Skip to content

AdaBoost suggestions #16

@paul-rogers

Description

@paul-rogers

Thanks much for putting this material together!

Looking at Lucky 13: AdaBoost. A few items are a bit unclear for us newbies.

First, in the fit() method, there is just a single pass over the data, X, while the original Freund & Schapire, 1995 paper suggests looping for T iterations, refitting the classifiers on each pass based on the evolving weights. Looks like the version here is based on Zhu, et al 2009. Might be worth a few words to explain the source of the algorithm, and also why this version needs to make only one pass over the samples.

Second, just from a learning perspective, it would be great to provide a data set that mimics the illustrations in the video, just so we can verify that things work as expected. For extra credit, use MatPlotLib to create the decision boundary visualization from the video.

Third, it might be worthwhile pointing out refinements a real design would need. For example, here are the decision stubs created from the test code. Notice that feature 23 is used twice: same polarity, just different threshold. Is this a limitation of this simple example, or actually a useful quirk of AdaBoost?

0: {'polarity': -1, 'feature_idx': 27, 'threshold': 0.1424, 'alpha': 1.2271759901553476}
1: {'polarity': -1, 'feature_idx': 23, 'threshold': 728.3, 'alpha': 0.9273811402788633}
2: {'polarity': -1, 'feature_idx': 1, 'threshold': 19.98, 'alpha': 0.7916733128875748}
3: {'polarity': -1, 'feature_idx': 23, 'threshold': 876.5, 'alpha': 0.6099992009200025}
4: {'polarity': -1, 'feature_idx': 26, 'threshold': 0.2177, 'alpha': 0.5775069918855832}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions