• Sorted by Date • Classified by Publication Type • Classified by Research Category •

**AveBoost2: Boosting for Noisy Data**. Nikunj C. Oza. In *Fifth International
Workshop on Multiple Classifier Systems*, pp. 31–40, Springer-Verlag, Cagliari, Italy, June 2004.

AdaBoost \citefrsc96 is a well-known ensemble learning algorithmthat constructs its *base* models in sequence. AdaBoostconstructs
a distribution over the training examples to create eachbase model. This distribution, represented as a vector, is constructedto
be orthogonal to the vector of mistakes made by the previous basemodel in the sequence \citekiwa99. We previously \citeoza03developed
an algorithm, AveBoost, that constructed distributionsorthogonal to the mistake vectors of all the previous models, and thenaveraged
them to create the next base model's distribution. Ourexperiments demonstrated the superior accuracy of this approach. Inthis
paper, we slightly revise our algorithm to obtain non-trivialtheoretical results: bounds on the training error and generalizationerror
(difference between training and test error). Our averagingprocess has a regularizing effect which leads us to a worse trainingerror
bound for our algorithm than for AdaBoost but a bettergeneralization error bound. For this paper, we experimented with thedata
that we used in \citeoza03 both as originally supplied and withadded label noise---some of the data has its original
labelchanged. Our algorithm's performance improvement over AdaBoost is evengreater on the noisy data than the original data.

@inproceedings{oza04, author={Nikunj C. Oza}, title={AveBoost2: Boosting for Noisy Data}, booktitle={Fifth International Workshop on Multiple Classifier Systems}, publisher={Springer-Verlag}, address={Cagliari, Italy}, editor={Fabio Roli, Josef Kittler, and Terry Windeatt}, pages={31-40}, month={June}, abstract={AdaBoost~\cite{frsc96} is a well-known ensemble learning algorithm that constructs its \emph{base} models in sequence. AdaBoost constructs a distribution over the training examples to create each base model. This distribution, represented as a vector, is constructed to be orthogonal to the vector of mistakes made by the previous base model in the sequence~\cite{kiwa99}. We previously~\cite{oza03} developed an algorithm, AveBoost, that constructed distributions orthogonal to the mistake vectors of all the previous models, and then averaged them to create the next base model's distribution. Our experiments demonstrated the superior accuracy of this approach. In this paper, we slightly revise our algorithm to obtain non-trivial theoretical results: bounds on the training error and generalization error (difference between training and test error). Our averaging process has a regularizing effect which leads us to a worse training error bound for our algorithm than for AdaBoost but a better generalization error bound. For this paper, we experimented with the data that we used in~\cite{oza03} both as originally supplied and with added label noise---some of the data has its original label changed. Our algorithm's performance improvement over AdaBoost is even greater on the noisy data than the original data. }, bib2html_pubtype={Refereed Conference}, bib2html_rescat={Ensemble Learning}, year={2004} }

Generated by bib2html.pl (written by Patrick Riley ) on Sun Jan 13, 2008 22:02:08