Nikunj C. Oza's Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

AveBoost2: Boosting for Noisy Data

AveBoost2: Boosting for Noisy Data. Nikunj C. Oza. In Fifth International Workshop on Multiple Classifier Systems, pp. 31–40, Springer-Verlag, Cagliari, Italy, June 2004.

Download

[PDF]211.6kB  

Abstract

AdaBoost \citefrsc96 is a well-known ensemble learning algorithmthat constructs its base models in sequence. AdaBoostconstructs a distribution over the training examples to create eachbase model. This distribution, represented as a vector, is constructedto be orthogonal to the vector of mistakes made by the previous basemodel in the sequence \citekiwa99. We previously \citeoza03developed an algorithm, AveBoost, that constructed distributionsorthogonal to the mistake vectors of all the previous models, and thenaveraged them to create the next base model's distribution. Ourexperiments demonstrated the superior accuracy of this approach. Inthis paper, we slightly revise our algorithm to obtain non-trivialtheoretical results: bounds on the training error and generalizationerror (difference between training and test error). Our averagingprocess has a regularizing effect which leads us to a worse trainingerror bound for our algorithm than for AdaBoost but a bettergeneralization error bound. For this paper, we experimented with thedata that we used in \citeoza03 both as originally supplied and withadded label noise---some of the data has its original labelchanged. Our algorithm's performance improvement over AdaBoost is evengreater on the noisy data than the original data.

BibTeX Entry

@inproceedings{oza04,
	author={Nikunj C. Oza},
	title={AveBoost2: Boosting for Noisy Data},
	booktitle={Fifth International Workshop on Multiple Classifier Systems},
	publisher={Springer-Verlag},
	address={Cagliari, Italy},
	editor={Fabio Roli, Josef Kittler, and Terry Windeatt},
	pages={31-40},
	month={June},
abstract={AdaBoost~\cite{frsc96} is a well-known ensemble learning algorithm
that constructs its \emph{base} models in sequence. AdaBoost
constructs a distribution over the training examples to create each
base model. This distribution, represented as a vector, is constructed
to be orthogonal to the vector of mistakes made by the previous base
model in the sequence~\cite{kiwa99}. We previously~\cite{oza03}
developed an algorithm, AveBoost, that constructed distributions
orthogonal to the mistake vectors of all the previous models, and then
averaged them to create the next base model's distribution. Our
experiments demonstrated the superior accuracy of this approach. In
this paper, we slightly revise our algorithm to obtain non-trivial
theoretical results: bounds on the training error and generalization
error (difference between training and test error). Our averaging
process has a regularizing effect which leads us to a worse training
error bound for our algorithm than for AdaBoost but a better
generalization error bound. For this paper, we experimented with the
data that we used in~\cite{oza03} both as originally supplied and with
added label noise---some of the data has its original label
changed. Our algorithm's performance improvement over AdaBoost is even
greater on the noisy data than the original data.
},
	bib2html_pubtype={Refereed Conference},
	bib2html_rescat={Ensemble Learning},
	year={2004}
	}

Generated by bib2html.pl (written by Patrick Riley ) on Sun Jan 13, 2008 22:02:08