Nikunj C. Oza's Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

Online Ensemble Learning

Online Ensemble Learning. Nikunj C. Oza. Ph.D. Thesis, The University of California, Berkeley, CA, 2001.

Download

[PDF]659.7kB  

Abstract

This thesis presents online versions of the popular bagging and boosting algorithms. We demonstrate theoretically and experimentally that the online versions perform comparably to their original batch counterparts in terms of classification performance. However, our online algorithms yield the typical practical benefits of online learning algorithms when the amount of training data available is large.Ensemble learning algorithms have become extremely popular overthe last several years because these algorithms, which generatemultiple base models using traditional machine learningalgorithms and combine them into an ensemble model, haveoften demonstrated significantly better performance than singlemodels. Bagging and boosting are two of the most popularalgorithms because of their good empirical results and theoreticalsupport. However, most ensemble algorithms operate in batch mode,i.e., they repeatedly read and process the entire training set.Typically, they require at least one pass through the training setfor every base model to be included in the ensemble. The basemodel learning algorithms themselves may require several passesthrough the training set to create each base model. In situationswhere data is being generated continuously, storing data for batchlearning is impractical, which makes using these ensemble learningalgorithms impossible. These algorithms are also impractical insituations where the training set is large enough that reading andprocessing it many times would be computationally prohibitive.This thesis describes online versions of bagging and boosting.Unlike the batch versions, our online versions require only onepass through the training examples in order regardless of thenumber of base models to be combined. We discuss how we derive theonline algorithms from their batch counterparts as well astheoretical and experimental evidence that our online algorithmsperform comparably to the batch versions in terms ofclassification performance. We also demonstrate that our onlinealgorithms have the practical advantage of lower running time,especially for larger datasets. This makes our online algorithmspractical for machine learning and data mining tasks where theamount of training data available is very large.

BibTeX Entry

@phdthesis{oza01,
	author="Nikunj C. Oza",
	title="Online Ensemble Learning",
	department={Electrical Engineering and Computer Science},
	school={The University of California},
	address={Berkeley, CA},
	month={Sep},
	abstract ={This thesis presents online versions of the popular
                  bagging and boosting algorithms. We demonstrate
                  theoretically and experimentally that the online
                  versions perform comparably to their original batch
                  counterparts in terms of classification
                  performance. However, our online algorithms yield
                  the typical practical benefits of online learning
                  algorithms when the amount of training data
                  available is large.
Ensemble learning algorithms have become extremely popular over
the last several years because these algorithms, which generate
multiple \emph{base} models using traditional machine learning
algorithms and combine them into an \emph{ensemble} model, have
often demonstrated significantly better performance than single
models. Bagging and boosting are two of the most popular
algorithms because of their good empirical results and theoretical
support. However, most ensemble algorithms operate in batch mode,
i.e., they repeatedly read and process the entire training set.
Typically, they require at least one pass through the training set
for every base model to be included in the ensemble. The base
model learning algorithms themselves may require several passes
through the training set to create each base model. In situations
where data is being generated continuously, storing data for batch
learning is impractical, which makes using these ensemble learning
algorithms impossible. These algorithms are also impractical in
situations where the training set is large enough that reading and
processing it many times would be computationally prohibitive.
This thesis describes online versions of bagging and boosting.
Unlike the batch versions, our online versions require only one
pass through the training examples in order regardless of the
number of base models to be combined. We discuss how we derive the
online algorithms from their batch counterparts as well as
theoretical and experimental evidence that our online algorithms
perform comparably to the batch versions in terms of
classification performance. We also demonstrate that our online
algorithms have the practical advantage of lower running time,
especially for larger datasets. This makes our online algorithms
practical for machine learning and data mining tasks where the
amount of training data available is very large.},
bib2html_pubtype = {Other},
bib2html_rescat = {Ensemble Learning},
	year ={2001}
}

Generated by bib2html.pl (written by Patrick Riley ) on Sun Jan 13, 2008 22:02:08