Machine learning is the process of using past experience to predict the future. There are many machine learning methods; neural nets, support vector machines, decision trees. The design trade-offs in optimizing them is a tricky business, still more art than science.
"Ensembles" are a machine-learning meta-method that can be applied to most machine learning algorithms. Ensembles generally greatly improve accuracy, provably do no harm, are admirably suited to parallel and distributed computation, and are delightfully weird and counter-intuitive. Further, properly used, they can greatly reduce or remove most of the design stress in building the best machine learning method for a given data set.
This talk will provide an terse introduction to machine learning ensembles: what they are, various theories on why they work, and how they can be simply applied to improve existing machine learning code in situ. It will then summarize a decade's research into how to best *use* ensemble methods, including late-breaking news on how a small change to decision tree machine learning methods can vastly improve their ensemble performance on heavily imbalanced data.
* Open to the general public. |