Scaling up machine learning

2011282 citationsJournal Article

Authors

Ron Bekkerman · LinkedIn (United States)

John Langford · Research!America (United States)

Abstract

This tutorial gives a broad view of modern approaches for scaling up machine learning and data mining methods on parallel/distributed platforms. Demand for scaling up machine learning is task-specific: for some tasks it is driven by the enormous dataset sizes, for others by model complexity or by the requirement for real-time prediction. Selecting a task-appropriate parallelization platform and algorithm requires understanding their benefits, trade-offs and constraints. This tutorial focuses on providing an integrated overview of state-of-the-art platforms and algorithm choices. These span a range of hardware options (from FPGAs and GPUs to multi-core systems and commodity clusters), programming frameworks (including CUDA, MPI, MapReduce, and DryadLINQ), and learning settings (e.g., semi-supervised and online learning). The tutorial is example-driven, covering a number of popular algorithms (e.g., boosted trees, spectral clustering, belief propagation) and diverse applications (e.g., recommender systems and object recognition in vision).

Topics & Keywords

Neural Networks and Applications Data Stream Mining Techniques Machine Learning and Data Classification

Publication Details

DOI: 10.1145/2107736.2107740

Field-Weighted Citation Impact: 21.20

Command Palette

Scaling up machine learning

Authors

Abstract

Topics & Keywords

Publication Details