Pig latin

20081,741 citationsJournal Article

Authors

Christopher Olston · Yahoo (United States)

Utkarsh Srivastava · Yahoo (United States)

Abstract

There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its associated scalable implementations on commodity hardware, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse.

Topics & Keywords

Advanced Database Systems and Queries Data Management and Algorithms Cloud Computing and Resource Management

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

DOI: 10.1145/1376616.1376726

Field-Weighted Citation Impact: 167.93