2010年1月3日 星期日

MapReduce vs. Parallel Database

先是有一篇 Google 的 MapReduce paper (OSDI '04).

再來有一篇 Brown/Wisconsin/Yale/MS/MIT 的 paper, 主旨是說 MapReduce 的 performance 完全比不上 modern parallel database (SIGMOD '09).

現在是 Google 陣營要戰了 (Communications of the ACM '10):

The query languages built into parallel database systems are also used to express the type of computations supported by MapReduce. A 2009 paper by Andrew Pavlo et al. (referred to here as the "comparison paper"13) compared the performance of MapReduce and parallel databases. It evaluated the open source Hadoop implementation10 of the MapReduce programming model, DBMS-X (an unidentified commercial database system), and Vertica (a column-store database system from a company co-founded by one of the authors of the comparison paper). Earlier blog posts by some of the paper's authors characterized MapReduce as "a major step backwards."5,6 In this article, we address several misconceptions about MapReduce in these three publications:

* MapReduce cannot use indices and implies a full scan of all input data;
* MapReduce input and outputs are always simple files in a file system; and
* MapReduce requires the use of inefficient textual data formats.

We also discuss other important issues:

* MapReduce is storage-system independent and can process data without first requiring it to be loaded into a database. In many cases, it is possible to run 50 or more separate MapReduce analyses in complete passes over the data before it is possible to load the data into a database and complete a single analysis;
* Complicated transformations are often easier to express in MapReduce than in SQL; and
* Many conclusions in the comparison paper were based on implementation and evaluation shortcomings not fundamental to the MapReduce model; we discuss these shortcomings later in this article.

We encourage readers to read the original MapReduce paper and the comparison paper for more context.

http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext

最後一句話, 用白話文來說, 意思差不多就是"你回去多讀個幾年書再來戰吧".

沒有留言: