Wednesday, December 15, 2010

Hive, SQL like, Facebook and the map join optimization

Hive is the SQL-like processor on the top of Hadoop, the open source Map-Reduce implementation. Facebook recently proposed an interesting optimization for the sort and merge phase adopted by the Reducer: "The motivation of map join is to save the shuffle and reduce stages and do the join work only in the map stage. By doing so, when one of the join tables is small enough to fit into the memory, all the mappers can hold the data in memory and do the join work there. So all the join operations can be finished in the map stage. However there are some scaling problems with this type of map join. When thousands of mappers read the small join table from the Hadoop Distributed File System (HDFS) into memory at the same time, the join table easily becomes the performance bottleneck, causing the mappers to time out during the read operation".

This is the optimization: "Hive-1641 solves this scaling problem. The basic idea of optimization is to create a new MapReduce local task just before the original join MapReduce task. This new task reads the small table data from HDFS to an in-memory hash table. After reading, it serializes the in-memory hash table into a hashtable file. In the next stage, when the MapReduce task is launching, it uploads this hashtable file to the Hadoop distributed cache, which populates these files to each mapper's local disk. So all the mappers can load this persistent hashtable file back into memory and do the join work as before."

So they are doing some pre-computation before the join operation itself for computing an hash table which is then pre-loaded in the memory of the local mapper. The performance gains are amazing since optimized map join is 12 to 26 times faster than the previous one.

I wonder how a fast compression scheme could help here. Paolo?

No comments:

Post a Comment