Sunday, November 23, 2008

Sorting 1PB with MapReduce

This is just crazy..... If you are a computer person, you understand just how crazy this is. If you aren't, this is similar to sorting all the junk mail, junk email, everything on your computer, everything in your cupboards, under the sink, sock drawers, that one junk drawer everyone has (yes, you do!), the nightstand, the basement and the garage, and basically every single piece of somethingt you and everyone in your city has every owned, possessed, touched, looked at, or smelled in the last 5 years...sorted *all* of that in, oh, a few seconds. All of it...every piece...sorted, organized, and arranged...oh yeah, and copied 3 times (before the sort) for disaster recovery.

Official Google Blog: Sorting 1PB with MapReduce: "We are excited to announce we were able to sort 1TB (stored on the Google File System as 10 billion 100-byte records in uncompressed text files) on 1,000 computers in 68 seconds. By comparison, the previous 1TB sorting record is 209 seconds on 910 computers.

Sometimes you need to sort more than a terabyte, so we were curious to find out what happens when you sort more and gave one petabyte (PB) a try. One petabyte is a thousand terabytes, or, to put this amount in perspective, it is 12 times the amount of archived web data in the U.S. Library of Congress as of May 2008. In comparison, consider that the aggregate size of data processed by all instances of MapReduce at Google was on average 20PB per day in January 2008."

No comments: