Google engineer Mihai Parparita did a study of webfeeds subscribed to in Google Reader. The goal of the work was to see what RSS/Atom namespace extensions were being used the most. The results were interesting albeit to a narrow audience of aggregator builders and webfeed wonks like me. Still it's good to have the information out there.
I was also struck by this casual toss off line by Parparita:
I wrote a small MapReduce program to go over our BigTable and get the top 50 namespaces based on the number of feeds that use them.I haven't actually seen this code but it feels like this was a one day hack.
Across a huge number of subscription lists.
Using a large number of parallel machines.
I'm engaging in a bit of speculation, but I think this is another example of how Google has a powerful, Web scale programming tool, developed by research, that enables frontline engineers to be creative. It's probably fairly rare to hear in the distributed, parallel, and high performance computing communities a sentence start with, "I wrote a small Foo program to..."