home ¦ Archives ¦ Atom ¦ RSS

NMH: Weblogging Data Challenge

So I've complained before about open data sets for weblog researchers and hackers. Be careful what you wish for...

I received the challenge data for the 2006 Weblogging Ecosystem Workshop. The first day of the set is a 200 odd MB compressed file that expands to over 900 MB. Yowsa!! And there's 17 days of this stuff.

Suffice it to say you can't just slurp the whole mess into an in-memory datastructure and start noodling about. Gonna have to creatively splice this iceberg into somewhat more manageable chunks.

© Brian M. Dennis. Built using Pelican. Theme by Giulio Fidente on github.