home ¦ Archives ¦ Atom ¦ RSS

MacManus: Rhapsody Web Services

Nice overview of Rhapsody's Web Services, by Richard MacManus at Read/Write Web. Rhapsody is probably the biggest competitor to the iTunes Music Store, but does a much better job of leveraging streaming audio to support the celestial jukebox. If you actually read the Rhapsody Web Services SDK documentation, the API seems a bit thin. However, MacManus' post puts the facilities in context.

Interestingly, it's possible to get an RSS feed of your listening history.


Meunier and Silva: PLT Spy

Ask and ye shall receive. PLT Spy is an implementation of Python on top of PLT Scheme. The system was implemented by Philippe Meunier and Daniel Silva way back in 2003!! Shows how closely I've been paying attention.

Hat tip to Jens Axel Søgaard.


Aho, Lam, Sethi, Ullman: New Dragon Book

20 years on and now there's going to be a new edition of The Dragon Book, the canonical textbook on compilation. That $108 projected price is a bit daunting though!!

New features include discussions of interprocedural program analysis, garbage collection, just in time compilation, and optimization for parallel machines. They must cover instruction level parallelism somewhere and how it affects code generation. That's been a big change since 1986. Also, Addison-Wesly seems to be running some sort of online assignment tool called Gradiance, in conjunction with the book.

Of course the real question is, what's the cover going to look like? Will the Dragon still be there? How about the Knight? If so, what will their new editions look like?

Enquiring minds want to know!!


Felten: Net Neutrality Nuts & Bolts

If you haven't been keeping up with your network neutrality reading, or your position on the issue hasn't congealed into a fixed stance, Ed Felten's net neutrality backgrounder (PDF) makes for smart reading.


Torkington: Ning's Cool!

I have to admit that I had the same impressions of Ning that Nat Torkington did, and I don't even read TechCrunch. However, his writeup of a meeting with the Ning crew does a hell of a job selling the product. First off he explains how out of the box Ning isn't for me the "serious" hacker, but for communities of non-professional programmers, enabling a gentle slope for those who desire a Web application but don't have the obvious chops or connections. Second, the underlying Ning services seem eminently exploitable by a pro developer looking to outsource stuff like authentication and semi-structured data storage.

Like I said, I've been known to be wrong.


Hurst: Blogosphere Mapping

Matthew Hurst has been doing some interesting mapping and visualization of segments of the blogosphere.


Robb: Killing MS R&D

Riffing on an idle thought from John Robb. What would the impact on academic CS research be if MS started downsizing Microsoft Research? One could make an argument that educational institutions are already oversupplied with faculty material given current enrollments. This would just be piling on. I think you'd see a little boost in startups and a big boon for startups. There'd be a few more Paul Graham types running around to either start new things or pitch in with existing concerns. Also, Amazon, Yahoo, and Google would win big. You might also see some trickle over into other non-obvious disciplines like biology and the social sciences.

That would also be MS throwing in the towel on leadership in Web scale computing.


Broekema: CLPython

Neato! William Broekema has developed an implementation of Python written in Common Lisp. That makes for four different languages Python has been implemented in: C, Java, CL and Python. For PLDI snobs, some of the signs of a language being "real" is that it has multiple implementations, indicating a diversity of implementation strategies, and that it self-hosts, meaning that the language can be implemented within itself. Self-hosting also means that the language at least approaches having an operational semantics, if not a formal semantics. (Don't shoot me real PLDI researchers!!). While there are plenty of other languages to hack around on, I would think working on program analysis and software engineering tools for Python code might be a valuable line of research.

Another interesting usage for such implementations is in the development of restricted execution environments. Say you want to give a Common Lisp application an embedded scripting language. The obvious choice is to use eval as an escape hatch into the same Common Lisp environment, but that's fraught with security issues. Instead you expose CLPython, where you can carefully craft the scripting environment to prevent bad things from happening.

Finally, I have to imagine CLPython could be straightforwardly transliterated to Scheme, whence one could start to show off all those snazzy applications Scheme's continuations to implement Python's funky control structures.


Google: Academic Papers

Plus: Google updated it's list of academic papers to 2005/2006.

Minus: Looks like the links to all the old papers disappeared. Except for a few high profile, Google specific papers.

Still, worth having around, although the new set of papers seems highly skewed to machine learning, trust, and cryptography in my eye. Makes sense given The Global War on Spam (TM). Not much on classic os/nsdi type stuff or distributed systems, though.

[Via ResourceShelf, which also has a bonus link to Microsoft's experimental Phlat, desktop search project. I saw the talk on Phlat at CHI 2006 and the system looks pretty intriguing.]


Enthought: Python 2.4.3 Edition

Talk about the kitchen sink. Enthought packages up a bunch of freely available Python modules, some of which are tricky to build, and creates a nice, Windows only, installation. This amalgamation is particularly tuned to scientific computing, but a beginning Python programmer, or even budding computer science student, could do worse than grab the Enthought edition to get started with. This goes beyond batteries included to nuclear power plant included!


Kuchling: Python FP HOWTO

Andrew Kuchling warns that the current URL is unstable, but his HOWTO on functional programming in Python is a nice concise overview of functional programming features in the language. At the least, I learned of (actually probably relearned) the enumerate function. In geekspeak, enumerate generates, from a sequence, a sequence of indexes and sequence items. Exceedingly handy.


Linden: Personalization v Spam

Greg Linden nicely summarizes one argument for personalization in search and other forms of aggregation. If the results are personalized, the payoff for spammers decreases, hopefully disincenting (sp?) them from spam attempts.


NMH: Sayonara Evanston

Today I attended graduation for both Northwestern's College of Engineering and the College of Arts and Sciences. It was a graduation of sorts for me as well, as I sat with the other faculty seeing undergraduates off into a bright future. I told you guys I'd be out of this business by the age of 40, and if nothing else I try to be a man of my word. I'll keep this one by a cushy 9 months.

My appointment with NU officially ends the 31st of August. Within the next 36 hours, I'm going to grab a bunch of clothes, some books, and some computing gear, then rood trip to the greater Washington, DC where my wife's been for 9 months now. A native Chicagoan she landed a gig that is a great career advance for her. For the rest of the summer, I'll be working with Medill's fellows in the News21 program, helping them implement web based, multimedia presentations of their reporting on privacy, civil liberties, and homeland security. Medill has a Washington, DC office so this works out perfectly. I should be back in the Chicago area a couple of times over the summer for various non-Northwestern related activities.

I'm not quite sure what I'll be doing after the summer, but if you need a versatile, multitasking, wicked fast learning, New Media Hacker yearning to scratch some new itches, I'm still available. DC area is a hard requirement unless you can give me enough money so my wife can retire. I could be convinced to commute to Chicago or NYC with the right perks.

New Media Hack will truck on for a few more months, while I wrap up loose ends here in Evanston. Depending on administrative largesse, it may even continue on into the next academic year. I'm debating whether to relocate the site, its current contents, and its sensiblity, or reinvent in whole cloth. I'll have a decision in a month or two.

Ciao!!


Efimova: On ICWSM

Lilia wonders how the International Conference on Weblogs and Social Media will contrast with Blogtalk. If I had to guess from looking at the list of organizers and the program committee (a flyer was handed out at WWW '06) it'll be very North American, fairly academic (there's a decent number of industry types involved), and much more formal (probably a straightahead conference format). My guess is that there'll be plenty of iterations on text mining of weblogs, network analysis of blog communities, and dealing with the menace of splogs. Discussion on culture and practice of weblogs will probably be quite minor.

Not that there's anything wrong that!

Paper deadline is December 8th, 2006. Conference dates, March 27th - 28th. Location, Boulder, Colorado, USA.


Henderson: Building Scalable Websites

Like many out there, I eagerly anticipated the arrival of Building Scalable Websites, by Cal Henderson. I got my copy this week and ripped through it in about 48 hours. Unfortunately, I was quite disappointed.

Probably, I was unconconciously holding the book up the standard of Philip Greenspun's writings, including "Philip and Alex's Guide to Web Publishing" and "Internet Application Workbook", which are both still pretty good reads on the fundamentals of engineering web applications. Much has changed since the mid 90's which is why Building Scalable Websites could have been a great update. My major complaints are three:

  • No Flickr war stories!! Where's the story about wiping out a terabyte of photos and having to miraculously rescue the data from barely working tape? Or rolling out a feature and subsequently having 36 hours of downtime? This is where Greenspun really excels, and it definitely helps break up the monotony of the rote listing of applicable technologies.

  • Shallowness. Overall, I felt the text covered topics just enough to convince a reader of the author's proficiency but not enough to transmit insight or at least the hairy details. For example, in Chapter 10, Spread is discussed as a technology for reliable multicasting of logging information. Good idea! And I know a little bit about Spread including the fact that Spread doesn't provide flow control, which means if a client can't keep up, data gets lost. I have to imagine this limitation is an issue in large scale websites, but no mention in the book. It felt like the discussion barely scraped the surface of the topic. I wonder how many of the other sections similarly lacked depth.
  • No images?! On the cover is a brazen black band in the corner with "The Flickr Way". I think there's exactly one image in the book, a poor photo at that. This is irony. It's also another area where Building Scalable... falls short in comparison to Philip and Alex's... . The images in that book, while astoundingly superfluous (depending on your sense of humor), were also astoundingly beautiful and helped break up the text of Philip's book, which never seemed to drag for me.
In short, I felt myself nodding off way too much, for small nuggets of wisdom. I'd still recommend purchasing the book, but more as a reference and case study to occasionally peek at and get a start at tackling a tough issue. However, you'll have to go to other texts when you eventually have to dig deeper. But what do I know. I didn't build Flickr.

Efimova: AAAI '06 Weblog Papers

Lilia Efimova collected links to papers presented at the AAAI 2006 Symposium on Computational Approaches to Analyzing Weblogs.

Speaking of the AAAI Weblog Symposium, some organizers from that meeting and the WWW '06 Weblog Ecosystem Symposium have banded together to launch a new standalone weblog conference: the International Conference on Weblogs and Social Media (site down as of this writing). The conference will first meet in March '07, with a submission deadline of November '06, if I remember correctly.


Wyman: PubSub in Trouble

Fooey. Looks like the PubSub company is in trouble and may have to close its doors soon. Like other search companies, they were having a tough time dealing with spam, and they never had a really sexy consumer facing product. The concepts and implementation of standing search combined with real-time notification, through a number of mechanisms, is a good idea, and well worth pursuing. A rich, new syndication ecosystem could be built on top of that foundation.


Nullsoft: Open Source AVS

With "copious spare time" due to the closeout of our spring quarter, I got to digging around in some old software gathering dust on a harddrive here and there. For a few years running I had students in my second quarter, intro programming course extend a plug-in for Winamp. My kludgy code to get them going sufficed for its purpose but I was always amazed by the Advanced Visual Studio (AVS) plug-in. Spectacular visual effects with an extremely minimal data model and programming language. I'm not sure the AVS language is even Turing complete.

Last year, Nullsoft open sourced the plug-in code for AVS. I've always wondered about the core of what is essentially a programmable, real-time image processing engine. That should be an interesting read for anyone interested doing computational art. Wonder how dependent on DirectX the code is.

Apropos of nothing, it seems like there's a Wikipedia entry for everything.


Offenhuber & Dirmoser: SemaSpace

Link parkin': SemaSpace looks like an interesting interactive graph visualization tool.

[Via infosthetics]


System One: Wikipedia3

The System One company has been converting the entire Wikipedia corpus into RDF. 47 million triples is a lot of data. A big heap o' XML data ready for machine manipulation. Drool.

The translation doesn't do incorporate Wikipedia article text, but as toxi points out big datasets for infoviz hacking aren't easy to come by.


Bumgardner: Building Tag Clouds

There's a whole e-book on "Building Tag Clouds in Perl and PHP". $9.99 for 48 pages. I'm not sure the subject needs that many pages, but then again tag cloud generation is an interesting case study for scripting languages.

Bumgardner also makes some cases for why you'd want to use tag clouds, other than being a web hipster.


O'Reilly: Startup Cities

Tim O'Reilly has an analysis of cities which are generating the most startups. The methodology could be suspect, but at least it's clearly documented. The obvious suspects, SF Bay Area, Boston, Seattle, NYC are leading the pack, but I wouldn't have guessed DC #5. Chicago is a respectable ninth, but probably not so hot amortized over the entire population. Ditto LA.


Borevitz: State of the Union Explorer

Link parkin: Brad Borevitz's visualization of all of the US State of the Union Addresses. Borevitz also makes the source text of the addresses available.


NMH: NeWS Reminiscing

Apropos of nothing, is it possible that time has caught up with NeWS? Given the Pythonic features and increased sophistication anticipated in the next edition of JavaScript, the ubiquity of Flash, and sophisticated languages implemented on the Java VM, a.k.a. applets, you've got multiple high-level engines with 2D rendering horsepower equivalent to PostScript. Granted these platforms have their own programming languages, but the NeWS extensions to PostScript were quite elegant and made for a darn good windowing system.

Leafing through The NeWS Book, and other histories of NeWS, I was struck how seemingly insignificant turns of fate have a big impact. Gosling helped port X10 to Sun hardware, and returned the source, eventually leading to The X Window Systems wide adoption on the UNIX platform. Essentially, Gosling shot himself in the foot.

Still waiting for the day we can have windows shaped to arbitrary paths.


Croft: Django Overview

Speaking of Django, Jeff Croft has a nice overview of Django for non-programmers.


SixApart: MovableType 3.3

I haven't been big into MovableType since I handed off a couple of sites that use 3.2. Creaky old NMH is still at 2.51

The new features of MovableType 3.3 look really juicy though. An improved plug-in mechanism and more extensibility of the admin interface were a couple of things I was really pining for.

Django or MT? MT or Django? Decisions, decisions.


NMH: Baseball Hacks in Hand

After a pre-release sighting earlier this year, I kept in the back of my mind the notion to buy Joseph Adler's Baseball Hacks. I got my dirty little mitts on a copy today.

Scanning through it, the book basically has three phases:

  1. Fundamentals of baseball and scoring
  2. Retrieving statistics from the Web and getting them into a database or statistics package
  3. Calculating all those statistics that SABRmetricians love
Pretty much what I anticipated, although I was surprised to find that there's actually a big hole in publicly available play-by-play data from 1992 to 1999 I believe. Otherwise, the book did indeed satisfy my itch for sources of raw data. As an O'Reilly "Hacks" book, it's doesn't appear to go into any one topic particularly deeply, which is fine. Although Statistics Hacks and a good intro stats books are probably good companions. In a riff on one of the books hacks, it looks like the game log data for other sports is readily scrapable off of sites like CBS Sportsline or ESPN.com. Hmmmmmm!

Poe: Perl6 Features

I forced myself to learn Perl 5 a while ago, just so I could at least kvetch knowledgeably about the language. I wound up actually appreciating many of its features, although the OOP system and references are truly godawful.

The long promised Perl 6 may be arriving within our lifetimes, and given the feature set described by Curtis Poe I can only say one thing.

Common Lisp with Perl syntax is going to be really weird.

Honestly, he must have been kidding about that "Real, honest to goodness macros like LISP programmers enjoy,...", right. Right?! If not, that's one I have got to see. Macros and Algol syntax are like oil and water. If the Perl 6 guys have figured out a way to make it work, more power to 'em.

Greenspun's tenth law lives on.


Cutting, Cafarella, Bialecki: Hadoop

Once upon a time, I called Google's MapReduce capability a force multiplier. Looks like the militia are going to get this capability sooner rather than later.

Doug Cutting, of Lucene and Nutch fame, along with Mike Cafarella and Andrzej Bialecki are developing Hadoop. Java based, Hadoop is open source, large scale, distributed computing infrastructure, including a MapReduce implementation on the come.

[Via Tim Bray]


WWW2006: Ecosystem Workshop Blog

Part of NMH's outage was due to attendance at WWW 2006 where I had a workshop paper. The day after my presentation, I crashed the Weblogging Ecosystem Workshop. The workshop blog recaps the many excellent papers. A couple of other nuggets that came out regarding data to do experiments on. First, Intelliseek is extending the availability of the dataset provided for the workshop. Second, the venerable TREC put together a weblog test collection, housed at the University of Glasgow.


MS: Live Labs Grant Awards

At the beginning of February, I noted the launch of Microsoft's Live Labs and an accompanying academic grant program.The winners have been announced. Brief descriptions of the awardees proposals are also available.

Since the key draw of the RFP was access to large scale query logs, data mining is a constantly recurring them although there is one winner investigating merging traditional search with del.icio.us style social search, and another project doing visualization of search results for health information.


De Bleser: NodeBox

Recently I've been fantasizing about a processing-like environment, but built on top of Python instead of Java. Frederik De Bleser's NodeBox fits the bill. Mac OS only though. The NodeBox gallery has some pretty impressive stuff.

I've had a front row seat for the collapse of CS enrollment through five straight years of teaching the second quarter of our intro sequence. Now I'm thinking about alternative intro curricula that really get away from the command line/console straightjacket. I don't claim to be particularly original, but it seems to me you could carefully craft a set of courses that did the following:

  • Completely covers a standard ACM/IEEE CS Curriculum

  • Frames assignments in terms of interactive visual artifacts or media manipulation
  • Touches on the simple, but mind-blowing, concepts of CS, e.g: automata, complex networks, iterated function systems, multi-agent systems, etc.
At least at NU, our intro sequence is mired in the uninspiring minutiae of learning programming. Something like NodeBox looks like a good start to getting out of this trap. [Via Daniel Jalkut, via Rui Carmo's The Tao of Mac]

Serrano, Gallesio, Loitsch: HOP

HOP is a language for programming the Web. My cursory scan indicates that the system has roughly the same architecture as OpenLaszlo for creating web apps. A domain specific language is compiled into a combination of client side DHTML code and server side business (in the broad sense) logic handlers. Scheme inside.

Looks like a neat framework, but the downer is that it can only target standards compliant browsers, a.k.a. anything besides IEs 5 and 6.


Gray & Voegels: Amazon Technology

Turing Award Winner Jim Gray interviews Amazon CTO Werner Voegels for ACM Queue. Good insights ensue.

Just a morsel, but Amazon proves the adage: "any useful complex system is grown from a simple working system."

I've talked about "Google scale" problems, but there's also a whole bin of "Amazon scale" problems which might not have the same order of magnitude of data to deal with, but whose interactivity requirements make for very different engineering challenges.


Holovaty: Mizzou JSchool Commencement

A few years ago I invited Adrian Holovaty to come to a seminar I organized here at Medill. The seminar was half baked but I'm glad to see Adrian recovered, made something of himself, and is lighting an entrepreneurial fire under some JSchool graduates.


NMH: The Hack is Back

Okay, I've been on hiatus for long periods before, but this is the first time the site has actually been down for a while. I got a sense that a feeling of panic was creeping throughout my vast audience. Anyhoo, a failing hard drive, a heap of travel, and the end of the quarter conspired to take creaky old costarica off the air and keep it that way for a few weeks. (Research tech support at NU? Shyah, right!).

Thankfully, the hard drive decided to make one last stand before I had to perform major surgery. New Media Hack has been backed up in all its glory so if/when the disk bites the bullet I'm in good shape to recover. Whew! I may just go proactive and cobble together a replacement machine when the quarter is over at the end of next week.

So you haven't quite gotten rid of me yet!! Heck, I think my Google ranking even went up


Novell: Open Source FLAIM

FLAIM is an embeddable database engine ala' Sleepycat's Berkeley DB. FLAIM is interesting in that it looks like it also provides a query language and indexing on the DB contents. Plus there's XFLAIM which looks to be an XML store on top of FLAIM. Novell has open sourced FLAIM and XFLAIM.

Wonder if there's a Python binding.

[Via Hack the Planet]


MusicStrands: Applied Research Labs

MusicStrands is a music recommendation site and system akin to last.fm. There's a plug-in that you add to iTunes that watches what you listen to, and forwards observations to MusicStrands' servers. Then new music is recommended to you. MusicStrands also incorporates tagging and playlist sharing, encouraging social stickiness.

Who knew that they had an applied research lab? They not only document whizzy new features in MusicStrands, but publis the occasional technical paper. Torrens, Hertzog, and Arcos have one on music visualization (PDF), while Baccigalupo and Plaza are applying case-based reasoning to playlist recommendation (PDF). Kewl!!


Linden: Spotback

Greg Linden spotted the newest kid on the personalized news block: Spotback. Spotback's "twist" is that ratings drive the personalization. I found Linden's take surprisingly positive although he did find the recommendations a bit off.

The key problem is that rating is work. In this context, it's work for non-obvious impact. If you rate a bunch of stuff how do you know you're getting better personalized results? I know the last thing I want to do when I'm surfing is rate stuff for the benefit of some opaque ghost in the machine. And this doesn't even touch the issues around dealing with cheaters. Vote early, vote often as we say in Chicago!

Since this personalization transparency issue is becoming a hobby horse of mine, maybe I should hunker down and do some literature review to see how the recommender system and information filtering communities evaluate the quality of their results. I'm starting to think that the best such engine would convince me not that it had "more good stuff" but guaranteed that it "always had the important stuff".


Sphere: Now Live

Well I guess that answered my question! Sphere ain't zombied. In fact it's quite live.

Now if I could only see what the win is. Oh sure, the Sphere folks have some nice claims, but other than UI goodness, how do I know their search results are great? Heck, I don't even have a clue what they're indexing.

© Brian M. Dennis. Built using Pelican. Theme by Giulio Fidente on github.