Link parkin': Facebook now has web services APIs. Social software and network researchers, start your engines.
A slightly stale bit, but I ran across this piece by Alex Russell which defines Comet, an extension (improvement?) on Ajax style client-side communication. That's my brief summary, but Russell fleshes out the technical differences between Ajax and Comet in more detail. It took me a while to untangle a key figure in his post, but then I realized that the key point of the Comet architecture involves a server autonomously pushing data to a client side application, not just the client making requests and receiving responses.
The hope is that by putting a name to the collective techniques, and identifying specific examples in the wild, momentum and shared experiences can be built. For example, Dojo, a client-side JavaScript toolkit which Russell helps develop, already supports Comet techniques. If you read some followup posts on his blog, looks like there's already quite a bit of action around Comet.
Link parkin': Dethe Elza, the developer of Pastels, says, "Pastels is an example project for creating an OS X screensaver in Python using PyObjC." Mmmmmm, screensaver in Python.
Live coding is real time programming of a system as the system is running. The technique is of interest in creating computational art, being a means to put human performers into the creative loop.
Logan Koester did some hypothesizing and prototyping of live coding in Python. The technique strikes my fancy as a way for computer science instructors to make classes more performative and, shudder, interactive. Also, I just find this alternate universe of computational art, that I know little about, fascinating. It's fun to discover stuff like ChucK, SuperCollider, and Fluxus.
Kudos to MarsEdits's automatic post saving, which rescued today's efforts from a brief power outage induced death. Time to get a UPS.
Rexa.info is search engine for the academic computing literature developed by the University of Massachusetts, Amherst Information Extraction and Synthesis Laboratory. Here's a brief summary of Rexa:
Rexa is a digital library and search engine covering the computer science research literature and the people who create it. Rexa aims to facilitate research progress and collaboration by providing efficient browsing, search, associations and analysis among papers, people, organizations, venues and research communities.
You have to sign up for an account to use Rexa, but in return you get to annotate and tag citations of interest.
[Via Stephen Glass]
Both Kevin Burton and Matthew Hurst are a bit suspicious of David Sifry's recent state of the blogosphere report. Count me in as well. Burton delves into the number of posts per blog, among other things, and through a couple of calculations argues that the number of "active" blogs is a few millions and only growing linearly. The comments on his post are also worth reading.
Of course, this hinges on what do you call an active blog, which is where Hurst comes in. The blogosphere, if you conflate that with Technorati's index, seems to include a lot of dead stuff. Hurst also points out, more accurately, that Technorati is just one sample of the blogosphere. The hard part is how to extrapolate from that to a comprehensive picture of blogging. Studious readers will note however, that Hurst works for a Technorati competitor, just to be clear, and I've briefly met Hurst in a professional setting. I definitely lean towards his scientific approach to observing and analyzing the weblog ecology.
From my point of view, Sifry's report really calls out for some segmentation based upon blog activity. I'm not sure how productive "blogosphere is doubling" announcements are any more. They're akin to Google and Yahoo!'s index size announcements. Great! But how come I can't find that page I was searching for.
The eminent Simon Willison has put together a Python Developer Center for the Yahoo! Developer Network. All about how to take advantage of Yahoo!'s web services using the Python programming language. The only thing I'd add is some mention of the really wonderful pycurl, for high performance, high flexiblity, highly standard conforming web client code.
Googler Eric Case collects links to a Google Maps API Tutorial that was run by Developer.com.
Link parkin': Diigo is about "Social Annotation". The company's about page, promotes the site as combining social bookmarking and social note taking that appears "in situ" on web pages. The tool got some pretty good press from the Techcrunch crowd.
Man! The ghost of Third Voice just won't die.
Okay, now Bloglines is losing blog posts that I've marked as "keep new", my low tech way of keeping a post around until I want to comment on the post or bookmark it. The lossage wouldn't be so bad if the left pane wasn't taunting me with counts that seem to indicate the system still has those marked posts somewhere.
Oh, and that duplicate post presentation effect seems to be spreading to other webfeeds as well.
Speaking of virtual machine techonology, out there somewhere, there's probably a team of developers working on a programming model for massive swarms of virtual machines. The folks at Northwestern's Plab, along with a growing virtualization research community, are making great strides in the utility and performance of virtual machines. But I wonder if systems built out of creating, launching, and coordinating large numbers of virtual machines has fundamental differences from what we know of traditional distributed programming and concurrent programming? If so would devising and implementing good tools for good programmers to build such systems endow a PageRank style competitive advantage?
VMWare, great software for running virtual machines, has grown up quite a bit since I last worked with it. VMWare Server is freely, as in beer, available. Plus VMWare has solidified the notion of pre-built, packaged, task specific vms called virtual appliances. And there seems to be a thriving market in virtual appliances.
[Via Jeremy Zawodny's linkblog 03/09/2006]
Apparently AOL has a research group focusing on information retrieval. Makes sense, and the've recently decloaked, releasing a number of large search related datasets and a search API.
[Via Matthew Hurst]
Google engineer Mihai Parparita did a study of webfeeds subscribed to in Google Reader. The goal of the work was to see what RSS/Atom namespace extensions were being used the most. The results were interesting albeit to a narrow audience of aggregator builders and webfeed wonks like me. Still it's good to have the information out there.
I was also struck by this casual toss off line by Parparita:
I wrote a small MapReduce program to go over our BigTable and get the top 50 namespaces based on the number of feeds that use them.I haven't actually seen this code but it feels like this was a one day hack.
Across a huge number of subscription lists.
Using a large number of parallel machines.
I'm engaging in a bit of speculation, but I think this is another example of how Google has a powerful, Web scale programming tool, developed by research, that enables frontline engineers to be creative. It's probably fairly rare to hear in the distributed, parallel, and high performance computing communities a sentence start with, "I wrote a small Foo program to..."
Jack Vinson picks up on some enterprise syndication features in KnowNow's new products. Looks like a meld of web crawling, feed crawling, and event notification on a big scale. There's also a hint of enhanced aggregation, which pulled Attensa's CEO, Craig Barnes, into the conversation. Apparently, Attensa is also doing enterprisey things with RSS.
I'd have to drill down on the postings and press releases to see if there's any there there, but where I feel desktop aggregation is a bit moribund, large corporate customers may be driving some "hidden" innovation.
Warning, serious nerd stuff coming.
So I'm trying to wrap my head around the recent buzz around DSLs, domain specific languages, given I was really into PLDI when there were a couple of academic conferences on the topic. As best I can tell, DSLs are really hot in the Ruby community, and JIm Freeze's introduction, "Creating DSLs with Ruby" is the most accessible I've seen so far. Can't say I'm overwhelmed so far, as these Ruby DSLs mainly look like crafty usage of a syntactic shortcut, eval, and default method lookup. Declarative style languages are handled straightforwardly but how does this face up to other design points. While very useful, I don't see a path to doing some of the major syntactic extensions ala Paul Graham in On Lisp, such as anaphoric macros. Another good point, made by Oleg Kiselyov, via Anton von Straaten, is that macros allow one to abstract over the stuff that isn't first class in the language.
Bonus: Lambda The Ultimate has a couple of good sections on DSLs
Neoformix has been having fun visualizing Boing Boing's post archive. The results are uncomfortably close to pie-charts for those with a Tufteian (Tuftian?) bent, but it's an interesting experiment.
[Via Matthew Hurst]
Wonder if undergraduate project presentations would be better done Pech Kucha style? 20 slides, 20 seconds per slide, no exceptions. Sort of like submitting to the discipline of the haiku.
I'm not one for lists usually, but I have to add two submissions to Filmcritic.com's Top 50 Movie Endings of All Time. First, since they're surprisingly light on horror films, I'd add John Carpenter's Halloween. Between the disappearance of Michael Myers, the look on Jamie Lee Curtis' face, and the legendarily creepy music you know in your gut that "You can't kill The Boogie Man." Plus, he's outside the theater somewhere waiting for you, maybe in your own house.
Second, a quintessential guy movie, Ocean's Eleven, has a quintessential bit of guy humor when George Clooney, as Daniel Ocean, is met by Brad Pitt's Rusty Ryan when Ocean gets out of prison wearing the tuxedo from the night he got busted in Vegas:
Rusty: I hope you were the groom.
Ocean (after a slight pause): Ted Nugent called. He wants his shirt back.
Not to mention the circularity of this scene as it ties to the beginning of the movie.
Bonus bantering
Tess Ocean: We need to get Rusty a girlfriend
Rusty: There's a women's prison just up the road
And further in the comedy vein, the end of Young Frankenstein is to die for. Madeline Kahn as The Bride followed up by Teri Garr asking "But what did you get from The Monster?" just leaves you howling.
Must... stop... thinking... about... this!
Just thinking out loud. What if you had crawler/monitor that recorded a daily snapshot of the popular items or front pages of the big social media and blog sites: digg, reddit memeorandom, slashdot, gawker empire, etc. Pull the links and stories out and record what the blogosphere does for 30 days around those stories, but don't publish anything. 30 days later, summarize and recap to give a delayed release retrospective. Rinse, cycle, repeat every day providing a continuing look back at the real(?) impact of the socially selected items. Provide as service and get rich!!
If someone ever says, "Did anybody ever look back at what happened with....?" say "We did!"
Of course all those blog search engines could already be doing this, presuming they archive back at least 30 days, or someone else could build it if they provided decent APIs. Such is life.
Okay, I really need to subscribe to Matthew Hurst's blog, Data Mining. There, done.
Hurst works for BlogPulse/Intelliseek/BuzzMetrics/Nielsen, whatever they are today. They specialize in bringing hard core techniques from the AI community to bear on analyzing and understanding a large chunk of the blogosphere. He's had a recent series of interesting posts on a variety of topics including distance based coloring in graph viz, the long tail of the blogosphere, and modeling the influence of visit traffic sent from other blogs.
Socialtext has open sourced its enterprise wiki software. I was gonna kick the tires to see what extensibility features Socialtext had inside, but looking at the roadmap I think I'll wait a bit.
Bernardo Huberman e-mailed last week to let me know of a new work he'd co-authored with Fang Wu. Entitled "The Economics of Attention: Maximizing User Value in Information-Rich Environments" (paper PDF), the paper presents a mathematical model which, given a set of attributes on pieces of information and a requirement that only a small subset of those items can be displayed to a user, maximizes the users utility, in the economic sense.
If you thought that sentence was a brainfull, you should see the math in the paper. Usually, with paitience and persistence, I can work out the math in a technical paper. However, this one uses terminology common to fields I've never been conversant in, and I have to admit this paper is a bit beyond my meager mathematical skills. Thankfully, the sections with the greek symbols are wrapped by a pretty straightforward context and example.
If efficiently iimplementable, this would be the type of mechanism I'd put inside a webfeed aggregator to creep one step closer to The Celestial Daily Me.
Yahoo!'s web services APIs for search are much nicer than any other search engine out there, and now the new developer's SDK supports a bunch of fun languages.
I don't use digg but they've launched a couple of interesting real time information visualizations for the stories that get collaboratively selected: Stack and Swarm. I can't do justice to Stamen Design's beautiful work with a text description, but they have surface slickness if not demonstrable long term utility.
Interestingly, the digg labs site threatens to release a public API for the data feeding these visualizations.
[Via Waxy's Links]
Link parkin': Blogmarks.net, a social bookmarking system resembling a combination of del.icio.us and wists, uses the Atom Publishing Protocol as its web services API (API Documentation).
Contrast and comparison with the del.icio.us API would make an interesting case study.
And just because I'm thinking about it, feels like it's time for a new generation of collegiate courses about engineering web applications. The user facing API and client side interface programming aspects have changed radically over the past year or two, not to mention the advances embodied by the various approaches to web frameworks. This particular topic also has a nice combination of practical market value for graduating students, but enough meaty underlying CS issues (UI design, DB modeling, scaling performance, software development methodologies etc.) so the course wouldn't be too vocational. Call it a team oriented, senior capstone and even the college administrators might love it!
digg has a sports page. Maybe I'll take a look now and see what all the excitement is about. Then again, if geeks run the page, it'll probably be a sad sight. It's my studied opinion that while many sports geeks are tech geeks, the converse does not hold.
[Via MicroPersuasion]
Joe Gregorio outlines how to implement the Atom Publishing Protocol, a RESTful API for publishing and editing Web resources, using Python. If you're grasping about looking for a way to provide a web services api for your application, starting with APP and folding in your own extensions or data model would be a good start. This would be a similar approach to what Google does with GData, which is just Atom, APP, and A9 stored queries.
Clickr is a Common Lisp client for the Flickr API, written by Mark Probst. Probst has a bunch of interesting code projects, including Metapixel, a Photomosaic generator, which I've toyed around with a bit.
In his blog, he also notes a big hole in the much hyped Zooomr: no community features, unless you count tags as a community feature.
There may be some outage with costarica.cs.northwestern.edu over the next few days. The server should be moving physical locations and IP addresses, so it might take the DNS entries a bit to propagate.
Just thinking out loud. What if Amazon Web Services LLC offered a virtual machine service that was as dirt cheap and simple as S3 and SQS? A buck a virtual network, a dime a VM instantiation (10 GB disk, 512 MB RAM, 1GHz P4 performance), bandwidth at the same cost as S3 (maybe a discount for using S3 from a VM). A great web based specing and ordering experience with 4 or 5 really obvious base installations (including all of the top Web application stacks) tuned for building Amazon Web Services apps.
Might not be differentiated enough from the Virtual Private Server providers (e.g. Rimuhosting) for Amazon to pursue, but it would be a nice computational complement to S3 and SQS.
Now that would be game changing!!
Link parkin': Visually pleasing, complex timelines presented using a DHTML/AJAXy widget implemented by David François Huynh. This is part of MIT's SIMILE project which is developing robust, open source tools for the Semantic Web. While I haven't been converted into a Semantic Web True Believer, SIMILE has an interesting portfolio of projects.
[Via infosthetics]
Kevin Lang and Vivek Tawde of Yahoo! Research constructed an RSS reader that clusters items based upon a fixed set of topics. The aggregator runs as part of the Yahoo! Widget Engine so it's essentially a desktop RSS reader. Also, from my read of the project description, the selected sources are rigged or at least initialized to a providential set. However the task is still quite tricky since the cluster sizes apparently aren't fixed and have to be learned from a given pool of content.
Another piece of The Daily Me drifts into view.
+1 to Bloglines for picking up my blog pings near instantaneously.
-1 for not dealing with duplicates properly in a number of feeds I read. Steve Rubel's feed is probably the most irritating, because I'm repeatedly getting 20 or so of his posts over and over. Not that I don't like the content, or didn't the first time I read it!
IronPython, an implementation of Python on the .NET runtime, is fast approaching a 1.0 release. Jeff Cogswell has a longish introduction to what IronPython is and what it can do. One thing I can't figure out though, is where IronPython diverges from CPython.
Bonus link: A March Q&A with Guido von Rossum, the creator of Python. He doesn't say much about what he's doing at Google, but drops some hints about Python 3000.
Hat tip to J. Scott Miller.
I guess the best analog in the US is SXSW, but Futuresonic Live's music and Urban Play's artistic focus speak to me more, especially the Social Technologies Summit. There's gotta be something like it here in the states!
[Via Tom Carden]
Purchasing Baseball Hacks motivated me to refresh my horrifically stale understanding of basic statistics and probability. The last time I seriously engaged with stats is over 20 years ago for a sophomore course, and oddly I've never been forced to reconnect. Casting about for some additional material, I landed on two works authored by Jim Albert, a statistician at Bowling Green State University.
His Teaching Statistics Using Baseball is a thin work which is best used as a supplement to a traditional stats course. For example, there's a couple of places where the book assumes knowledge of MINITAB, a popular entry level stats package. Curve Ball: Baseball, Statistics, and the Role of Chance in the Game, written with Jay Bennett, is a more complete book, with few assumptions of the reader. Curve Ball clearly uses a lot of basic material from Teaching Statistics, so they're somewhat redundant. If I had to only recommend one it would be Curve Ball.
However, both books are good at highlighting the role of chance and working through concrete examples that separate performance, which is something observed, and ability, which is something claimed to be innate. Their best example is the streakiness of batting in players. One can often see negative and positive streaks in a player's hitting performance, but is that streakiness part of the player's inherent ability? In situations involving chance, some observed performance is simply attributable to randomness. Curve Ball and Teaching Statistics both delve into how to model and reason about these issues with concrete examples that are interesting to the average sports fan.
Also, the examples are easily translatable to hacking in Python, especially since Retrosheet makes a lot of baseball data available. Fair warning though. Their game event data files reflect the inherent messiness of scoring baseball, so parsing them isn't trivial, but that's a fun challenge unto itself.
Now to mashup the last two posts, Jeff Barr highlights the Cardbox application. Cardbox looks like a Win 32 semi-structured database application, akin to Mac OS's venerable Tindebrox, and a contemporary of the late, lamented HyperCard. As advertised, Cardbox has both remote scripting and plug-ins. Barr points out that the Cardbox extensibility has been hijacked to support S3 integration for internet backup of Cardbox databases and explicit inclusion of S3 objects.
In November 2004, I gushed a little bit about Amazon's Simple Queue Service (SQS), a web hosted, reliable, persistent, messaging queue. Back then it was free, but beta. Now the SQS has gone production, meaning the feature set is stable, you can get an unlimited number of queues, and it costs money. But SQS is cheap on the scale of Amazon's S3.
Speaking of which, SQS and S3 start to make a nice foundation for building Web applications which I'm sure is Amazon's intent. You provide the compute engines and user interface, and they host the hard parts of the web scale infrastructure.
I've said before that I didn't think S3 was a game changer, but Amazon's web infrastructure strategy might be.
John Gruber articulates most elegantly why the combination of external scriptability and internal extensibility make for powerful applications and eliminates some of the demand for open-sourcing those apps. In particular, while I agree with Kevin Burton that an open source aggregator would be nice, especially for research purposes, one with excellently designed scripting and plug-ins would be even better.
On Win32 the external scripting part isn't taken nearly as seriously as on MacOS even though COM is fairly decent for this purpose. Other than Microsofts's own apps, most tools have non-existent to crappy remote object models. As far as I know, no aggregators really support remote scripting. Awasu is one of the few that I know of that has any plug-in model.
If I ever teach my scripting languages class again, I also need to hone in on this point, since such languages are really handy on both the remote scripting (duh!) and plug-in sides of the equation. This generates powerful effects.