LOL. Lucas Gonze's new mailing list idea is so stupid, it just might work.
A pretty deep geek signifier is when you're excited to get your review copy of Jon Kleinberg and Eva Tardos new textbook Algorithm Design.
A quick scan indicates it's not completely graph/network wonky, but that the motivating problems are definitely a little more up to date than CLR although I don't have the second edition of that textbook.
Erik Benson, in summarizing his view of ETech and SXS2 2005, hits on a point which gets continually washed out by the taxonomy/folksonomy arguments: people tag things for many different reasons, concluding that people aren't being rational in their tagging. The big hope is that a pile of autonomous labels will reveal patterns of classification.
Hidden in there are patterns of communication, coordination, and collective activity. C. f. The Gates Memory Project, Squared Circle, and Day In The Life Of....
Tags are used as much for signaling others as they are a remembrance device.
Oh, and being in Chicago, I have to say "less is more".
The Street finds its own uses for things - uses the manufacturers never imagined.
Rocket Radio, William Gibson, Rolling Stone, June 15, 1989.
The rumor mill can move on. Done deal, Yahoo! has acquired Ludicorp. Let's hope they see a better fate than Blogger.
Now's a good time for me to get off a few cheap shots on Flickr though, ;-/. One, what's up with an XML-RPC API where every response is an XML-RPC string of the REST API response? No wonder no one uses the XML-RPC API. Lame.
Two. Annotations are FotoNotes on the surface, but downloaded photos don't have the annotation data embedded in the metadata headers.
Nits though. Those fine folks have been doing good work, making photos of the Web. Expecting more good things to come...
Scot Golder, who I've shared a couple of meals with at conferences, is investigating shared annotation of webpages, through his Webbed Footnotes project. I'm interested to see how the rating system works out. Unfortunately, I've got way too much work to seriously participate in someone else's research project, but maybe some of my 6 or so readers will join in. Besides, I don't read the NY Times, the target publication, online that much.
P.S. That's a joke son. I know there's more than 6 of you out there.
Joe Kraus, head honcho at JotSpot, recently posted about long tail for software. The rough idea is that in any business, there are a bunch of company and situation specific business processes. This becomes a combinatorial explosion of opportunities for software solutions. The current solution for businesses is Email+Excel (might as well read Outlook+Excel), which has issues with version control, change notification, and integrating other documents. Of course the Wiki based JotSpot is designed to solve these problems.
The thinking has gotten plenty of play amongst tech blogs, but I'm a bit skeptical, and mainly thinking out loud. Not so much skeptical that there aren't a bazillion business process instances, but that they can be aggregated like "searches" or "product matches". Also, business processes aren't so much disposable as adapted. Yeah you might be able to think of each job search requiring a slightly different business process, but it's not started from scratch.
Maybe I'm putting too fine a point on it, but there's a big difference between a bazillion problem instances and a million problem classes.
I have a real soft spot for cafes, up to the point of working in one for about a year as a grad student just for the hell of it (Nefeli for you Berkeley denizens). Recently I've been meandering throughout Evanston getting most of my work done at Mud, Liquid, and Unicorn. I'm a multiple visit per day type of person. Name a cafe in Evanston, and I've been in it.
Recently, I've been feeling, not quite ashamed, but odd pulling out my laptop. There's now a teeming horde of laptop folk. It's gotten to the point that you'll have pairs/couples working across two laptops at the same table.
Well back in my old stomping grounds, Jon Snydal, Damon McCormick, and Sean Savage are investigating this effect, among other things, with Project Placesite. I couldn't quite put my finger on what was bugging me until I read their exposition of The Zombie Effect. Publicly staring deep into a virtual world that others can't join in is probably a muted form of the public cell phone conversation effect. Of course, I'm a prime offender.
Oddly enough, I'm not sure this is actually a problem per se. Two of the cafes I frequent are essentially startups, and I know they would be seriously struggling if they didn't have a stable of Zombies drawn by free wi-fi. Plus, I don't particularly find laptop folks any more or less approachable then say snobby looking folks reading The New Yorker.
Last point, Evanston would make a very interesting case study on the commercial impact of Wi-Fi on small vendors, especially cafes. We have a local incumbent which, seemingly reluctantly, moved to for pay access. This despite heavy usage by a CS department and tons of laptop carrying undergrads. (Aside, our VP of IT claims 80% of our frosh came to campus with a laptop. Yikes!) Meanwhile, the aforementioned upstart cafes have arisen, along with Wi-Fi in a few other sit down spots, plus the deployment by the big boys like Barnes & Noble and Borders. I don't know if the slow move to Wi-Fi made a difference on Unicorn, but I see a lot of customer distribution to the new places.
Someone needs to take Colorization Using Optimization and hack it into a tool for munging Flickr photostreams. In fact a general tool, or Photoshop plug-in, for pulling down Flickr photos, hacking them and reinserting them would be pretty useful.
For the intrigued, the referenced paper allows end users to do some color scribbling on a monochrome image, followed by an automated pass to colorize the image in its entirety. Seems to generate pretty good results and to also be extensible to video. Kewl!!
Apropos of nothing, I don't particularly care what's in Ted Leung's bag, but I like knowing what his bags are.
I don't talk about sports much here, but I'm a closet sports junkie. I grew up at the intersection of ACC and Big East country. I've been listing to sports talk forever, practically daily since about 1993. The first weekend of the NCAA tournament was essentially a 4 day holiday for a few years in grad school.
In honor of that fine event, I'll tip my hat to PaidContent.org for actually taking online sports seriously and pointing towards Alan Karben's Statsology blog. The site is a veritable cornucopia of links to topics I've been looking for for years. I always thought sports would be a fine segment of the online market to be involved in. Passionate communities, tons of numbers to crunch, serious money and a patina of real journalism, what's not to like?
Karben is doing a great job of hitting each of these points in his coverage of the Fantasy Sports Trade Association. Who knew! This strikes me as a space where some social software experiments would turn up some interesting results.
P. S. Looks like I went 16-0 on a bracket today, and while I do play multiple sheets it's not like I go hog wild. One rational, one radical, and one for rooting does it.
Bob Wyman thinks blog tool innovation is marked for death with an apparently serious entry of Microsoft into the space. Toss in Yahoo! getting into the game and you can start to see some serious lockin developing. Then again, the sky was falling when AOL journals was about to come on line too. Wyman is also being a bit parochial in that he's an advocate of structured blogging essentially the embedding of more task specific tools within blog apps. If blog tools go the way of IE development, then he's got a point.
One possible interesting effect of this development is a serious arms race around Web services. If it's still about getting developer mind share as well as users, then being smart about Web services will be a serious strategic advantage, and the field is relatively level here. With big time stakes, we may get an answer to that SOAP v. REST debate real quick. Also, I generally don't bet against MS developers, but buildling scalable, reliable, secure web services is new territory for them. Has MS actually delivered a Web service API that folks regularly build on?
In fact, other than Amazon I don't feel too many crews have actually gotten serious Web services right. Flickr is about as close as I've seen from a small shop, and they're still working out glitches. Google's AdSense might be roughly considered one as well, but concrete positive examples are few and far between.
If you're a researcher into various socially oriented technologies, you could do worse than to work through Michael Pazzani's Web Personalization seminar syllabus. Pazzani's a long time machine learning guy and also did a stint as a director at NSF.
Via Seb Paquet
Despite the fact that I'm just this side of detesting screencasts, there were 5 seconds of insight at the end of Jon Udell's whirlwind overview of del.icio.us. Of course since it's continuous media, I can't go back and find the exact quote, so I'll paraphrase, regarding how languages develop from pidgin:
All you need is an environment that let's people speak to each other, hear each other, and adapt their behavior to what they hear.
Recommended if you don't know much about del.icio.us.
Link parkin': For that next whizzy presentation, open source vector graphics, many at least halfway decent.
On the heels of Adrian Holovaty dismissing EmPRINT, Vin Crosbie makes the most eloquent case against digital editions I've seen yet.
David Sifry is posting The State of the Blogosphere from Technorati's perspective. Precis: big spike in weblogs tracked, lots of spam weblogs detected.
I'll reserve commentary other than to say two things. One, more of these from other institutions would be useful. Some independent verification would be nice. Two, more transparency on how such numbers are derived is sorely needed.
The scientist in me is looking for reproducible results.
Thought parkin'. Most of the photo annotations I've seen are pretty banal. And it doesn't appear that many of the photos on Flickr take advantage of this capability. I'm willing to bet a trivial percentage of Flickr photos are actually annotated. Annotation is work. Most photos probably speak as a unit. What's the point?
Two options. One: annotating becomes cooler, easier, more interactive, more networked, more weblike. Two: photos automatically generate a small set of annotations, which people can easily filter and add to.
Despite a terminally confusing display and interaction, Lucas Gonze and Olivier Nerot have teamed up to provide a tantalizing Java based visualization that provides interactive maps of Webjay's playlist space.
Memo to social software folks, there might be a network in your system, but network visualizations are notoriously hard to make comprehensible. You're probably better off looking at other types of displays that are driven by results gleaned from network analysis.
But they do make pretty good eye candy!!
Microupdate added visualization apres Lucas Gonze's response.
No, not Bill Gates memory project. Christo's Central Park Gates.
Flickr and The Institute for the Future of the Book (is it me, or is there an institute for everything, other than the enrichment of junior faculty?!) are teaming up to record an online visual memory of The Gates installation. They don't quite know what they're going to do other than collect the photos. All those photo annotation and management folks should be licking their chops and proposing stuff.
Hint: since all the photos are just going into Flickr, public no less, with a particular tag, you can use the Flickr APIs to do some cool stuff.
musicplayer is a Flash based audio player that's driven by XSPF playlists. Concocted by Fabricio Zuardi, but contextualized by Lucas Gonze, musicplayer makes it easy to put a little Web audio player, directly into a Web page. Spiffy!
Google points Brian Dennis to the last place I thought it ever would. But as anticipated, I still own media hack despite Wired News' might. And it's not like Penenberg has been slacking.
Still fightin' the power laws!!
Now Zoundry has an interesting approach to making money off of blogging. The've developed what looks like a nice blog editor, that works with Atom API supporting tools. The twist is that the editor makes it easy to add links to products, presuming you're writing reviews or other content that might encourage a purchase. Meanwhile, Zoundry provides a complementary affilliate program centralization service, which will manage your product click throughs, for a small cut. The win is that you sign up one place, but reap from multiple programs.
Doubt if this is really for me, although I may try the blog editor as I really haven't seen one I like for Windows. I don't do affiliate programs. But this strikes me as a pretty innovative way to get desktop software to pay for itself.
Infosential is Tim Duckett and Wayne Robinson. They're a small UK tech consulting concern focusing on usages of social software in business. Vanilla enough.
They're workflow for helping a busy exec stay on top of the tech blogosphere is quite interesting. They use what I call watch engines (PubSub, Google Alerts) to track topics of interest to their client as an incoming stream. Then they sift and edit the material to construct short spoken summaries recorded in MP3s. The summaries are then shipped as podcasts, using RSS, through an aggregator, into the overwhelmed execs iPod shuffle. Slick.
Now this strikes me as a decent alternative business model for aggregation and podcasting. Alternative in the sense that it really doesn't need advertising. As long as you're not looking for megabucks success, subscriptions could support a decent business I think.
And for you wankers thinking about how to automate the human voice out of the loop, give it up. It's probably easier to outsource the problem to India.
Hat tip to Bob Wyman, who rightly notes the very real small scale, social issues a busy exec faces: on airplanes all the time, can't slog paper, often disconnected, need for easy, cheap tech, etc. etc.
Oh yeah, and more ammo, competitive intelligence, for why this aggregation stuff is important.
Blogdigger, a webfeed search engine I should have linked to a long time ago, and Webjay, the open playlist emporium are teaming up to support better continuous media search. If Blogdigger can help Webjayers find open source media faster, the world will be a better place.
I'm late to the party, but Joe Gregorio is writing a column on building RESTful applications for O'Reilly. I particularly liked the article describing how to build a RESTful bookmark service, sort of like plumbing for a del.icio.us knockoff. I wonder how much better, worse, and/or different del.icio.us would be with similar design.
Hat tip to Ryan Tomayko's link blog.
The New York Public Library unleashed a huge archive of images on the world. As something of a curator of these sites, I'm always curious as to what the reuse terms are. Apparently, most of the NYPL images are in the public domain, says the faq, but then mumbles about "freely available for personal use". That doesn't jibe does it?!
One other thing about these "public domain" archives. Why don't organizations ever make bundled versions of these collections available? Suppose I'm a scientist (I play one on TV) who thinks a collection of images, or recordings would make a great dataset. Maybe for some image analysis expriements, or explorations into digital media authoring tools. What're my options to get a hold of the complete archive in a nice handy tarball or iso? One, ask nicely and pray. Two, write a Web crawler. Both feel a bit,...unpalatable.
Can't quite figure out what the heck Wists are for, but they've got tags, buddy lists, and apparently mechanisms for grouping tagged items. Looks like del.icio.us with more spit and polish and some straightforward interface and collaboration extensions.
David Galbraith seems to be the ringleader behind the project. I have to point out though that automatically constructing social networks based on a click will eventually be counterproductive. Tire kicking leads to network management leads to work weeding out stuff every time I just test drive a tie. Blech!
Odd how Wists and Blogmarks, another thumbnail/bookmark combo, surface in the same week. Oh, and what's the over /under on first porn sighting?
Update: Fixed missing angle bracket, so I'm not dissing David Galbraith, who's a fine chap as far as I know.
Yahoo! has joined the party providing Web Services APIs ala Amazon, Google, Technorati, Flickr, del.icio.us, et. al. Jeremy Zawodny has a good roundup of Web commentary.
One thing I was going to kvetch about was the fact that they have a usage limited API key, like Google's 1 grand calls per day. This feature makes it tough to build a desktop client to distribute for testing, because any decent number of users would quickly chew up your call allocation. According to Phil Ringanalda though, there's a bit of method in their rate limiting madness. Since it's IP address based, roughly each user get's charged for calls, not the developer. Hackers can distribute away, although you can't be profligate with calls, or you'll cheese of users.
Congrats Yahoo!
P.S. SOAP is passed over again.
At HICSS 38, John Seely Brown had pretty clearly focused on "mass amateurization" as one of his talking points. Mary Grush's interview with Brown in Campus Technology, captures some of what he's been thinking about. However, this Demos white paper on "The Pro-Am Revolution", by Charles Leadbeater and Paul Miller, better captures the spirit and in more detail.
A Rice CS team of Sandler, Mislove, Post, and Druschel have sketched a scheme for distributing webfeed content using P2P publish and subscribe, called FeedTree. To appear in the 4th International Worksop on Peer-to-Peer Systems, FeedTree is designed to help deal with webfeed bandwidth issues. I call it a sketch because the paper is a bit short on implementation details and has no performance analysis, but as Wes Felter says, "maybe Rice will actually ship".
The tricky bit, which the paper tries to address somewhat but not satisfactorily, is how to deal with the legacy implementations. You could be looking at an IPV4 vs IPV6 situation again. Also, deploying to PlanetLab seems lame to me. Build something real publishers might want to use and go from there, even if it's a limited subset of the real world. Hack up a budding, open source, client like Jaeger and work with one of the webfeed engines (PubSub, Technorati, Bloglines, FeedBurner) to see how this could play out. That'll provide bigger impact plus an independent, empirical analysis of webfeed behavior would be more useful than simulations on PlanetLab. Besides, you'll need a model for those simulations anyway.
Visible Path has started a blog a relationship capital, nee social network, related blog entitled Centrality. Looks like it's going to be a bit heavy on the corporate angle, but Stanley Wasserman, who literally wrote the book on social network analysis, is committed to writing for the blog, so it can't be all bad.
Ted Leung notes that he interacts with the Web much more through his aggregator than his browser, and he subscribes to a lot (1000?) of feeds. Combine with aggregator side analysis and he sees it as an opportunity to productively burn future cycles.
I have to respectfully disagree. When we're tallking machines in the Ghz range, and I think 1 Ghz is the low end of desktops these days, with 1/2 Gb of memory, they should be able to analyze 4000 RSS items a day easily. This is purely a gut feeling, but I'm betting aggregators are actually network bound, not CPU bound. There's relatively mature, optimized packages out there for the techniques Leung's thinking about. Just as an example, see how Steven Johnson uses DevonThink, which must have some similar stuff inside. Granted Johnson's research material might not be growing at the same rate as a complete archive of Leung's blogroll, but this analysis stuff has been pounded on for a long time. It's really only when you get to Google/Amazon/Yahoo!/MS scale problems, trying to do it for millions of users in real time that things get hairy. Besides, Leung is probably an outlier although I realize making things better for prosumers typically propagates to all users.
Now more cycles for rendering analysis derived pretty pictures, visualizations, charts, and graphs , especially interactive, might be a welcome driver for future CPU sales.
Just a thought, can we get enough information to make decent decisions about long tail items in a hyper-niche information market? If you anticipate a power law distribution, most stuff won't be rated, linked to, or written about enough to decently feed a recommendation engine. Does it even matter? If stuff is cheap enough, I just need to get in the vicinity of items that might be attractive to and then I can root around to pick something. A few clunkers doesn't really matter as long as I get a steady stream of useful things, and a few hits.
Maybe the best long tail tools will help people window shop. Instead of trying to make recommendations, make the overall process of sifting enjoyable, efficient, and social.
Rafat Ali jots a note regarding Trendum including mention of a $3 million investment from AC Nielsen, and former big name clients such as Time Warner, CNN, and HBO. Trendum specializes in monitoring, analyzing, and reporting on various parts of the Internet for their clients' benefit.
On a couple of occasions, I've been given a hard time about why anyone should be interested in better tools for analyzing and understanding the blogosphere. How much analysis do we need of navel gazers, rabble rousers, teenage girls, and cat owners? The money placed in Trendum is a pretty concrete example of why this stuff might be important.
Leanord Richardson has cooked up the Ultra Gleeper essentially a single person, or small group, recommendation engine for Web pages. The Ultra Gleeper is interesting in that it uses blogrolls, watch engines (Technorati, del.icio.us webfeeds), and ratings in an attempt to avoid a number of standard problems with recommender systems. You can read all about the design in Richardson's CodeCon paper.
I like it a lot as a potential experimentation platform, but in thinking a bit about potential uses, I forced myself into a conundrum. Without a specific task, what do people need recommendations for? This is a case where good enough is the enemy of best. If I'm just browsing, surfing, and trawling the blogosphere, I've got more than enough feeds to provide good morsels. If I need more serendipity I can just tap into Findory or the various dexes.
But if I don't have a particular task, how can a technology say it's made my life any better? Greg Linden will say news personalization has "learned what I need" and is giving it to me, but by what metric?
Alternatively, how do we know the Ultra Gleeper or Findory are doing any better than random? And what does "doing better" mean?
However, all hope is not lost. There are plenty of places where a task might be explicit or implicit and a recommendation engine might be appropriate. For example, if I really am a blogger, writer, analyzer, synthesizer ( no jokes from the peanut gallery sometimes I wonder too), then a recommendation engine can keep its eyes on the lookout for corners of the world I should be interested in, given what I write on. This is a soft definition of what I need, but at least there's some hope of knowing that I get better at it by getting what I need. And there's plenty of contexts and activities where a similar story can be told. I pity the recommender's though, who don't have much of a picture of what the user is trying to achieve, if anything.
Michael Sippey entertains that mashups might be automatically generated
+1
Seems like a no brainer for a genetic algorithms or reinforcement learning researcher to try and tackle.
Moment of silence for Stating the Obvious....
Thank you.
I don't know why I hadn't been subscribed to the weblog for Vin Crosbi'es consultancy Digital Deliverance, but I like what I've been seeing recently. In particular, his missive for media companies to focus on customer relationship management in their online efforts seems right. No use shoveling your print/broadcast content down another pipe without any differentiation.
He also tells a fascinating tale of what's going on at Critical Mention, a startup that's leveraging computing to monitor TV programs for corporate intelligence and PR management (my summary). The bigger win is in techniques for making digitized video searchable.
Memo to Vin: check out your archives and permalink pages. I don't know if it's intentional but they sure don't look pretty at my end. Looks like a missing stylesheet somewhere.
Update Tweaked Critical Motion to the correct Critical Mention.
BoingBoing is ably demonstrating a looming difficulty with webfeeds and advertising. BoingBoing is in the Creative Commons, using the Attribution-Commercial license. If I "remix" their feed, for free, and remove the ads, I'm guessing they wouldn't be happy.
The bigger tarpit is a potential clash of commercial interests. What if some other commercial entity wraps aggregated webfeed content with contextual ads that are in direct competition with the ads in a feed. Think trying to make a buck off of MyYahoo! The content providers probably have the legal high ground and are fair in demanding some kind of payback. But then middleman or client side wrappers and remixers have to negotiate hoards of licenses. Or bend over for centralized providers like Moreover. Strikes me as a mean disincentive to provide high quality aggregation services, which cost money to develop, deploy and support.
Maybe there'll be a middle tier of "prosumer" aggregators, desktop one time fee, or web based subscription supported. These guys will innovate on useful but not wildly popular features where you can get paid by not conflicting with the ads.