home ¦ Archives ¦ Atom ¦ RSS

Bar-Yossef & Rajagopalan: Template Detection

If you do any web indexing or information retrieval on HTML, templates can easily screw things up. Results from WWW 2002 indicate that there's hope for detecting and leveraging template elements.

Key nuggets use tree structure, administrative authority, and link occurrence counts to find the recurring elements. Also, it can be done reasonably fast, with standard RDBMS technology.


Huberman & Adamic: Network Infodynamics

A new (at least to me) social networking paper from folks at HP's Information Dynamics Laboratory, Bernardo Huberman and Lada Adamic. "Information Dynamics in the Networked World."

You know, if you really want to be on the leading edge of this stuff, you need to keep an eye on the physicists. That is if you can handle the math.

Update: Alex Halavais e-mailed to chide me for possibly slighting any traditional social network analysts out there. No slight intended, those folks have been on the leading edge of this stuff for decades. Hope you're not in a tizzy!


Leung: RSS Bandwidth is a Problem

I *think* Ted Leung is taking issue with my recent posting regarding RSS bandwidth. I got linked, but "Quite a few people have posted" on the issue.

We agree on one thing, RSS bandwidth could be a major issue if aggregator usage explodes. And I also agree that's definitely going to happen, not even a probable.

However, it should be clear that part of the success of RSS is that by using HTTP, publishing RSS is damn easy for authors. Combined with loose formats that means transforming plain old HTML into a feed is trivial. Ergo, lots of content for all those aggregators. It's something of a virtuous cycle.

Why is this an issue for solving the bandwidth problem? Any solution will need content providers to buy in. If you're going to try and pry them off of HTTP, well good luck.

Lastly, my comment about RSS being a small part of HTTP traffic, is targeted towards my own "fancy P2P schemes" thinking. Knuth's commentary about "premature optimization" was chiming in the back of my head.

In summary, I'm not saying RSS bandwidth isn't or won't be a problem, just that whizzy non-HTTP solutions have a low chance of success. IMHO.

Memo to self: someone needs to do a traffic characterization study across a wide range of HTTP/RSS feeds.


Nichani & Rajamanickam: Classifying Interactive Explanations

Maish Nichani and Venkat Rajamanickam have examined a number of interactive presentations, including a few on news sites, and come up with a simple classification scheme.

Looks like a promising start, might be worth comparing with Nora Paul's attempts to come up with the "elements of interaction". Also, I might add a "comparative" category, that works through contrast and comparison, especially of images.

Thanks to the WebWord.com Weblog


Thomson: RSS Comic Strips

David Thomson, a.k.a. dwlt.net, is providing Tapestry, which serves a number of comic strips through RSS. As a bonus, he has a nice catalog of RSS aggregators.

One more section for the RSS Newspaper.


Rosen: PressThink

Wow! Jay Rosen, the chair of the journalism program at NYU, has been keeping a pretty spiffy MT based blog.


Kaufmann: On Sports Blogging

King Kaufman, in Salon, writes about the disparity between the number of baseball and football bloggers. Conclusion: baseball has a daily rhythm, plenty of stats, and a literary tradition that all lend to decentralized, non-professional commentary. However, I was interested enough to get a daypass just to find out where these blogs are.

Why? With decent sports blogging, you could probably synthesize all the sections of USA Today out of RSS feeds. There's plenty of US politics related feeds for the front page, the BBC's RSS provides a pretty good start to an international section, you can get A&E to whatever granularity you want, and sports would almost complete the mix. Tech business falls out pretty well, but I'm not sure about more general economic/business reporting even in the vicinity of The Wall Street Journal. Maybe that could be cobbled out of vertical industry feeds.

Just a thought.

Memo to self: isn't it about time for someone to predict the death of Salon again?

Memo to Salon: Your RSS feed headlines are slowly sucking me in. Must... resist... subscribing...


sjdesign: Back Office Bag

The Big BackOffice computer backpack by Shaun Jackson Design is one of the few I've seen recently that I might actually buy. I've got a bandolier style that's been serviceable for the past couple of years, but I really prefer double straps. Most of the stuff I've seen out there are courier cargo bags, which are okay for short trips and light loads. But I tend to carry a couple of books in addition to my TiBook, which adds up quick, and screws up your neck and shoulder just as fast.


Ogbuji: State of Python & XML

Also thanks to Ted, Uche Ogbuji on "The State of the Python-XML Art"


Bowers: RSS Bandwidth Efficiency

Jeremy Bowers makes a number of good points about how current usage of RSS is pretty wasteful of bandwidth. He even proposes using rproxy to improve the situation.

Sometimes I think this is an interesting problem to tackle. Yesterday, I was cooking up fancy P2P schemes for scalable, persistent distribution of RSS fragments. Not an original idea, but just something to kick around.

Then I remember that a) RSS traffic is still ridiculously small relative to regular HTTP traffic and b) part of RSS's popularity is that it does just use HTTP.

It's a format optimized for publishers not consumers, so if you want to fix any problems you'll really have to start at that end of the pipe.

Thanks to Ted Leung


Futures Lab: Access Grid

Link parkin': The Access Grid Project. Since it looks like I'll be working with these guys.


MIT-Stanford Vlab: Social Software $$

Uh, oh. Here come the venture capitalists.

Social Software == Internet Hype 2.0?!!

Discuss.

An observation though. All of the software being discussed only makes minimal use of the lessons of social network and computing research. Friendster strikes me as a triumph of marketing and cultural fascination than being revolutionary from a technological or sociological standpoint.

Then again, Tim Berners-Lee didn't know much about hypertext either.


Bombich: Carbon Copy Cleaner

Link parkin': Carbon Copy Cleaner, for when I get the dough to upgrade my TiBook.


Pearson: Ecosystem Data

Philip Pearson of Second p0st has been gathering weblog network data.


SFSU: XPress Online

I've been trying to sow the seeds for moving the Medill News Service CMS to Movable Type. [X]Press Online, out of San Francisco State University's journalism program, looks like a nice example of what can be done. Link parkin' for future case making.


Creative Commons: Movie Contest

Need content? Run a contest!. Ditto if you've got content.

Refining my creative contests trope, organizations with extensive content archives should be doing exactly what Creative Commons is doing. And on a regular basis.

Yeah there's some work in screening the submissions, but you get fresh, cheap content, you get a closer connection to your audience (both creatives and passive observers), and you get targeted (read more lucrative) opportunities for sponsorship and advertising.

Heck, depending on the material you could argue that it's journalism, giving more nuance, context, and perspective to stories.


Rojas: Digital Technics

Peter Rojas of Gizmodo sparks the techno lust with a report that Technics is inching towards a device with the SL1200 form factor, but really digital inside.

Imagine the cool points for hacking that. Talk about getting brothers into software...


Want: Mobile Personal Server

Acknowledging the limitations of a house interview, Intel's piece on Roy Want and the personal server is interesting. For what it leaves out.

Slap Bluetooth on an iPod and don't you have a personal server? Want's a smart guy, and there's smart folks in the group, but what's in the article doesn't sound particularly challenging.

Also, I'm wondering if users will really bite on a device that is always dependent on other components. Strikes me you'll have to attach some kind of I/O just so the server isn't rendered completely useless in places where it's inconvenient to install the supporting technology. Maybe the deal is to make the server fit in your shoe (a non starter unless you expect to wear the same shoes every day), make the cell phone the minimal I/O device, and then if you get in presentation rich spaces interact.

Okay now there are issues.

And making the server into jewelry (ring, bracelet, belt buckle) might be a better bet.


Parks: Shrook

Link parkin': Shrook is yet another RSS aggregator. However, this looks like it might actually be competition for NetNewsWire.


Glaser: Online News Pioneers

Mark Glaser, writing for Online Journalism Review, has shipped the first part of a two part series on folks who've been doing newsish things on the web for 10 years. Apparently, there was no news on the Web until the University of Florida did a journalism web site.

Anyhoo, Glaser's first chunk features interviews with John Battle, Ana Marie Cox, Bernard Gwertzman, Craig Newmark, and Dave Winer. Check out the article if you don't know who they are.

I found Ann O'Tate, er Cox, to be the most cogent of the bunch. "That said, I think what's really revolutionary about weblogs isn't their content..."

Moment of silence for Suck. I've still got various tchotchkes they gave away. Sniff.

Where's the Suck for the new millenium when you need it?


NMH: Commodity Virtual Servers

Link parkin': JohnCompanies, JVDS.

Both of these guys provide virtual servers (Linux and/or FreeBSD) to the public.

JohnCompanies?: $65 a month, to start. Nice!

JVDS?: $15 a month to get going. Yowsa!!

I'd have to look at the performance numbers, but, there's basically no need to fork over any money to Dell (or anyone else) until you:

  1. Need desktop graphics
  2. Need privacy
  3. Need high performance
  4. Need a Microsoft OS
Frankly, for anybody who's time is worth anything and knows (or can learn) an open os, you'll recoup the costs in admin hassles.

The entry cost of owning your own server is near zero.

Thanks to 0xDECAFBAD.


Jain: Blogstreet

Link parking: Blogstreet, yet another MetaWeblogService (MWS) to steal a term from the Waypath guys.

Thanks to Scoble for the reminder.


NYTimes: Discussing Photos

Another note from the CyberJournalist.Net bulletin (still digging through leftover honeymoon e-mail), mentions that The New York Times online is now doing a "week in photos" feature. The new wrinkle, since MSNBC and The Washington Post do this already, is audio commentary from the photographers and photo editors.

Excellent use of the infinite news hole. Also, mining photo archives, either internally or by outside users, could make those extra images productive. As a great historical opportunity, why not give the world some kind of access to all of those unused 9/11 photos?

Memo to self: a serious study of photo archive usage on news websites would be a good capstone project.


Batagelj & Mrvar: Pajek for Network Analysis

Pajek looks like a pretty good tool for analyzing really big networks.

Thanks for the tip from Blog De Halavais.


Udell: RSS for E-mail, Bah!

Jon Udell echoes the silent majority in saying, "There's been a lot of talk about replacing email with RSS. I don't buy it." RSS is overkill to deal with ad-hoc, unsolicited, one-to-one communication.

So you say it can be fixed up? Well let me put it to you this way. If you thought the Necho/Pie/RSS/funky feed wars were meanspirited, tedious, and unproductive, imagine the same debates with billions of dollars at stake. For every RSS user currently, there must be 100's, if not thousands of SMTP/IMAP users.

Actually, it wouldn't be much of a war, because Microsoft would take the opportunity presented by the confusion created in overhauling the system to stamp out all the other e-mail vendors.

E-mail is too big to fail.


Google: Friends Newsletter

I've been digging through some old e-mail, that went unattended during my wedding buildup. One piece was a CyberJournalist.Net alert that pointed to a short interview with Krishna Bharat, principal genius behind Google News. Of course it's in the Google-Friends Newsletter so it's a bit of a softball, but still, interesting insights into how the dang thing works.

Heck, I didn't even know there was a Google-Friends Newsletter.


Waypath: Yet Another Dex

Squirreled away in JD Lasica's blog is a link to Waypath, yet another in the crop of indexing engines for weblogs. Definitely a bit less publicized than its counterparts: Blogdex, Daypop, Technorati, et. al.

Update: The Waypath archives contain links to a whole bunch of other metaweblog (as the Waypath folks put it) services. June looks particularly ripe.

Memo to self: check to see if weblogs.com and blo.gs got pinged.


Bergman: Competing for Last

Somewhat stale, but still new to me is a piece Cory Bergman penned over at the The Lost Remote. Entitled, "Competing for Last Place" it posits that competition in the news industry actually breeds a deadly sameness. To succeed newsrooms need to strike and cover the same old stories, but with unique angles.

Bergman is mostly focusing on TV newsrooms but in the short term, the thinking is also applicable to news web sites. After being burnt in the mid-90's by wacky innovations (anyone remember the Chicago Tribune's old TV/Flash like front page), news sites on the Web have slowed down the innovation. They're all starting to look like the front of the NY Times.

On the web though, there's still plenty of room to do interesting things, especially in providing services for end users and developers to build on, and still look like everyone else, e.g. Yahoo! News providing RSS feeds.


Yahoo: RSS News Feeds

Soon to be making the rounds, Yahoo is now providing RSS feeds for its news pages.


NMH: NewsGator Scriptable?

Thinking out loud. NewsGator is an RSS aggregator embedded within Outlook. I haven't looked recently, but I bet Outlook is still scriptable, probably still with VisualBasic for applications. This probably means one can trawl over at least some of NewsGator's data about your feeds.


Jon: Yet Another Blog Index

By dint of poking around in Daypop's citations I ran across Nicholas Jon's blogosphere.us. On the surface, blogosphere.us looks to be similar to Blogdex and Daypop, but as is usual, its techniques aren't documented.

Update: found a bit more info in another somewhat incestuous article, and its more detailed version.

Careful readers will note the implication that Daypop uses sites that index citations as part of its citation count (apparently). Whether this is a good idea or not is left as an exercise to the reader.


Awasu: Scriptable RSS Aggregator

While kludging together an RSS aggregator isn't all that difficult, I'm coming to the realization that implementing one I'd like to use every day is a challenging task. Yet for some of my research, I'd really like to modify the behavior of an RSS aggregator.

NetNewsWire already integrates AppleScript. So now I'm casting about for extensible aggregators under Windows. Awasu is the first one I've run across, although I need to dig into the other .Net based contenders, like SharpReader. Some of them, such as FeedReader and Syndirella are open source, so maybe I can hack in my own extensibility.


Jenkins: Email, Google, and Microsoft

While the spirit of Elwyn Jenkins, "Email, Google, Microsoft and the Lack of Diversity" is commendable, the piece is chock full of bad argumentation. I'll probably do a careful analysis, but statements like:

The problem with email is that every email client works much the same way regardless of who constructed it...

are just silly. Is there any non-Microsoft e-mail client that 's responsible for spreading viruses? Heck, even newer versions of Outlook (from Microsoft itself) have script execution (the main virus transmission mechanism) turned off by default.


Gartner & Berkman: Copyright Post-Napster

Parking for later perusal: Copyright and Digital Media in a Post-Napster World.


Hunsinger: Damn Good Paper Idea

Jeremy Hunsinger writes a fictitious abstract regarding a paper that needs to be written by someone. The investigation would involve the validation/debunking of the Weblog power law meme and its implications. Alex Halavais provided the connection to Hunsinger's idea and amplified on the theme.

I completely agree with both gentlemen and would hasten to add that whatever one's stance in this debate, the foundations are still shaky. The methodology and data both strike me as bit suspect. There hasn't really been any independent validation. Also, the time varying aspects of these networks should prove interesting under scrutiny. Other than Andy Tomkins and crew, there doesn't seem to be a whole lot of action in this space.


Raynes: MTOtherBlog

I'm starting to hack on reengineering the Medill News Service CMS to be built on top of Movable Type. Numerous benefits should accrue.

However, there's a tiny problem. The MNS CMS relies on a concept of beats to categorize the articles. I'd like to have each beat correspond to its own blog. Then there could be a "front page" blog that pulls from the various beat blogs. Unfortunately, the tags to pull entries from other blogs isn't baked into stock MT.

Enter David Raynes' MTOtherBlog plugin for Movable Type. Looks like it fits the bill.

Of course occasionally you'll want to grab from a merged view of a set of blogs. For that there's the Global Listings plugin from Stepan Riha.

Best Hawk Harrelson voice: I luv Movable Type extensions.


Dash: Web Tools Adapt

Anil Dash, in "Crossing the Threshold", succinctly captures something that had been kicking around in my head:

Web-based writing tools tend to be the opposite of desktop writing tools in the sense that the tools form around the ways people write, instead of the desktop application model, where the tool informs and influences the works that are created with it.

MovableType is what FrontPage should have been.

I'll add a minor contribution to this train of thought. Eventually the ways people read and write with Web tools will filter back to the desktop. For example, RSS aggregators got a kick start be having HTTP/HTML frontends (cf Radio Userland, AmphetaDesk). Now desktop versions are coming back with a vengeance (cf NetNewsWire, FeedDemon).

A thesis. Web development is so constrained that it forces developers to have a clear data model and be on task regarding the task.

Also, Web development really supports rapid, iterative prototyping. Granted, hitting reload isn't as much fun as a Common Lisp REPL, but it beats compile/link/debug.

I wonder if there are any companies out there working with a "prototype on the Web/deliver on the desktop" model. Strikes me as a possible winner.

And another nugget. If you believe Dash, bake extensibility into your app from the get go. This allows the tool to adapt to how people write. That may seem obvious, but you'd be surprised how many young developers have to be convinced that extensibility is a good idea.


Dyson: XML and PostgreSQL

Just thinking out loud, but I wonder if the combination of PostgreSQL and XML would make the foundation of a good weblog/RSS social network crawler.


NMH: Back...

caught you looking for the same thing. Now for something a bit different.

Had a fabulous time Down Under (TM) and highly recommend it. The picture below is of my lovely wife of roughly 3 weeks. When in Sydney, be sure to have a nice lunch at the Museum of Contemporary Art's outdoor cafe, as we did. The cafe is right on the Circular Quay.
.

The food was tasty, the wine good, and the service excellent. Despite the fact that the poor gentleman in the background had red wine spilled on him by a waitress.

A full glass actually.

On his jacket and shirt.

But it's a really nice restaurant, trust me!


NMH: No Worries

Howdy folks! For all 1.5 of my readers out there, this is just a little notice that I've been busy getting married, which explains last week's outage. Now I'm halfway around the world in Australia enjoying my honeymoon, which will explain the next two weeks. Don't expect anything new until August 18th at the earliest.

As an aside, looks like wireless Internet access hasn't quite struck down under yet. But they've got Web based cybercafes up the wazoo, at least in Melbourne so far.

Cheers!!

© Brian M. Dennis. Built using Pelican. Theme by Giulio Fidente on github.