David Davies' Radio Weblog
Stephen Downes
David Carter-Tod
Sebastian Fiedler
Andy Powell and
D'Arcy Norman
There are of course loads of other people including Raymond Yee, Ben Toth, Oliver Wrede, George Siemens, Sébastien Paquet who are deep thinkers in this area and of course I'd love to have the chance to meet all of them at a meeting like this this, too.
Who would be on your list?
Feeder by Ben Nolan
rssSearch by François Schiettecatte
Snarf by Brady Gaster
I'm sure this list will grow quite rapidly.
As a service to the RSS community I'll happily maintain a list of these engines as an OPML file so that Dave can link it into the RSS directory. Please let me know if you are working on a search engine or know of one not yet on the list.
Postscript: Dave has created a new node in the directory.
The following are not criticisms of Feedster or any RSS search engine. Instead I see them as challenges and without challenges there'd be no progress. Some have easy solutions, others not. In this piece I've used Feedster as the canonical example of an RSS search engine. The questions raised are relevant to all RSS search engines but as Feedster is the nom du jour I've taken the liberty of using it as a convenient reference.
In no particular order, the significant order depends upon what you think is important:
1. Relevance ranking.
Compare these two searches for 'RSS search engines', firstly using Google then using Feedster. I know Feedster is just starting up but we need to know how it's searching its data and what the relevance ranking model is.
2. Finding and searching legacy data.
How will Feedster cope with legacy data? Take a look at my weblog home page RSS feed:
http://daviddavies.name/rss.xml
It lists only 3 items yet I've been writing this weblog for a couple of years. How will Feedster discover the rest of my written content? If the date of the first post listed in any one site's RSS file is taken as year 0 for that site, then fair enough, at least it'll be current. But as I write more for example then older posts will fall off the end of my RSS feed. Will these items be stored in Feedster's database or will the engine only ever search what's in the RSS feed at any one time? If the latter then a lot of relevant content will always be hidden from Feedster's gaze. (see also Walking the web - content scalability).
3. Walking the web - content scalability.
Google can find new content by walking the web. It picks up a link and walks it to find pages with yet more links. Eventually the whole web, at least in theory, can be walked as Google plays the ultimate six degrees of Kevin Bacon game (everything is related to everything else by a number of links).
Feedster can walk this walk to some extent using RSS autodiscovery but sooner or later (most likely sooner) it'll come up against a page with no associated RSS file and presumably the walk stops there. One day maybe all pages will be part of an RSS feed somewhere but not for a while yet I suspect.
Here's my weblog:
I actually use my weblog as 3 separate weblogs, the home page and 2 categories (in Radio UserLand parlance). So in Feedster presumably my weblog has 3 instances:
http://daviddavies.name/rss.xml
http://daviddavies.name/categories/smsblog/rss.xml
http://daviddavies.name/categories/theviewfromhere/rss.xml
Now could Feedster guess that? Well yes, to some extent, it could walk my weblog and discover the RSS links from each page coming up with these 3 unique RSS URLs. But could Feedster use my weblog's domain name to discover more RSS feeds from other people's weblogs? Probably not. The domain name where my weblog resides is:
How can you infer from this what other weblogs exists in this domain, let alone what their home URL is?
So my guess is that right now Feedster has a more complex database than Google. It probably has a table listing all unique RSS feed URLs then a larger database of each RSS item. The RSS items database is probably what gets searched. My guess is that Google doesn't maintain a separate table of top-level domains. is this a problem for either system? well, I guess it depends upon what you want to achieve. Feedster can use these extra data to its advantage in its advanced search. But so can Google.
Feedster can only discover content if it's part of an RSS feed. Google isn't limited in this respect and can find any page on the web providing it has a link to it from some page already in the Google database (you can of course suggest a link to both Google and Feedster to get a new site into their databases). Right now it would seem that RSS search engines are limited to a theoretically finite data-set of pages with an associated RSS feed. This may change with time (see also Finding and searching legacy data).
Google has a pretty quick turn around such that when a new web page is available Google will index it pretty quickly. But quick in this context is a few days (at least for most sites - Google probably scans some sources much more frequently e.g Google News sites). This is fine for many web pages but not for current news, or news as it happens. For example, a community of weblogs in Iraq could be a vital source of information providing news as it happens. Couple that to a picture or video weblog and you've got a powerful voice. I want to search for these kinds of events now, not in a day or so. Ok, I know I could bookmark a weblog and view it often if I wanted up-to-the-minute news but what if I didn't know a site existed and wanted to find it? Particularly just after an event has happened. Sometimes, even with more mundane searches, you want to find something that's just happened minutes ago.
Weblogs are a good source of up-to-the-minute news as well as RSS feeds. An advantage of searching an aggregation of RSS feeds is that prior art has created a system whereby when a weblog is updated it can ping a central server or servers to notify it that it has changed. A search engine that knows to index a weblog (or any other site) that has just changed will always be right up to date. Google uses other tricks to increase the relevance of a search result, for example by looking at how many links point to that page. An RSS search engine will need to give some thought about how to create a relevance ranking in the absence of a rich set of inbound links (maybe provenance of a site or weblog will increase ranking?). If Feedster uses the same ping protocols as weblogs.com then it's got the jump on other RSS search engines and search engines in general as it'll always have access to the most recent information, in theory information only minutes old. If it hasn't then maybe Scott should look into this because Google has bought Blogger so will likely be looking at something like this (and if not they should be!).
Good luck Feedster!
I also predict that mobile weblogs will soon fade, at least in name, as weblogs in general become enhanced by all the technologies available for posting. Soon it'll make no more sense having a mobile weblog than it would be to call your regular weblog a stationary blog. You'll just have a weblog and how you post to it, and what you post to it, will depend upon where you are, what you're doing and what you want to say.
So I've made my first mobile video post. In days others will join in and very soon a new medium for personal expression will evolve. If the true value of weblogs is freedom of expression then come one, come all. In no time at all we'll look back and ask what all the fuss was about.
Note to eager m-bloggers: the assetManager tool now handles multimedia. An updated version will be available for you to download in a few days.