(The following post is cut-n-pasted from an email conversation I had recently about a Personal Search Engine with Jeff Barr. I'm posting it on my typepad so that I can use Google to find these notes again. I note the irony that if I had a Fisher, I wouldn't need to blog it publicly to be able to find it with certainty later when I need to find it.)
I've been thinking long and hard lately about finding things in my (20 Gigabytes representing 15 years of) email archives (or, as Rohit calls it, The "Search My Mail D*mnit" Problem) so I can properly deal with Ham. In the email must evolve context, I think ZOË-as-Personal-Server points in the right direction.
I find myself wondering if there is still an opportunity to launch a desktop search product that fits the classic definition of platform. The equivalent of a "Browser" for the next decade that brings together existing disparate tools by mixing SMTP and HTTP and throws in a healthy dose of instant messaging and RSS -- except that instead of browsing for information it lets you go (for lack of a better word) "Fish" for information. It's got a simple browser interface and query language (like Google), is lightning fast (due to regular re-indexing), and offers search results of your personal stuff in that simple UI.
Rohit is three steps ahead of me here -- that any good "Fisher" of all your emails, IM's, desktop files, web history, and RSS feeds needs a great algorithm to rank the results of your "Go Fish" queries. (Ranking is something ZOË doesn't do, and therefore it cannot handle the volumes of email I receive daily.)
A "Fisher" isn't a replacement for our existing PIM's and browsers and IM clients and RSS readers, in the same way that Google doesn't replace the Web. Actually, Google is a good analogy since it provides a set of ranked results for any given query of the Public Web. But Google is an anonymous search of the Public Web of information. Fisher, by contrast, provides a set of ranked results for any given query with a personal search of your Private Web of information. It's a customized search of your personal stuff.
I don't have a better sketch of the opportunity yet, but if it works for your personal stuff anything close to the way Google works for public stuff, I think it's a killerApp just waiting for an auteur to get around to writing it. The platform play of course is that it is scriptable -- not just the query engine but the "tap on your ethernet" which can watch HTTP and SMTP traffic crossing your machine and do things on your behalf based on inspecting all the data that crosses through.
I have to think about it. Like I said, I'm sure Rohit is three steps ahead of me on this one... as was jwz six years ago in his write-up of Intertwingle (and short discussion), "a potential project to make it easier to deal with a massive volume of personal messages: excavating, traversing, relating, reporting, annotating. Intertwingle can be seen as a unification of a search tool and an address book. It is not, however, a mail reader. The presentation of query results could be done through a mail reader, but the intention is that ones choice of mail reader should be orthogonal to the use of this tool. The two kinds of tools just happen to operate on the same data." Might this be what the Twingle effort of Kasei is all about?
Update, 3/4/2004 at 5am. Rohit emailed me a description of Fisher in his own words:
Personally, I'm getting very aggravated by the irony that it's easier to find stuff on the Internet than on my own PC. The vast majority of this problem is email, specifically. And while email is impoverished in hyperlinks, ruling out the sorts of PageRank algorithms web search engines use, it is very rich in social network information. The correspondence information can help us choose which bits of text are likely to be the most relevant hit for a query, because it does matter who said what.I believe Intertwingle is a good early manifesto about Fishers as a product category, and I think that ZOË and X1 are good first-generation instantiations of Fisher-as-a-product. Looking around SearchTools.Com I don't see any other Fishers... yet.
Admittedly, this may have something to contribute about searching multiple-agent discourses in general -- anywhere you can clearly identify authorship of a snippet, and thence calibrate which authors a user reads most.
Of course, if we simply ranked people by frequency of interaction, it would be kind of boring. Frame it as a simple principal-eigenvector problem -- count who reads your readers, and so on -- and interesting patterns emerge. It could well be as useful as PageRank itself was, by comparison to text search alone.
Imagine the Google UI for your own PC. You aim your web browser at localhost and get back a results page that looks eerily familiar, but the hits are actually documents, mails, photos, and cached web pages from your own personal archive. This means nailing the challenges of grouping similar results, generating short excerpts, converting file formats for indexing, and so on. There's more magic to how it installs -- you don't change your email client at all! -- but that's just more technology.