My Photo

Peeps

« Restarting The 97% Rule | Main | Web Services Networking »

Fisher As A Product Category

I've been thinking a lot lately about Personal Search Engines -- and about Fisher as a category of desktop software that indexes your email (and, as the product evolves, the content in your files, most of which have gotten to your hard drive through email, but also some that arrived through the Web, RSS feeds, instant messaging, and so on). I'm not the only one.

Three days ago, David Weinberger posted an enthusiastic endorsement of X1: "I use it maybe 5 times a day. Now X1 is starting to market itself. Good. It's worth the $100 in time savings alone. It's held up well as my email archive has grown to 110,000 messages." Actually, I don't see a marketing campaign. A scan of Google News for X1 reveals just one press release, which isn't even on X1's press page. However, I have been following X1 for about a year now, and I did download and try their product 3 months ago. And I can see clearly the vision in Nate Koechley's comment:

It's wonderful, and will change how you think about your information. Gone are the days of extensive folder structures in Outlook (or your client of choice). Now, it doesn't matter where the message is, you can always find what you want in the same 2 seconds.
This is clearly a compelling vision: a Fisher has the potential to change my life. It is the starting interface I go to for all of my personal information (just as Google is the starting interface I go to for all public information), and it saves me the time I would have otherwise spent organizing email & files and waiting for search queries to complete.

Two days ago, John Battelle cited David's post in his Searchblog, and cites several good reasons why Fisher-as-a-Product-Category solves a problem that a lot of people have, and that gets worse each day:

Desktop search (ie searching your own hard drive) is one of those things that seems to have gotten worse in the past ten years ... I've got 40 gigs, I think, but no desktop search utility (Sherlock doesn't have text string search, far as I can tell). My email, for example, is a thicket of badly organized folders.
X1 currently runs only on Windows; John asks if there are any such products available on Mac OSX. The only first-generation Fisher that I can think of for OSX is the beta-grade open source project called Zoë. In reading the comments to John's post, I see that a few people suggested Zoë for searching personal email archives. Like X1, I believe Zoë is good -- good enough to criticize, if not use daily.

First, let me note with due regard to my friend Raphaël Szwarc (Zoë's original visionary and author) and fellow Caltech alum Bill Gross (X1's original visionary) that "good enough to criticize" is actually fairly high praise (as Alan Kay famously noted). There are a lot of technologies that do not qualify as Fishers because they are too raw to be useful to end users: toolkits like Doug Cutting's AIAT (neé V-Twin), Lucene, and Nutch, for example, are not 'complete end user products'; the current Windows XP "Search For Files Or Folders" Dog is silly and slow; the current Outlook 2003 Email Search facility is serious and slow; and grep gets stuck in the throat of anyone who's not a UNIX power user.

Back to X1 and Zoë. In my experience so far -- and remember, I'm a power email user with 20 Gig of email saved over the last 15 years -- neither of products these is good enough to put up with on a regular basis yet.

I believe those who suggested Zoë have never tried actually using it. Rohit Khare and I tried it on a Mac, and we found the system to be a good demonstration of the idea but not something we would actually use as the primary interface to all the information privately available on our desktop (the way we use Google a hundred times a day as the primary interface to all the information publicly available on the Web). The blog-like UI, which seems interesting and novel at first, gets real old real fast. More importantly, as mailbox size increases, Zoë becomes unbearable to use for search because:

(1) Its search results are not complete -- there appears to be an arbitrary cutoff of the total number of items returned;

(2) The search results themselves are sufficiently indistinguishable from a random order -- either Zoë is using Lucene poorly, or Lucene works poorly for this application, because even the results in most-recent-order-first would be an improvement;

(3) Searching with Zoë for any mailbox bigger than a few Megabytes is slow -- slow enough that it doesn't transcend Heidegger's categories from 'present-at-hand' to 'ready-to-hand' (like, say, Google itself); and

(4) No simple query syntax like Google offers: not just booleans, but operators like to:adam or subject:cheese or attach:ppt.

Evaluating X1 on Windows reminded me of my evaluation of Bloomba. They stake their claims to fame on speed (that is, until my email gets to be greater than 100k messages, but apparently this is a problem very few of us have right now). However, their UI's are clunky fat clients that demand I change some aspect of how I work. With X1, I have to give up a half-inch of my screen for an unnecessarily-modal four-tab UI for searching either email text, attachments, files, or contacts; with Bloomba, I have to drop Outlook (or Eudora or Netscape), with all of the charming foibles that I've gotten used to in my mail client over the years.

In either case, there isn't a decent query syntax -- heck, X1 is no more than 1970's-era KWIC (keyword-in-context) string matching with a bizarre insistence on keystroke-by-keystroke redrawing as if it's trying to nag me into acknowledging its "speed"). Without the kind of excerpting that Google does in step "K", I find myself wading through dozens if not hundreds of hits for most queries... which brings me to a more important point: There is no ranking that makes the results better than grep. At least (unlike Zoë) X1 and Bloomba return matches in order of most-recent-first, but for quality search results, Google has proven to me that ranking is of utmost importance. There is no ranking algorithm a la PageRank that acknowledges even the simplest truths about my mail (stuff from Rohit ranks higher than Orkut notifications, say :).

Many folks can't even imagine 100K messages, but I'm closer to a million (!). Sounds absurd, sure, but the design target for Microsoft Longhorn is supposedly 1 terabyte PCs! And that speaks to another basic criticism of all the aforementioned tools: email may be the center of my universe, but not the entirety of it. How about an "image search" of my hard drive that didn't require me to laboriously pre-caption each photo? Or a "version search" of our latest spec sheet that doesn't trip up on the fact that there are 32 separate Word attachments that all contain the same paragraph over the last year? Or a way to search all the web pages I've visited before? There's hard drive to spare -- why not cache everything?

To put it bluntly, Zoë and X1 are good first-generation manifestations of Fisher As A Product Category. But this product category must evolve before these products are a must-have for everyone who feels the pain of finding private information in-my-email-and-on-my-hard-drive:

In Summary
Google has shown me I can have it all: fast, ranked search with a simple UI and a rich query language. Is it too much to ask for being able to have that kind of search for my personal data the way I can already search the public web?

By the by, several other folks in the ensuing discussion linked to projects I don't consider "good enough to criticize" (yet):

Jwz's Intertwingle insight is still a manifesto with which I wholeheartedly agree, but somehow it hasn't magically been implemented spontaneously by the open source community. Perhaps Chandler will do better -- it has great architecture plans -- but in my opinion its attention is too focused on competing with Outlook-the-GUI.

Launchbar is pretty much like locate(8) for non-UNIX types -- I love the single-keystroke access to its ultra-minimal UI, but I'm not a developer enough to be excited about searching .h files for method names. I need to search restaurant names in unstructured text...

And Spring and Scopeware, for all their promise, have even clunkier fat-client UI's than X1 and Bloomba. I don't need any more User Interfaces in my life!

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834559b4c69e200e55072762f8834

Listed below are links to weblogs that reference Fisher As A Product Category:

» Relax, Everything Is Deeply Intertwingled from ritilan.com
Over at relax, Adam has a nice bit about personal search for the desktop. I had not given it much thought until now but I can see where this will become another killer app. We need Google for the desktop.... [Read More]

» Search from Confluence: STS
Part of the GigabyteEmail value offered by Gmail is that not only does it hold a great deal of your clutter, it helps you sort through it.... [Read More]

» Search from Confluence: STS
Part of the GigabyteEmail value offered by Gmail is that not only does it hold a great deal of your clutter, it helps you sort through it.... [Read More]

Comments

Hi Adam,

(1) "Its search results are not complete"

There is indeed a cutoff. Only the top x hits are returned. This is under the user control though. On the other hand, what good one million hits will do to you?

(2) "random order"

This is in the eyes of the beholder :) On the other hand, you can sort any result in any way you want.

(3) slow enough

Hmmm... perhaps... YMMV. On the other hand, with over half-a-million messages in my instance running on a humble laptop, it's fast enough.

(4) "simple query syntax"

"Simple query syntax"? Is that an oxymoron? :) In the meantime, the full Lucene syntax is supported. On the other hand:

"If you like regular expressions, boolean searches and SQL queries, this it not for you. If you thrive in complexity, just stay away. The point here is to make complex thing simple (and to keep simple thing simple). Not the other way around."

-- Unknown

http://zoe.nu/itstories/story.php?data=stories&num=16&sec=1

"I can have it all"

No, you can't. At least not what you seem to be implying by mixing Google in the mix. Something as to give one way or another :)

Mr. Zoë awakes -- cool to see you around! :-)

As for user-configurable, I hadn't noticed, but that's understandable. I don't live with it enough to have discovered that feature. However, returning "1M results" is *exactly* what I need. As an example, I'd like my own results for searching FoRK.mbox to rival what I get from Google's xent.com crawl of the same bits.

That said, the key is ranking: only the first 5-7 search results are really worthwhile to most users. [That said, there is a school of thought that users may be more persistent in searching their own email, since they may be "sure I've seen it before, d*mnit!"]

So one of the options facing us is just hacking Zoë to experiment with adding a ranker, but the license makes us a bit wary -- not that we've found a commercial revenue model that makes sense for Fishers yet! :-)

Hi Rohit,

Mr. Zoë awakes -- cool to see you around! :-)

Busy dealing with some pesky Mexican revolutionary ;)

As for user-configurable, I hadn't noticed, but that's understandable. I don't live with it enough to have discovered that feature. However, returning "1M results" is *exactly* what I need.

Hmmmm... how would that be useful in any practical sense?

As an example, I'd like my own results for searching FoRK.mbox to rival what I get from Google's xent.com crawl of the same bits.

I think that continuously referring to Google is misleading as one of the fundamental flaw with Google & Co. is that they are "bottomless" so to speak. There is also things that one can do in a computer farm that are not realistic to envision on the client side. And vis-versa.

That said, the key is ranking: only the first 5-7 search results are really worthwhile to most users. [That said, there is a school of thought that users may be more persistent in searching their own email, since they may be "sure I've seen it before, d*mnit!"]

Yes, ranking is important... but relative... keep in mind there is no "one and only one true, universal ranking"... PageRank, MessageRank or PeopleRank are not the definitive answer no matter how you look at it.

So one of the options facing us is just hacking Zoë to experiment with adding a ranker,

For what you have in mind, the answer may be in the SZLink class :)

but the license makes us a bit wary

Drop me a line. I sure we could work out something :)

-- not that we've found a commercial revenue model that makes sense for Fishers yet! :-)

Cheers,

Z.

Many of the points you made are subsumed in this one:

"think that continuously referring to Google is misleading as one of the fundamental flaw with Google & Co. is that they are "bottomless" so to speak. There is also things that one can do in a computer farm that are not realistic to envision on the client side. And vis-versa."

I disagree. First, on the technical side: client-side PC power just continues to increase inexorably to ridiculous proportions, while the speed of human reading is only constant -- I think one back-of-the-envelope measure is that if the fast speed-reader read continuously for a century, that still wouldn't even be 3GB of English text (uncompressed!). Processor, memory, and spare disk space will soon handily eclipse the powers of the earliest-generation search engines -- decade-old technology. (Heck, anyone remember Brian Pinkerton's very first WebCrawler, which sold to AOL for the then-unimaginable $1M; prototyped on NeXTstep using Digital Librarian?)

Second, though, I believe that my personal archive *is* "bottomless". [Not to mention that some of it may be topless, too :-] Not only in the minimal sense that I already have over 1M files and emails on my current laptop (literally true!) but also that I'd like to keep a copy of everything I've ever read, subscribe to all those RSS feeds, and thus keep copies of everything I'm *likely* to read -- my own offline Internet (keyword: rion).

This brings us back to ranking -- you're right, literally paging through 1M results is useless. And there is no one perfect ranking algorithm. "Goodness" is a teleological conundrum: is what-is-good what you are likely-to-read? Is what you are likely-to-read determined by what you have-already-read? It does still fall to a particular auteur to propose a hypothesis for ranking, and for users -- initially, packrats and professionals (say, journalists) -- to dispose...

Good luck with the Zappata revolution!
(see, at least Google can find this page, now that I used the magic word! :-)

I disagree. First, on the technical side: client-side PC power just continues to increase inexorably to ridiculous proportions, while the speed of human reading is only constant -- I think one back-of-the-envelope measure is that if the fast speed-reader read continuously for a century, that still wouldn't even be 3GB of English text (uncompressed!). Processor, memory, and spare disk space will soon handily eclipse the powers of the earliest-generation search engines -- decade-old technology. (Heck, anyone remember Brian Pinkerton's very first WebCrawler, which sold to AOL for the then-unimaginable $1M; prototyped on NeXTstep using Digital Librarian?)

In principle, I agree. In fact this is one of the tenet of ZOE, ZAPPATA & Co.: "The Personal Server".

"Take a closer look at your latest PC (that use to stand
for Personal Computer). It has hundreds of megabytes of memory.
Uncountable gigabytes of disk space. Lightning fast processor. Fast
internet connection. And what does it do, sitting alone by night? It
looks for alien life forms..."

-- Unknown, Practically Speaking

Second, though, I believe that my personal archive *is* "bottomless". [Not to mention that some of it may be topless, too :-]

I disagree. Your data is not bottomless. While significant, this data has nonetheless been filtered by you, Rohit, one way or another. In the case of Google, this is not the case: Google & Co. aspire to universality and are therefore indiscriminate. And in the process a lot of context get lost.

Not only in the minimal sense that I already have over 1M files and emails on my current laptop (literally true!) but also that I'd like to keep a copy of everything I've ever read, subscribe to all those RSS feeds, and thus keep copies of everything I'm *likely* to read -- my own offline Internet (keyword: rion).

Aha! Here we go: this is the fundamental difference between Google and, gasp, ZAPPATA for example. In principle, they do the same thing: collecting and finding staff. In practice, ZAPPATA has the added benefit of, er, "social filtering". This make a huge difference in the quality of your data and therefore your search ranking.

This brings us back to ranking -- you're right, literally paging through 1M results is useless. And there is no one perfect ranking algorithm. "Goodness" is a teleological conundrum: is what-is-good what you are likely-to-read? Is what you are likely-to-read determined by what you have-already-read? It does still fall to a particular auteur to propose a hypothesis for ranking, and for users -- initially, packrats and professionals (say, journalists) -- to dispose...

Yes, ranking is very important. But the quality of your data is fundamental also.

Good luck with the Zappata revolution!

Thanks :)

(see, at least Google can find this page, now that I used the magic word! :-)

Perhaps. But how helpful would that be if it's on the page 5,345,654th page?

Cheers,

Z.

Hi from a non-techie. (My you all sound SMART. I'm just a lowly writer in New Jersey.) I like the Cat in the Hat hat, btw. Found you via Google because we both read the book "Linked" and I was charmed by the Intertwingled title. I've just downloaded an X1 trial.... it's 23 percent through my files now, so we'll see how well it works when it's done.

Given your interest in links and intertwingling, you may appreciate reading about my misadventure with Technorati today.

Ok, shameless self-promotion. Check out FILEhand Search. It's modeled after Google in that it returns results sorted by relevance, and you can search using phases, and AND, OR, and NOT booleans.

Good news/bad news: bad news: no email support -- yet. It's coming.

good news: the original intent was to support file searching, especially PDF, office, MP3 tags, etc. If you are looking for some information in, say, one of 15,000 PDF files, FILEhand Search will sort the results by relevance and show a scrollable text extract of what you were looking for. You likely wouldn't even have to open a PDF reader. And, it costs $39.

OK, so I'm the co-founder of Filehand. Did I do a bad thing by posting here? I don't know myself. But, after we support email in a few weeks, I think you'll find FILEhand Search a closer analogy to Google for the desktop.

What would a Fisher look like if you deconstructed the "search engine", shredded apart and flipped over so to speak, solving search in terms of coordination instead of indexing? Treat indexing as an active filtering problem so that topics such as /people/adam/terms/intertwingled/ maintains a collection of tuples referring to the emails sent by Adam that contain the term intertwingled. Perhaps the UI could map basic search terms into route configurations to return the more ephemeral structure that a query result would entail. Does this make sense?

In general can't help but be fascinated by the idea of programming in the small much along the lines of Clay's piece on Situated Software http://www.shirky.com/writings/situated_software.html . A Fisher seems to be an interesting problem for this sort of thinking.

Left unmentioned here is Nelson - see www.caelo.com. It is way way better than a "Google for Email", which it does as well as any other IR-based search of text, but also because it also takes into account that:

a) People use Email to manage tasks
b) People use Email to store documents
c) People work on a project basis
d) People look up email by people, and by time period

Nelson handles all this but doesn't get in the way of your Outlook workflow.

BTW, jwz "invented" nothing. This is a very old topic.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Music

Reading

  • John Battelle: The Search

    John Battelle: The Search
    My favorite book of 2005. Period.


    (*****)

  • Steven D. Levitt: Freakonomics : A Rogue Economist Explores the Hidden Side of Everything

    Steven D. Levitt: Freakonomics : A Rogue Economist Explores the Hidden Side of Everything
    "Just because two things are correlated does not mean that one causes the other. A correlation simply means that a relationship exists between two factors -- let's call them X and Y -- but it tells you nothing about the direction of that relationship. It's possible that X causes Y; it's also possible that Y causes X; and it may be that X and Y are both being caused by some other factor, Z.

    Economics is, at root, the study of incentives: how people get what they want, or need, especially when other people want or need the same thing.

    Incentives are the cornerstone of modern life. The conventional wisdom is often wrong. Dramatic effects often have distant, even subtle, causes. Experts use their informational advantage to serve their own agenda. Knowing what to measure and how to measure it makes a complicated world much less so." (*****)

  • Malcolm Gladwell: Blink

    Malcolm Gladwell: Blink
    A book of anecdotes about the power of thinking without thinking; this book is a more interesting read than Gladwell's previous, The Tipping Point.

    New York Times: "Gottman believes that each relationship has a DNA, or an essential nature. It's possible to take a very thin slice of that relationship, grasp its fundamental pattern and make a decent prediction of its destiny. Gladwell says we are thin-slicing all the time -- when we go on a date, meet a prospective employee, judge any situation. We take a small portion of a person or problem and extrapolate amazingly well about the whole."

    David Brooks, who wrote that review, adds: "Isn't it as possible that the backstage part of the brain might be more like a personality, some unique and nontechnological essence that cannot be adequately generalized about by scientists in white coats with clipboards?" (*****)

  • Paul Graham: Hackers and Painters

    Paul Graham: Hackers and Painters
    I don't agree with some parts of this book, but I truly loved reading it, and it really made me think. I referenced it in my weblications and superhacker and phoneboy posts. Favorite chapter is How to Make Wealth. (Thanks, Ev.) (*****)

  • Joel Spolsky: Joel on Software

    Joel Spolsky: Joel on Software
    Joel is really good at wielding "diverse and occasionally related matters of interest to software developers, designers, and managers, and those who, whether by good fotune or ill luck, work with them in some capacity."

    Joel on Software embodies the principle of "Welcome to management! Guess what? Managing software projects has nothing at all to do with programming." This book, a compendium of the website's wisdom, is useful for everyone from team leads estimating schedules to software CEOs developing competitive strategy. (*****)

  • Bruce Sterling: Tomorrow Now: Envisioning The Next Fifty Years

    Bruce Sterling: Tomorrow Now: Envisioning The Next Fifty Years
    Bruce wrote this book to come to terms with seven novel aspects of the twenty-first century, situations that are novel to that epoch and no other. It's about future possibilities.

    "This is the future as it is felt and understood: via human experience... The years to come are not merely imaginary. They are history that hasn't happened yet. People will be born into these coming years, grow to maturity in them, struggle with their issues, personify those years, and bear them in their flesh. The future will be lived." Here here, well-spoken, Bruce. (*****)

  • The World's 20 Greatest Unsolved Problems: John Vacca

    The World's 20 Greatest Unsolved Problems: John Vacca
    "Science has extended life, conquered disease, and offered new sexual and commercial freedoms through its rituals of discovery, but many unsolved problems remain...

    If support for science falters and if the American public loses interest in it, such apathy may foster an age in which scientific elites ignore the public will and global imperatives." (*****)

  • Paul Hawken, Amory Lovins, L. Hunter Lovins : Natural Capitalism: Creating the Next Industrial Revolution

    Paul Hawken, Amory Lovins, L. Hunter Lovins : Natural Capitalism: Creating the Next Industrial Revolution
    I had the pleasure recently of meeting Amory Lovins and hearing him talk about Twenty Hydrogen Myths and the design of hypercar. (He also talked about Bonobos... wow.) I'm a convert to the way of thinking espoused in Natural Capitalism. I used to be cynical about the future, but Amory's work has made me a believer that many great things are about to come. The best way to predict the future is to invent it. (*****)

  • Merrill R. Chapman: In Search of Stupidity: Over 20 Years of High-Tech Marketing Disasters

    Merrill R. Chapman: In Search of Stupidity: Over 20 Years of High-Tech Marketing Disasters
    In hilarious prose, this book catalogs lots of stoopid high-tech marketing decisions. It offers clear, detailed analysis of many a marketing mishap, with what happened, why, and how to avoid such stupidity. Might just be the best. book. ever... (*****)

  • Paul Krugman: The Great Unraveling: Losing Our Way in the New Century

    Paul Krugman: The Great Unraveling: Losing Our Way in the New Century
    A book exposing the pitfalls of crony capitalism, from corrupt corporations straight up to the executive branch of our government. Krugman is nonpartisan -- what he exposes is foolish short-term thinking on the part of recent United States policies. The patriotic thing to do, he advises, is to fix these economic problems now before they become much harder to solve.

  • Henry Petroski: Small Things Considered: Why There Is No Perfect Design

    Henry Petroski: Small Things Considered: Why There Is No Perfect Design
    "Design can be easy and difficult at the same time, but in the end, it is mostly difficult." (*****)

  • Alexander Blakely: Siberia Bound

    Alexander Blakely: Siberia Bound
    One of my favorite books of the past few years. Xander is a master storyteller. (*****)

  • Susan Scott: Fierce Conversations

    Susan Scott: Fierce Conversations
    How to make every conversation count. One of my favorite books of the last decade. (*****)

Blog powered by TypePad
Member since 08/2003