My Photo

Peeps

« Restarting The 97% Rule | Main | Web Services Networking »

Comments

Zoe

Hi Adam,

(1) "Its search results are not complete"

There is indeed a cutoff. Only the top x hits are returned. This is under the user control though. On the other hand, what good one million hits will do to you?

(2) "random order"

This is in the eyes of the beholder :) On the other hand, you can sort any result in any way you want.

(3) slow enough

Hmmm... perhaps... YMMV. On the other hand, with over half-a-million messages in my instance running on a humble laptop, it's fast enough.

(4) "simple query syntax"

"Simple query syntax"? Is that an oxymoron? :) In the meantime, the full Lucene syntax is supported. On the other hand:

"If you like regular expressions, boolean searches and SQL queries, this it not for you. If you thrive in complexity, just stay away. The point here is to make complex thing simple (and to keep simple thing simple). Not the other way around."

-- Unknown

http://zoe.nu/itstories/story.php?data=stories&num=16&sec=1

"I can have it all"

No, you can't. At least not what you seem to be implying by mixing Google in the mix. Something as to give one way or another :)

Rohit

Mr. Zoë awakes -- cool to see you around! :-)

As for user-configurable, I hadn't noticed, but that's understandable. I don't live with it enough to have discovered that feature. However, returning "1M results" is *exactly* what I need. As an example, I'd like my own results for searching FoRK.mbox to rival what I get from Google's xent.com crawl of the same bits.

That said, the key is ranking: only the first 5-7 search results are really worthwhile to most users. [That said, there is a school of thought that users may be more persistent in searching their own email, since they may be "sure I've seen it before, d*mnit!"]

So one of the options facing us is just hacking Zoë to experiment with adding a ranker, but the license makes us a bit wary -- not that we've found a commercial revenue model that makes sense for Fishers yet! :-)

Zoe

Hi Rohit,

Mr. Zoë awakes -- cool to see you around! :-)

Busy dealing with some pesky Mexican revolutionary ;)

As for user-configurable, I hadn't noticed, but that's understandable. I don't live with it enough to have discovered that feature. However, returning "1M results" is *exactly* what I need.

Hmmmm... how would that be useful in any practical sense?

As an example, I'd like my own results for searching FoRK.mbox to rival what I get from Google's xent.com crawl of the same bits.

I think that continuously referring to Google is misleading as one of the fundamental flaw with Google & Co. is that they are "bottomless" so to speak. There is also things that one can do in a computer farm that are not realistic to envision on the client side. And vis-versa.

That said, the key is ranking: only the first 5-7 search results are really worthwhile to most users. [That said, there is a school of thought that users may be more persistent in searching their own email, since they may be "sure I've seen it before, d*mnit!"]

Yes, ranking is important... but relative... keep in mind there is no "one and only one true, universal ranking"... PageRank, MessageRank or PeopleRank are not the definitive answer no matter how you look at it.

So one of the options facing us is just hacking Zoë to experiment with adding a ranker,

For what you have in mind, the answer may be in the SZLink class :)

but the license makes us a bit wary

Drop me a line. I sure we could work out something :)

-- not that we've found a commercial revenue model that makes sense for Fishers yet! :-)

Cheers,

Z.

Rohit Khare

Many of the points you made are subsumed in this one:

"think that continuously referring to Google is misleading as one of the fundamental flaw with Google & Co. is that they are "bottomless" so to speak. There is also things that one can do in a computer farm that are not realistic to envision on the client side. And vis-versa."

I disagree. First, on the technical side: client-side PC power just continues to increase inexorably to ridiculous proportions, while the speed of human reading is only constant -- I think one back-of-the-envelope measure is that if the fast speed-reader read continuously for a century, that still wouldn't even be 3GB of English text (uncompressed!). Processor, memory, and spare disk space will soon handily eclipse the powers of the earliest-generation search engines -- decade-old technology. (Heck, anyone remember Brian Pinkerton's very first WebCrawler, which sold to AOL for the then-unimaginable $1M; prototyped on NeXTstep using Digital Librarian?)

Second, though, I believe that my personal archive *is* "bottomless". [Not to mention that some of it may be topless, too :-] Not only in the minimal sense that I already have over 1M files and emails on my current laptop (literally true!) but also that I'd like to keep a copy of everything I've ever read, subscribe to all those RSS feeds, and thus keep copies of everything I'm *likely* to read -- my own offline Internet (keyword: rion).

This brings us back to ranking -- you're right, literally paging through 1M results is useless. And there is no one perfect ranking algorithm. "Goodness" is a teleological conundrum: is what-is-good what you are likely-to-read? Is what you are likely-to-read determined by what you have-already-read? It does still fall to a particular auteur to propose a hypothesis for ranking, and for users -- initially, packrats and professionals (say, journalists) -- to dispose...

Good luck with the Zappata revolution!
(see, at least Google can find this page, now that I used the magic word! :-)

Zoe

I disagree. First, on the technical side: client-side PC power just continues to increase inexorably to ridiculous proportions, while the speed of human reading is only constant -- I think one back-of-the-envelope measure is that if the fast speed-reader read continuously for a century, that still wouldn't even be 3GB of English text (uncompressed!). Processor, memory, and spare disk space will soon handily eclipse the powers of the earliest-generation search engines -- decade-old technology. (Heck, anyone remember Brian Pinkerton's very first WebCrawler, which sold to AOL for the then-unimaginable $1M; prototyped on NeXTstep using Digital Librarian?)

In principle, I agree. In fact this is one of the tenet of ZOE, ZAPPATA & Co.: "The Personal Server".

"Take a closer look at your latest PC (that use to stand
for Personal Computer). It has hundreds of megabytes of memory.
Uncountable gigabytes of disk space. Lightning fast processor. Fast
internet connection. And what does it do, sitting alone by night? It
looks for alien life forms..."

-- Unknown, Practically Speaking

Second, though, I believe that my personal archive *is* "bottomless". [Not to mention that some of it may be topless, too :-]

I disagree. Your data is not bottomless. While significant, this data has nonetheless been filtered by you, Rohit, one way or another. In the case of Google, this is not the case: Google & Co. aspire to universality and are therefore indiscriminate. And in the process a lot of context get lost.

Not only in the minimal sense that I already have over 1M files and emails on my current laptop (literally true!) but also that I'd like to keep a copy of everything I've ever read, subscribe to all those RSS feeds, and thus keep copies of everything I'm *likely* to read -- my own offline Internet (keyword: rion).

Aha! Here we go: this is the fundamental difference between Google and, gasp, ZAPPATA for example. In principle, they do the same thing: collecting and finding staff. In practice, ZAPPATA has the added benefit of, er, "social filtering". This make a huge difference in the quality of your data and therefore your search ranking.

This brings us back to ranking -- you're right, literally paging through 1M results is useless. And there is no one perfect ranking algorithm. "Goodness" is a teleological conundrum: is what-is-good what you are likely-to-read? Is what you are likely-to-read determined by what you have-already-read? It does still fall to a particular auteur to propose a hypothesis for ranking, and for users -- initially, packrats and professionals (say, journalists) -- to dispose...

Yes, ranking is very important. But the quality of your data is fundamental also.

Good luck with the Zappata revolution!

Thanks :)

(see, at least Google can find this page, now that I used the magic word! :-)

Perhaps. But how helpful would that be if it's on the page 5,345,654th page?

Cheers,

Z.

Deb

Hi from a non-techie. (My you all sound SMART. I'm just a lowly writer in New Jersey.) I like the Cat in the Hat hat, btw. Found you via Google because we both read the book "Linked" and I was charmed by the Intertwingled title. I've just downloaded an X1 trial.... it's 23 percent through my files now, so we'll see how well it works when it's done.

Given your interest in links and intertwingling, you may appreciate reading about my misadventure with Technorati today.

Elliot

Ok, shameless self-promotion. Check out FILEhand Search. It's modeled after Google in that it returns results sorted by relevance, and you can search using phases, and AND, OR, and NOT booleans.

Good news/bad news: bad news: no email support -- yet. It's coming.

good news: the original intent was to support file searching, especially PDF, office, MP3 tags, etc. If you are looking for some information in, say, one of 15,000 PDF files, FILEhand Search will sort the results by relevance and show a scrollable text extract of what you were looking for. You likely wouldn't even have to open a PDF reader. And, it costs $39.

OK, so I'm the co-founder of Filehand. Did I do a bad thing by posting here? I don't know myself. But, after we support email in a few weeks, I think you'll find FILEhand Search a closer analogy to Google for the desktop.

Robert

What would a Fisher look like if you deconstructed the "search engine", shredded apart and flipped over so to speak, solving search in terms of coordination instead of indexing? Treat indexing as an active filtering problem so that topics such as /people/adam/terms/intertwingled/ maintains a collection of tuples referring to the emails sent by Adam that contain the term intertwingled. Perhaps the UI could map basic search terms into route configurations to return the more ephemeral structure that a query result would entail. Does this make sense?

In general can't help but be fascinated by the idea of programming in the small much along the lines of Clay's piece on Situated Software http://www.shirky.com/writings/situated_software.html . A Fisher seems to be an interesting problem for this sort of thinking.

Gary

Left unmentioned here is Nelson - see www.caelo.com. It is way way better than a "Google for Email", which it does as well as any other IR-based search of text, but also because it also takes into account that:

a) People use Email to manage tasks
b) People use Email to store documents
c) People work on a project basis
d) People look up email by people, and by time period

Nelson handles all this but doesn't get in the way of your Outlook workflow.

BTW, jwz "invented" nothing. This is a very old topic.

The comments to this entry are closed.

Music

Recently Updated Weblogs

Reading

  • John Battelle: The Search

    John Battelle: The Search
    My favorite book of 2005. Period.


    (*****)

  • Steven D. Levitt: Freakonomics : A Rogue Economist Explores the Hidden Side of Everything

    Steven D. Levitt: Freakonomics : A Rogue Economist Explores the Hidden Side of Everything
    "Just because two things are correlated does not mean that one causes the other. A correlation simply means that a relationship exists between two factors -- let's call them X and Y -- but it tells you nothing about the direction of that relationship. It's possible that X causes Y; it's also possible that Y causes X; and it may be that X and Y are both being caused by some other factor, Z.

    Economics is, at root, the study of incentives: how people get what they want, or need, especially when other people want or need the same thing.

    Incentives are the cornerstone of modern life. The conventional wisdom is often wrong. Dramatic effects often have distant, even subtle, causes. Experts use their informational advantage to serve their own agenda. Knowing what to measure and how to measure it makes a complicated world much less so." (*****)

  • Malcolm Gladwell: Blink

    Malcolm Gladwell: Blink
    A book of anecdotes about the power of thinking without thinking; this book is a more interesting read than Gladwell's previous, The Tipping Point.

    New York Times: "Gottman believes that each relationship has a DNA, or an essential nature. It's possible to take a very thin slice of that relationship, grasp its fundamental pattern and make a decent prediction of its destiny. Gladwell says we are thin-slicing all the time -- when we go on a date, meet a prospective employee, judge any situation. We take a small portion of a person or problem and extrapolate amazingly well about the whole."

    David Brooks, who wrote that review, adds: "Isn't it as possible that the backstage part of the brain might be more like a personality, some unique and nontechnological essence that cannot be adequately generalized about by scientists in white coats with clipboards?" (*****)

  • Paul Graham: Hackers and Painters

    Paul Graham: Hackers and Painters
    I don't agree with some parts of this book, but I truly loved reading it, and it really made me think. I referenced it in my weblications and superhacker and phoneboy posts. Favorite chapter is How to Make Wealth. (Thanks, Ev.) (*****)

  • Joel Spolsky: Joel on Software

    Joel Spolsky: Joel on Software
    Joel is really good at wielding "diverse and occasionally related matters of interest to software developers, designers, and managers, and those who, whether by good fotune or ill luck, work with them in some capacity."

    Joel on Software embodies the principle of "Welcome to management! Guess what? Managing software projects has nothing at all to do with programming." This book, a compendium of the website's wisdom, is useful for everyone from team leads estimating schedules to software CEOs developing competitive strategy. (*****)

  • Bruce Sterling: Tomorrow Now: Envisioning The Next Fifty Years

    Bruce Sterling: Tomorrow Now: Envisioning The Next Fifty Years
    Bruce wrote this book to come to terms with seven novel aspects of the twenty-first century, situations that are novel to that epoch and no other. It's about future possibilities.

    "This is the future as it is felt and understood: via human experience... The years to come are not merely imaginary. They are history that hasn't happened yet. People will be born into these coming years, grow to maturity in them, struggle with their issues, personify those years, and bear them in their flesh. The future will be lived." Here here, well-spoken, Bruce. (*****)

  • The World's 20 Greatest Unsolved Problems: John Vacca

    The World's 20 Greatest Unsolved Problems: John Vacca
    "Science has extended life, conquered disease, and offered new sexual and commercial freedoms through its rituals of discovery, but many unsolved problems remain...

    If support for science falters and if the American public loses interest in it, such apathy may foster an age in which scientific elites ignore the public will and global imperatives." (*****)

  • Paul Hawken, Amory Lovins, L. Hunter Lovins : Natural Capitalism: Creating the Next Industrial Revolution

    Paul Hawken, Amory Lovins, L. Hunter Lovins : Natural Capitalism: Creating the Next Industrial Revolution
    I had the pleasure recently of meeting Amory Lovins and hearing him talk about Twenty Hydrogen Myths and the design of hypercar. (He also talked about Bonobos... wow.) I'm a convert to the way of thinking espoused in Natural Capitalism. I used to be cynical about the future, but Amory's work has made me a believer that many great things are about to come. The best way to predict the future is to invent it. (*****)

  • Merrill R. Chapman: In Search of Stupidity: Over 20 Years of High-Tech Marketing Disasters

    Merrill R. Chapman: In Search of Stupidity: Over 20 Years of High-Tech Marketing Disasters
    In hilarious prose, this book catalogs lots of stoopid high-tech marketing decisions. It offers clear, detailed analysis of many a marketing mishap, with what happened, why, and how to avoid such stupidity. Might just be the best. book. ever... (*****)

  • Paul Krugman: The Great Unraveling: Losing Our Way in the New Century

    Paul Krugman: The Great Unraveling: Losing Our Way in the New Century
    A book exposing the pitfalls of crony capitalism, from corrupt corporations straight up to the executive branch of our government. Krugman is nonpartisan -- what he exposes is foolish short-term thinking on the part of recent United States policies. The patriotic thing to do, he advises, is to fix these economic problems now before they become much harder to solve.

  • Henry Petroski: Small Things Considered: Why There Is No Perfect Design

    Henry Petroski: Small Things Considered: Why There Is No Perfect Design
    "Design can be easy and difficult at the same time, but in the end, it is mostly difficult." (*****)

  • Alexander Blakely: Siberia Bound

    Alexander Blakely: Siberia Bound
    One of my favorite books of the past few years. Xander is a master storyteller. (*****)

  • Susan Scott: Fierce Conversations

    Susan Scott: Fierce Conversations
    How to make every conversation count. One of my favorite books of the last decade. (*****)

Blog powered by Typepad
Member since 08/2003