My Photo


« Less-than-a-Grand Theft Auto | Main | Web 2.0 »



one question: do they really need to count for you to be happy? i've noticed recently (maybe it's not really new) a branch of blogging devoted to self-optimisation, think of brain overclocking if you will, and then i think about pekka himanen writing in hacker's ethic of hackers breaking out of bounds of the iron cage and i say to myself, they're not breaking, they're adding extra bars from inside.


I enjoyed your back of the envelope calculations. Aside from reassuring me that the businesses of storing, transmitting, sorting and searching information will probably continue to grow I wondered: When information is dirt cheap to store, transmit and share how do you control it?

You just got the ball rolling for me on what became far too long a ramble to post here. If you're interested you can read it here:


Alek, point well taken. They don't really need to count for me to be happy. However, I am always confronted with a thousand or more things I want to read, and I'm never going to be able to read them all. And so I continue to search for a means by which to prioritize what would be best for me to read next.

NudeCybot, your post is quite thought-provoking. I already know of a company that was formed for the express purpose of filing patents that could then be sold or licensed.

Perhaps to counter that we need to patent a machine that stores all combinations of 0's and 1's to a googol of bits, and patent them all -- and then show how anything anyone tries to patent from then on maps to one of the bit sequences we've already patented... perhaps the best way to level the playing field is to destroy it.


Greg Linden: "If you want to get started building your personal web, take a look at Seruku and Recall Toolbar."

Also, I'm very excited today. Fluffy Bunny (aka Google Desktop) has been a thrilling tool to use... instant gratification...


(I just found some notes from Rohit written just after I wrote this post. I'm including it here because I think it offers a compelling vision...)

Personal Web Platform

Battelle's Web 2.0 Conference is next week. The theme is "the web as
platform", the trend of applications moving from stand-alone, with
local data, to networked, with shared, remote data. As this happens,
the details of one's local operating system become less relevant. All
you'll need is a web browser, and perhaps a select few other web-based
applications. With less stuff required of local applications, this
forms a threat to Microsoft's desktop dominance. My concern is whether
another company might replace Microsoft's desktop monopoly with a
"web OS" monopoly.

In this web-based world, I'd like to keep all my personal data
remotely, so that I can access it equally well from a Linux
workstation, an Apple laptop, a Palm phone and a Windows-based
internet-access terminal. Still, I'd like to leverage my local
resources. For example, my laptop and handheld should be able to
access my data while offline, and my workstation should be able to
search it quickly using a local database.

Another big advantage of storing data remotely is that, if my laptop
hard drive fails, or I get a new workstation, I don't have to worry
about copying all my stuff. I just log into my personal web and,
voila, all my stuff is right there.

As implied above, I should be able to search all my data. Apple's
Spotlight will search my local private data on a Mac, as will Gnome's
Beagle and Longhorn on Windows. But synchronizing my data across these
platforms is still a pain, so these don't really solve the problem.

What's needed to make this possible? We already have standards for
accessing most remote data. Email can be accessed with IMAP, Address
books with LDAP, files with WebDAV, chat with Jabber, etc. There are
even providers for most of these services, and each has clients for
most platforms. So what's missing? A little glue, I think.

We need a standard way to store bookmarks, history and other personal
meta data (like the names of your mail, instant messaging, and WebDav
accounts) and a standard way to intercept personal data so that it can
be indexed and searched, either locally or remotely. But who will
build the glue?

I think this has to be a cross-platform open-source project. It has
too much potential as a chokepoint to be entrusted to a commercial party.
Inspired by ideas around Fisher, much of this could be achieved with a
proxy server. It could proxy HTTP, IMAP, POP, LDAP, Jabber, etc.,
transparently indexing and caching things. One could connect to it as a

web server to search and configure things. Applications can contact it
through HTTP to get their configuration data. It can expose web
services APIs for all this too, so that native applications can be
built for search, etc. If we could, e.g., get Mozilla and other
desktop applications to look for the daemon on install, and, when it's
present, configure themselves through it by default, then all that one
should need to do on a new machine is tell the daemon where on the web
your personal configuration lives, and you're good to go. With one
step, your files, address book, bookmarks, cookies, logins, email
configuration, etc. would all be there.

The daemon would mostly be a framework for plugins. For example,
search needn't be hard-wired into it, it should be a plugin. Different
vendors might provide different personal search applications.
Similarly, a spam-detectors could easily be plugged into the email
processing pipeline, etc.

(Editor's note: sounds like Google Desktop is a great first step toward a Personal Proxy...)


John Markoff on the current size of The Google Web:

Currently Google, the largest search engine, indexes about 4 billion Web pages, 880 million images and 845 million Usenet messages. The service is used by almost 82 million people each month, according to Nielsen/NetRatings.


Ten Reasons Why on Personal Web:

A good search obliviates the need for a hierarchy. The conceptual flaw inherent ot most email clients and bookmark managers is the assumption the best way for to make the information useful is to organize it (usually manually) into categories or hierarchies. To be fair, this is probably less a conceptual flaw, than a historical technical limitation -- you need a certain level of processing power and storage space.

Solving for this allowed Google to take a commanding lead in the "finding information" field. At one point in the history of the Web, human-managed hierarchical directories like Yahoo were still a valuable method to get to information kind of like what you wanted. Enter Google. By applying brute force processing power, suddenly a search turns up relevant results, so I don't need the legion of Yahoo indexers as much anymore. (Librarians shudder at this line of thought.)

Furl is more valuable than other bookmark managers, because it indexes the full text of every page I "furl." Not just the page title, not just the metadata I add, but the full text of the page. I quickly did away with categorizing "furled" pages once I realized I can use Furl's fairly decent, Google-like query syntax. That's great because heirarchies are bitch to maintain and keep relevant (just ask Yahoo). Furl is like Lookout for bookmarks. Or, more to the point, it's like Google for bookmarks.

Well stated.

I also found this nice table on the Size of the Prize from Charles Ferguson's article in the MIT Tech Review, What's Next for Google.

Size of Searchable Internet

Also, I found Jeremy's post on email and browser URL extraction and search to be particularly interesting in how it relates to one's Personal Web.

What I really need is a tool that acts like a personal that's automatically fed from the combination of URLs embedded in e-mail messages as well as my browser history. It could keep a database of those URLs, count the frequency with which I visit them as well as how often they appear in e-mail that I send or receive. And if it provided the ability to tag and annotate the URLs, all the better.

In fact, if it was like a private "satellite" version of that had the ability to check with the larger public that'd be even better. The idea being that for public URLs which end up in my local (private) database, I could still benefit form the collective tagging and annotation efforts of those in the outside world.

I can imagine a second generation of this system that goes a step further: fetching the web content that each of the URLs points to, storing a cached copy locally, and indexing it just like a traditional web search engine might. Bonus points for integration with something like the Slogger extension for Firefox, so that it doesn't have to store duplicate data.

If I had a copy of the source code for handy, I could probably get the first cut of this going in a day's time. That might be a day well spent.

Hmm. Between Firefox (and Slogger) plus Thunderbird, it might even be possible to do this in a cross-platform way.

Sounds like some kind of wonderful.

Wai Yip Tung

Hi Adam, I have stumbled upon your idea on Peronsal web. I must introduce you the open source personal search engine project MindRetrieve that I have launched. It does fairly close to what you describe, a http proxy that saves and let you search everything you've read. I did some math myself and am quite convinced that it is feasible to save a copy of everything we ever read. MindRetrieve start modest and save only a trim down version of web pages right now. It is already very handy for me to bring back a lot of things I have recently read. It runs on Windows and Linux right now with a Mac version to follow soon.


Solving for this allowed Google to take a commanding lead in the "finding information" field. At one point in the history of the Web, human-managed hierarchical directories like Yahoo were still a valuable method to get to information kind of like what you wanted.

The comments to this entry are closed.



  • John Battelle: The Search

    John Battelle: The Search
    My favorite book of 2005. Period.


  • Steven D. Levitt: Freakonomics : A Rogue Economist Explores the Hidden Side of Everything

    Steven D. Levitt: Freakonomics : A Rogue Economist Explores the Hidden Side of Everything
    "Just because two things are correlated does not mean that one causes the other. A correlation simply means that a relationship exists between two factors -- let's call them X and Y -- but it tells you nothing about the direction of that relationship. It's possible that X causes Y; it's also possible that Y causes X; and it may be that X and Y are both being caused by some other factor, Z.

    Economics is, at root, the study of incentives: how people get what they want, or need, especially when other people want or need the same thing.

    Incentives are the cornerstone of modern life. The conventional wisdom is often wrong. Dramatic effects often have distant, even subtle, causes. Experts use their informational advantage to serve their own agenda. Knowing what to measure and how to measure it makes a complicated world much less so." (*****)

  • Malcolm Gladwell: Blink

    Malcolm Gladwell: Blink
    A book of anecdotes about the power of thinking without thinking; this book is a more interesting read than Gladwell's previous, The Tipping Point.

    New York Times: "Gottman believes that each relationship has a DNA, or an essential nature. It's possible to take a very thin slice of that relationship, grasp its fundamental pattern and make a decent prediction of its destiny. Gladwell says we are thin-slicing all the time -- when we go on a date, meet a prospective employee, judge any situation. We take a small portion of a person or problem and extrapolate amazingly well about the whole."

    David Brooks, who wrote that review, adds: "Isn't it as possible that the backstage part of the brain might be more like a personality, some unique and nontechnological essence that cannot be adequately generalized about by scientists in white coats with clipboards?" (*****)

  • Paul Graham: Hackers and Painters

    Paul Graham: Hackers and Painters
    I don't agree with some parts of this book, but I truly loved reading it, and it really made me think. I referenced it in my weblications and superhacker and phoneboy posts. Favorite chapter is How to Make Wealth. (Thanks, Ev.) (*****)

  • Joel Spolsky: Joel on Software

    Joel Spolsky: Joel on Software
    Joel is really good at wielding "diverse and occasionally related matters of interest to software developers, designers, and managers, and those who, whether by good fotune or ill luck, work with them in some capacity."

    Joel on Software embodies the principle of "Welcome to management! Guess what? Managing software projects has nothing at all to do with programming." This book, a compendium of the website's wisdom, is useful for everyone from team leads estimating schedules to software CEOs developing competitive strategy. (*****)

  • Bruce Sterling: Tomorrow Now: Envisioning The Next Fifty Years

    Bruce Sterling: Tomorrow Now: Envisioning The Next Fifty Years
    Bruce wrote this book to come to terms with seven novel aspects of the twenty-first century, situations that are novel to that epoch and no other. It's about future possibilities.

    "This is the future as it is felt and understood: via human experience... The years to come are not merely imaginary. They are history that hasn't happened yet. People will be born into these coming years, grow to maturity in them, struggle with their issues, personify those years, and bear them in their flesh. The future will be lived." Here here, well-spoken, Bruce. (*****)

  • The World's 20 Greatest Unsolved Problems: John Vacca

    The World's 20 Greatest Unsolved Problems: John Vacca
    "Science has extended life, conquered disease, and offered new sexual and commercial freedoms through its rituals of discovery, but many unsolved problems remain...

    If support for science falters and if the American public loses interest in it, such apathy may foster an age in which scientific elites ignore the public will and global imperatives." (*****)

  • Paul Hawken, Amory Lovins, L. Hunter Lovins : Natural Capitalism: Creating the Next Industrial Revolution

    Paul Hawken, Amory Lovins, L. Hunter Lovins : Natural Capitalism: Creating the Next Industrial Revolution
    I had the pleasure recently of meeting Amory Lovins and hearing him talk about Twenty Hydrogen Myths and the design of hypercar. (He also talked about Bonobos... wow.) I'm a convert to the way of thinking espoused in Natural Capitalism. I used to be cynical about the future, but Amory's work has made me a believer that many great things are about to come. The best way to predict the future is to invent it. (*****)

  • Merrill R. Chapman: In Search of Stupidity: Over 20 Years of High-Tech Marketing Disasters

    Merrill R. Chapman: In Search of Stupidity: Over 20 Years of High-Tech Marketing Disasters
    In hilarious prose, this book catalogs lots of stoopid high-tech marketing decisions. It offers clear, detailed analysis of many a marketing mishap, with what happened, why, and how to avoid such stupidity. Might just be the best. book. ever... (*****)

  • Paul Krugman: The Great Unraveling: Losing Our Way in the New Century

    Paul Krugman: The Great Unraveling: Losing Our Way in the New Century
    A book exposing the pitfalls of crony capitalism, from corrupt corporations straight up to the executive branch of our government. Krugman is nonpartisan -- what he exposes is foolish short-term thinking on the part of recent United States policies. The patriotic thing to do, he advises, is to fix these economic problems now before they become much harder to solve.

  • Henry Petroski: Small Things Considered: Why There Is No Perfect Design

    Henry Petroski: Small Things Considered: Why There Is No Perfect Design
    "Design can be easy and difficult at the same time, but in the end, it is mostly difficult." (*****)

  • Alexander Blakely: Siberia Bound

    Alexander Blakely: Siberia Bound
    One of my favorite books of the past few years. Xander is a master storyteller. (*****)

  • Susan Scott: Fierce Conversations

    Susan Scott: Fierce Conversations
    How to make every conversation count. One of my favorite books of the last decade. (*****)

Blog powered by Typepad
Member since 08/2003