On the future of search

Robert Scoble has put together a video presentation on how Techmeme, Facebook and Mahalo will kill Google in four years time. His basic premise is that SEO’s who game Google’s algorithm are as bad as spam (and there are some pissed SEO experts waking up today!). People like the ideas he introduces about social filtering, but on the whole - people are a bit more skeptical on his world domination theory.

There are a few good posts like Muhammad’s on why the combo won’t prevail, but on the whole, I think everyone is missing the real issue: the whole concept of relevant results.

Relevance is personal

When I search, I am looking for answers. Scoble uses the example of searching for HDTV and makes note of the top manufacturers as something he would expect at the top of the results. For him - that’s probably what he wants to see - but for me, I want to be reading about the technology behind it. What I am trying to illustrate here is that relevance is personal.

The argument for social filtering, is that it makes it more relevant. For example, by having a bunch of my friends associated with me on my Facebook account, an inference engine can determine that if my friend called A is also friends with person B, who is friends with person C - than something I like must also be something that person C likes. When it comes to search results, that sort of social/collaborative filtering doesn’t work because relevance is complicated. The only value a social network can provide is if the content is spam or not - a yes or no type of answer - which is assuming if someone in my network has come across this content. Just because my social network can (potentially) help filter out spam, doesn’t make the search results higher quality. It just means less spam results. There is plenty of content that may be on-topic but may as well be classed as spam.

Google’s algorithm essentially works on the popularity of links, which is how it determines relevance. People can game this algorithm, because someone can make a website popular to manipulate rankings through linking from fake sites and other optimisations. But Google’s pagerank algorithm is assuming that relevant results are, at their core, purely about popularity. The innovation the Google guys brought to the world of search is something to be applauded for, but the extreme lack of innovation in this area since just shows how hard it is to come up with new ways of making something relevant. Popularity is a smart way of determining relevance (because most people would like it) - but since that can be gamed, it no longer is.

The semantic web

I still don’t quite understand why people don’t realise the potential for the semantic web, something I go on about over and over again (maybe not on this blog - maybe it’s time I did). But if it is something that is going to change search, it will be that - because the semantic web will structure data - moving away from the document approach that webpages represent and more towards the data approach that resembles a database table. It may not be able to make results more relevant to your personal interests, but it will better understand the sources of data that make up the search results, and can match it up to whatever constructs you present it.

Like Google’s page rank, the semantic web will require human’s to structure data, which a machine will then make inferences - similar to how Pagerank makes inferences based on what links people make. However Scoble’s claim that humans can overtake a machine is silly - yes humans have a much higher intellect and are better at filtering, but they in no way can match the speed and power of a machine. Once the semantic web gets into full gear a few years from now, humans will have trained the machine to think - and it can then do the filtering for us.

Human intelligence will be crucial for the future of search - but not in the way Mahalo does it which is like manually categorising pieces of paper into a file cabinet - which is not sustainable. A bit like how when the painters of the Sydney harbour bridge finish painting it, they have to start all over again because the other side is already starting to rust again. Once we can train a machine that for example, a dog is an animal, that has four legs and makes a sound like “woof” - the machine can then act on our behalf, like a trained animal, and go fetch what we want; how those paper documents are stored will now be irrelevant and the machine can do the sorting for us.

The Google killer of the future will be the people that can convert the knowledge on the world wide web into information readeable by computers, to create this (weak) form of artificial intelligence. Now that’s where it gets interesting.

4 Responses to “On the future of search”


  1. 1 JofArnold

    In my own biased way, I couldn’t agree more. It’s a really interesting idea, but Mahalo’s method is surely not fast enough or scalable enough to compete with Google.

    The reason I am “biased” is because we’ve already proven, via our Blog Friends facebook app, that simple keyword search, contextualized and filtered via your social network, is something that can be achieved for real for blogs.

    Jof Arnold
    COO, i-together ltd.

  2. 2 Yihong Ding

    I agree with what you said. Personalization is definitely an important feature of WWW in the future. Most recently I will post a new article on “some truth about semantic web” at the Semantic Focus blog. Probably you might be interested in that.

    — Yihong

  3. 3 Charlie

    Sorry to be obtuse here but what is a practical example of the semantic web in action?

    I can already type ‘Define: dog’ into Google and it will tell me what a dog is. It’s not a huge leap to type ‘Buy: dog’ and come home after work to find a dog on your doorstep.

    Why do I need my search engine to understand the semantics of what I am asking it to do?

  4. 4 Elias

    The semantic web isn’t about humans - it’s about machines - and that is a point lost on people that don’t understand why we need it . The current web is readable by humans only; the semantic web is about adding an extra layer so that the computers can understand the data.

    So for example, lets say you are looking to buy a car. You manually search through all the car websites; you do research on car models; you are picky because all the cars have varying degrees of information (some with missing service history; others with missing extras). It’s time consuming. Although the web helps you find information, you need to manually process it using your human intelligence. And even then, you only come to a decision because you get over it!

    Enter the semantic web, where all this data is machine readable. So you can tell a computer
    1) You past purchase history
    2) What your preferences are
    3) What limits you have (money, location of vehicle, etc).

    …and then, like a digital personal assistant, a computer can sift through all the webpages (or rather data), and come back to you with a precise recommendation.

    Think Terminator…but before the world destruction! At it’s core, it’s about structuring the way we store data on the web so that a computer can query the entire web like one massive database. The web is unstructured at the moment. For example, a Google search for Hilton Paris will bring up pages on the celebrity Paris Hilton, not just pages on the hotel.

    Does that make sense?

Leave a Reply