Tag Archive for 'google' Page 2 of 8



How Google reader can finally start making money

Today, you would have heard that Newsgator, Bloglines, Me.dium, Peepel, Talis and Ma.gnolia have joined the APML workgroup and are in discussions with workgroup members on how they can implement APML into their product lines. Bloglines created some news the other week on their intention to adopt it, and the announcement today about Newsgator means APML is now fast becoming an industry standard.

Google however, is still sitting on the side lines. I really like using Google reader, but if they don’t announce support for APML soon, I will have to switch back to my old favourite Bloglines which is doing some serious innovating. Seeing as Google reader came out of beta recently, I thought I’d help them out to finally add a new feature (APML) that will see it generate some real revenue.

What a Google reader APML file would look like
Read my previous post on what exactly APML is. If the Google reader team was to support APML, what they could add to my APML file is a ranking of blogs, authors, and key-words. First an explanation, and then I will explain the consequences.

In terms of blogs I read, the percentage frequency of posting I read from a particular blog will determine the relevancy score in my APML file. So if I was to read 89% of Techcrunch posts – which is information already provided to users – it would convert this into a relevancy score for Techcrunch of 89% or 0.89.

ranking

APML: pulling rank

In terms of authors I read, it can extract who posted the entry from the individual blog postings I read, and like the blog ranking above, perform a similar procedure. I don’t imagine it would too hard to do this, however given it’s a small team running the product, I would put this on a lower priority to support.

In terms of key-words, Google could employ its contextual analysis technology from each of the postings I read and extract key words. By performing this on each post I read, the frequency of extracted key words determines the relevance score for those concepts.

So that would be the how. The APML file generated from Google Reader would simply rank these blogs, authors, and key-words - and the relevance scores would update over time. Over time, the data is indexed and re-calculated from scratch so as concepts stop being viewed, they start to diminish in value until they drop off.

What Google reader can do with that APML file
1. Ranking of content
One of the biggest issues facing consumers of RSS is the amount of information overload. I am quite confident to think that people would pay a premium, for any attempt to help rank the what can be the hundreds of items per day, that need to be read by a user. By having an APML file, over time Google Reader can match postings to what a users ranked interests are. So rather than presenting the content by reverse chronology (most recent to oldest); it can instead organise content by relevancy (items of most interest to least).

This won’t reduce the amount of RSS consumption by a user, but it will enable them to know how to allocate their attention to content. There are a lot of innovative ways you can rank the content, down to the way you extract key works and rank concepts, so there is scope for competing vendors to have their own methods. However the point is, a feature to ‘Sort by Personal Relevance’ would be highly sort after, and I am sure quite a few people will be willing to pay the price for this God send.

I know Google seems to think contextual ads are everything, but maybe the Google Reader team can break from the mould and generate a different revenue stream through a value add feature like that. Google should apply its contextual advertising technology to determine key words for filtering, not advertising. It can use this pre-existing technology to generate a different revenue stream.

2. Enhancing its AdSense programme

blatant ads

Targeted advertising is still bloody annoying

One of the great benefits of APML is that it creates an open database about a user. Contextual advertising, in my opinion is actually a pretty sucky technology and its success to date is only because all the other types of targeted advertising models are flawed. As I explain above, the technology instead should be done to better analyse what content a user consumes, through keyword analysis. Over time, a ranking of these concepts can occur – as well as being shared from other web services that are doing the same thing.

An APML file that ranks concepts is exactly what Google needs to enhance its adwords technology. Don’t use it to analyse a post to show ads; use it to analyse a post to rank concepts. Then, in aggregate, the contextual advertising will work because it can be based off this APML file with great precision. And even better, a user can tweak it – which will be the equivalent to tweaking what advertising a user wants to get. The transparency of a user being able to see what ‘concept ranking’ you generate for them, is powerful, because a user is likely to monitor it to be accurate.

APML is contextual advertising biggest friend, because it profiles a user in a sensible way, that can be shared across applications and monitored by the user. Allowing a user to tweak their APML file for the motivation of more targeted content, aligns their self-interest to ensure the targeted ads thrown at them based on those ranked concepts, are in fact, relevant.

3. Privacy credibility
Privacy is the inflation of the attention economy. You can’t proceed to innovate with targeted advertising technology, whilst ignoring privacy. Google has clearly realised this the hard way by being labeled one of the worst privacy offenders in the world. By adopting APML, Google will go a long way to gain credibility in privacy rights. It will be creating open transparency with the information it collects to profile users, and it will allow a user to control that profiling of themselves.

APML is a very clever approach to dealing with privacy. It’s not the only approach, but it a one of the most promising. Even if Google never uses an APML file as I describe above, the pure brand-enhancing value of giving some control to its users over their rightful attention data, is something alone that would benefit the Google Reader product (and Google’s reputation itself) if they were to adopt it.

privacy

Privacy. Stop looking.

Conclusion
Hey Google - can you hear me? Let’s hope so, because you might be the market leader now, but so was Bloglines once upon a time.

Bloglines to support APML

Tucked away in a post by one of the leading RSS readers in the world, Bloglines had announced that they will be investigating on how they can implement APML into their service. The thing about standards is that as fantastic as they are, if no one uses them, they are not a standard. Over the last year, dozens of companies have implemented APML support and this latest annoucement by a revitalised Bloglines team that is set to take back what Google took from them, means we are going to be seeing a lot more innovation in an area that has largely gone unanswered.

The annoucement has been covered by Read/WriteWeb, APML founders Faraday Media,Â? and a thoughtful analysis has been done by Ross Dawson. Ben Melcalfe had also written a thought-provoking analysis, of the merits of APML.

What this means?

APML is about taking control of data that companies collect about you. For example, if you are reading lots of articles about dogs, RSS readers can make a good guess you like dogs - and will tick the “likes dogs” box on the profile they build of you which they use to determine advertising.Â? Your attention data is anything you give attention to - when you click on a link within facebook, that’s attention data that reveals things about you implicitly.

The big thing about APML is that is solves a massive problem when it comes to privacy. If you look at my definition of what constitutes privacy, the abillity to control what data is collected with APML, completely fits the bill. I was so impressed when I first heard about it, because its a problem I have been thinking about for years, that I immediately joined the APML workgroup.

Privacy is the inflation of the attention economy, and companies like Google are painfully learning about the natural tension between privacy and targetted advertising. (Targetted advertising being the thing that Google is counting on to fund its revenue.) The web has seen a lot of technological innovation, which has disrupted a lot of our culture and society. It’s time that the companies that are disrupting the world’s economies, started innovating to answer the concerns of the humans that are using their services. Understanding how to deal with privacy is a key competitive advantage for any company in the Internet sector. It’s good to see some finally realising that.

Don’t get the Semantic Web? You will after this

Prior to 2006, I had sort of heard of the Semantic Web. To be honest, I didn’t know much – it was just another buzzword. I’ve been hearing about Microformats for years, and cool but useless initiatives like XFN. However to me it was simply just another web thing being thrown around.

Then in August 2006, I came across Adrian Holovaty’s article where he argues journalism needs to move from a story-centric world to a data-centric world. And that’s when it dawned on me: the Semantic web is some serious business.

I have since done a lot of reading, listening, and thinking. I don’t profess to be a Semantic Web expert – but I know more than the average person as I have (painfully) put myself through videos and audios of academic types who confuse the crap out of me. I’ve also read through a myriad of academic papers from the W3C, which are like the times when you read a novel and keep re-reading the same page and still can’t remember what you just read.

Hell – I still don’t get things. But I get the vision, so that’s what I am going to share with you now. Hopefully, my understanding will benefit the clueless and the skeptical alike, because it’s a powerful vision which is entirely possible

1) The current web is great for humans; useless for machines
When you search for ambiguous terms, at best, search engines can algorithmically predict some sort of answer that partially answers your query. Sometimes not. But the complexity of language, is not something engineers can engineer to deal with. After all, without ambiguity of natural languages, the existence of poetry is impossible.

Fine.

What did you think when you read that? As in: “I’ve had it – fine!” which is like another way of saying ok or agreeing with something. Perhaps you thought about that parking ticket I just got – illegal parking gets you fined. Maybe you thought I am applauding myself by saying that was one fine piece of wordcraftship I just wrote, or said in another context, like a fine wine.

Language is ambiguous, and depending on the context with other words, we can determine what the meaning of the word is. Search start-up company Powerset, which is hoping to kill Google and rule the world, is employing exactly this technique to improve search: intelligent processing of words depending on context. So by me putting in “it’s a fine”, it understands the context that it’s a parking ticket, because you wouldn’t say “it’s a” in front of ‘fine’ when you use it to agree with something (the ‘ok’ meaning above).

But let’s use another example: “Hilton Paris” in Google – the worlds most ‘advanced’ search engine. Obviously, as a human reading that sentence, you understand because of the context of those words I would like to find information about the Hilton in Paris. Well maybe.

Let’s see what Google comes up with: Of the ten search results (as of when I wrote this blog posting), one was a news item on the celebrity; six were on the celebrity describing her in some shape or form, and three results were on the actual Hotel. Google, at 30/70 – is a little unsure.

Why is Paris Hilton, that blonde haired thingy of a celebrity, coming up in the search results?

Technologies like Powerset apparently produce a better result because it understands the order of the words and context of the search query. But the problem with these searches, isn’t the interpretation of what the searcher wants – but also the ability to understand the actual search results. Powerset can only interpret so much of the gazilions of words out there. There is the whole problem of the source data, no just the query. Don’t get what I mean? Keep reading. But for now, learn this lesson

Computers have no idea about the data they are reading. In fact, Google pumping out those search results is based on people linking. Google is a machine, and reads 1s and 0s – machine language. It doesn’t get human language

2) The Semantic web is about making what human’s read, machine readable
Tim Berner’s Lee, the guy that invented the World Wide Web and the visionary behind the Semantic Web, prefers to call it the ‘data web’. The current web is a web of documents – by adding this extra data to content – machines will be able to understand it. Metadata, is data about data.

A practical outcome of having a semantic web, is that Google would know that when it pulls up a web page regardless of the context of the words – it will understand what the content is. Think of every word on the web, being linked to a master dictionary.

The benefit of the semantic web is not for humans – at least immediately. The Semantic Web is actually pretty boring with what it does – what is exciting, is what it will enable. Keep reading.

3) The Semantic web is for machines to interpret, not people
A lot of the skeptics of the semantic web, usually don’t see the value of it. Who cares about adding all this extra meta data? I mean heck – Google still was able to get the website I needed – the Hilton in Paris. Sure, the other 60% of the results on that page were irrelevant, but I’m happy.

I once came across a Google employee and he asked “what’s the point of a semantic web; don’t we already enough metadata?” To some extent, he’s right – there are some websites out there that have metadata. But the point of the semantic web is so that machines once they read the information, can start thinking like how a human would and connecting it to other information. There needs to be across the board metadata.

For example, my friend Michael was recently looking to buy a car. A painful process, because there are so many variables. So many different models, different makes, different dealers, different packages. We have websites, with cars for sale neatly categorised into profile pages saying what model it is, what colour it is, and how much. (Which may I add, are hosted on multiple car sites with different types of profiles). A human painfully reads through these profiles, and computes as fast as a human can. But a machine can’t read these profiles.

Instead of wasting his (and my) weekends driving around Sydney to find his car, a machine could find it for him. So, Mike would enter his profile in – what he requires in a car, what his credit limit is, what his prior history with cars are – everything that would affect his judgement of a car. And then, the computer can query every online website with cars to match the criteria. Because the computer can interpret these websites across the board, it can evaluate and it can go back to Michael and say “this is the car for you, at this dealer – click yes to buy”.

The semantic web is about giving computers the information to be able to interpret data, so that it can do what they do really well – compute.

4) A worldwide database
What essentially Berner’s Lee envisions, is turning the entire world wide web into a database that can be queried. Currently, the web looks like Microsoft Word – one swab of text. However, if that swab of text was neatly categorised in an Excel spreadsheet, you could manipulate that data and do what you please – create reports, reorder them, filter, and do whatever until your heart is content.

At university, I was forced to do an Information Systems subject which was essentially about the theory of databases. Damn painful. I learned only two things from that course. The first thing was that my lecturer, tutor, and classmates spoke less intelligible English than a caterpillar. But the second thing was that I learned what information is and how it differs from data. I am now going to share with you that lesson, and save you three months of your life.

You see, data is meaningless. For example, 23 degrees is data. On its own, it’s useless. Another piece of data in Sydney. Again, - useless. I mean, you can think all sorts of things when you think of Sydney, but it doesn’t have any meaning.

Now put together 23 degrees and Sydney, and you have just created information. Information is about creating relationships between data. By creating a relationship, an association, between these two different pieces of data – you can determine it’s going to be a warm day in Sydney. And that is what information is: Relationship building; connecting the dots; linking the islands of data together to generate something meaningful.

The semantic web is about allowing computers to be able to query the sum of human knowledge like one big database to generate information

Concluding thoughts
You are probably now starting to freak out and think “Terminator” images with computers suddenly erupting form under your computer desk, and smashing you against the wall as a battle between humans and computers begins. But I don’t see it like that.

I think about the thousands of hours humans spend trying to compute things. I think of the cancer research, whereby all this experimentation occurring in labs, is trying to connect new pieces of data with old data to create new information. I think about computers being about to query the entire taxation legislation to make sure I don’t pay any tax, because it knows how it all fits together (having studied tax, I can assure you – it takes a lifetime to only understand a portion of tax law). In short, I understand the vision of the Semantic web as a way of linking things together, to enable computers to compute – so that I can sit on my hammock drinking my beer, as I can delegate the duties of my life to the machines.

All the semantic web is trying to do, is making sure everything is structured in a consistent manner, with a consistent dictionary behind the content, so that a machine can draw connections. As Berner’s Lee said on one of the videos I saw: “it’s all about creating links”.

The process to a Semantic Web is boring. But once we have those links, we can then start talking about those hammocks. And that’s when the power of the internet - the global network - will really take off.