What is data?

The leading voices in technology have exploded in discussion about data portability, data rights, and the future of web applications. As an active member in the DataPortability Policy group, here is my suggestion on how the debate needs to proceed: break it down. Michael Arrington seems pretty convinced you own all your data, but I don’t think that’s a fair thing to say - and at core is the reason he is clashing with Robert Scoble’s view. For things to proceed, I really think a deeper analysis of the issues need to be made.

1) Define the difference between data, information and knowledge. There’s a big difference.
2) Determine what things are. (is an e-mail address data or information?)
3) Recognise the difference between ownership, rights and their implications.
4) Determine what rights (if that’s what it is) the various entities have over data (users, web apps, etc).

This is a big area and has a lot of abstract concepts - break it down and debate it there.

Some of my own thoughts to give some context

1) Data is an object and information is generated when you create linkages between different types of data - the ‘relationships’. Knowledge is the application of information.

  • 2000 is data - a symbol with no meaning. Connect it with other data, like the noun "year", and you have information because 2008 now has meaning. Connect that information with other information, like "computer bug" and "HSBC and you now have an application of that information. That being, there was an issue with the Y2K bug that has something to the bank HSBC.

2) Define what things are

What’s an e-mail address, a phone number, a social graph, an image, a podcast…I’m not entirely sure. I wouldn’t be blogging this if I had all the answers. Once we agree on definitions, we can then start categorising them and applying a criteria.

3) Ownership:

Here is something Steve Greenberg explained to me

- Ownership is relevant when there is scarcity.
- Ownership is the ability to deny someone else’s use of the asset.
- So, if data is shared and publicly available, it is a practical impossibility for me to deny use
- and if data is available in a form where I can’t control others’ use of it, I can not really claim to own it

Nitin Borwankar has a very different argument: you should have ownership based on property rights. He explained that to me here .

4) Rights over data

I personally think no one owns data (which is inspired by the definition of data being inherently meaningless); instead you own things further down the value chain when that data becomes something with value. You own your overall blog posts - but not the words.

But again, this goes back to what is data?

The value chain for information

Lately, I’ve been doing a lot of thinking about the value chain of information, based on the Porter model of doing a value chain analysis . Given there is an undeniable trend to an knowledge-based economy (that is, if we’re not already there!), it seems pretty valuable that we should at least understand the different facets in the value chain to better understand the information sector.

Below are some thoughts about what I think are the broad aspects of the value system, with some commentary under each to help you understand my thinking. I’ve used common social computing sites to help illustrates the concepts, as everyone can relate to them. Also my definitions for data, information, and knowledge .

value chain is sweet
The value chain
1) Data collection
- value is in the storage
Competitive advantage: who offers the consumers the lowest price for the most storage. You should not just consider this in terms of cost in hosting but also about whether is costs the user their rights to control over some of their data.
Example: MySpace is where you store all your demographic data; SmugMug is where you store all your photos (which I consider data)

2) Data processing
- value is in the ability to manipulate the data
Competitive advantage: The infrastructure to process vasts amount of data at the highest output with the lowest cost
Example: Facebook calculates how many friends you have. The raw computing power to calculate the information requires substantial computing power, which is why Friendster fell when it captured the imagination of the industry as the first major social networking site.

3) Information generation
- Value is in the type and diversity of information. The connection of data (objects) is what generates information. Requires unique ability to understand what data inputs to pull.
Competitive advantage: Ability to access the most data (ie, relationships with the data storage components in the chain), and be able to creatively apply the data in a unique way.
Example: LinkedIn allows me to know that I am two degrees separated from a certain individual. The ability for LinkedIn to do that is a combination of what data they can use as well as the ability to process it. Essentially, the creativity of the company’s management to determine the feature’s value and the relationships with storage vendors or methods of using their own storage. In a DataPortability enabled world, it’s not so much how much data you can store of a user - but how much you can access from the storage vendors ie, relationships with these vendors.

4) Knowledge application
- value is in the application of information
Competitive advantage is on the application of information in a unique way that has not been done before
Example: A network analysis of my social graph. So if a social networking sites can tell me that 48% of my friends are male; and another piece of information that 98% of them are heterosexual; then therefore it is likely I am a straight male. The ability to derive insight, despite the multiple piece of information available, is filtered by those with the unique ability to recognise application of information in certain ways. The determination that I am straight is inference, which is a higher order type value as opposed to just information (which is grounded in hard data and more based on fact).

Implications of the value chain
It is important to note, and why it will be difficult for you to conceptualise the above, is that the Internet industry which is the backbone of the Information Sector of the economy, is still relatively immature. Flickr for example does most of the value chain - they store my photos, they allow me to make changes to the photos and add addition data like tags; they generate information by allowing me to organise my photos into sets (hence giving more value to the photo by putting it into context). And of course, they allow for knowledge application through their community - people passing by, leaving comments, is quite a unique thing that is unique to Flickr.

By better understanding the value chain, hopefully we can also realise that business can thrive by focussing on specific areas and it may not be in their interest to be in all areas. For example, the notion that locking up a person’s user data as being a competitive advantage is silly, if you can offer value through knowledge application.

To put the above in context, MySpace’s recent data availability announcement is a step into the direction of DataPortability (something that will take until the end of this year to finalise at minimum), but whilst Google and Facebook race to offer similar services to ‘lock’ their data, they are in fact missing the point. The value of MySpace for example is the community, and they get value in accessing data and information from as many diverse places as possible to apply that in a unique way. Because they think locking in the data is what determines their business strategy, it forces them to compete in the data storage market - and that is something I would not want to be in given the ability for it to be commoditised, and the massive compliance demands with government and user expectations with their rights. As highlighted by Nitin , data redundancy is a big issue so battling in the storage market puts you at risk if you are solely relying on it as your source for information and knowledge.

As always, I write my blog posts to extend on my thoughts. I’d love feedback and people to challenge the assumptions I’ve made, because I think this can be a very valuable tool in how we view businesses on the web.

Emerging trends? Nope - its been a long time coming

When I read the technology news, concepts about cloud computing still seem to be debated . I think to myself: you are kidding me right? I take a step back and think maybe the future won’t be like the current mantra, but then again, trends take time to materialise.

Scanning through my hard-disk, I could not help but laugh after I found a document I wrote to a friend in February 2006 - and as I said in the document "Those six points, as rough as they are, form core elements in my thinking on how I approach business on the Internet …[I’ve been thinking about it] since November 2003"

So below, is literally a copy and paste of that document that has seeds from way back in 2003 when I submitted a grant application for a business idea (ahem, no response obviously…). The fact that nearly half a decade has passed since I first synthesised these ideas (and no doubt, from reading of the thinkers of the day not just me being imaginative) means they are not flake predictions: they are real. Ready?

1. Digital future. All information – news reports, television shows, educational text books, radio shows – are being digitalised, coexisting with their analogue versions. Whether the digital replicas replace their analogue counterparts is pure speculation. But one fact we cannot ignore is that the possibility is there – all content is now digital. And consumers will switch to the digital version if the value of the content consumed is better realised in digital form

  • Quick case study. Many pundits believe newspapers will not exist in 15 years. I know they won’t exist in 15 years, and I have spent three years thinking about this very point. At first I used to think digital replicas, as shown by http://www.newsstand.com, was what was going to transform the newspaper business. What I didn’t realise, is that the current newspaper experience far exceeds the digital replica (I was hung up on the idea of electronic paper [www.eink.com] – which still remains a big possibility). But I knew the digital future was going to make the current newspaper business obsolete – there is more value out of digitial. It only just hit me recently by observing my own behaviour– traditional newspapers are not going to be replaced by digital versions – rather, the method¬ that people receive their news is going to change. And this fact is embodied by the recent acknowledgment of the world’s great newspapers of not being in the newspaper business anymore, but in the information business now. I used to read every single major newspaper, and several international newspapers, as I was a debater – I was a heavy news consumer, and I still am. Today, I still follow the news very closely – but I have not read a newspaper all year. Why? I receive all my information needs through websites, RSS feeds and blogs. A new method, made possible by the digital future. People means of consuming content will change because of digital.

2. Internet as infrastructure. It doesn’t take a genius to realise that the internet will be the core infrastructure of anything to do with information and communications. The power of the internet as infrastructure to communications and information unlocks opportunities that are transforming the world. Radio, TV, phone calls – you name it – can be done via the internet protocol now.


3. Content is king, distribution is queen – but advertising is what pays for the cost of that sting. Google now makes more revenue than the three prime time television stations in the USA. In monetary terms, that’s about $10 billion a year. And yet, 99% of that revenue comes from one thing – Google’s click-through advertising (about 45% from Google results, the rest from the Google network of publishers through adsense). HarperCollins announced last week that they are trialling a new business model of providing books for free but supported by advertising – the consumer book business up until then was literally the only segment of media not reliant on advertising as a revenue model. Whilst broadcasting organisations make money from several sources, advertising is literally the backbone of their revenue. To make money out of any content, you place a huge reliance on advertising.
In short, if you want to make money out of content, you need to understand advertising

4. One-to-one advertising is the superior form of advertising. Partly due to technological factors, the mass media could only advertise through a one-to-many medium – meaning one message to many. The digital-internet future has transformed that ability, by customising content on a one-to-one basis. If advertising, and content can be targeted to an individual’s personality profile and preferences, it allows for the value of the content to be maximised, with 1-to-1 advertising returning a higher return on campaigns - far superior than any other form of advertising. Superior because it can make advertising more relvant for consumers (ie, higher response rate), it can increase advertising inventory (mass media advertising is a bit like throwing pamphlets out of a plane, hoping the right people catch them - 1-to-1 means the right people get it at minimal cost and best of all, it creates better accountability which is what advertisers now demand.

5. The best business practice for one-to-one advertising is not there yet. The internet is the platform that enables one-to-one advertising, and yet, this opportunity has still not been fully exploited. There is a massive need in the market, for a means of providing personalised advertising far superior to the current technologies and methods. Google populised an innovative form of advertising through click-throughs. However internet click-throughs, despite providing more accountable and better targeted advertising, still lacks the ability to unleash the real power of one-to-one advertising. The power of the internet as a one-to-one advertising platform is still in its infancy

6. Privacy matters. Privacy is the right to determine what information is available about you, when you want it to be available, and to whom you want it available to. Current practices of companies who gain as much information about you through your sales history, your activity on the web, and the like – are often doing so without the full knowledge of the consumer. It is information collected by spying on a consumer, and whilst some people retaliate by various measures (ie, fake information, anonymous proxies), there is great mistrust by the public in providing personal information, or rather, too much to one organisation. If information is to be used about people, there needs to be proper approval – both for legal reasons (a business model cannot rely on consumer stupidity) but also for the integrity of the data (ie, a cooperative consumer will provide more reliable data)

  • Companies like Double Click who would collect your surfing history relied on placing a cookie on your computer – what happens if you delete that cookie? And what happens if your dad, mum, and cousin from Brazil, use the same computer as you? That creates a fairly inconsistent “profile” of a person that is to be targeted

I had totally forgotten I had written that. And reading it now it’s a bit lame and I could probably extend on things a little bit - actually there are things I have actually written in blog posts this last year. Better still, I can provide actual evidence that validate these trends as advancing like the existence of the VRM project for advertising, the big clash with Facebook and privacy (and lets not forget the first time ), and Microsoft’s recent announcement about moving away from software (to pick but a few examples).

If this is what I was seeing in November 2003 as a naive university student absorbing what the industry trends were back then; February 2006 when I wrote to my friend what I thought he needed to consider about the future; and the fact I still agree with it in May 2008 - I think things are beyond speculation: these are long-term trends that are entrenched.