We at the DataPortability project have kick started a research phase, because we’ve realised we need to spend more time consulting with the community working out issues which don’t quite have one answer.
As Chris Saad and myself are also experimenting with a new type of social organisation as we incubate the DataPortability project, which I call wikiocracy (Chris calls it participant democracy), I thought I might post these issues on my blog to keep in line with the decentralised ethos we are encouraging with DataPortability. This is something the entire world should be questioning,
So below are some thoughts I have had. They’ve changed a lot since I first thought about what a users data rights are, and no doubt, they will change again. But hopefully my thoughts can act as a catalyst for what people think data rights really are, and a focus on the issue at stake which I conclude as my question. I think the bill of rights for users on the social web is not quite adequate, and we need a more careful analysis of the issues.
It’s the data, stupid
Data is essentially an object. Standalone it’s useless - take for example the name “Elias”. In the absence of anything else, that piece of datum means nothing. However when you associate that name with my identity (ie, appending my surname Bizannes or linking it to my facebook profile), that suddenly becomes “information”. Data is an object and information is generated when you create linkages between different types of data - the ‘relationships’.
Take this data definition from DMReview which defines data (and information):
Items representing facts, text, graphics, bit-mapped images, sound, analog or digital live-video segments. Data is the raw material of a system supplied by data producers and is used by information consumers to create information.
Data is an object and information is a relationship between data - I’ve studied database theory at university to be authoritative on that! But since I didn’t do philosophy, then what is knowledge?
Knowledge can be considered as the distillation of information that has been collected, classified, organized, integrated, abstracted and value added
(source)
Relationships, facts, assumptions, heuristics and models derived through the formal and informal analysis or interpretation of data
(source)
So in other words, knowledge is the application of information to a scenario. Whilst I apologise if this appears that I am splitting hairs, I think clarifying what these terms are is fundamental to the implementation of DataPortability. Why this is relevant will be seen below, but now we need to move onto what does the second concept mean.
Portability
On first interpretation, portability means the ability to move something - exporting and importing. I think we shouldn’t take the ability to move data around as the sole definition of portability but it should also mean being able to port the context that data is used. After all, information and knowledge is based on the manipulation of data, and you don’t need to move data per se but merely change the context to do that. A vendor can add value to a consumer by building unique relationships between data and giving unique application to other scenarios - where the original data is stored is irrelevant as long as its accessible.
Portability to me means a person needs to have the ability to determine where their data is used. But to do that, they need control over that data - which means determining how it is used. Yet there is little point being able to determine how your data is used, if you can’t determine who can access your data. Therefore, the concept of portability invokes an understanding of what exactly control and accessibility means.
So to discuss portability, requires us to also understand what does data control and data accessibility really mean. You can’t “port” something unless you control it; and you can’t “control” something, if you can’t determine who can “access” it. As I state, as long as the data is accessible, the location of it can be on the moon for all I care: for the concept of portability by context to exist, we must ensure as a condition that the data is open to access.
Ownership
Now here is where it gets complicated: who owns what? Maybe the conversation should come to who owns the information and knowledge generated from that data. Data on its own, potentially doesn’t belong to anyone. My name “Elias” is shared by millions of other people in the world. Whilst I may own my identity, which my name is a representation of that, is it fair to say I own the name “Elias”? On the flip side, if a picture I took is considered data - I think it’s fair to say I “own” that piece of data.
Information on the other hand, requires a bit of work to create. Therefore, the generator of that information should get ownership. However when we start applying this concept to something like a social relationship, it gets a bit tricky. If I add a friend on Facebook, and they accept me, who “owns” that relationship? Effectively both of us - so we become join partners in ownership of that piece of information. If I was to add someone as a friend on MySpace, they don’t necessarily have to reciprocate - therefore it’s a one way relationship. Does that mean, I own that information?
This is when the concept of privacy comes in. If I am generating information about someone, am I entitled to it? If someone owns the underlying data I used to generate that information - then it would be fair to say, I am “licensing” usage of that data to generate information which de-facto is owned by them. But privacy as a concept and in the legislation of many countries doesn’t work like that. Privacy is even a right along side other basic rights like freedom of expression and religion in the constitution of Iraq (Article 17). So what’s privacy in the context of information that relates to someones identity?
Perhaps we should define privacy as the right to control information that represents an entity’s identity (being a person or legal body). Such as definition ties with defamation law for example, and the principle of privacy: you have control over what’s been said about you, as a fundamental human right. But yet again, I’ve just opened up a can of worms: what is “identity”? Maybe the Identity commons people can answer that? Would it be fair to say, that in the context of an “identity”, an entity like a person ‘owns’ that? So when it comes to information relating to someones identity, do we override it with this human right to privacy as to who owns that information, regardless of who generated that information?
This posting is a question, rather than an answer. When we say we want “data portability”, we need to be clear what exactly this means. Companies I believe are slightly afraid of DataPortability, because they think they will lose something, which is not true. Companies commercial interests are something I am very mindful when we have these discussions, and I will ensure with my involvement that DataPortability pioneers not some unrealistic ideal but a genuine move forward in business thinking. It needs to be clear what constitutes ownership and of what so we can design a blueprint that accounts for users’ data rights, without ruining the business models of companies that rely on our data.
Which brings me to my question - “who owns what”?

It’s certainly a good question. There’s such a staggering array of data that exists for each one of us, however, that the data itself is essentially worthless. I’m inclined to think a better question might be “what’s the best way to make sense of all this data?” What you define as information is what is truly important. Who owns the methods for managing the information is what counts. Those who work on Data Portability are generally concerned with making sure the public itself owns those methods.
It’s silly to suppose that Facebook or Myspace own our relationships, but they do own means of making them useful (just to use one example.) I have hesitated to invest much time in these, and most other communities, because I am wary of not being able to be able to use the information I give them in ways that suit me. Hence, my interest in Data Portability. I’ll take a useful set of open standards that allow me to pipe and filter information over a walled garden any day.
Overall I think the concept of “owning” information is a bankrupt idea (lighting a candle, yadda yadda.). Technology has rendered it useless. By making it so easy to copy and distribute information, those who seek to own it are forced to throw up technical hurdles to limit the flow.
At the same time, technology has undermined our ideas of privacy. I say ideas because many different cultures hold different views on expectations of privacy. Indeed, we probably all hold diverse, sometimes contradictory, notions of privacy as individuals. This erosion of privacy, to me, is more interesting than ownership of information.
I once was denied a job I was sure I would get. Everything went well. I was highly qualified for the position. Then I find out I didn’t get the job. I received a copy of the background report sent to the company and was shocked to find a list of criminal offenses in states I had never even visited! There was no way for me to prove that the company didn’t hire me as a result of the report but the whole episode soured me on how many decisions are made based on the wide dissemination of personal data. Credit and health issues are even more troublesome.
In this way, privacy is intrinsically related to “ownership” of data. We want to control who can access certain information about us. We might not care if our friends know we partied last weekend, but might not want current or future employers to know. The ease of search coupled with the sheer persistence of data (erroneous or not) indicate troubled times for privacy.
Ultimately, what I’m saying is that rather than discuss ownership of data (which seems meaningless,) the conversation should focus on access control. We need to develop standards relating to how data access is mediated. This is by no means a simple task for there are many gradations and grey-zones. But developing some kind of robots.txt for personal data is important.
I hope I didn’t get too far off track . . .
Wow, what a great discussion.
I have a few comments to that blog post and finally a place to put them. First is that I disagree with Jonathan (but only a little bit). Not in principle but in method. I think control is the big issue and that privacy is not eroding but being discovered.
I once started off with the idea of owning data as well. The short story is that the year of my life spent chasing ownership brought a realisation that owning information isn’t a solution that will ever work.
It is also very clear that data portability is the name of a function, and for that information to be useful, the context of information is very important (I very much agree). To me these thoughts over the past few years has evolved into a understanding that there needs to be a policy which accompanies the data and this policy dictates the hierarchy of control.
Underneath this policy there is only one solid infrastructure that works, and that is the law. One thing that is really missing, is the ability to see what these policies are in a way that are usable. Until then everything is just confusing, un quantified, ect.
In order for tools to be made where ‘people can see’ the context of data usage and control, there needs to be a framework that is common and standardising. This is something I am calling the ‘Identity Legal Framework’ which can be used to build a technical and legal infrastructure for liability and control while freeing information. To me there is an explicit difference between ownership and control of data.
As for privacy, most of our information, like our name or address or what we are doing has never been private, it just has never been easily accessible. Now technology provides access making it easy and giving the illusion of eroded privacy, but in fact we never owned our identity, information now is just less of a secret. I think that put in the right perspective this can be the path to a data portability solution. It does seem like this is a massive hill to climb with the the traditional power structures not inclined to be open minded. Although to me it is inevitable that a solution of control and data empowerment will be distributed to the masses. The reason for this is the data subject has the high ground with legal protected rights of access, and notice for explicit consent. These are the tools that can really make things happen.
To this end there is something I have been calling the ‘Master Controller Access Framework’(MCAF) a hybrid legal & technical, concept that uses the hierarchy of data use from something like the ‘identity legal framework’ as a vehicle for the creator of information (aka the data subject) to make policy that provides control and facilitates the choice of privacy.
Are these concepts something that can be used to answer your question?
I too have become much more interested in privacy than portability but the problem is that most people don’t care about privacy till it’s too late. Does that mean I should become an elitist and promote laws, standards, and systems that increase control by individuals over their own information, even though they generally don’t care?
@Jonathan: No, thank you! Off track is still relevant on this discussion. Access control rather than ownership is something that I think you are right on the money with.
@Mark: Ownership and control is another good point. And I couldn’t agree more that “privacy is not eroding but being discovered”.
MCAF is something that looks interesting, and I will note it formally down so we can investigate it.
What relationship do you have with the other identity frameworks, like Identity commons and the Higgins project?
@Dennis: The reason people don’t care is because they don’t understand it. I can assure you, the average person is freaked out about privacy issues, but it’s only in the context of practical examples like what people see with their Facebook profile (if they even have one). On a deeper philosophical level, by function you become elitist, but it’s not like we are stopping anyone else from contributing! Please don’t let apathy and ignorance be the cause of you not contributing.
I think privacy is an old world word, that happens to be an anchor to describe something that is now being better understood and better defined.
If people could see their privacy they would care a lot more. For instance Identity Mapping meaning knowledge of where ones personal information is and for what purpose it is being used. Privacy as a concept at this time seems just too abstract to be relevant, which I think gives the impression that people don’t care. I think people really do care.
I have just proposed a working group at Identity Commons called Identity & Trust http://wiki.idcommons.net/index.php/Identity_Rights_Agreements.
Basically, a policy work group, but there hasn’t been much interest as of yet. Hence my excitement with this post and a focus of policy in data portability.
As for investigating the MCAF, let me know when.
Oops.. That link is wrong it was suppose to be: http://wiki.idcommons.net/index.php/Identity_Trust_Charter
That privacy is being discovered is an astute observation. And I think you’re right that it’s an old world word. To some extent, privacy has always been an illusion. Folks living in old world villages had to contend with persistent gossip about habits and events. For the most part though, this was confined by limits of time and space. While a rumor might go around that a local teacher was discreetly seeing the farmer behind the haystacks, the word was generally confined to her town. Even in the town, word may be limited to a certain group of adults.
In our age, news of the affair may spill out of these limited conditions and begin to be embellished on countless myspace and facebook pages. Who knows, stakeouts with digital cameras may commence! At this point, the persistence of innuendo completely alters the equation. One search changes everything!
This is probably not the best example, but it illustrates the point. It forces us to become more aware of our privacy, and thus to discover appropriate boundaries and precautions.
The MCAF seems to be the type of thing I was alluding to earlier with my comment about a robots.txt for personal info. I would like to have fine-grained access control for some of my personal information. For instance, I have public and private phone numbers and e-mail addresses. The XDI people seem to have thought a lot about accommodating this control. Let’s hear more!
@Mark: I’m interested in participating in the Identity & Trust working group, but fear I may not have the legal background to help much. IANAL, but I have spent hundreds of hours reading relevant threads on Slashdot. Does that count for anything?
Great post!
I would like to add the following:
Portability might also mean moving around data in certain scenarios. Think of virtual worlds where an object can really be moved around from server to server (depending on the implementation). We struggle with the implications of this right now over the Grid Interoperability Working Group. Maybe think also of MP3s or so.
Probably it all depends on definition of course but I think here the question is where we define the scope of what we work on (for now) in the DP Group. I guess priority probably has profile or social graph information which is either copied or referenced.
As for privacy I am not sure we should discuss it on a philosophical level but more so on the individual fields we are working on. E.g. if I have my email address in a profile how should I be able to control who can read it? And even if I allow some friend to read or export it how can I make sure that it’s not spreading from there? (of course I cannot control that, it’s mostly a question of trust here. BTW again the same problemn we have in Second Life
).
Not having a legal background is definitely not an obstacle. In fact, interest is the number one precursor. This discussion, makes me think that a combination of the Identity & Trust working group and Data Portability policy would be a better way to approach this.
As well, this post has inspired me to think of changing the Identity Legal Framework to Identity Data Legal Framework which may be more appropriate. Perhaps @Johnathan, @Elias, @Dennis, you have an idea of how to organise this effort in a better way?
I have proposed this group because I felt it uncovered something very important that was lacking.