Data mine-ing

According to Jean-Jacques Rousseau ‘the first person who, having fenced in a piece of land, bethought himself to say “this is mine” and found others simple-minded enough to believe him was the real founder of civil society.’ While it is hard to share Rousseau's nostalgia for idealised anarchy, the patterning of social relations that is entailed when people start staking claims to property can have morally ambiguous consequences. Such claims are now, it appears, being staked on the emerging territories of electronic data.

Earlier this month, a discussion about the concept of data ownership was held at the Royal Academy of Engineering, under the auspices of the Royal Academy, the Royal Society and TechUK (a spin off from last year’s report, Data management and use: Governance in the 21st century.) The data ownership paradigm got short shrift at the meeting, although there are some serious attempts to develop systems that support it for personal data. It behoves us, therefore, to consider seriously what kind of data society such ‘simple-mindedness’ could found.

I think the application of the concept of ownership to data is both metaphysically and morally misguided. In the first place, treating data as a kind of stuff that can be owned and exchanged is reductive and it risks ignoring important relational aspects of data. It is a tautology to say that data are ‘given’, but to attend exclusively to what is given is to ignore the event of giving (which is always a ‘giving to’). Data speak of the observer (and of the instrument of observation) as much as of the observed. This is not so much a matter of semiotics as of information theory: data both mark an event and imply a relationship, even before they are decoded as information. Conceptually, this may be as relevant to machine generated data as it is of human observation – where there are data there is difference. (I recall that Gregory Bateson once defined information as ‘the difference that makes a difference’.) But it is practically important when data circulate among environments governed by different norms. (There was a salutary presentation by anthropologist Helen Knox at the ‘data ownership’ meeting, which drew attention to how data function as tokens in three kinds of relationships – of donation, reciprocation and expropriation – which had the effect of liberating the discussion of data from a reductive disciplinary construction as potential commodities.) So: if we focus only on the given we may miss the giving; if we focus on the event we may still miss the relationships and norms it implies. This points to the relevance of interdisciplinary engagement to explore their diverse social meanings, not merely 'stakeholder engagement’ to weigh contending interests against each other.

In the second place, treating data as property addresses what is arguably the wrong problem (that of value extraction) at the expense of more pressing questions about how data can influence personal and social wellbeing. It has, furthermore, a troubling implication: if we can treat data as things that can be appropriated, shared and exchanged, then there is a temptation to expect that all the interpersonal obligations and entitlements arising from data use can be resolved by the mechanisms of the ‘data market’. (This is what Michael Sandel calls a ‘market society’). It is not clear that such an approach is desirable. Those who attended a seminar last week at the Nuffield Foundation, home of the newly established Ada Lovelace Institute (and, coincidentally, of the Nuffield Council on Bioethics), heard economist Diane Coyle warning about the ‘marketisation’ of data. In making the case for data as a public good, Professor Coyle focussed on the non-rivalrous quality of data. She did not discuss non-excludability (the other classic characteristic of a public good, and a key condition of their being property). Data are excludable, of course, although they are becoming radically less so, and differently from other commodities. A thought that could be better expressed in discussing the future of data society is the fundamental transformations of private and public environments that has been wrought by information technologies. We may have to get used to these transformations, and they may enable public good, but there are choices to be made about the structural terms on which they take place.

Some of the discussion at the Nuffield Foundation event turned to data collected in health care contexts, in particular about the consequences of how NHS data are treated. We examined this question in our 2015 ‘Biodata’ report. A series of developments can be traced from the 2010 Strategy for UK Life Sciences (really, biomedical sciences), which might be glossed as ‘monetise the resources of the NHS and bet the farm on genomics.’ This signalled – or perhaps only reflected – the transition from an NHS as ‘imagined community’, one based on solidarity, to an NHS based on obligations of reciprocity, in which the relationship between public and private is foregrounded. Attempts have been made to put this on terms, through a more explicit kind of ‘social contract’ (the NHS Constitution of 2012, for example).

Genomics has been the privileged site on which the relationship between health policy and industrial strategy has been consolidated over the last decade. It was in 2011 that David Cameron, then Prime Minister, declared to representatives of the life sciences industry that ‘every NHS patient should be a research patient’. A year later the ‘100,000 genomes’ project was launched. It seems pretty clear that Health and Social Care Secretary Matt Hancock’s 2018 conference pledge to sequence five million genomes in five years is only a stepping stone to the comprehensive national roll-out envisaged in the sector’s 2017 Life Sciences Industrial Strategy proposals. The question of commercial access to NHS data (and the normative consistency of public and private environments, and the mutual implication public and private interests) haunts all these discussions.

The reflection on data governance is therefore a timely anticipation, as the terms of a ‘social contract’ that governs the reciprocal responsibilities of patients and the NHS are again under discussion, pertinently in the context of the new genomic medicine service. (In our Biodata report we recommended that broader consideration should be given to the most appropriate model for the ethical use of genomic information generated in health services for public benefit before the one developed for the 100,000 genomes project becomes the de facto infrastructure for future initiatives.)

It is certainly possible to pursue social good by aligning it with commercial interests and it may be economically efficient to do so. For better or worse, such an innovation system has become locked in with industries such as pharmaceuticals and biotechnology. But we should not be simple-minded enough to assume that ‘ownership’ and a data market would resolve the social and ethical questions that data use engages.