Open data: a Commons? (S. Goëta)

with No Comments

abstract-850

________________________________________________________________________________

Article Source:

FOR FREE INFORMATION AND OPEN INTERNET. INDEPENDENT JOURNALISTS, COMMUNITY MEDIA AND HACKTIVISTS TAKE ACTION — PASSERELLE 11 FROM COREDEM.

Coredem, a Collective Initiative. Coredem (Community of Sites of Documentary Resources for a Global Democracy) is a space for exchanging knowledge and practices by and for actors of social change.

Ritimo, the Publisher. The organisation Ritimo is in charge of Coredem and of publishing the Passerelle Collection. Ritimo is a network for information and documentation on international solidarity and sustainable development.  Ritimo is actively involved in the production and dissemination of plural and critical information, by means of its website: www.ritimo.org


La Coredem — Conditions d’utilisation

La reproduction, la distribution et la communication au public des informations contenues dans ce site sont autorisées, selon les conditions suivantes : « Paternité – Partage des conditions initiales à l’identique ». Les termes de ce contrat de droits d’auteur sont définis dans la licence Creative Commons suivante : creativecommons.org/licenses/by-sa/2.0/fr.

______________________________________________________

Open data: a Commons?
Samuel Goëta. Co-founder of the Open Knowledge Foundation France.

(Passerelle 11, p.156-160)

Open data refers to the proactive public release of data by (primarily) public administrations, to enable value creation by citizens and businesses, greater transparency of government action, and civic participation. From a legal point of view, this data has to be released under so-called open licenses, allowing anyone to use the data as long as they mention its source and share any improvement under the same license. Given that open data is poised to become common practice for governments, does this mean we are witness to the emergence of a digital commons? Who are the communities that demand open data and defend it as a commons?


Let us first recall that open data originates in long-standing practices of dissemination and sharing of information. Access to information held by government was one of the central demands of the French Revolution. Article 15 of the 1789 Declaration of the Rights of Man and of the Citizen stipulates that “Society has the right to require of every public agent an account of his administration”. After World War II, the demand for accountability of public officials gained fresh momentum with the emergence of the concept of “open government”, which requires that citizens have access to state secrets. Facing criticism about the army’s lack of transparency during the Vietnam War, President Johnson passed the Freedom of Information Act (FOIA) in 1966, which gives every citizen the right to acquire information held by an administration, subject to certain conditions of confidentiality and national security. FOIA-like provisions were adopted in most democratic countries and transparency of public action is now widely seen as a fundamental right. The first instances of voluntary exchange of data emerged in the realm of science.

With the development of computer networks and computing capabilities, sharing of scientific data has grown considerably. Genome sequencing is a particularly illustrative example, given that it requires large-scale exchange of data between laboratories with specialized research infrastructure. Bruno Strasser, in his study on the exchange of genetic data1, recounts how, when the first genome database, GenBank, was created, it triggered a substantial debate on the need for data sharing. According to Strasser, sharing of scientific data is part of a “moral economy”, where researchers provide data if there is a compensation in terms of symbolic capital. He mentions the case of GenBank, which managed to establish itself as the main DNA database because it did not consider the data as its property, unlike the competing project, the Atlas of Protein Sequence. Since 1990, scientists are obliged to publish any scientific article which uses genome data on GenBank. These scientific practices, which pioneered large-scale data sharing, were not yet called “open data”. It was in 1995 that the term first appeared, in a report by the US National Academy of Science entitled “On the Full and Open Exchange of Scientific Data”, in which the Academy demanded the sharing of data collected by satellites.

While the practice of open data was already well established in science, open data as a political claim first developed in the UK. In 2006, the Guardian published an op-ed demanding that all data held by the British government be made freely available. Titled “Give us back our crown jewels”, the op-ed demanded in  particular that data from the Ordnance Survey, the United Kingdom’s Geographical Institute, the use of which was subject to the payment of a fee, be made publicly available. The argument, which later became a key principle of open data, was that this data should be made public because it was produced with taxpayer money. In 2007, the open data movement entered the political debate in the United States, following a meeting held in Sebastopol, California, which featured well-known digital activists such as Lawrence Lessig, Tim O’Reilly and Aaron Swartz. The purpose of the meeting was to formulate principles that could be taken up by candidates for the US presidential election. These principles included the release of public data in their entirety as soon as it is produced and in the form it is collected, together with an open license for reuse by anyone. This demand for “raw” data signals the emergence of a new era of transparency, based on the reduction of information asymmetries between public administrations and citizens. It also sought to trigger a wave of innovation through developers and businesses reusing this data, as well as a better use of public resources. The open data movement led to a proliferation of government web portals such as, at national level, data.gov, launched in the United States in 2008, data.gov.uk; launched in the UK in 2009, and data.gouv.fr in France in 2011. In 2013, the principles of open data were taken up by the G8 in its “open data charter”, with the same objectives of transparency and economic value creation. The charter also stipulates that open data would become the default practice of the eight signatory countries’ public administrations.


A resource that is not managed as a commons

Open data demands are based on the argument that data produced by government and funded with public money is a common good that should be shared with society. But is the data effectively being managed as a commons? Are there governance mechanisms that would allow society to manage open data as a shared resource?

By requiring “raw” data, open data activists have attracted attention to previously unshared documents that are used in the daily work of administration officials. Publication of these documents is unusual for public administrations, which naturally leads to resistance and doubts. Most of the information published on open data portals is administrative recordings, not statistical data produced specifically to create a general knowledge about society. This data is effectively produced to meet the needs of an administrative department, not to build a resource that can be used by citizens. The selection of data to be published on these open data portals is also subject to negotiation between the people in charge of publishing the data and the producers of the data. This selection process, which we were able to observe during a field study in French administrations2, goes against the fundamental principles of  open data, as defined in 2007 in Sebastopol, which require the release of all public data as soon as it is produced. In practice, the data is selected by administrations according to various criteria such as the technical ease of extraction, interest for potential re-users, and the publication of similar data on other open data portals.

Moreover, the issue of data sensitivity, that is to say, the risk that its release would create for the public but also for the administration that produced it, is often a crucial consideration. Finally, the data is processed before its publication; it is not published in its unaltered, raw form. Data producers edit the data to remove problematic elements, improve formatting or remove comments, acronyms and columns that have no interest outside of the production environment.

All these steps, crucial in the construction of public data, occur away from the users’ gaze. Production, selection and modification of public data is a process that is invisible to citizens; it is behind the scenes of these administrations that government transparency is produced. Nor do citizens have any opportunity to participate in the governance of how this data is produced and distributed. Yet such a possibility exists for data from statistical agencies whose task is to produce a general knowledge about society. Since 1972, French government statistics are subject to shared governance by a coordinating body, the National Council of Statistics (CNIS), which includes representatives of public institutions, trade unions, employers’ organisations, research institutions and civil society. CNIS has a say on the need for producing and distributing data, and on categorizing the data produced by statistical agencies. There is no similar form of shared governance for public administrations’ data, which remains out of citizens’ control.


Commoners: producing and managing data as commons

 In the absence of effective and accountable governance of public data, commoners have seized on public data to preserve their nature as commons. The Open Knowledge Foundation, a global network advocating open content and data, released in 2009 the Open Database License (ODbL) which applies the principle of copyleft to open data – which was previously impossible because the Creative Commons licenses do not apply to databases. The ODbL requires sharing the reused data under the same license, thus preserving its character as a commons.

Jurisdictions such as that of Paris have used this license to avoid any “enclosure” of its data, fearing that its exploitation by private interests would not benefit the community in return. Commoners also contributed to the emergence of public data as a commons by republishing their own improved data. In France, regards Citoyens automatically extracts data from the websites of the National Assembly and the Senate and republishes it on its ‘citizen observatory’ of parliamentary activity, nosdeputes.fr and nossenateurs.fr. This data, republished as open data in machine-usable formats, enabled the development of many other tools for monitoring parliamentary activity. Similarly, the OpenCorporates website gathers data from the business registers of 75 jurisdictions around the world, and allows anyone to use this data, even for commercial purposes, as long as the ShareAlike clause, which secures the data as a commons, is respected.

In addition, citizens have also produced databases managed as commons. The best-known one is OpenStreetMap (OSM), a global geographic data base that was founded in 2004 by a British researcher following the refusal of the Ordnance Survey to share its data. OSM relies on the contributions of citizens to publish a “cartographic Wikipedia”. The database now covers most countries in the world and competes with Google Maps. In 2013, OSM celebrated its millionth contributor. In the field of science, TeleBotanica is a network of both professional and amateur botanists who collaboratively create a shared database on the flora and fauna of the world. One could also mention OpenFoodFacts, which allows everyone to scan the nutritional data on food packages, or OpenMeteoData, for the free sharing of meteorological data. All these projects are based on citizen participation with the objective of creating databases under free licenses, which complement public data, and they all enable a shared governance of these critical digital assets.


[1] Strasser, Bruno J (2011). “The Experimenter’s Museum GenBank, Natural History, and the Moral Economies of Biomedicine,” Isis, vol. 102, no. 1, pp. 60–96.

[2] Denis, Jerome et Goeta, Samuel. “La fabrique des donnees brutes. Le travail en coulisses de l’open data”, in Penser l’écosystème des données. Les Enjeux Scientifiques et Politiques des données numériques, Nabir C., Plantin, JC. and Monnager Smith (eds), editions FMSH, to be published. Available online: http://halshs.archives-ouvertes.fr

web-4869856_1920