“With the arrival of any new president, vast troves of information on government websites are at risk of vanishing within days. The fragility of digital federal records, reports and research is astounding.” (The New York Times, 'Harvesting Government History, One Web Page at a Time', Dec 1, 2016). Climate-data, for instance.
To whom does such data belong? Consider everything on data.gov. How important are repositories like it to your life and work? That these are not rhetorical questions provided the premises of a public meeting evaluating #vulnerabledata and planning #datarefuge held at the Penn Libraries on 12/07.
There are precedents for erasure by political appointees, for instance, from the Bush II administration, but also in the shortchanging of climate-efforts in the federal shutdown of 2013. The idea behind #vulnerabledata and #datarescue and #datarefuge: campuses such as Penn might serve as a (modest!) node in an (ambitious!) network of responses to what will be, by the first few months of 2017, a data emergency. People affiliated with different institutions in Philadelphia--such as the EPA Transition Watch, Azavea and Open Data Philly, Penn Libraries, Science Outreach Initiative, the Price Lab for Digital Humanities--got together with members of the PPEH faculty working group and PPEH fellows to discuss responses to climate-data's new precarity.
Others have been at it superbly. For instance: Guerrilla Archiving, on 12/17 at the University of Toronto, will see volunteers help San Francisco based Internet Archive with the End of Term 2016 project.
How does one build a toolkit for such work, moving forward? How can such projects network so that efforts aren't duplicated? One can't, of course, download the internet; even those at data.gov, which is coordinating with archive.org on sustainable mirroring, have a limited capacity to work on this. There are limits to what web-culling and web-crawling can do without a broader network of impact providers, including Penn. The task, moreover, isn't just of mirroring data or dipping into snapshots, but of describing data, checking what's happening with it, what happens to a page over time; not just the low-hanging fruit, but the details of hidden databases and the scripts needed to access them; entire cities of storage and server-capacity, labor-hours. Impossible -- yet urgent!
And that leads to a swarm of questions. Is such data proprietary? Any data produced by the government, it turns out, belongs to the public -- but access and survival are different issues. There's data that's available, even now, if you follow due process, but the quiet disappearance of data, either because of financial plug-pulling, or because of interference, is the worry now at the threshold between institutional science and the public.
How much data we talking about? Hundreds of petabytes and more, in buildings larger than the Penn library. Which depositories are statutorily public? Much of it is gray literature, uncertainly defined. And when gray literature disappears slowly, it can provide a false sense of our having time, of the likelihood of somebody else mobilizing to intervene. The big institutions with well-known acronyms, such as NOAA, the EPA and NASA, have their own backups of course; the EPA transition watch has 35 people collaborating across 25 universities; but funding for these backups will be seen, in the new regime's vocabulary, as "politicized science". Penn's own vision-statements are geared towards a response to these issues: the Penn Compact 2020 commits to actions that "bridge the translational gap between academic research and societal change", and Penn's Climate Action Plan 2.0 commits to the integration of sustainability in all aspects of the university's work. What could it mean, then, for the university to function as a regional node in a larger, collective recuperation of open data in 2017?
Join us again this Friday, 12/09, at 1 p.m, at 627 Kislak Center, Van Pelt Library, Philadelphia, as we assemble lists -- institutions, needs, methods, specialists, offshoot-projects -- that will be crucial in the weeks to come.