How to rescue data

BIG QUESTIONS

In building DataRefuge, we face questions about what to save, how to save it, and why these activities are important. At a DataRescue event, you may find yourselves focused on only one or two of these questions--or on different questions. Still other questions will emerge as DataRefuge expands. Together we’ll continue to develop ideas, tools, protocols, and practices to answer:

Why?

Ongoing human-made climate change is a widely accepted fact. It is accepted as such because of decades of data-driven analysis. Continued reliable access to federal data that feeds climate and environmental research is vital. The ease with which research communities can access federal data depends on social circumstances, including changing political contexts.

Our project also draws attention to the ways in which internet data is inherently unstable (consider recent epidemics of fake news). We thus also work to insure trustworthy information in a digital world. The practices and protocols that go into creating DataRefuge will build trust in the data and educate about its reliability. We hope people will use these events to learn together, talk about these issues, and to look for ways that each DataRescue event may also be a teach-in, a panel of speakers, or a community conversation considering:

  • How does your community use climate and environmental data?

  • What makes information trustworthy and reliable?

  • Who does environmental and climate research impact?

  • What data does not get collected? Who decides collection priorities?

What?

Together with Project_ARCC and many partners in professional networks, we are gathering information about the datasets that these partners most value. We are also interviewing experts to assess the relative vulnerability of priority datasets, according to their legal, political, and technical status as well as to their unique content. If you'd like to help, please complete our survey of valuable data.

We have created a list of data that are especially valuable and vulnerable. We will be happy to distribute a subset of this list to ensure that there is not unnecessary duplication of effort.

While an event may decide to use a subset of the list above, choosing which datasets to prioritize is a question that each DataRescue event should decide on its own. You might begin with a conversation with local scientists, researchers, or community groups to address the federal environmental data sources they most rely on, or which ones they are most concerned about losing access to, or you may choose another approach. Watch our webinar on hosting a DataRescue event and/or visit our Host a DataRescue Event page for more!

How?

Some DataRescue events may gather people with developed expertise in collecting, preserving, and providing access to the vast array of data. All DataRescue events will also have access to tools and advice about how to do this work in ways that will create trustworthy copies. For those materials that can be harvested through web crawlers, nomination to Internet Archive is a good step. The tools and resources developed by EDGI, and made available as an event toolkit provide guidance on nominating sites for inclusion in the End of Term Harvest.

Together with partners at the University of Michigan Libraries and Project_ARCC, we are also working on developing tools and advice for saving data using methods other than web crawling.

Check back early and often also via Twitter where we will announce updates.

DataRescue Philly, January 13-14, 2017

DataRescue Philly was a huge success. Together we identified 3,692 seeds from NOAA websites to fed to the End of Term Harvest and bagged 17 datasets, about 15 GB of data, that will be described and made available via the DataRefuge repository at www.datarefuge.org. Biggest and warmest of thank yous to all our speakers, guests, and volunteers. We couldn't build DataRefuge without you!