Historical Page – Data Rescue Workflow | Penn Program in Environmental Humanities

-HISTORICAL PAGE-

DATA RESCUE WORKFLOW

BEFORE YOU BEGIN

We are so glad that you are participating in this project! You many have heard all about the workflow other events have used to archive data. We're in the process of retiring this workflow as we move into the next phases of this project. Many paths to preserve data exist that perhaps shouldn't revolve around the workflow so lovingly (but rapidly) developed when we began in these events in January. This page includes many ideas and we encourage you to think of the best ways your community can get involved as we move to Data Refuge 2.0!

DataRescue Event Ideas

Note that these are just some ideas -
Experiment to make your event right for your community!

Create Metadata for the End of Term Archive
The End of Term project collected an enormous amount of information and publications this year. Help them curate that and make the publications discoverable and usable by creating metadata records for them. See bit.ly/eot-metadata for more. This task requires users have accounts set up - so do that ahead of your event!

Cleaning Metadata from Data.gov
See this excellent workflow used at DataRescuePDX that focuses on working with metadata at data.gov. Cleaning up this metadata is important for easing the archiving of this important resource.

Teaching Web Archiving Skills
Rather than doing the work of archiving, DataRescueNH in Dover focused on teaching the skills one needs to do web archiving in the first place.

Outreach & Education
Many events include hosting a teach-in or panel discussions about issues related to DataRescue such as data literacy, data management, the vulnerability of born-digital information, web archiving, and other topics. This is a great opportunity to highlight issues that matter to your community. DataRescueDC and DataRescuePhilly are two events that did a lot of teaching and education. The events taking place during Endangered Data Week are also great examples of types of events you could host.

Clean Up Records and Unzip Files in datarefuge.org
Hundreds of files have been added to the datarefuge.org repository - and most are zipped. See how to unzip files and clean up records here.

Storytelling
(Contact: datarefuge@ppehlab.org)
You will record stories about the importance of climate and environmental data on our everyday lives and share this work on social media as well as document the event. DataRefuge’s Storytelling Kit includes Portraits of Data Rescuers and Field Notes, among others. Consider this path if you’re on social media (Facebook, Instagram, Twitter, whatever), if you can use Storify, if you have good listening and writing skills, and/or if you can make creative and engaging materials.

Citizen Science
World-Wide Weather Data Rescue Project, Old Weather, and Tomnod's Antarctic Weddell Seal Count are three examples of great citizen science projects to get involved with. Also check out Zooniverse for citizen science projects that may be of interest.

Wikipedia Edit-athon
Editing government agency, sub-agency, and organization Wikipedia entries is another activity that can add value to broader understanding of how these agencies are structured and related and what their purposes are. There are a lot of existing resources with tips for hosting a Wikipedia editing event.

DataRefuge Built into a Libraries+ Network
(Contact datarefuge@ppehlab.org)
To move DataRefuge to more sustainable footing, we’re partnering with other big research libraries. With the help of the Association of Research Libraries and the Mozilla Foundation, we're organizing a meeting in early May to envision a Libraries+ Network: a consortium which can--systematically, comprehensively, and on an-ongoing basis--"pull" digital resources from adopted agencies. This idea builds on decades of research by librarians, including James Jacobs, Jim Jacobs, and others. (Check out their work on Free Government Information.) Help inform our meeting by taking our survey

Three Stories in Our Town across Towns, Cities, Countries.
(Contact: datarefuge@ppehlab.org)
“Three Stories” goes beyond storytelling driven by DataRescue events to create local partners and knowledge communities who research local uses of open federal environmental and climate data and how it keeps them, their assets, and their communities safe and healthy. This project has now launched in Philadelphia, and we are actively inviting its adoption by other cities and towns. A template is being distributed via organizers of past and future DataRescue events, via the Urban Sustainability Directors Network (USDN), and more. We want to know what climate and environmental data is needed for local city planners and workers to do their work. In this first phase, you might, for example, develop three stories that consider: How does federal climate and environmental data inform the work of one city worker? Preserve one local landmark? Address one local health concern?

Previous Workflow
NOTE: We have retired this workflow

DataRefuge slack: Generate Invite
See the technical workflow here: https://datarefuge.github.io/workflow/

Website Archiving
(Contact: Maya Anjur-Dietrich, @maya)

This first path of a DataRescue event is accessible to all levels of skill. You’ll be working through federal websites to nominate, or “seed,” web pages, documents, and datasets to the Internet Archive (IA)’s End of Term archive, which preserves material using their web crawler. EDGI’s Agency Archiving Primers outline the structure of at-risk departments and identify key programs, datasets, and documents that are vulnerable to change and loss, which then helps guide volunteers through the web presence of federal agencies. Using EDGI’s Chrome Extension, you can record URLs from agency websites for inclusion in the IA’s Presidential Harvest 2016and flag any datasets or documents that need to be preserved through other methods because they are “uncrawlable,” or can’t be collected by the IA’s web crawler. In Path II, these “uncrawlables” are collected manually. Consider this first path if you’re comfortable browsing the web and have a great attention to detail. An understanding of how web pages are structured will help you with this task. You’ll have to learn what the IA’s webcrawler can and cannot collect (poster).

Archiving More Complex Datasets ("Uncrawlables")
(Contact: App and harvesting- Matt Price, @mattprice; Checking, bagging, describing, and repository- Laurie Allen, @laurieallen)

This path further researches and preserves the “uncrawlables” identified in Path I. You will be researching, investigating, and preserving at-risk datasets identified in the Web Archiving track and contribute to preserving datasets to the DataRefuge repository. Working with the Archivers App and harvesting tools, this track is guided by the DataRescue Workflow specification, developed by both EDGI and DataRefuge. Path II is made up of multiple roles, and participants should select roles based on their skills and interest. Consider particular roles in this path if you have strong front-end web experience, are a coder, have domain knowledge of scientific datasets, are a librarian, or an information technologist, and overall have a strong attention to detail.

As a researcher, you will review and investigate the URLs marked as “uncrawlable” in Path I.
As a harvester, you will figure out how to capture the “uncrawlable” data.
As a checker*, you will inspect harvested datasets to make sure they are complete.
As a bagger*, you will assure data quality and then package (or “bag”) the data.
As a describer*, you will describe the contents of “bags” of data.

* These roles require special permissions in the Archivers App. You event organizer or path guide should be able to grant you these permissions if they have admin privileges

EDGI’s Next Steps in Tech Development
(Contact: Dawn Walker, @dcwalk; also on github.com/edgi-govdata-archiving)

EDGI has been building online tools, and creating networks to proactively preserve and track public environmental data and ensure its continued availability. You will help us discuss and strategize as we move beyond preservation into distributed and federated forms of holding data. Consider this project if you are interested in helping build an open web to share data. Our GitHub organization provides project overviews to support this track.