Data Rescue workflow:
Before you begin
We are so glad that you are participating in this project!
- If you are an event organizer:
- Learn about what you need to do to prepare the event
- Think about what event activities make sense for your community - the data archiving is just one idea!
- If you are a regular participant: get a role assignment (e.g., Seeder, or Harvester), get account credentials needed for your role, and go over the workflow corresponding to your role.
DataRescue Event Overview
(Contact: Maya Anjur-Dietrich, @maya)
This first path of a DataRescue event is accessible to all levels of skill. You’ll be working through federal websites to nominate, or “seed,” web pages, documents, and datasets to the Internet Archive (IA)’s End of Term archive, which preserves material using their web crawler. EDGI’s Agency Archiving Primers outline the structure of at-risk departments and identify key programs, datasets, and documents that are vulnerable to change and loss, which then helps guide volunteers through the web presence of federal agencies. Using EDGI’s Chrome Extension, you can record URLs from agency websites for inclusion in the IA’s Presidential Harvest 2016 and flag any datasets or documents that need to be preserved through other methods because they are “uncrawlable,” or can’t be collected by the IA’s web crawler. In Path II, these “uncrawlables” are collected manually. Consider this first path if you’re comfortable browsing the web and have a great attention to detail. An understanding of how web pages are structured will help you with this task. You’ll have to learn what the IA’s webcrawler can and cannot collect (poster).
Archiving More Complex Datasets ("Uncrawlables")
(Contact: App and harvesting- Matt Price, @mattprice; Checking, bagging, describing, and repository- Laurie Allen, @laurieallen)
This path further researches and preserves the “uncrawlables” identified in Path I. You will be researching, investigating, and preserving at-risk datasets identified in the Web Archiving track and contribute to preserving datasets to the DataRefuge repository. Working with the Archivers App and harvesting tools, this track is guided by the DataRescue Workflow specification, developed by both EDGI and DataRefuge. Path II is made up of multiple roles, and participants should select roles based on their skills and interest. Consider particular roles in this path if you have strong front-end web experience, are a coder, have domain knowledge of scientific datasets, are a librarian, or an information technologist, and overall have a strong attention to detail.
- As a researcher, you will review and investigate the URLs marked as “uncrawlable” in Path I.
- As a harvester, you will figure out how to capture the “uncrawlable” data.
- As a checker*, you will inspect harvested datasets to make sure they are complete.
- As a bagger*, you will assure data quality and then package (or “bag”) the data.
- As a describer*, you will describe the contents of “bags” of data.
* These roles require special permissions in the Archivers App. You event organizer or path guide should be able to grant you these permissions if they have admin privileges
You will record stories about the importance of climate and environmental data on our everyday lives and share this work on social media as well as document the event. DataRefuge’s Storytelling Kit includes Portraits of Data Rescuers and Field Notes, among others. Consider this path if you’re on social media (Facebook, Instagram, Twitter, whatever), if you can use Storify, if you have good listening and writing skills, and/or if you can make creative and engaging materials.
Outreach & Education
Many events include hosting a teach-in or panel discussions about issues related to DataRescue such as data literacy, data management, the vulnerability of born-digital information, web archiving, and other topics. This is a great opportunity to highlight issues that matter to your community. Some events that didn't focus on solely on archiving include DataRescueNH in Dover, DataRescueDC, DataRescuePDX, and DataRescuePhilly. The events taking place during Endangered Data Week are also great examples of types of events you could host.
Looking Beyond DataRescue Events
Consider this path if you’d like to work on building this movement through projects incorporating the community from DataRescue events, but also looking to the future.
DataRefuge Built into a Libraries+ Network
To move DataRefuge to more sustainable footing, we’re partnering with other big research libraries. With the help of the Association of Research Libraries, we're organizing a meeting in early May to envision a Libraries+ Network: a consortium which can--systematically, comprehensively, and on an-ongoing basis--"pull" digital resources from adopted agencies. This idea builds on decades of research by librarians, including James Jacobs, Jim Jacobs, and others. (Check out their work on Free Government Information.) This is a fast-moving train, and if you'd like to hop aboard, please let us know.
DataRefuge’s Longer Path: Three Stories in Our Town across Towns, Cities, Countries.
“Three Stories” goes beyond storytelling driven by DataRescue events to create local partners and knowledge communities who research local uses of open federal environmental and climate data and how it keeps them, their assets, and their communities safe and healthy. This project has now launched in Philadelphia, and we are actively inviting its adoption by other cities and towns. A template is being distributed via organizers of past and future DataRescue events, via the Urban Sustainability Directors Network (USDN), and more. We want to know what climate and environmental data is needed for local city planners and workers to do their work. In this first phase, you might, for example, develop three stories that consider: How does federal climate and environmental data inform the work of one city worker? Preserve one local landmark? Address one local health concern?
EDGI’s Next Steps in Tech Development
(Contact: Dawn Walker, @dcwalk; also on github.com/edgi-govdata-archiving)
EDGI has been building online tools, and creating networks to proactively preserve and track public environmental data and ensure its continued availability. You will help us discuss and strategize as we move beyond preservation into distributed and federated forms of holding data. Consider this project if you are interested in helping build an open web to share data. Our GitHub organization provides project overviews to support this track.