Clean Up & Unzip Files
in datarefuge.org

 

Hundreds of datasets have been harvested, bagged, checked, described, and added to datarefuge.org. The files are mostly zip files and the metadata is so far minimal - these are things that should change to make the files more discoverable and usable.

Where to start & what to do

  • Make an account at datarefuge.org
  • Event leaders should email their username and selected organization to datarefuge@ppehlab.org to get set up as admins within the organization
    • It's easiest to pick one organization (agency) to work on. Editors have to be added individually to each organization they want to work with. See this slideshow for instructions about adding editors
  • The event leaders/admins should add all event participants as editors to the organization
  • Each person working on the project should have their downloads going to a well organized location. Zipped files are named by UUID - not a handy way to identify what you're working on. Once one record is complete, they should delete the files from their computer when they're done with it to decrease errors and confusion
  • Check the metadata in the record for accuracy. Does the file belong in that organization? Does the file appear match the URL listed? Correct any information that's incorrect -including deleting author information if no author is known. Email datarefuge@ppehlab.org if you find misplaced files

How to

  • Go to your selected organization
  • Click on a dataset (make sure you're not working on the same one as others at your event)
  • Click Manage to edit the metadata
  • Check that the source matches the organization you're working with and remove erroneous authors. Correct anything else that is incorrect. Then click Update Dataset
Check other fields for accuracy as well.

Check other fields for accuracy as well.

  • Click on View Dataset
  • Click the green Explore button and then Go to resource to download the file
  • Extract and save the unzipped files to your computer
    This will vary for each record. Use your best judgment to make sure all the important files are added back to the resource in meaningful ways. If you want to document your process and share it with us, we'd love to see it! If a file is too big or will take too long for you, cancel the download and grab a different one. 
     
  • Back in the record, click on the Explore button again but this time go to Edit
  • Click on All resources
  • Click Add new resource to add the files you extracted from the zipped file
  • Click Add to add the new resources - the unzipped files. Do NOT delete the zipped file or the json file