The Seeders team's job is to add URLs to the End of Term archive to help them prioritize their efforts and crawl deeper into agency websites. Seeders canvases the resources of a given government agency, identifying important URLs. They sort them by whether their data can be automatically captured by the Internet Archive webcrawler. URLs judged to be possibly crawlable are "nominated" (equivalently, "seeded") using our Chrome extension or bookmarklet. This sorting is only provisional: when in doubt seeders mark a URL as possibly not crawlable, and these URLs populate a spreadsheet. Read more at https://datarefuge.github.io/workflow/seeding/.
Seeders and Sorters will use the EDGI subprimer systems, or a similar set of resources, to identify important/at risk data. Individual events should set up spreadsheets or other tools in which search efforts can be recorded. The work of this group includes:
- Canvassing the resources of a given government agency, identifying important URLs.
- Identifying whether those URL's can be crawled by the Internet Archive's webcrawler
- If URL's are crawlable, nominate them to the EOT crawl using the EDGI Nomination Tool
- If they are not crawlable, mark them as "Uncrawlable" in the Extension.