What are Datasets?


The datasets database is a utility we're always maintaining. It contains known police datasets. Each dataset has a status, and can potentially be scraped.
Scrapers are bits of Python code designed to collect information from a Dataset. Scrapers are best understood and defined by the Dataset they are scraping.


Datasets are maintained and can be easily viewed in DoltHub. The same tables are in a Hadoop PostgreSQL mirror, which is a better back end for Scrapers.


  1. 1.
    Country The United States has ~18,000 Agencies across all regions.
  2. 2.
    Region A state, county, municipality, or other region containing multiple police agencies.
  3. 3.
    ​Agency A specific police organization, typically containing many Datasets.
  4. 4.
    ​Dataset A URL where one type of police data can be found.
  5. 5.
    ​Scraper A bit of code which downloads everything it can find on a Dataset.
Copy link
Edit on GitHub