Terminology

TermOur definition

Agency

A police department or organization, like "Aurora Police Department". Agencies often have parent-child relationships to one another.

criminal legal system

Law enforcement, courts, and corrections. Our focus is on the United States.

data accessibility

This is a scale which we're still defining.

  1. The records should exist somewhere, but we need to locate them.

  2. We know where records can be accessed.

  3. There is historic data available in a stable archive.

data custody / provenance

Who collected and published the data?

agency_described (which agency is the data about?)

originating_entity (who generated the records?)

supplying_entity (who is publishing the records?)

Sometimes these are all the same entity; sometimes they are all different.

Data Source

A URL pointing to a place on a police website where public records may be scraped, like "police-agency.com/arrest-reports". Read more here.

Data Source archive

A raw, unprocessed HTML archive of a Data Source at a specific time.

metadata

Packaged with data (like a Data Source or scraper extraction), metadata is information about when and how the data was collected.

public records

Some information is required by federal, state, or local law to be public. Governments keep several types of public records, and make them publicly available to different degrees.

scraper / data scraper

A bit of code responsible for collecting an Extraction from a Data Source or Archive. Check out the GitHub repo. For more about our philosophy, start here.

Colloquially, "scraper" may refer to a person writing a Scraper.

scraper extraction

The result of running a Scraper is an "extraction", usually intended to further parse or process an HTML page or PDF into more usable data.

Last updated