location
. url
.id
is a UUID without hyphens. If you leave this field blank, it will automatically generate.url
is the web location of the data for us to obtain. It could be a directory of files, a link to an aggregator, a map of incident reports. Our Examples & Best Practices guide will help you determine what to use for the url
as well as determining the other fields for this tablestatus_id
is the current status of the datasetsource_type_id
is where the data is sourced fromdata_types_id
is what kind of data this dataset is. Most of these are self explanatorydata-intake
will store a link to view the actual videoformat_types_id
is how the data is structured. As of writing, we do not have format types fully fledged out. There is not quite a standard to how agencies present data unless it comes from an aggregator such as CityProtect or ArcGIS. Usually scrapers will have a better idea of the format_type. Here are our current format_types:agency_id
is a relationship to the agencies
table. It links this dataset back to a specific agency.update_frequency
how often the data updates (usually the person responsible for retrieving the data will figure this out as they try to get the data). These are self-explanatory.portal_type
this is a legacy field from our original schema. If the data comes from a certain aggregator like CityProtect, Tyler Technologies, ArcGIS, you can specify that here. This field may be merged into format_types_id
in the future.coverage_start
if you can find when the agency started producing the data at this link, you can record it here. Some agencies keep historical records, some agencies wipe and start fresh each day.scraper_id
Not yet implemented, okay to leave blanknotes
any notes about the dataset we should know. Especially useful if the data_types_id
is multi
for us to at-a-glance figure out what all it may comprise of.can_scrape
some agencies – and some states – have laws expressly forbidding scraping. We absolutely need to know if we are banned from scraping. Let us know that by setting this value to 0
if scraping is not allowed and 1
if there is no language indicating scraping is prohibited.date_insert
automatically generates when a new dataset is addedlast_modified
auto generates on insert, will need to be manually updated when you modifymachine_readable
not fully implemented. If the dataset contains machine readable output like an API endpoint, JSON, CSV or XLSX files, we consider that machine readable. It is very easy to transform and load into a database, so that would be 1
. If the output of the datasets is in PDF, DOCX format that is not easily parsable, it would be considered not machine readable so 0
id
is identical to the datasets
.id
name
is the name of the agency such as "California Highway Patrol" or "Greenwood County Sheriff's" Officeagency_type
is pretty self explanatory:state_iso
a link to our states table. A two-digit code representing the state / province the agency is a part ofcity
is the city the agency is a part of (will not apply to federal / state / county agencies)zip
is the postal code of the agency, may not always applycounty_fips
is the Federal Information Processing System code for US counties. The full FIPS is 13 characters long and unique to each geographic area. The first 2 digits are state, and the following 3 digits are county specific. You can use the FCC API with the lat
and lng
to find the FIPS code.lat
is the latitude of the main HQ of this particular agency (provided some agencies have multiple districts)lng
is the longitude of the main HQ of this particular agency (provided some agencies have multiple districts)date_insert
automatically generated on insertdata_policy
we can store a link to Terms and Conditions or a Data Policy page from an agency. Very useful if they expressly forbid scrapingcountry
should always be US
for now, but is here in case we decided to start exploring other countrieshomepage_url
this is the agencies homepage that may be of interest if the dataset URLs break, we may be able to search their homepage again to see if we can find updated information