idis a UUID without hyphens. If you leave this field blank, it will automatically generate.
urlis the web location of the data for us to obtain. It could be a directory of files, a link to an aggregator, a map of incident reports. Our Examples & Best Practices guide will help you determine what to use for the
urlas well as determining the other fields for this table
status_idis the current status of the dataset
source_type_idis where the data is sourced from
data_types_idis what kind of data this dataset is. Most of these are self explanatory
data-intakewill store a link to view the actual video
format_types_idis how the data is structured. As of writing, we do not have format types fully fledged out. There is not quite a standard to how agencies present data unless it comes from an aggregator such as CityProtect or ArcGIS. Usually scrapers will have a better idea of the format_type. Here are our current format_types:
agency_idis a relationship to the
agenciestable. It links this dataset back to a specific agency.
update_frequencyhow often the data updates (usually the person responsible for retrieving the data will figure this out as they try to get the data). These are self-explanatory.
portal_typethis is a legacy field from our original schema. If the data comes from a certain aggregator like CityProtect, Tyler Technologies, ArcGIS, you can specify that here. This field may be merged into
format_types_idin the future.
coverage_startif you can find when the agency started producing the data at this link, you can record it here. Some agencies keep historical records, some agencies wipe and start fresh each day.
scraper_idNot yet implemented, okay to leave blank
notesany notes about the dataset we should know. Especially useful if the
multifor us to at-a-glance figure out what all it may comprise of.
can_scrapesome agencies – and some states – have laws expressly forbidding scraping. We absolutely need to know if we are banned from scraping. Let us know that by setting this value to
0if scraping is not allowed and
1if there is no language indicating scraping is prohibited.
date_insertautomatically generates when a new dataset is added
last_modifiedauto generates on insert, will need to be manually updated when you modify
machine_readablenot fully implemented. If the dataset contains machine readable output like an API endpoint, JSON, CSV or XLSX files, we consider that machine readable. It is very easy to transform and load into a database, so that would be
1. If the output of the datasets is in PDF, DOCX format that is not easily parsable, it would be considered not machine readable so
idis identical to the
nameis the name of the agency such as "California Highway Patrol" or "Greenwood County Sheriff's" Office
agency_typeis pretty self explanatory:
cityis the city the agency is a part of (will not apply to federal / state / county agencies)
zipis the postal code of the agency, may not always apply
county_fipsis the Federal Information Processing System code for US counties. The full FIPS is 13 characters long and unique to each geographic area. The first 2 digits are state, and the following 3 digits are county specific. You can use the FCC API with the
lngto find the FIPS code.
latis the latitude of the main HQ of this particular agency (provided some agencies have multiple districts)
lngis the longitude of the main HQ of this particular agency (provided some agencies have multiple districts)
date_insertautomatically generated on insert
data_policywe can store a link to Terms and Conditions or a Data Policy page from an agency. Very useful if they expressly forbid scraping
countryshould always be
USfor now, but is here in case we decided to start exploring other countries
homepage_urlthis is the agencies homepage that may be of interest if the dataset URLs break, we may be able to search their homepage again to see if we can find updated information