SELECT REPLACE(uuid(), '-', '');
idcolumns have been set up to automatically generate the uuid for you if you do not specify the column in the insert.
5c2d0726d183487ba746402872573f42is our agency id! Now we perform a google search for Alameda Police Department CA, and we find their homepage. We can also see in the above table that
homepage_urlis blank, so we can run a quick update to store this homepage!
datasetrecord, as the scraper for each data type will most likely be a bit different and the table the data goes into will also be different. So let's start cross-referencing the data-types with what we see on the page to build to our datasets!
multi, which we can find on our data types reference list as id
27. This data being housed on an external source, makes it a
third partysource type, which looking that up is
3. It's also a good idea to leave a note for a
multidata type. The
status_idfor our purposes will always be
1because you are just adding the dataset but not scraping it. Scrapers will update this column to show the status of each dataset later! We have many other columns we can fill out for a dataset, but most of which will be used by data scrapers in their task, so it's not completely necessary to do now, unless you really want to!
datasetstable will automatically generate the
idfor you if you do not specify it. However, if you really wish to specify it yourself: you can use the sql function
REPLACE(uuid(), '-', '')(which can be found via this link). This will always be a random string of letters and numbers, so for me I received
c813aab6ba3d11ebb082927cb9e1207c. You also would need to specify it in the statement below such as
INSERT INTO datasets (id, url ...).
directsource type. Now we look again at the data types and we find
annual_reports. And we are ready to insert!
multidata type for PDFs. We can see the 3rd and 5th accordion: Monthly Crime Statistics and Policty and Hate Crimes Reports. The
datasetstable must be unique, so we cannot simply paste the url twice for both datatypes. In this case we need to use data type
multi, we can use
directsource type again, and gen a new id. The difference for this is I will leave a note of what the
select * from datasets where agency_id='5c2d0726d183487ba746402872573f42' and date_insert >= '2021-05-21'it will grab all of the records for the current agency, and the
date_insertwill help in case the agency you are working with already had a previous record added but you found more!