datasets.url
is https://www.alamedaca.gov/Departments/Police-Department/Annual-Arrest-Traffic-Statisticsid
columns have been set up to automatically generate the uuid for you if you do not specify the column in the insert.agencies
table.5c2d0726d183487ba746402872573f42
is our agency id! Now we perform a google search for Alameda Police Department CA, and we find their homepage. We can also see in the above table that homepage_url
is blank, so we can run a quick update to store this homepage!dataset
record, as the scraper for each data type will most likely be a bit different and the table the data goes into will also be different. So let's start cross-referencing the data-types with what we see on the page to build to our datasets!
multi
, which we can find on our data types reference list as id 27
. This data being housed on an external source, makes it a third party
source type, which looking that up is 3
. It's also a good idea to leave a note for a multi
data type. The status_id
for our purposes will always be 1
because you are just adding the dataset but not scraping it. Scrapers will update this column to show the status of each dataset later!
We have many other columns we can fill out for a dataset, but most of which will be used by data scrapers in their task, so it's not completely necessary to do now, unless you really want to!datasets
table will automatically generate the id
for you if you do not specify it. However, if you really wish to specify it yourself: you can use the sql function REPLACE(uuid(), '-', '')
(which can be found via this link). This will always be a random string of letters and numbers, so for me I received c813aab6ba3d11ebb082927cb9e1207c
. You also would need to specify it in the statement below such as INSERT INTO datasets (id, url ...)
.2
for a direct
source type. Now we look again at the data types and we find 24
which is annual_reports
. And we are ready to insert!multi
data type for PDFs. We can see the 3rd and 5th accordion: Monthly Crime Statistics and Policty and Hate Crimes Reports. The url
in the datasets
table must be unique, so we cannot simply paste the url twice for both datatypes. In this case we need to use data type 27
again for multi
, we can use 2
for a direct
source type again, and gen a new id. The difference for this is I will leave a note of what the multi
entails.select * from datasets where agency_id='5c2d0726d183487ba746402872573f42' and date_insert >= '2021-05-21'
it will grab all of the records for the current agency, and the date_insert
will help in case the agency you are working with already had a previous record added but you found more!