/common/etl
library from a specific scraped agency dir, pass the schema.json
file to it.pdap/datasets
and pdap/data-intake
reposagency_id
exists and will grab the appropriate record from the databasedata
items in the schema.json
. It will create any datasets where the id is null, otherwise it will find an existing dataset and sync the data from the table with the schema.json
. Whichever source has the most recent last_modified
time will be the βSource of Truthβ and thatβs the data that will be used to sync.data
.mapping
object, it will search for a table in pdap/data-intake
with the exact name of the data_type
and then sync the columns so they are there. If a new column is in the database but not the schema.json file, it will add the missing column with a __skip__
value.mapping
object and insert the data into pdap/data-intake
. Erroneous records are skipped a message displayed to the console.