2021-04-17 Meeting notes
44RDCXDSDDDate
17 Apr 2021
Participants
Mitch Miller
Jeff Joskisch
Discussion topics
Item
Notes
Dolt / Databases
Lots of changes being made in datasets
agencies
Does Dolt support SQL COPY
Richard has MongoDB creds for anyone who would like to experiment
How slow is Dolt? Breakingly? We’re going to have ~2 million records.
We can host a mirror on our server and run an instance if people want the same data quicker / without the dolt UI
Podcast fame
Jeff was interviewed on Privacy Please releasing this week (wednesday) and mentioned PDAP
OCR
Tensorflow has been suggested
We may need a lot of training data, which we don’t have.
Is there a way we could slowly start to feed this stuff to tensorflow now?
Requirements:
We need to be able to comma delimit things on the way in
There’s no harm in the meantime with publishing unedited PDFs
may inspire contributors to help with OCR
FE
There are some folks ready to work on stuff for when we have data
For now it’s pretty small and people could clone it and run it locally
Miles is converting gatsby to JSX which will make iteration easier
mongodb
We should have a template if we’re using Mongo
Docker compose file
Mitch is working on a docker compose file for dolt and mongo, which will be helpful as we get our ETL framework together
Action items
Eric Turner to ping Mitch in slack when sql-server POC is done (nearly)
Josh Chamberlain policy / rationale for PII → docs (this is a high priority)
Josh Chamberlain make shitty base tables from examples of other data types
Josh Chamberlain Do meeting notes in Docs next time so they can be shared