2021-04-17 Meeting notes

44RDCXDSDDDate

17 Apr 2021

Participants

Mitch Miller

Josh Chamberlain

Eric Turner

Jeff Joskisch

Richard Ji

Discussion topics

Item

Notes

Dolt / Databases

  • Lots of changes being made in datasets

    • agencies

  • Does Dolt support SQL COPY

  • Richard has MongoDB creds for anyone who would like to experiment

  • How slow is Dolt? Breakingly? We’re going to have ~2 million records.

  • We can host a mirror on our server and run an instance if people want the same data quicker / without the dolt UI

Podcast fame

Jeff was interviewed on Privacy Please releasing this week (wednesday) and mentioned PDAP

OCR

Tensorflow has been suggested

We may need a lot of training data, which we don’t have.

  • Is there a way we could slowly start to feed this stuff to tensorflow now?

Requirements:

  • We need to be able to comma delimit things on the way in

There’s no harm in the meantime with publishing unedited PDFs

  • may inspire contributors to help with OCR

FE

There are some folks ready to work on stuff for when we have data

For now it’s pretty small and people could clone it and run it locally

Miles is converting gatsby to JSX which will make iteration easier

mongodb

We should have a template if we’re using Mongo

Docker compose file

Mitch is working on a docker compose file for dolt and mongo, which will be helpful as we get our ETL framework together

Action items

  • Eric Turner to ping Mitch in slack when sql-server POC is done (nearly)

  • Josh Chamberlain policy / rationale for PII → docs (this is a high priority)

  • Josh Chamberlain make shitty base tables from examples of other data types

  • Josh Chamberlain Do meeting notes in Docs next time so they can be shared