# 2021-04-10 Meeting notes

## Date <a href="#id-2021-04-10meetingnotes-date" id="id-2021-04-10meetingnotes-date"></a>

10 Apr 2021

## Participants <a href="#id-2021-04-10meetingnotes-participants" id="id-2021-04-10meetingnotes-participants"></a>

* [Josh Chamberlain](https://pdap.atlassian.net/wiki/people/6068f9e790e3950069fbaaf4?ref=confluence)
* Eddie Brown
* Alec Akin
* Eric Turner
* Stabs

## Goals <a href="#id-2021-04-10meetingnotes-goals" id="id-2021-04-10meetingnotes-goals"></a>

* Answer schema questions to unblock [dolt implementation](https://pdap.atlassian.net/browse/PDAP-60).
* Decide whether tableplus or DBeaver would be useful in the dolt pipeline ([discussion here](https://policeaccessibility.slack.com/archives/C014Q3ZT2GG/p1617822153049800)).

## Discussion topics <a href="#id-2021-04-10meetingnotes-discussiontopics" id="id-2021-04-10meetingnotes-discussiontopics"></a>

| Item                                      | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Datasets                                  | <ul><li>UUIDs have been added in dolt</li><li>may want to remove hyphens to save space (currently mysql built in)</li><li>Once things are more stable it’s probably worth doing some views</li><li><p>Eric wrote a script to keep CityProtect datasets up to date as well as commit the dataset to the dolt db</p><ul><li>Fork then merge is a necessity</li><li>We could put these on a chron job or just manually one them—once for each portal type</li><li>These update quarterly, so automating that is not our biggest problem.</li><li>355 agencies in cityprotect for bulk downloading, avg 10 CSV files</li></ul></li><li>We’ll need to get <code>data</code> out of <code>datasets</code></li><li>Clones and forks become problematic just to make simple additions</li></ul>                                                                                                                                             |
| Dolt x Scale                              | <p>Alternative for down the road: <a href="https://www.esri.com/en-us/arcgis/products/arcgis-open-data"><https://www.esri.com/en-us/arcgis/products/arcgis-open-data></a></p><p>Risk: dolt is young and missing advanced SQL features. Dolt is not our advanced analytics tool. Dolt is not our deep storage layer.</p><p>Databases can’t currently reference each other, and repos are 1:1 with databases.</p><p>Data is unlimited but current technical cap is \~200gb</p><p>Can Dolt be an intake tool and not grow?</p><p>This may not even be our presentation layer because of the low amount of data it can store</p><ul><li>one way around this might be checking data out of one dolt repo into another, preserving paper trail. does dolt support this / is it reasonable?</li><li>Otherwise priority ↑ for <a href="https://pdap.atlassian.net/browse/PDAP-80"><https://pdap.atlassian.net/browse/PDAP-80></a></li></ul> |
| Where are we storing our data properties? | <p>Is breadth an issue in Dolt? Would this table be prohibitively gigantic?</p><p>SQL column limit \~1000, mysql 4000. we should be good.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Hosting                                   | If we went with mongodb directly we could bolt it into our DigitalOcean                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Tableplus / DBeaver                       | <p><a href="https://policeaccessibility.slack.com/archives/C014Q3ZT2GG/p1617822153049800">context</a></p><p>Tableplus free has some limitations (open tabs)</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| Scrapers                                  | For now—people should write whatever they want / is most convenient for them. If we want to refactor, we can figure out what the priority is on it + do it later.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| Server access                             | We don’t use passwords, just keys—send it to Alec if you need access.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| OCR                                       | Stabs tried \~5 different ones and none of them worked well enough. Alec’s tried tensorflow                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |

## Action items <a href="#id-2021-04-10meetingnotes-actionitems" id="id-2021-04-10meetingnotes-actionitems"></a>

* proof of concept for separate dolt DB (check whether DBs can easily reference one another) [![](https://pdap.atlassian.net/secure/viewavatar?size=medium\&avatarId=10318\&avatarType=issuetype)PDAP-144](https://pdap.atlassian.net/browse/PDAP-144) Done
* proof of concept for `global_properties` table referenced by data\_types views [PDAP-121](https://pdap.atlassian.net/browse/PDAP-121?src=confmacro) - JIRA project doesn't exist or you don't have permission to view it.
* [Josh Chamberlain](https://pdap.atlassian.net/wiki/people/6068f9e790e3950069fbaaf4?ref=confluence) to make request in dolthub discord for one:many repos:dbs
* [Alec Akin](https://pdap.atlassian.net/wiki/people/60319bf02a42cc0069af9ac8?ref=confluence) reach out to 3 major cloud providers + mongodb for sponsorships
* [Josh Chamberlain](https://pdap.atlassian.net/wiki/people/6068f9e790e3950069fbaaf4?ref=confluence) Point database volunteers in the direction of [Dolt SQL viewer integrations](https://github.com/dolthub/docs/tree/gitbook-dev/content/integrations)
* [Josh Chamberlain](https://pdap.atlassian.net/wiki/people/6068f9e790e3950069fbaaf4?ref=confluence) add scraper philosophies to readme (do what you want, we can always refactor later)

## Decisions <a href="#id-2021-04-10meetingnotes-decisions" id="id-2021-04-10meetingnotes-decisions"></a>

* Dolt is not our advanced analytics tool. Dolt is not our deep storage layer.
