📖
Police Data Access Point Docs
pdap.ioGitHub
  • 👋Welcome
  • ⚡Activities
    • Label new Data Sources
      • Labeling events
    • Volunteer for Data Requests
    • Search for Data Sources
    • Publish data
    • Web scraping
    • FOIA requests
    • Advocate for open data
  • 🔬About
    • Search the PDAP database
    • Terms & definitions
      • What is a Data Source?
      • Terminology
    • Database details
      • Data Sources data dictionary
      • Agencies data dictionary
      • Requests data dictionary
      • Record Types taxonomy
      • Hidden properties
    • GitHub
    • Hugging Face
  • 📡API
    • Introduction/Getting Started
  • 🛠️Tools & Resources
    • Related projects
    • Resources for using data
    • Using LLMs like ChatGPT
  • 🔁Meta
    • Internal Tools (Retool)
    • Internal dev resources
      • GitHub issue template
      • GitHub pull request template
      • Product changes checklist
      • ☑️Production QA Checklist
      • Retool
    • Operations
      • Staff resources
        • Meeting Minutes
          • 2021-07-14
          • 2021-06-16
          • 2021-03-14
          • 2020 11-21 Tech Stack discussion
          • 2020-09-30 Leadership Cadence
          • 2020-10-14 Leadership Cadence
          • 2020-10-21 Leadership Cadence
          • 2020-10-28 Leadership Cadence
          • 2020-11-04 Leadership Cadence
          • 2020-11-12 Leadership Cadence
          • 2020-11-18 Leadership Cadence
          • 2020-11-25 Leadership Cadence Notes
          • 2020-12-02 Leadership Cadence Notes
          • 2020-12-09 Leadership Cadence Notes
          • 2020-12-12 Working Session
          • 2020-12-16 Leadership Cadence
          • 2020-12-30 Leadership Cadence Notes
          • 2021-01-06 Leadership Cadence Notes
          • 2021-01-13 Leadership Cadence Notes
          • 2021-01-20 Leadership Cadence Notes
          • 2021-01-27 Leadership Cadence Notes
          • 2021-02-03 Leadership Cadence Notes
          • 2021-02-10 Leadership Cadence Notes
          • 2021-02-17 Leadership Cadence Notes
          • 2021-02-24 Leadership Cadence Notes
          • 2021-03-03 Leadership Cadence Notes
          • 2021-03-10 Leadership Cadence Notes
          • 2021-03-16 Leadership Cadence Notes
          • 2021-03-27 database working session
          • 2021-03-31
          • 2020-12-1
          • 2021-01-23
          • 2021-04-10 Meeting notes
          • 2021-04-17 Meeting notes
          • 2021-04-21 Leadership Cadence
          • 2021-04-28 Leadership Cadence
          • 2021-05-05 Leadership Cadence
          • 2021-05-12 Leadership Cadence
          • 2021-05-19 Leadership Cadence
          • 2021-05-26 Leadership Cadence
          • 2021-06-02 Leadership Cadence
          • Decision log
        • Brand assets
      • Legal
        • Public records access laws & precedent
        • Legal Data Scraping
        • State Computer Crimes laws
      • Policy
        • Impartiality resolution
        • PDAP Access
        • PDAP Privacy Policy
        • Password Management
        • Personally Identifiable Information
    • Community calls
      • October 17, 2023
      • February 22, 2023
      • February 1, 2023
      • January 20, 2023
      • January 5, 2023
      • October 25, 2022
      • September 22, 2022
      • August 23, 2022
      • October 2, 2021
      • September 25, 2021
      • September 11, 2021
      • September 4, 2021
      • August 7, 2021
      • July 27 Dolt Bounty retro
      • July 17, 2021
      • July 10, 2021
      • June 26, 2021
      • June 19, 2021
      • June 12, 2021
      • June 5, 2021
      • May 1, 2021
      • April 24, 2021
    • Newsletter
    • Join our Discord
Powered by GitBook
On this page
  • Date
  • Participants
  • Goals
  • Discussion topics
  • Action items
  • Decisions

Was this helpful?

Edit on GitHub
  1. Meta
  2. Operations
  3. Staff resources
  4. Meeting Minutes

2021-04-10 Meeting notes

Previous2021-01-23Next2021-04-17 Meeting notes

Was this helpful?

Date

10 Apr 2021

Participants

  • Eddie Brown

  • Alec Akin

  • Eric Turner

  • Stabs

Goals

  • Answer schema questions to unblock .

  • Decide whether tableplus or DBeaver would be useful in the dolt pipeline ().

Discussion topics

Item

Notes

Datasets

  • UUIDs have been added in dolt

  • may want to remove hyphens to save space (currently mysql built in)

  • Once things are more stable it’s probably worth doing some views

  • Eric wrote a script to keep CityProtect datasets up to date as well as commit the dataset to the dolt db

    • Fork then merge is a necessity

    • We could put these on a chron job or just manually one them—once for each portal type

    • These update quarterly, so automating that is not our biggest problem.

    • 355 agencies in cityprotect for bulk downloading, avg 10 CSV files

  • We’ll need to get data out of datasets

  • Clones and forks become problematic just to make simple additions

Dolt x Scale

Risk: dolt is young and missing advanced SQL features. Dolt is not our advanced analytics tool. Dolt is not our deep storage layer.

Databases can’t currently reference each other, and repos are 1:1 with databases.

Data is unlimited but current technical cap is ~200gb

Can Dolt be an intake tool and not grow?

This may not even be our presentation layer because of the low amount of data it can store

  • one way around this might be checking data out of one dolt repo into another, preserving paper trail. does dolt support this / is it reasonable?

Where are we storing our data properties?

Is breadth an issue in Dolt? Would this table be prohibitively gigantic?

SQL column limit ~1000, mysql 4000. we should be good.

Hosting

If we went with mongodb directly we could bolt it into our DigitalOcean

Tableplus / DBeaver

Tableplus free has some limitations (open tabs)

Scrapers

For now—people should write whatever they want / is most convenient for them. If we want to refactor, we can figure out what the priority is on it + do it later.

Server access

We don’t use passwords, just keys—send it to Alec if you need access.

OCR

Stabs tried ~5 different ones and none of them worked well enough. Alec’s tried tensorflow

Action items

Decisions

  • Dolt is not our advanced analytics tool. Dolt is not our deep storage layer.

Alternative for down the road:

Otherwise priority ↑ for

proof of concept for separate dolt DB (check whether DBs can easily reference one another) Done

proof of concept for global_properties table referenced by data_types views - JIRA project doesn't exist or you don't have permission to view it.

to make request in dolthub discord for one:many repos:dbs

reach out to 3 major cloud providers + mongodb for sponsorships

Point database volunteers in the direction of

add scraper philosophies to readme (do what you want, we can always refactor later)

🔁
Josh Chamberlain
dolt implementation
discussion here
PDAP-121
Josh Chamberlain
Alec Akin
Josh Chamberlain
Dolt SQL viewer integrations
Josh Chamberlain
https://www.esri.com/en-us/arcgis/products/arcgis-open-data
https://pdap.atlassian.net/browse/PDAP-80
context
PDAP-144