2020 11-21 Tech Stack discussion

Broad strokes: we determined that the MVP fulfills the pieces not currently achieved by Splunk—which could serve as the entire front end at very small scale. This means data integrity and depth are the core of the value added by PDAP.

Parts

IngestionArchival StorageSearch & analysis

Core principles, aka value add

Data stewardship

Transparency

Discipline

“Librarian”

Insights

"I want police data."

Make a query → get information

Enter search via UI (selects) or code → present specific data

api (json), chart, csv

"I want to be able to analyze the police data I found."

Analysis tools or analysis that is done for you

Find extremes in the data automatically

"PDAP needs to verify data."

  • Guard the submissions process

  • Credibility score for each type of data

"PDAP needs to be like a librarian."

  • Nonintrusive

  • Pedigree

    • Legally captured?

    • Multiple sources?

    • Anonymity?

"What does it mean to verify data?"

"We need to be able to get data out of cold storage."

  • Eventually it'll be too much data for Splunk

Workflows powered by Splunk

Query data → Analyze data → Export insights

Upload data → Analyze data* *The user agrees that we can keep the data, and provides information or verification about it.

Save an Analysis → Share the Analysis with someone else

Save an Analysis → Revisit it with updated data Alert user if an analysis changes based on updated information

Specific abilities granted by Splunk

  • easily write regex

  • accept any type of data

    • oddly / non-delimited

    • many file types

  • faster analysis / searching on the server rather than locally

  • automatically find "interesting fields"

  • search

  • analysis

Workflows not supported yet

Supply data to PDAP by volunteering or other sources

Verify submitted data → Request more info from a submitter

Provide an unprecedented breadth of data

Safely archive historic data for the foreseeable future

Understand the categorization structure We need to make sure the structure is future-proof, and establish policies for sortation that cannot easily be corrupted.