Archival, verifiable, and secure data storage is the core of our mission.
An Intake Database where data can be dropped by the public. The public may also find unprocessed data. This raw archive is the cornerstone of our ability to audit any data we publish.
A Gold Standard Database, the unified archive managed by the PDAP community. Volunteers can contribute with machine learning, ETL, and manual processing to make the source material more useful. This is where PII is removed.
A public-facing Data Access Point. This is where the data collection community meets the needs of the data consumer community.
We don't yet have much data in permanent storage, so Dolt fulfills our requirements until we outgrow it or need more than it offers in some critical area.
Dolt can't store unprocessed data, so we are going to scale by using Hadoop servers.
Servers are configured as Digital Ocean droplets. There's no user-level authentication, so we don't have an edit history yet. Only DoltHub's data should be considered auditable.
Scrapers can be run from one of our Digital Ocean boxes. We haven't solved authentication here yet.