What is a Data Source?

What is a Data Source?

You can see for yourself at this public view.
A Data Source is a web page, file, database, or filing cabinet somewhere which contains records about a criminal legal system agency (law enforcement, courts, corrections). Often, they are published by the agency itself. Many more records are hidden behind records request processes.
Data Sources can be records describing an agency's activities, like traffic stops or use of force policies. They describe agencies in the criminal legal system like the Pittsburgh Bureau of Police or Allegheny County Jail.
To contribute Data Sources, start here!

Why is this important?

People need to be able to find data to do anything with it. The foundation of our work is creating a common system for classifying and tracking public data. Does your organization have a giant, unwieldy spreadsheet tracking FOIA requests and web sites? You're not alone—and you can share your work with others.

Things we can build using a Data Sources database

  • Automatic archives of each URL, creating a lasting resource for future research and web scraping. As it stands, information is lost to time due to data retention policies.
  • A classification system using metadata about which records are available, how it was collected, and how it relates to other records. This is the path to doing big, complicated aggregation projects.
  • Better transparency. We can improve transparency by being a hub for people who are using what's already there, finding its limits, and addressing them one by one.
  • Shared tools. When someone finds a Data Source in our database, they will also be able to see associated scrapers, extractions, and archives.

What kinds of Data Sources are best?

Finding all the Data Sources for your hometown is a human-sized project that can make a real impact.
If it's about a criminal legal system agency, we want to track it. This includes FOIA'd documents, web URLs, and independently scraped records.