Project Applications are open! Visit our nonprofits page to submit. Apps are due Aug 29 2019.



A data pipeline to aggregate doctor disciplinary actions across the United States (Fall 2018)

Technologies used: Python, Selenium (for web scraping), EC2, S3, SQLite

About the Project

Doctor disciplinary data is siloed across states. When a doctor is disciplined in one state, the action is only reported by the state’s medical board. But if the doctor moves from one state to another, patients may not receive adequate information regarding their physician’s prior disciplinary actions. Thus, it was the goal of our project to create a data pipeline that could tackle this issue.

About the Client: ProPublica

ProPublica is an American nonprofit organization based in New York City. It is a nonprofit newsroom that aims to produce investigative journalism in the public interest.

You can find a link to their website here.


The desired effect is to help aggregate doctor disciplinary data in one central database that can be viewed and analyzed by the client.

The Team

Fall 2018 Team members:


  • Ability to web scrape doctor disciplinary actions in a state

  • Ability to aggregate actions in one central database

  • Ability to store PDFs of disciplinary action information

  • Ability to run scripts periodicly

Technical Challenges

  • Building scripts for states with very different days of recording and showing disciplinary actions

  • Removing duplicate data after the script is run each time