A data pipeline to aggregate doctor disciplinary actions across the United States (Fall 2018)
Technologies used: Python, Selenium (for web scraping), EC2, S3, SQLite
About the Project
Doctor disciplinary data is siloed across states. When a doctor is disciplined in one state, the action is only reported by the state’s medical board. But if the doctor moves from one state to another, patients may not receive adequate information regarding their physician’s prior disciplinary actions. Thus, it was the goal of our project to create a data pipeline that could tackle this issue.
About the Client: ProPublica
ProPublica is an American nonprofit organization based in New York City. It is a nonprofit newsroom that aims to produce investigative journalism in the public interest.
You can find a link to their website here.
The desired effect is to help aggregate doctor disciplinary data in one central database that can be viewed and analyzed by the client.
Fall 2018 Team members:
Ability to web scrape doctor disciplinary actions in a state
Ability to aggregate actions in one central database
Ability to store PDFs of disciplinary action information
Ability to run scripts periodicly
Building scripts for states with very different days of recording and showing disciplinary actions
Removing duplicate data after the script is run each time