FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Machine Coded Data? Not Always Better Than Human Coding for Underreporting Bias
Insights from the Field
underreporting bias
state repression events
Agence France-Presse
Associated Press
machine coding
Methodology
PSR&M
1 text files
1 archives
Dataverse
The Prevalence and Severity of Underreporting Bias in Machine and Human Coded Data was authored by Benjamin Bagozzi, Patrick Brandt, John Freeman, Jennifer Holmes, Alisha Kim, Agustin Palao Mendizabal and Carly Potz-Nielsen. It was published by Cambridge in PSR&M in 2019.

This research investigates a common problem in textual political science data: underreporting bias. News sources often fail to report state repression events, similar issues can occur with human coders.

Using the Agence France-Presse and Associated Press news datasets as examples, Cook et al.'s method estimates the extent of unreported repression by comparing multiple sources' coverage.

Researchers applied this technique using machine-coded data from the World-Integrated Crisis Early Warning System dataset. Both models (human vs. machine coding) were then evaluated against external measures of human rights protections in Africa and Colombia.

The findings reveal that underreporting bias affects both forms of data collection similarly across different contexts like Colombia's political landscape.

This means researchers must actively account for potential missing information whether analyzing news reports or algorithmically coded texts.

data
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Science Research & Methods
Podcast host Ryan