This research investigates a common problem in textual political science data: underreporting bias. News sources often fail to report state repression events, similar issues can occur with human coders.
Using the Agence France-Presse and Associated Press news datasets as examples, Cook et al.'s method estimates the extent of unreported repression by comparing multiple sources' coverage.
Researchers applied this technique using machine-coded data from the World-Integrated Crisis Early Warning System dataset. Both models (human vs. machine coding) were then evaluated against external measures of human rights protections in Africa and Colombia.
The findings reveal that underreporting bias affects both forms of data collection similarly across different contexts like Colombia's political landscape.
This means researchers must actively account for potential missing information whether analyzing news reports or algorithmically coded texts.






