FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Why Predicting Multiple Labels Together Improves Political Text Coding
Insights from the Field
multi-label
text-as-data
supervised learning
Mexico
human rights
Methodology
Pol. An.
27 other files
43 datasets
1 text files
Dataverse
Multi-label Prediction for Political Text-as-Data was authored by Aaron Erlich, Stefano Dantas, Benjamin Bagozzi, Daniel Berliner and Brian Palmer-Rubin. It was published by Cambridge in Pol. An. in 2022.

πŸ“Œ What’s the problem?

Political scientists increasingly use supervised machine learning to code multiple labels from the same texts. The current practice of training separate supervised models for each label ignores relationships among labels and is likely to under-perform as a result.

πŸ” What was done and how it was evaluated

A multi-label prediction framework is introduced as a solution that leverages inter-label associations when coding multiple features from the same texts. The framework is reviewed and then applied in direct comparisons with standard single-label supervised learning approaches.

πŸ“‚ Texts and coding tasks examined

  • Access-to-information requests submitted to the Mexican government
  • Country-year human rights reports

πŸ”‘ Key findings

  • Multi-label prediction outperforms standard supervised learning approaches for coding multiple labels from the same texts.
  • The performance advantage holds even in cases where correlations among the multiple labels are low, indicating benefits beyond obvious label dependencies.

🌍 Why it matters

Multi-label prediction offers a practical improvement for text-as-data work that requires assigning multiple, potentially related labels to the same documents. Researchers and practitioners coding political texts should consider multi-label approaches to capture cross-label information and boost predictive performance.

data
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
Podcast host Ryan