π Whatβs the problem?
Political scientists increasingly use supervised machine learning to code multiple labels from the same texts. The current practice of training separate supervised models for each label ignores relationships among labels and is likely to under-perform as a result.
π What was done and how it was evaluated
A multi-label prediction framework is introduced as a solution that leverages inter-label associations when coding multiple features from the same texts. The framework is reviewed and then applied in direct comparisons with standard single-label supervised learning approaches.
π Texts and coding tasks examined
- Access-to-information requests submitted to the Mexican government
- Country-year human rights reports
π Key findings
- Multi-label prediction outperforms standard supervised learning approaches for coding multiple labels from the same texts.
- The performance advantage holds even in cases where correlations among the multiple labels are low, indicating benefits beyond obvious label dependencies.
π Why it matters
Multi-label prediction offers a practical improvement for text-as-data work that requires assigning multiple, potentially related labels to the same documents. Researchers and practitioners coding political texts should consider multi-label approaches to capture cross-label information and boost predictive performance.