Bring Multilingual Texts Into Comparative Politics With Topic Models

Insights from the Field

STM

Machine translation

Multilingual

Text analysis

Comparative politics

Computer-Assisted Text Analysis for Comparative Politics was authored by Christopher Lucas, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer and Dustin Tingley. It was published by Cambridge in Pol. An. in 2015.

🔎 What this article tackles

Recent advances in tools for systematic text analysis are opening new research possibilities across the social sciences. For comparative politics—where interest often centers on non-English or multilingual corpora—those advances can be hard to access. This article maps practical issues that arise at every stage of multilingual text work and emphasizes how standard procedures change across languages.

🔧 How text processing, management, translation, and analysis differ by language

Discusses practical steps for handling textual data across languages, including preprocessing, file and metadata management, translation choices, and analytic workflows
Highlights language-specific challenges and where common pipelines need adjustment

🧪 Two applied demonstrations using the Structural Topic Model

Combines the procedures described into two concrete examples of automated text analysis using the recently introduced Structural Topic Model (STM)
Shows how STM can be applied both to native-language corpora and to data that have been converted into a single language via machine translation

📂 Tools and reproducibility

All described methods are implemented in open-source software packages made available by the authors

⚖️ Why it matters

Provides a practical, language-aware roadmap for comparative politics researchers who want to leverage modern automated text methods while avoiding common pitfalls when working with non-English and multilingual data.