Can Speech Recognition Replace Human Transcripts in Political Text Analysis?

Insights from the Field

speech recognition

text-as-data

bag-of-words

WERSIM

Testing the Validity of Automatic Speech Recognition for Political Text Analysis was authored by Sven-Oliver Proksch, Christopher Wratil and Jens WÃ¤ckerle. It was published by Cambridge in Pol. An. in 2019.

🔍 What This Paper Does

Examines the validity of automatic speech recognition (ASR) for quantitative political text analysis, focusing on how ASR transcripts perform with standard bag-of-words methods when human transcription is unavailable or prohibitively expensive.

🧾 Where This Matters

Political speech sources that are often not routinely transcribed: parliamentary speeches, party conferences, television interviews and talk shows, and other recorded political events
Contexts where on-demand human transcription is cost-prohibitive for research projects

🧪 How Validity Was Tested

Introduces a novel word error rate simulation (WERSIM) procedure to probe how transcription errors affect downstream bag-of-words analyses
Implements WERSIM in R and uses it to simulate varying levels of ASR error
Applies quantitative text-analysis workflows to ASR-generated transcripts to evaluate robustness to transcription noise

📈 Key Findings

Demonstrates the potential for ASR outputs to be used with bag-of-words models to address open questions in political science
Uses two substantive applications to illustrate practical uses across different kinds of political speech
Shows that systematic robustness checks (via WERSIM) are essential for interpreting results from ASR-derived text

⚠️ Limitations and Practical Challenges

Accuracy of ASR varies by context, speaker, audio quality, and language, which affects downstream inferences
ASR does not eliminate the need for validation: researchers must assess error sensitivity for their specific research designs
Practical hurdles include model choice, preprocessing decisions, and integration with existing text-as-data pipelines

🔧 Tools Provided

An R implementation of WERSIM and a workflow demonstrating how to combine ASR transcripts with bag-of-words text-analysis methods

Why it matters: Expands the set of usable political speech sources for text-as-data research while offering a practical framework to evaluate and report the risks posed by transcription error.