🔍 What This Paper Does
Examines the validity of automatic speech recognition (ASR) for quantitative political text analysis, focusing on how ASR transcripts perform with standard bag-of-words methods when human transcription is unavailable or prohibitively expensive.
🧾 Where This Matters
- Political speech sources that are often not routinely transcribed: parliamentary speeches, party conferences, television interviews and talk shows, and other recorded political events
- Contexts where on-demand human transcription is cost-prohibitive for research projects
🧪 How Validity Was Tested
- Introduces a novel word error rate simulation (WERSIM) procedure to probe how transcription errors affect downstream bag-of-words analyses
- Implements WERSIM in R and uses it to simulate varying levels of ASR error
- Applies quantitative text-analysis workflows to ASR-generated transcripts to evaluate robustness to transcription noise
📈 Key Findings
- Demonstrates the potential for ASR outputs to be used with bag-of-words models to address open questions in political science
- Uses two substantive applications to illustrate practical uses across different kinds of political speech
- Shows that systematic robustness checks (via WERSIM) are essential for interpreting results from ASR-derived text
⚠️ Limitations and Practical Challenges
- Accuracy of ASR varies by context, speaker, audio quality, and language, which affects downstream inferences
- ASR does not eliminate the need for validation: researchers must assess error sensitivity for their specific research designs
- Practical hurdles include model choice, preprocessing decisions, and integration with existing text-as-data pipelines
🔧 Tools Provided
- An R implementation of WERSIM and a workflow demonstrating how to combine ASR transcripts with bag-of-words text-analysis methods
Why it matters: Expands the set of usable political speech sources for text-as-data research while offering a practical framework to evaluate and report the risks posed by transcription error.