FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Small Corpora, Big Insights: Word2Vec Tracks 161 Years of Shifting Meanings
Insights from the Field
word2vec
semantic change
newspapers
bootstrap
pretraining
Methodology
Pol. An.
1 other files
1 archives
Dataverse
A Timely Intervention: Tracking the Changing Meanings of Political Concepts With Word Vectors was authored by Emma Rodman. It was published by Cambridge in Pol. An. in 2020.

Political text analysis faces two linked challenges: many political corpora are relatively small, and many research questions depend on tracking how word meanings change over time. Traditional successes with word vector methods often rely on massive, stable corpora, leaving uncertainty about whether these tools can recover cultural-semantic change in typical political science use cases.

๐Ÿ“š 161 Years of Newspapers as a Gold Standard

  • A modest dataset of 161 years of U.S. newspaper coverage was assembled to serve as a human-grounded benchmark.
  • The focus was on public dialogue around the concept of equality in America, allowing comparison of algorithmic outputs to human assessments of semantic change.

๐Ÿงช Four Time-Sensitive Word2Vec Approaches Compared

  • Four different time-aware implementations of word2vec were evaluated against the newspaper-based gold standard.
  • The evaluation specifically addressed the methodological hurdles posed by small corpora and time-varying meanings.

๐Ÿ”Ž Key Findings

  • One implementation clearly outperformed the others in matching human judgments about how meanings around equality shifted over time.
  • Word2vec, when appropriately applied, can offer much more granular, temporally sensitive measures of meaning than many common text-as-data alternatives.

๐Ÿ› ๏ธ Practical Recommendations for Small, Time-Series Corpora

  • Use bootstrap resampling of documents to assess stability and uncertainty in semantic estimates.
  • Pretrain vectors on larger or related corpora before time-sliced training to improve performance.
  • Prefer the tested implementation that best matched the human benchmark for similar small-corpus, longitudinal questions.

๐Ÿ’ก Why It Matters

  • Demonstrates that word vector methods can be reliable for tracing semantic change in many political science contexts, not only in massive, static corpora.
  • Provides concrete best practices for researchers seeking to study cultural meanings over time using word2vec in smaller datasets.
data
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
Podcast host Ryan