FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Cross-Encoders Beat Word Methods for Short Political Texts
Insights from the Field
cross-encoder
embeddings
short texts
text similarity
Supreme Court
Methodology
AJPS
2 R files
28 Datasets
14 PDF
6 Text
12 HTML
35 Other
Dataverse
Using Cross-Encoders to Measure the Similarity of Short Texts in Political Science was authored by Gechun Lin. It was published by Wiley in AJPS in 2025.

🔍 The Problem

Scholars often need to estimate whether two political texts convey the same meaning. Commonly used methods in political science rely heavily on shared words, which limits their ability to detect semantic equivalence—a problem that becomes acute when documents are short, a growing form of data in modern political research.

🛠️ What Was Introduced and How It Works

Building on recent advances in computer science, cross-encoders are introduced as a tool for precise semantic similarity measurement in short texts. Key features:

  • Use of pair-level embeddings that directly model the relationship between two texts rather than embedding each text independently.
  • Availability as off-the-shelf models or as customizable models tailored to specific research tasks.

📚 How the Approach Was Tested

Performance is illustrated across three applied examples using short political texts:

  • Social messages generated in a telephone-game setup
  • News headlines about U.S. Supreme Court decisions
  • Facebook posts from members of Congress

These examples compare cross-encoders to traditional word-based techniques and to sentence-level embedding approaches.

📈 Key Findings

  • Cross-encoders, leveraging pair-level embeddings, offer superior performance across the three tasks.
  • They better identify when two short texts convey the same meaning even when they share few or no words.
  • The advantage holds across diverse short-text sources (experimental messages, headlines, social media posts).

💡 Why It Matters

More accurate semantic-similarity measurement for short texts improves the validity of research that relies on headlines, social media, survey open-ends, and other brief political communications. The availability of off-the-shelf and customizable cross-encoders provides a practical path for political scientists to adopt these methods and overcome the limitations of word-overlap and sentence-level embedding approaches.

data
Find on Google Scholar
Find on JSTOR
Find on Wiley
American Journal of Political Science
Podcast host Ryan