FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Probabilistic Model Streamlines Large-Scale Data Merging
Insights from the Field
probabilistic model
record linkage
administrative data
large-scale merging
Methodology
APSR
2 other files
1 PDF files
Dataverse
Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records was authored by Ted Enamorado, Benjamin Fifield and Kosuke Imai. It was published by Cambridge in APSR in 2019.

Merging large administrative data sets is challenging due to missing identifiers and inaccurate records. This paper introduces a new algorithm for probabilistic record linkage that handles these issues efficiently at scale.

Data & Methods: We developed a fast, scalable algorithm using a canonical probabilistic model approach. The method accommodates millions of observations while accounting for:

  • Missing data
  • Measurement error
  • Auxiliary information integration
  • Uncertainty adjustment in post-analysis

Simulation Studies: Our algorithm was tested extensively through realistic scenarios to ensure reliability.

Real Applications: Case studies demonstrate its use in merging campaign contribution records, survey datasets, and voter files. An open-source implementation is available for researchers.

data
Find on Google Scholar
Find on JSTOR
Find on CUP
American Political Science Review
Podcast host Ryan