FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Machine vs Human Geocoding: Similar Validity, Different Errors in Colombian Rights Data
Insights from the Field
geocoding
Colombia
human-rights
spatial-probit
INLA
Methodology
Pol. An.
17 R files
15 other files
1 PDF files
9 text files
Dataverse
Human Rights Violations in Space: Assessing the External Validity of Machine Geo-coded vs. Human Geo-coded Data was authored by Logan Stundal, Benjamin Bagozzi, John Freeman and Jennifer Holmes. It was published by Cambridge in Pol. An. in 2022.

🔎 What Was Compared:

This study compares human- and machine-geocoded records of human rights violations in Colombia against an independent ground-truth source to assess external validity. Agreement rates between the two geocoding approaches are evaluated for an eight-year focal period, for three consecutive two-year subperiods, and for a selected set of (non)journalistically remote municipalities.

📊 How The Data Were Tested:

  • Event type: human rights violations in Colombia.
  • Temporal scope: one key 8-year period and three 2-year subperiods.
  • Spatial scope: nationwide with targeted analysis of (non)journalistically remote municipalities.
  • Benchmark: an independent ground-truth dataset used to measure agreement between human and machine geocodes.

🧮 How The Models Compared Predictive Performance:

Spatial probit models were estimated separately on each of the three datasets to compare predictive patterns. These models incorporate Gaussian Markov Random Field (GMRF) error processes, are constructed via a stochastic partial differential equation (SPDE) approach, and are estimated using integrated nested Laplace approximation (INLA). The models test whether datasets:

  • Produce comparable predictions;
  • Underreport events relative to the same covariates; and
  • Share similar patterns of prediction error.

🔑 Key Findings:

  • Agreement analysis against the ground truth shows that machine- and human-geocoded datasets are comparable in terms of external validity for this subnational conflict.
  • Geostatistical (spatial probit) models reveal that prediction errors differ in important respects across the datasets, indicating distinct spatial error structures despite similar overall validity.

🌍 Why It Matters:

These results caution researchers and practitioners: machine-geocoded event data can be externally valid at the subnational level, but spatially structured prediction errors may affect inference and mapping of conflict risk. Choosing between human and machine geocoding should consider not only agreement with ground truth but also how geocoding method shapes spatial error patterns and subsequent model-based predictions.

data
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
Podcast host Ryan