FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Categorical Data Favor Conditional Multiple Imputation Over MVN
Insights from the Field
multiple imputation
multivariate normal
conditional imputation
categorical data
ANES
Methodology
Pol. An.
1 Datasets
5 Text
1 Other
Dataverse
Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches was authored by Jonathan Kropko, Ben Goodrich, Andrew Gelman and Jennifer Hill. It was published by Cambridge in Pol. An. in 2014.

📌 What Was Compared

This study compares two common multiple imputation (MI) approaches: joint multivariate normal (MVN) MI, which models the complete data as a sample from a joint multivariate normal distribution and typically treats discrete categories as probabilistic draws from underlying continuous values; and conditional MI, which models each variable conditional on all others.

📊 How The Comparison Was Done

  • Two performance targets were assessed:
  • Accuracy of the imputed values themselves
  • Accuracy of coefficients and fitted values from analysis models run on completed datasets
  • Simulations covered a range of variable types:
  • Continuous, binary, ordinal, and unordered-categorical
  • Two simulation sources were used:
  • Synthetic data drawn from a multivariate normal distribution
  • Realistic data drawn from the 2008 American National Election Studies (ANES)
  • Missingness was generated by carefully following the conditions necessary for missingness to be Missing At Random (MAR), a less restrictive and more realistic setup than often used in missing-data simulation studies.

🔍 What Was Found

  • In these simulations, conditional MI produced more accurate imputations and more accurate analysis results than joint MVN MI whenever the dataset included categorical variables.
  • The advantage of conditional MI held across both the MVN-generated data and the ANES-based simulations, and across binary, ordinal, and unordered-categorical variables.
  • Joint MVN MI’s common practice of treating discrete outcomes as derived from continuous latent values appears to undercut its accuracy when categorical variables are present.

💡 Why It Matters

These results imply that applied researchers using MI on datasets with any categorical variables should favor conditional imputation approaches over standard joint MVN implementations, because conditional MI yields more accurate imputations and downstream inferences under realistic MAR conditions.

data
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
Podcast host Ryan