📌 What Was Compared
This study compares two common multiple imputation (MI) approaches: joint multivariate normal (MVN) MI, which models the complete data as a sample from a joint multivariate normal distribution and typically treats discrete categories as probabilistic draws from underlying continuous values; and conditional MI, which models each variable conditional on all others.
📊 How The Comparison Was Done
- Two performance targets were assessed:
- Accuracy of the imputed values themselves
- Accuracy of coefficients and fitted values from analysis models run on completed datasets
- Simulations covered a range of variable types:
- Continuous, binary, ordinal, and unordered-categorical
- Two simulation sources were used:
- Synthetic data drawn from a multivariate normal distribution
- Realistic data drawn from the 2008 American National Election Studies (ANES)
- Missingness was generated by carefully following the conditions necessary for missingness to be Missing At Random (MAR), a less restrictive and more realistic setup than often used in missing-data simulation studies.
🔍 What Was Found
- In these simulations, conditional MI produced more accurate imputations and more accurate analysis results than joint MVN MI whenever the dataset included categorical variables.
- The advantage of conditional MI held across both the MVN-generated data and the ANES-based simulations, and across binary, ordinal, and unordered-categorical variables.
- Joint MVN MI’s common practice of treating discrete outcomes as derived from continuous latent values appears to undercut its accuracy when categorical variables are present.
💡 Why It Matters
These results imply that applied researchers using MI on datasets with any categorical variables should favor conditional imputation approaches over standard joint MVN implementations, because conditional MI yields more accurate imputations and downstream inferences under realistic MAR conditions.