🧩 What the model tackles:
Classical generalized linear models (GLMs) assume that marginal effects are homogeneous across units once observed covariates are taken into account. That assumption may fail in practice. Building on recent proposals that use Dirichlet process priors to allow for latent heterogeneity, this paper introduces a hierarchical Dirichlet process of generalized linear models in which latent heterogeneity can depend on context-level features.
📊 How context enters the model:
- The hierarchical Dirichlet process lets covariate effects vary across latent subgroups.
- Crucially, the probability of subgroup membership is modeled as a function of context- or country-level features, so heterogeneity is context-dependent rather than purely random.
- This structure is especially relevant for comparative analyses where data are pooled from multiple countries and country-level characteristics may shape latent effect variation.
⚙️ Computation and estimation:
- A general Gibbs sampler is provided for the full hierarchical model.
- A specialized, efficient Gibbs sampler is developed for Gaussian outcome variables.
- For discrete outcomes, estimation uses a Hamiltonian Monte Carlo within Gibbs scheme to handle nonconjugacy and improve sampling efficiency.
🔬 Evidence and demonstrations:
- A Monte Carlo exercise illustrates the importance of accounting for latent, context-dependent heterogeneity.
- Two empirical applications replicate recent scholarly analyses and show how inference changes when heterogeneity is modeled as context-dependent rather than ignored.
- The results demonstrate that Simpson's paradox can appear in empirical analysis when latent heterogeneity is ignored, and that the proposed model can recover and quantify heterogeneity in covariate effects.
💡 Why this matters:
- The approach generalizes existing Dirichlet-process GLM methods by linking latent grouping to observable context features, improving interpretation in comparative settings.
- The provided estimation algorithms make the model practical for both continuous and discrete outcomes, enabling researchers to detect and estimate context-driven effect heterogeneity that standard GLMs would miss.