Why Algorithms Still Beat Crowds at Predicting Recidivism

Insights from the Field

recidivism

statistical learning

MTurk

predictive accuracy

algorithmic fairness

Can Non-Experts Really Emulate Statistical Learning Methods? was authored by Kirk Bansak. It was published by Cambridge in Pol. An. in 2019.

🧾 What Dressel and Farid Reported

Dressel and Farid (2018) used responses from Amazon Mechanical Turk participants asked to predict recidivism and argued that nonexperts can match algorithmic approaches in both predictive accuracy and fairness. Their claim has been taken to suggest broad doubts about the value of algorithmic recidivism prediction.

🔍 How the Same Data Was Reassessed

Analyzing the original dataset from the Dressel and Farid study, additional evaluation techniques were applied to compare the outputs of statistical learning procedures with the MTurkers' assessments. The reanalysis focuses on the quality of predicted probabilities produced by models versus the judgments provided by nonexpert respondents.

📊 Key Comparisons and Evidence

Data source: Responses from Amazon Mechanical Turk used in Dressel and Farid (2018).
Comparison target: Predicted probabilities from statistical learning procedures versus MTurkers' evaluations.
Evaluation approach: Additional techniques and metrics beyond those reported in the original paper to assess prediction quality.

🔑 Main Findings

The metrics presented show that statistical learning approaches outperform nonexpert crowd judgments on important dimensions of probabilistic prediction.
The superiority pertains to the quality of the predicted probabilities rather than only headline accuracy measures.
Given these results, the conclusion that Dressel and Farid's findings "cast significant doubt on the entire effort of algorithmic recidivism prediction" is difficult to accept.

⚖️ Why This Matters

These reanalyses clarify that careful evaluation of probabilistic outputs can reveal advantages of statistical models missed by headline comparisons. For policy and research on recidivism prediction, comparing the full distributional and calibration properties of model predictions — not just raw accuracy — changes the inference about whether algorithms are meaningfully inferior to nonexperts.