๐งพ What Dressel and Farid Reported
Dressel and Farid (2018) used responses from Amazon Mechanical Turk participants asked to predict recidivism and argued that nonexperts can match algorithmic approaches in both predictive accuracy and fairness. Their claim has been taken to suggest broad doubts about the value of algorithmic recidivism prediction.
๐ How the Same Data Was Reassessed
Analyzing the original dataset from the Dressel and Farid study, additional evaluation techniques were applied to compare the outputs of statistical learning procedures with the MTurkers' assessments. The reanalysis focuses on the quality of predicted probabilities produced by models versus the judgments provided by nonexpert respondents.
๐ Key Comparisons and Evidence
- Data source: Responses from Amazon Mechanical Turk used in Dressel and Farid (2018).
- Comparison target: Predicted probabilities from statistical learning procedures versus MTurkers' evaluations.
- Evaluation approach: Additional techniques and metrics beyond those reported in the original paper to assess prediction quality.
๐ Main Findings
- The metrics presented show that statistical learning approaches outperform nonexpert crowd judgments on important dimensions of probabilistic prediction.
- The superiority pertains to the quality of the predicted probabilities rather than only headline accuracy measures.
- Given these results, the conclusion that Dressel and Farid's findings "cast significant doubt on the entire effort of algorithmic recidivism prediction" is difficult to accept.
โ๏ธ Why This Matters
These reanalyses clarify that careful evaluation of probabilistic outputs can reveal advantages of statistical models missed by headline comparisons. For policy and research on recidivism prediction, comparing the full distributional and calibration properties of model predictions โ not just raw accuracy โ changes the inference about whether algorithms are meaningfully inferior to nonexperts.