Model-X knockoffs in the replication crisis era: Reducing false discoveries and researcher bias in social science research

Jing Zhou, Sebastian Scherr

Research output: Contribution to journalArticlepeer-review

2 Downloads (Pure)

Abstract

The present study addresses problems faced by data-driven social science caused by having too much or not enough data. In particular, an abundance of data or a (sudden) lack thereof makes it challenging to identify the most important predictors in a sea of noise using the most parsimonious and reproducible model possible. In this article, we present the model-X knockoff method, which was introduced by Candès et al. (2018) for reducing the false identification of significant effects due to flexibility-ambiguity issues, to a broader audience, particularly within the social sciences and humanities. Our goal is to provide an accessible starting point and ideally spark interest among researchers in these fields to explore how model-X knockoffs can enhance their work. The findings from a performance contrast simulation indicate that model-X knockoffs select fewer relevant variables than other statistical methods to automatically identify variables, resulting in fewer mistakes. The simulation findings also demonstrate that model-X knockoffs are stable and less sensitive to even small changes in the dataset than other procedures, making them a viable way to reduce researcher degrees of freedom and increase the reproducibility of scientific findings. An additional real data example demonstrates the operational utility of the simulation.
Original languageEnglish
Article number101380
JournalSocial Sciences & Humanities Open
Volume11
Early online date28 Feb 2025
DOIs
Publication statusPublished - 2025

Keywords

  • Big data
  • High dimensional statistics
  • Model-X knockoffs
  • Multivariate linear regression
  • Simulation study
  • True model
  • Variable selection

Cite this