Large Language Models in Survey Research: Generating Synthetic Data and Unlocking New Possibilities

Fabio Yoshio Suguri Motoki, Januario Monteiro, Ricardo Malagueño, Victor Rodrigues

Research output: Working paperPreprint

Abstract

This study examines the potential of large language models (LLMs) to generate synthetic survey data for organizational research. We propose a structured framework for prompting LLMs to simulate human-like responses, incorporating persona creation and impulse variables to enhance variability. Using previously validated constructs in the organizational deviance literature, we assess whether synthetic data exhibits response patterns aligned with theoretical expectations. Our findings suggest that LLMs can produce structured data that approximates real-world constructs, paving the way for more efficient pre-testing and refinement of survey instruments. While our results highlight promising potential, they also reveal key challenges, including response homogeneity, overestimated reliability, and model-specific biases. Ethical considerations, such as bias propagation and transparency, further emphasize the need for careful application. As LLMs continue to advance, their role in methodological innovation may expand, enabling researchers to explore new avenues in survey-based studies. This study represents a foundational step in integrating synthetic data into organizational research, broadening methodological possibilities while acknowledging current limitations.
Original languageEnglish
PublisherSSRN
DOIs
Publication statusPublished - 6 Nov 2023

Cite this