Episode

Human Psychometric Questionnaires Mischaracterize LLM Behavior

Podcast: Daily Paper Cast
Published: Jun 10, 2026
Duration seconds: 1508
Processing state: not_requested
Canonical source: https://share.transistor.fm/s/30fb1fb6
Audio: https://media.transistor.fm/30fb1fb6/cbba53b5.mp3
JSON: /v1/public/podcasts/daily-paper-cast-7079649/episodes/human-psychometric-questionnaires-mischaracterize-llm-behavior
Markdown: /podcast/daily-paper-cast-7079649/human-psychometric-questionnaires-mischaracterize-llm-behavior.md

Actions

POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/human-psychometric-questionnaires-mischaracterize-llm-behavior/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/daily-paper-cast-7079649/human-psychometric-questionnaires-mischaracterize-llm-behavior.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

🤗 Upvotes: 31 | cs.CL, cs.AI Authors: Woojung Song, Dongmin Choi, Yoonah Park, Jongwook Han, Eun-Ju Lee, Yohan Jo Title: Human Psychometric Questionnaires Mischaracterize LLM Behavior Arxiv: http://arxiv.org/abs/2509.10078v4 Abstract: We examine whether human psychometric questionnaires can serve as reliable tools for characterizing and predicting LLM behavior in everyday user interactions. We analyze eight open-source LLMs by comparing their value and personality profiles derived from two different methods: Likert self-reports on established questionnaires (PVQ-40/21 and BFI-44/10) and generation probabilities over value-laden responses to everyday user queries. The two profiles diverge substantially. Within-construct item consistency, often cited as evidence of stable LLM dispositions, disappears in generation probabilities. We attribute this gap to the fact that explicit lexical cues in established questionnaire items allow models to recognize the target construct and respond in alignment-consistent, socially desirable ways, whereas realistic user queries provide no such cues. In addition, demographic persona prompts shift models' responses to human questionnaires in ways consistent with real human patterns, but no such shifts appear in the generation probabilities of responses to realistic user queries, showing their limited ability to simulate the behaviors of target demographics in real-world user interactions. Overall, our study shows that human psychometric questionnaires are insufficient tools for predicting LLM behavior and suggests generation-based profiling as a more accurate measure.