# Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Page: https://stenobird.com/podcast/daily-paper-cast-7079649/perception-or-prejudice-can-mllms-go-beyond-first-impressions-of-personality Text version: https://stenobird.com/podcast/daily-paper-cast-7079649/perception-or-prejudice-can-mllms-go-beyond-first-impressions-of-personality.md Podcast: [Daily Paper Cast](https://stenobird.com/podcast/daily-paper-cast-7079649) Published: 2026-05-23T04:29:24+00:00 Episode link: https://share.transistor.fm/s/05a1e45c Audio file: https://media.transistor.fm/05a1e45c/de8ed868.mp3 Processing state: not_requested JSON: https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/perception-or-prejudice-can-mllms-go-beyond-first-impressions-of-personality Duration seconds: 1433 ## Resource đŸ¤— Upvotes: 152 | cs.AI, cs.CV, cs.CY Authors: Caixin Kang, Tianyu Yan, Sitong Gong, Mingfang Zhang, Liangyang Ouyang, Ruicong Liu, Bo Zheng, Huchuan Lu, Kaipeng Zhang, Yoichi Sato, Yifei Huang Title: Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Arxiv: http://arxiv.org/abs/2605.22109v1 Abstract: Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we design a three-tier evaluation (rating, reasoning, grounding) plus four sample-level failure-mode metrics: Prejudice Rate (PR), Confabulation Rate (CR), Integration-failure Rate (IR), and Holistic-grounding Rate (HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and the Holistic-Grounding Rate spans only 0-33.5%. These findings expose a disconnect between getting the right score and reasoning for the right reason, charting a roadmap for grounded… ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/perception-or-prejudice-can-mllms-go-beyond-first-impressions-of-personality/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/daily-paper-cast-7079649/perception-or-prejudice-can-mllms-go-beyond-first-impressions-of-personality.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.