Blog: Why should we be careful when using LLMs as human participants?

Large Language Models (LLMs) have the ability to act as a specific personality based on a given description. Researchers have used this ability in simulating LLMs as replacement for human participants in user studies, annotation tasks, opinion surveys, and Computational Social Science. When researchers are using LLMs in such a way, they are assuming that LLMs are able to represent the perspectives of different demographic identities because of the vast training data they were trained on.

However, the paper by Wang et. al. [1], 'Large language models that replace human participants can harmfully misportray and flatten identity groups' show that there is an inherent flaw in the current approaches of training these LLMs that make them misportray and flatten representation of identity groups. The authors thus urge caution in use cases where LLMs are intended to replace human participants.

There are mainly three limitations in using LLMs as human participants' replacement:

Misportrayal of Identity groups

LLMs are trained on text scraped from the internet where the text is rarely attributed to a specific demographic identity. For example, most articles online do not mention the demographic background of the author so it's difficult to associate a text with a particular demographic identity. If demographic identity is mentioned in the text, it could be from in-group member (e.g. a Black person writing about themselves) or an out-group member (e.g. a White person writing about a Black person).
This causes LLMs to misportray identity groups i.e. the LLM acts out what out-group members think of that group, than what in-group members think of themselves.
This is harmful because this can reinforce stereotypes of these demographics and also reinforces the practice of speaking for others. E.g. In the disability community, out-group members (like the guardians and relatives of the disabled person) are often more vocal than in-group members. Thus any representation of such disabled persons using LLMs could likely be out-group representation that might not display the views of the in-group members.

Flattenting representation of Identity groups

LLMs are trained using likelihood loss functions like cross-entropy that reward models for producing the more likely text outputs.
This leads the LLM to flatten representation of Identity groups i.e. erase sub-group heterogeneity (e.g. within Women, Black women are different from White women).
This is harmful for historically marginalized groups (like Black people) who have been represented as one dimensional personalities in media.

Identity essentialization

When we prompt these LLMs with identities, we inherently essentialize identity as a difference factor. But, based on the task, initializing these LLMs with sensitive demographic related characteristics might be unnecessary.
Using identity prompts leads to Identity essentialization where we are reducing identities to a fixed and rigid set of characteristics. Such reductionist representation about people is harmful.

Thus, the authors suggest being careful when using LLMs as human participants. However, when the goal is to supplement rather than replace human participants, the authors provide some approaches to mitigate these limitations.

Use identity-coded names (e.g. Darnell Pierre) instead of labels (e.g. Black person). They found this to generate representation that are closer to in-group responses.
Use higher temperature values and other techniques to improve generation diversity. This approach could provide more diverse output.
Do not use identity prompts if it's unnecessary. In tasks where identity do not play any role, opt for other prompts such as behavioral personas, prompting with astrology signs, etc.

I really enjoyed reading this paper and I recommend you to check out the paper for the methodology and discussions.

- Written by Divya Mani Adhikari

[1] https://arxiv.org/abs/2402.01908

Updates from I2SC

Blog: Why should we be careful when using LLMs as human participants?

Popular posts from this blog

Job: Student Research Assistant (m/w/d) for AI & Misinformation detection

Blog: The Importance of Good Data in Satellite Imagery Analysis

Job: Student Research Assistants (m/w/d)