OpenTalks #74 | 大语言模型的内隐偏见

文摘   2024-07-08 16:30   江苏  

OpenTalks是OpenScience的学术策划小组与neurochat团队联合组织的在线学术交流活动,旨在促进研究者之间的交流。我们将邀请国内外的研究者,尤其是青年研究者,进行在线学术报告与讨论。学术报告的主题涉及可重复性、神经影像等。分享语言以中文为主,分享人偏好英文时,将使用英文。

OpenTalks希望增加青年研究者的曝光度,增加青年研究者的影响力,因此也非常欢迎青年研究者自荐作为报告人,报告主题不一定需要与Open Science直接相关。


报告信息

题目





Measuring Implicit Bias in Explicitly Unbiased Large Language Models

大语言模型的内隐偏见

报告语言



中文

分享嘉宾


Xuechunzi Bai
University of Chicago


Dr. Bai is an incoming assistant professor in the Department of Psychology at the University of Chicago, affiliated with the Center for Decision Research at Booth School of Business, Computational Social Science, and Cognitive Science. She studies dynamic social minds: the interplay between individual decision processes and societal phenomena in the field of social cognition. Her current work explores the psychological origins of social stereotypes. https://www.xuechunzibai.com/

 


Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: as LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks. Both measures are based on psychological research: LLM Implicit Bias adapts the Implicit Association Test, widely used to study the automatic associations between concepts held in human minds; and LLM Decision Bias operationalizes psychological results indicating that relative evaluations between two candidates, not absolute evaluations assessing each independently, are more diagnostic of implicit biases. Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories (race, gender, religion, health) in 21 stereotypes (such as race and criminality, race and weapons, gender and science, age and negativity). Our prompt-based LLM Implicit Bias measure correlates with existing language model embedding-based bias methods, but better predicts downstream behaviors measured by LLM Decision Bias. These new prompt-based measures draw from psychology's long history of research into measuring stereotype biases based on purely observable behavior; they expose nuanced biases in proprietary value-aligned LLMs that appear unbiased according to standard benchmarks.

时间


北京时间[GMT+8] 7月9日(周二) 21:00~22:00

新加坡时间[GMT+8] 7月9日(周二) 21:00~22:00
欧洲中部时间[CET] 7月9日(周二) 15:00~16:00

美国东部时间[EDT] 7月9日(周二) 9:00~10:00

Zoom信息


Meeting ID: 863 0404 9478(房间容量100人)


报告流程


报告30~40分钟, 提问10~20分钟

主持人




胡传鹏

(南京师范大学心理学院)

其他


本次线上报告提供录

COSN_live

https://space.bilibili.com/252509184




组织团队(按名字首字母倒序排列)
COSN 学术策划小组
张晗(博士), A*STAR, Singapore
张磊(博士), University of Birmingham, UK
楊毓芳(博士), Freie Universität Berlin, Germany
杨金骉(博士), MPI Psycholinguistics, the Netherlands
徐婷(博士), Child Mind Institute, USA
肖钦予, University of Vienna, Austria
王鑫迪(博士), 待业
王庆(博士), 上海精神卫生中心
鲁彬(博士), 中科院心理所
刘泉影(博士), 南方科技大学
金淑娴, University of Sussex, UK
金海洋(博士), 浙江理工大学
胡传鹏(博士), 南京师范大学
耿海洋(博士), 天桥脑科学研究院
葛鉴桥(博士), 北京大学
高梦宇(博士), 北京师范大学
陈志毅(博士), 第三军医大学
陈妍秀(博士), 中科院心理所
陈骥(博士), 上海交通大学
曹淼(博士), 澳大利亚国家影像设施

neurochat团队
张洳源(上海交通大学)
彭玉佳(北京大学)
应浩江(苏州大学)
徐婷(Child Mind Institute, USA)
王鑫迪(待业)
鲁彬(中国科学院心理研究所)
孔祥祯(浙江大学)
胡传鹏(南京师范大学)



更多精彩推荐,请关注我们



排版:董海龙

审核:胡传鹏

OpenScience
Chinese Open Science Network, a network for Transparent, Open, & Reproducible Science.开放科学中文社区,传播透明、开放和可重复的基础研究理念和实践。
 最新文章