OpenTalks #74 | 大语言模型的内因偏见

文摘 2024-07-03 09:00 江苏

OpenTalks是OpenScience的学术策划小组与neurochat团队联合组织的在线学术交流活动，旨在促进研究者之间的交流。我们将邀请国内外的研究者，尤其是青年研究者，进行在线学术报告与讨论。学术报告的主题涉及可重复性、神经影像等。分享语言以中文为主，分享人偏好英文时，将使用英文。

OpenTalks希望增加青年研究者的曝光度，增加青年研究者的影响力，因此也非常欢迎青年研究者自荐作为报告人，报告主题不一定需要与Open Science直接相关。

报告信息

题目

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

大语言模型的内隐偏见

报告语言

中文

分享嘉宾

Xuechunzi Bai

（University of Chicago）

Dr. Bai is an incoming assistant professor in the Department of Psychology at the University of Chicago, affiliated with the Center for Decision Research at Booth School of Business, Computational Social Science, and Cognitive Science. She studies dynamic social minds: the interplay between individual decision processes and societal phenomena in the field of social cognition. Her current work explores the psychological origins of social stereotypes. https://www.xuechunzibai.com/

摘要

Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: as LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks. Both measures are based on psychological research: LLM Implicit Bias adapts the Implicit Association Test, widely used to study the automatic associations between concepts held in human minds; and LLM Decision Bias operationalizes psychological results indicating that relative evaluations between two candidates, not absolute evaluations assessing each independently, are more diagnostic of implicit biases. Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories (race, gender, religion, health) in 21 stereotypes (such as race and criminality, race and weapons, gender and science, age and negativity). Our prompt-based LLM Implicit Bias measure correlates with existing language model embedding-based bias methods, but better predicts downstream behaviors measured by LLM Decision Bias. These new prompt-based measures draw from psychology's long history of research into measuring stereotype biases based on purely observable behavior; they expose nuanced biases in proprietary value-aligned LLMs that appear unbiased according to standard benchmarks.

时间

北京时间[GMT+8] 7月9日(周二) 21:00~22:00

新加坡时间[GMT+8] 7月9日(周二) 21:00~22:00

欧洲中部时间[CET] 7月9日(周二) 15:00~16:00

美国东部时间[EDT] 7月9日(周二) 9:00~10:00

Zoom信息

Meeting ID: 863 0404 9478（房间容量100人）

报告流程

报告30～40分钟，提问10~20分钟

主持人

胡传鹏

（南京师范大学心理学院）

其他

本次线上报告提供录屏

COSN_live

https://space.bilibili.com/252509184

组织团队(按名字首字母倒序排列)

COSN 学术策划小组

张晗(博士), A*STAR, Singapore

张磊(博士), University of Birmingham, UK

楊毓芳(博士), Freie Universität Berlin, Germany

杨金骉(博士), MPI Psycholinguistics, the Netherlands

徐婷(博士), Child Mind Institute, USA

肖钦予, University of Vienna, Austria

王鑫迪(博士), 待业

王庆(博士), 上海精神卫生中心

鲁彬(博士), 中科院心理所

刘泉影(博士), 南方科技大学

金淑娴, University of Sussex, UK

金海洋(博士), 浙江理工大学

胡传鹏(博士), 南京师范大学

耿海洋(博士), 天桥脑科学研究院

葛鉴桥(博士), 北京大学

高梦宇(博士), 北京师范大学

陈志毅(博士), 第三军医大学

陈妍秀(博士), 中科院心理所

陈骥(博士), 上海交通大学

曹淼(博士), 北京大学

neurochat团队

张文昊(UT Southwestern Medical Center, USA)

张洳源(上海交通大学)

张磊(University of Birmingham, UK)

应浩江(苏州大学)

徐婷(Child Mind Institute, USA)

王鑫迪(待业)

滕相斌(香港中文大学)

鲁彬(中国科学院心理研究所)

孔祥祯(浙江大学)

胡传鹏(南京师范大学)

邸新(New Jersey Institute of Technology, USA)

▼

更多精彩推荐，请关注我们

▼

排版：董海龙

审核：胡传鹏

OpenScience

Chinese Open Science Network, a network for Transparent, Open, & Reproducible Science.开放科学中文社区，传播透明、开放和可重复的基础研究理念和实践。

最新文章

会议通知 | MRI Together 2024：全球MRI研究者的在线盛会

转载 | 基于神经影像数据的 CentileBrain 规范模型及脑龄模型

OpenMinds 3.0 | 第12期：频率论or贝叶斯

工作坊通知｜如何使用开放科学实践做好自己的研究

OpenMinds 3.0 | 第11期：贝叶斯因子

招聘 | 华南师范大学注意与记忆实验招募博士生（1-2名）

全球视角看学术职涯：SPSP 2025 早期学者发展专题，诚邀中国学术代表加入

招聘 | 南京师范大学王昊教授和浙江大学丁鼐教授联合现面向社会招募科研助理

翻译 | 如何让零结果的研究获得应有的关注

OpenMinds 3.0 | 第10期：频率论 vs. 贝叶斯统计

DDM Journal Club | #17：量子序列采样器：解码概率推理的动态模型

翻译校对招募 | FORRT开放学术术语表

SORTEE 2024全球网络研讨会

做出改变：加入OS SIG委员会！

OpenTalks #78 | “解构”构念效度

OpenMinds 3.0 | 第9期：样本量规划

OpenTutorials#20 | neuromaps: a toolbox for comparing brain maps

翻译 | 如何编写有效的大型语言模型提示词

【本周六 21:00】OpenTutorials#20 | neuromaps

【本周日 20:00】OpenMinds 3.0 | 第9期：样本量规划

【每周五 8:00】南师大2024 高级心理统计（贝叶斯方法）课程

OpenMinds 3.0 | 第9期：样本量规划

OpenTutorials#20 | neuromaps: a toolbox for comparing brain maps

转载 | 什么是开放科学？关于知识创造、传播、影响力评估的未来

转载 | 南师大2024 高级心理统计（贝叶斯方法）课程

OpenTalks #77 | 中-美在12项任务中的认知与感知差异：可重复性、稳健性和文化内变异

开放科学研讨会 & COSN Hackathon 2024开始报名啦！

转载 | 面向未来的科学学读书会阅读清单，探索开放科学范式

OpenMinds 3.0 | 第8期：统计检验力

翻译｜将信心放在置信区间的谬误

COSN Summer Hackathon 2024开始报名啦！

科研招聘｜南京师范大学心理学院王昊教授语言发展实验室现面向社会招聘科研助理1人

COSN Summer Hackathon 2024开始报名啦！

翻译｜将信心放在置信区间的谬误

科研招聘｜南京师范大学心理学院王昊教授语言发展实验室现面向社会招聘科研助理1人

OpenMinds 3.0 | 第7期：正确理解NHST

OpenTalks #75 | Angiotensin ll AT1R对奖赏加工的影响：脑影像和转录组学证据

OpenTalks #76 | 掩码填空联系测验FMAT：利用AI语言模型探究宏观社会心理及历史演变

OpenMinds 3.0 | 第6期：NHST的局限性（二）

OpenTalks #74 | 大语言模型的内隐偏见

OpenTalks #74 | 大语言模型的内因偏见

征稿 | Scientific Data 邀请您投递人类行为神经机制主题文章

今晚 9 点！eLife网络研讨会: 在全球南方推广开放科学

OpenMinds 3.0 | 第5期：NHST的局限性（一）

信度分析：组内相关系数 (ICC) 知多少

OpenMinds 3.0 | 第5期：NHST的局限性（一）

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉