「学术趋势」EMNLP 24 最佳论文盘点

科技 2024-11-15 21:05 广东

SmartFlowAI

点击上方蓝字关注我们

机智流顶会顶刊讨论组
全文约 2400 字，预计阅读时间 6 分钟

11 月 12 日至 11 月 16 日 EMNLP 2024 在美弗罗里达火热举办中，本届最佳论文也在今天北京时间凌晨发布。本文对获评的五篇最佳论文进行了盘点*。后续我们还会继续陆续发布不同领域的 EMNLP 2024 高引盘点，在机智流公众号后台对话框回复“盘点”，加入顶会论文盘点交流群。

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance
Towards Robust Speech Representation Learning for Thousands of Languages
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method
CoGen: Learning from Feedback with Coupled Comprehension and Generation

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

https://arxiv.org/pdf/2404.01247

总结：随着多媒体内容的增加，人类译者不仅在文字上，也在图像等其他形式上注重文化适应性以传达相同意义，但机器翻译系统仍局限于处理语音和文本中的语言。本文朝着翻译图像使其具有文化相关性迈出第一步，构建了三个包含先进生成模型的管道来执行任务，还构建了两部分评估数据集（概念部分含600张跨文化连贯的图像，应用部分含100张来自实际应用的图像），对翻译后的图像进行多方面的人工评估以评估文化相关性和意义保留情况。结果发现，图像编辑模型目前无法完成该任务，但可通过引入大型语言模型和检索器来改进；最佳管道在较简单的概念数据集里对某些国家只能翻译5%的图像，在应用数据集里对某些国家无法成功翻译任何图像，这表明该任务极具挑战性，同时公布了代码和数据的网址。

摘要：Given the rise of multimedia content, human translators increasingly focus on culturally adapting not only words but also other modalities such as images to convey the same meaning. While several applications stand to benefit from this, machine translation systems remain confined to dealing with language in speech and text. In this work, we take a first step towards translating images to make them culturally relevant. First, we build three pipelines comprising state-of-the-art generative models to do the task. Next, we build a two-part evaluation dataset: i) concept: comprising 600 images that are cross-culturally coherent, focusing on a single concept per image, and ii) application: comprising 100 images curated from real-world applications. We conduct a multi-faceted human evaluation of translated images to assess for cultural relevance and meaning preservation. We find that as of today, image-editing models fail at this task, but can be improved by leveraging LLMs and retrievers in the loop. Best pipelines can only translate 5% of images for some countries in the easier concept dataset and no translation is successful for some countries in the application dataset, highlighting the challenging nature of the task. Our code and data is released here: this https URL.

Towards Robust Speech Representation Learning for Thousands of Languages

https://arxiv.org/abs/2407.00837

总结：自监督学习（SSL）虽减少了对标记数据的需求，有助于将语音技术扩展到更多语言，但仍远不能支持世界上7000多种语言。本文提出通用语音跨语言编码器XEUS，它使用来自4057种语言的100多万小时数据进行训练，将SSL模型的语言覆盖范围扩大了4倍。通过将现有公开语料库的100万小时语音与新创建的来自4057种语言的7400多小时语料库（将公开发布）相结合，并针对多语言语音数据的不同条件，在典型SSL掩码预测方法基础上增加新的去混响目标以增强鲁棒性。XEUS在多个基准测试中表现优于或取得与最先进（SOTA）SSL模型相当的结果，在ML - SUPERB基准测试中创下新的SOTA，尽管其参数或预训练数据较少，但分别比MMS 1B和w2v - BERT 2.0 v2高出0.8%和4.4%。相关检查点、代码和数据可在指定网址获取。

摘要：Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. We combine 1 million hours of speech from existing publicly accessible corpora with a newly created corpus of 7400+ hours from 4057 languages, which will be publicly released. To handle the diverse conditions of multilingual speech data, we augment the typical SSL masked prediction approach with a novel dereverberation objective, increasing robustness. We evaluate XEUS on several benchmarks, and show that it consistently outperforms or achieves comparable results to state-of-the-art (SOTA) SSL models across a variety of tasks. XEUS sets a new SOTA on the ML-SUPERB benchmark: it outperforms MMS 1B and w2v-BERT 2.0 v2 by 0.8% and 4.4% respectively, despite having less parameters or pre-training data. Checkpoints, code, and data are found in this https URL.

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

https://arxiv.org/abs/2402.12865

总结：研究Transformer语言模型（LMs）的学习和回忆信息机制是深度学习领域的重要目标。近期的可解释性方法将正向传播的权重和隐藏状态投射到模型的词汇表中，有助于揭示信息在LMs内的流动方式。本研究将这种方法扩展到LMs的反向传播和梯度，先证明梯度矩阵可被转换为正向和反向输入的低秩线性组合，再开发将梯度投射到词汇项的方法，探索新信息在LMs神经元中的存储机制。

摘要：Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the models' vocabularies, helping to uncover how information flows within LMs. In this work, we extend this methodology to LMs' backward pass and gradients. We first prove that a gradient matrix can be cast as a low-rank linear combination of its forward and backward passes' inputs. We then develop methods to project these gradients into vocabulary items and explore the mechanics of how new information is stored in the LMs' neurons.

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

https://arxiv.org/abs/2409.14781

总结：随着大型语言模型（LLMs）训练语料库规模的增长，模型开发者不愿透露数据细节，这给科学评估和伦理部署带来挑战。已有预训练数据检测方法被探索，Min - K%概率法虽有成果但存在局限。为解决这一问题，文中引入基于散度的校准方法校准标记概率以进行预训练数据检测，并开发了中文基准PatentMIA评估LLMs检测方法在中文文本上的性能，实验表明该方法优于现有方法，代码和基准可通过网址获取。

摘要：As the scale of training corpora for large language models (LLMs) grows, model developers become increasingly reluctant to disclose details on their data. This lack of transparency poses challenges to scientific evaluation and ethical deployment. Recently, pretraining data detection approaches, which infer whether a given text was part of an LLM's training data through black-box access, have been explored. The Min-K% Prob method, which has achieved state-of-the-art results, assumes that a non-training example tends to contain a few outlier words with low token probabilities. However, the effectiveness may be limited as it tends to misclassify non-training texts that contain many common words with high probabilities predicted by LLMs. To address this issue, we introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection. We compute the cross-entropy (i.e., the divergence) between the token probability distribution and the token frequency distribution to derive a detection score. We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text. Experimental results on English-language benchmarks and PatentMIA demonstrate that our proposed method significantly outperforms existing methods. Our code and PatentMIA benchmark are available at \url{this https URL}.

CoGen: Learning from Feedback with Coupled Comprehension and Generation

https://arxiv.org/abs/2408.15992

总结：《CoGen：从反馈中学习，耦合理解与生成》：兼具语言理解和生成能力的系统可受益于两者间的紧密联系。本研究聚焦从与用户的交互中持续学习，探索理解与生成的耦合。提出将学习和推理的两种能力紧密整合的技术，以双人参考游戏为研究场景，部署多种模型与人类用户进行数千次交互并从交互反馈信号中学习。结果显示随着时间推移性能显著提升，理解 - 生成耦合使性能绝对值提升达26%，与非耦合系统相比准确率提高达17%，并且耦合对系统语言有重要的定性影响，使其更像人类语言。

摘要：Systems with both language comprehension and generation capabilities can benefit from the tight connection between the two. This work studies coupling comprehension and generation with focus on continually learning from interaction with users. We propose techniques to tightly integrate the two capabilities for both learning and inference. We situate our studies in two-player reference games, and deploy various models for thousands of interactions with human users, while learning from interaction feedback signals. We show dramatic improvements in performance over time, with comprehension-generation coupling leading to performance improvements up to 26% in absolute terms and up to 17% higher accuracies compared to a non-coupled system. Our analysis also shows coupling has substantial qualitative impact on the system's language, making it significantly more human-like.

往期 · 推荐

FastChat（二）：负载均衡策略

FastChat（一）：200 行代码实现 Mini FastChat

简单聊聊人工评测

Google 论文 | 数据集关系大揭秘：基于用户任务的全面分析

🌠 番外：我们期待与读者共同探讨如何在 AI 的辅助下，更好地发挥人类的潜力，以及如何培养和维持那些 AI 难以取代的核心技能。通过深入分析和实践，我们可以更清晰地认识到 AI 的辅助作用，并在 AI 时代下找到人类的独特价值和发展空间。“机智流”公众号后台聊天框回复“cc”，加入机智流大模型交流群！

一起“点赞”三连👇

http://mp.weixin.qq.com/s?__biz=Mzg2NzU4MDgzMA==&mid=2247526576&idx=1&sn=71d86bedb115b8761b031dc8c9c2f217

机智流

共赴 AI 时代浪潮~涉及涵盖计算机视觉、大语言模型、多模态模型等AI领域最新资讯知识分享~

「学术趋势」EMNLP 24 知识图谱 Top15 被引盘点

「学术趋势」EMNLP 24 多模态 TOP15 被引论文盘点

AI周报：Perplexity 推出 AI 购物功能 | Mistral AI 发布 1240 亿参数多模态图像模型

大会日程公布｜PyCon China 2024 周末上海见！

第一次用书生大模型，我做出了《黑神话：悟空》通关助手！

「学术趋势」EMNLP 24 智能体 TOP15 被引论文盘点

「学术趋势」EMNLP 24 复杂推理 Top15 被引盘点

「学术趋势」EMNLP 24 评测领域 Top15 被引盘点

速报：Scaling law已终结？｜阿里云发布重磅AI编程模型：Qwen2.5-Coder

「学术趋势」EMNLP 24 最佳论文盘点

多个中国团队斩获EMNLP'24最佳论文！UCLA华人学者中三篇杰出论文，明年顶会落户苏州

吴恩达DeepLearning.AI课程系列 - 大模型检索增强生成（四）：检索优化进阶

「学术趋势」EMNLP 24 高引用 TOP 15

AI周报：AlphaFold 3开源 | Qwen 2.5-Coder性能媲美GPT-4o

FastChat（二）：负载均衡策略

R-CoT: 利用反向思维链弥补合成数据与实际数据之间的GAP，实现多模态几何数据生成能力突破

早鸟优惠即将截止！PyCon China 2024 即将到来

奥特曼专访自曝OpenAI掌握AGI密钥，2025年降临！1人1万块GPU缔造十亿独角兽

简单聊聊人工评测

Meta宣布举办Llama黑客马拉松，总奖金高达1.5万美元 || 混元开源新400B MoE模型

早鸟优惠即将截止！PyCon China 2024 即将到来

Google 论文 | 数据集关系大揭秘：基于用户任务的全面分析

DeepMind：CoT推理无需prompt也可进行，一文回顾CoT推理及其发展（上）

Google：推出MDAgents提升医疗决策的AI协作能力，本周AI周报来了

Meta：通过触摸感知、灵活性和人机交互的进步来推进嵌入式人工智能

Llama版o1来了，来自上海AI Lab，强化学习代码已开源，基于AlphaGo Zero范式

鹅厂版AI笔记悄悄上线，微信公众号优质内容秒变专属知识库，实测在此

时间地点公布｜PyCon China 2024 上海见！

中国自动驾驶时代记：技术、理想和“真经路”｜产业家特稿

DocLayout-YOLO，让多样性文档布局检测更快、更准、更强

课程升级、资源加码！万人共学的书生大模型实战营第4期正式起航！

你的第一张AI认证——亚马逊云科技正式推出「AI 从业者认证」

一文带你了解具身智能的学习进化架构技术路线

AI周报：LangChain开始商业化，LlamaIndex开发AI Agent课程，Github Copliot支持多家模型

你的第一张AI认证——亚马逊云科技正式推出「AI 从业者认证」

LLM101N：原理到代码，从零带你读懂ngram算法

课程升级、资源加码！万人共学的书生大模型实战营第4期正式起航！

你的第一张AI认证——亚马逊云科技正式推出「AI 从业者认证」

吴恩达DeepLearning.AI课程系列 - 大模型检索增强生成（四）：向量数据库中的检索优化

课程升级、资源加码！万人共学的书生大模型实战营第4期正式起航！

你的第一张AI认证——亚马逊云科技正式推出「AI 从业者认证」

咋做一个自己的评测数据集呢

免费 | 万人共学的书生大模型实战营公益课程来啦!

你的第一张AI认证——亚马逊云科技正式推出「AI 从业者认证」

FastChat（一）：200 行代码实现 Mini FastChat

你的第一张AI认证——亚马逊云科技正式推出「AI 从业者认证」

免费 | 万人共学的书生大模型实战营公益课程来啦!

落地分享：来看 UFH AI 医疗大模型如何助力国际化诊疗场景

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉