2025-01-02 论文分享 | 大语言模型最新进展

文摘 2025-01-02 09:59 安徽

点击蓝字关注我们

论文分享 | 大语言模型相关研究进展

我们从2024-12-27到2025-01-02的25篇文章中精选出5篇优秀的工作分享给读者。

GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback
IMTP: Search-based Code Generation for In-memory Tensor Programs
Toward Adaptive Reasoning in Large Language Models with Thought Rollback
Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

1.GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors

Authors: Chengming Zhang, Xinheng Ding, Baixi Sun, Xiaodong Yu, Weijian Zheng, Zhen Xie, Dingwen Tao

https://arxiv.org/abs/2412.19829

论文摘要

Heterogeneous hardware like the Gaudi processor has been developed to enhance computations, especially matrix operations for Transformer-based large language models (LLMs) for generative AI tasks. However, our analysis indicates that Transformers are not fully optimized on such emerging hardware, primarily due to inadequate optimizations in non-matrix computational kernels like Softmax and in heterogeneous resource utilization, particularly when processing long sequences. To address these issues, we propose an integrated approach (called GFormer) that merges sparse and linear attention mechanisms. GFormer aims to maximize the computational capabilities of the Gaudi processor's Matrix Multiplication Engine (MME) and Tensor Processing Cores (TPC) without compromising model quality. GFormer includes a windowed self-attention kernel and an efficient outer product kernel for causal linear attention, aiming to optimize LLM inference on Gaudi processors. Evaluation shows that GFormer significantly improves efficiency and model performance across various tasks on the Gaudi processor and outperforms state-of-the-art GPUs.

论文简评

这篇论文《GFormer：优化大型语言模型（LLMs）性能的基于稀疏和线性注意力机制的方法》提出了一种名为GFormer的新方法，旨在通过集成稀疏和线性注意力机制来优化Gaudi处理器上大型语言模型（LLMs）的表现。该方法的主要目标是解决Softmax运算在处理长序列时出现的性能瓶颈问题。

论文的关键点主要集中在以下三个方面：

融合稀疏与线性注意力机制：这是本文的核心创新之一，它提出了一个全新的方法，旨在利用这些不同的注意力机制来优化大型语言模型的性能。这种融合让模型能够更好地捕捉输入序列中的长期依赖关系，并在有限的时间内完成大量的计算任务。
实验验证：论文通过实验数据展示了GFormer方法在处理不同类型数据集时取得的显著速度提升，尤其在对GPT和ViT等Transformer模型进行训练和推理时，与传统方法相比表现出高效性。
针对特定硬件的适用性：尽管论文未直接探讨GFormer如何适用于具体硬件平台，但通过对现有技术的改进可以推断，GFormer对提高Gaudi处理器上的模型性能具有可行性。

综上所述，GFormer作为一篇研究论文，不仅提供了优化大型语言模型性能的新方法，还通过实证测试证明了这种方法的有效性和实用性。它的理论基础和实践应用表明了其在处理长序列任务时的巨大潜力，尤其是在需要快速处理大量数据的场景中。因此，GFormer无疑是一个值得进一步探索和发展的方向。

2.Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

Authors: Seong Jin Lee, Will Wei Sun, Yufeng Liu

https://arxiv.org/abs/2412.19436

论文摘要

Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences. However, the heterogene ity of human feedback, driven by diverse individual contexts and preferences, poses significant challenges for reward learning. To address this, we propose a Low-rank Contextual RLHF (LoCo-RLHF) framework that integrates contextual information to better model heterogeneous feedback while maintaining computational efficiency. Our approach builds on a contextual preference model, leveraging the intrinsic low-rank structure of the interaction between user contexts and query-answer pairs to mitigate the high dimensionality of feature representations. Furthermore, we address the chal lenge of distributional shifts in feedback through our Pessimism in Reduced Subspace (PRS) policy, inspired by pessimistic offline reinforcement learning techniques. We the oretically demonstrate that our policy achieves a tighter sub-optimality gap compared to existing methods. Extensive experiments validate the effectiveness of LoCo-RLHF, showcasing its superior performance in personalized RLHF settings and its robustness to distribution shifts.

论文简评

这篇论文提出的低秩上下文RLHF（LoCo-RLHF）框架是应对异质性人机反馈在强化学习中挑战的重要成果。它融合了上下文信息，并利用低秩结构来提高奖励学习的效率，并有效应对分布变化。作者通过理论保证为其论点提供了坚实支持，并进行了广泛的数据验证以证明该方法的有效性。

论文的核心贡献在于提出了一种新的低秩上下文偏好模型，旨在解决强化学习中的人机反馈异质性问题。这种新模型能够有效减少计算复杂度，为解决此类问题提供了有效解决方案。此外，理论保证的使用也为该方法提供了坚实的理论支持，使其能够在实践中发挥重要作用。

实验结果表明，该方法在个性化设置下表现优异。大量实验数据的分析使得我们可以得出，LoCo-RLHF框架在处理异质性人机反馈时具有显著优势，尤其在应对分布变化时。这些结果证明了LoCo-RLHF框架的有效性和实用性，使得研究者们对这一领域有了更深入的理解。

综上所述，LoCo-RLHF框架作为一篇关于强化学习中异质性人机反馈的研究报告，不仅解决了传统方法面临的问题，而且在理论和实践上都取得了突破性进展。它的成功应用将有助于改善强化学习系统的性能，提升人机交互质量，推动人工智能技术的发展。

3.LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System

Authors: Kaiwen Zuo, Yirui Jiang

https://arxiv.org/abs/2412.18947

论文摘要

Medical Large Language Models (MLLMs) have demon strated potential in healthcare applications, yet their propen sity for hallucinations—generating medically implausible or inaccurate information—presents substantial risks to patient care. This paper introduces MedHallBench, a comprehen sive benchmark framework for evaluating and mitigating hal lucinations in MLLMs. Our methodology integrates expert validated medical case scenarios with established medical databases to create a robust evaluation dataset. The frame work employs a sophisticated measurement system that com bines automated ACHMI (Automatic Caption Hallucina tion Measurement in Medical Imaging) scoring with rigor ous clinical expert evaluations, and utilizes reinforcement learning methods to achieve automatic annotation. Through an optimized reinforcement learning from human feedback (RLHF) training pipeline specifically designed for medical applications, MedHallBench enables thorough evaluation of MLLMs across diverse clinical contexts while maintaining stringent accuracy standards.We conducted comparative ex periments involving various models, utilizing the benchmark to establish a baseline for widely adopted large language models (LLMs). Our findings indicate that ACHMI provides a more nuanced understanding of the effects of hallucina tions compared to traditional metrics, thereby highlighting its advantages in hallucination assessment. This research estab lishes a foundational framework for enhancing MLLMs relia bility in healthcare settings and presents actionable strategies for addressing the critical challenge of AI hallucinations in medical applications.

论文简评

《MedHallBench: Evaluating and Mitigating Hallucinations in Medical Large Language Models》提出了首个专注于医疗大语言模型（MLLMs）幻觉问题的综合评测框架。通过结合专家验证的医疗案例和数据库，创新性地引入了自动幻觉评分指标（ACHMI），并结合临床专家评估，实现对模型幻觉问题的精准评估。论文展示了ACHMI在量化幻觉方面的独特优势，并通过强化学习优化医疗场景下的RLHF训练管道，为提高MLLMs在医疗领域的可靠性奠定了基础。虽然实验结果和方法论清晰度尚需加强，但其解决医疗AI幻觉这一关键问题的努力值得关注！

4.Toward Adaptive Reasoning in Large Language Models with Thought Rollback

Authors: Sijia Chen, Baochun Li

https://arxiv.org/abs/2412.19707

论文摘要

Large language models (LLMs) have been routinely used to solve various tasks using step-by-step reasoning. However, the structure of intermediate reasoning steps, or thoughts, is rigid and unidirectional, such as chains, trees, or acyclic-directed graphs. Consequently, the resulting inflexible and forward-only reasoning may not address challenging tasks and fail when the LLM frequently gives false responses, i.e., “hallucinations.” This paper proposes a new reasoning framework, called Thought Rollback (TR), allowing LLMs to adaptively build thought structure while maintaining effective reasoning toward problem-solving under “hallucinations.” The core mechanism of TR is rolling back thoughts, which enables LLMs to perform error analysis on thoughts, thus rolling back to any previously mistaken thought for revision. By including such trial-and-error in the prompt to guide the LLM, each rollback leads to one more reliable reasoning path. Therefore, starting with a simple prompt without human annotations, LLM with TR adaptively and gradually explores thoughts for a correct solution. Comprehensive experiments on mathematical problems and multi-task reasoning demonstrate the state-of-the-art performance of TR in terms of problem-solving rate and interaction cost. For instance, the solving rate of GPT-4 with TR outperforms the current best by 9% on the MATH dataset. The source code is available under the folder examples/ThoughtRollback of https://github.com/iQua/llmpebase.

论文简评

该论文提出了一个名为Thought Rollback（TR）的框架，旨在通过允许大型语言模型（LLM）对先前的思想进行灵活调整来改进其推理能力。作者表明，TR增强了LLMs解决复杂推理任务的能力，并减少了交互成本。框架中采用回滚机制以处理生成思想中的错误，并旨在创建更为灵活的推理结构。

实验结果证实了在解决数学问题和多任务推理上的提高率，这表明了该方法的有效性潜力。此外，该框架设计为轻量级且支持试错学习，这是一个引人注目的优点。

综上所述，这篇论文提出了一种创新性的解决方案，通过引入回滚机制改善大型语言模型的推理能力，从而提升了它们解决问题的能力。实验结果也证实了这种方法的有效性和可行性，因此，我们期待在未来的研究中进一步探索其潜在的应用前景。

5.Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Authors: Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina

https://arxiv.org/abs/2412.20595

论文摘要

This study demonstrates that the modern generation of Large Language Models (LLMs, such as GPT-4) suffers from the same out-of-domain (OOD) performance gap observed in prior research on pre-trained Language Models (PLMs, such as BERT). We demonstrate this across two non-topical classification tasks: 1) genre classification and 2) generated text detection. Our results show that when demonstration examples for In-Context Learning (ICL) come from one domain (e.g., travel) and the system is tested on another domain (e.g., history), classification performance declines significantly.

To address this, we introduce a method that controls which predictive indicators are used and which are excluded during classification. For the two tasks studied here, this ensures that topical features are omitted while the model is guided to focus on stylistic rather than content-based attributes. This approach reduces the OOD gap by up to 20 percentage points in a few-shot setup. Straightforward Chain-of-Thought (CoT) methods, used as the baseline, prove insufficient, while our approach consistently enhances domain transfer performance.

论文简评

该论文旨在探讨大型语言模型（LLM）在应用中的脱域性能差距，并特别关注文本分类和生成文本检测这两个领域。通过提出一种控制提示特征的方法，该研究强调了对风格属性的关注而非主题属性，以提高在少量上下文学习中使用零样本分类任务的准确性。论文声称可以将OOD差距减少多达20个百分点，并对比其方法与基线链式思考方法进行了详细比较。整篇文章聚焦于一个重要的实践性问题，并成功展示了通过控制提示特征来改善分类任务的有效性。此外，还提供了清晰的实验比较，使读者能直观了解不同模型及其配置间的差异。总之，这篇论文提供了一种有效战略以改进大型语言模型的脱域性能，并为未来的研究提供了宝贵的见解。

我们欢迎您在评论区中留下宝贵的建议！包括但不限于：

可以提出推文中论文简评的不足！
可以分享最近更值得推荐的论文并给出理由！

END

2025-01-02 论文分享 | 大语言模型最新进展

论文分享 | 大语言模型相关研究进展

1.GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors

论文摘要

论文简评

2.Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

论文摘要

论文简评

3.LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System

论文摘要

论文简评

4.Toward Adaptive Reasoning in Large Language Models with Thought Rollback

论文摘要

论文简评

5.Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

论文摘要

论文简评

推荐阅读