2024-12-24 论文分享 | 大语言模型最新进展

文摘 2024-12-24 10:43 安徽

点击蓝字关注我们

论文分享 | 大语言模型相关研究进展

我们从2024-12-19到2024-12-24的30篇文章中精选出5篇优秀的工作分享给读者。

Accelerating Retrieval-Augmented Generation
VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models
Rethinking Uncertainty Estimation in Natural Language Generation
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Length Controlled Generation for Black-box LLMs

1.Accelerating Retrieval-Augmented Generation

Authors: Derrick Quinn, Mohammad Nouri, Neel Patel, John Salihu, Alireza Salemi, Sukhan Lee, Hamed Zamani, Mohammad Alian

https://arxiv.org/abs/2412.15246

论文摘要

An evolving solution to address hallucination and enhance accuracy in large language models (LLMs) is Retrieval-Augmented Generation (RAG), which involves augmenting LLMs with information retrieved from an external knowledge source, such as the web. This paper profiles several RAG execution pipelines and demystifies the complex interplay between their retrieval and generation phases. We demonstrate that while exact retrieval schemes are expensive, they can reduce inference time compared to approximate retrieval variants because an exact retrieval model can send a smaller but more accurate list of documents to the generative model while maintaining the same end-to-end accuracy. This observation motivates the acceleration of the exact nearest neighbor search for RAG.

In this work, we design Intelligent Knowledge Store (IKS), a type-2 CXL device that implements a scale-out near-memory acceleration architecture with a novel cache-coherent interface between the host CPU and near-memory accelerators. IKS offers 13.4--27.9 faster exact nearest neighbor search over a 512GB vector database compared with executing the search on Intel Sapphire Rapids CPUs. This higher search performance translates to 1.7--26.3 lower end-to-end inference time for representative RAG applications. IKS is inherently a memory expander; its internal DRAM can be disaggregated and used for other applications running on the server to prevent DRAM — which is the most expensive component in today's servers — from being stranded.

论文简评

这篇论文深入探讨了智能知识库（IKS）这一硬件架构的设计理念及其在加速检索增强生成（RAG）中的有效性提升。该文旨在解决RAG应用中高效检索高质量信息的问题，并提出了一种新的加速方法来优化精确最近邻搜索。通过详细分析RAG执行流程，证明了精确检索可以超越近似方法，即使其资源消耗更高。IKS架构巧妙整合了低功耗加速器，显著提高了检索速度，从而缩短了端到端推理时间。

实验结果表明，IKS系统的表现非常出色，提供了强大的实证支持，证实了提出的解决方案的有效性和可行性。这些研究为未来的研究方向提供了良好的基础，有助于进一步探索如何更有效地处理大规模数据集中的检索任务，以及如何利用可扩展性技术提升系统性能。

2.VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models

Authors: Dexter Neo, Tsuhan Chen

https://arxiv.org/abs/2412.15739

论文摘要

Large Vision-Language Models (LVLMs) have made remarkable developments along with the recent surge of large language models. Despite their advancements, LVLMs have a tendency to generate plausible yet inaccurate or inconsistent information based on the provided source content. This phenomenon, also known as "hallucinations", can have serious downstream implications during the deployment of LVLMs. To address this, we present VORD, a simple and effective method that alleviates hallucinations by calibrating token predictions based on ordinal relationships between modified image pairs. VORD is presented in two forms: 1) a minimalist training-free variant which eliminates implausible tokens from modified image pairs, and 2) a trainable objective function that penalizes unlikely tokens. Our experiments demonstrate that VORD delivers better calibration and effectively mitigates object hallucinations on a wide range of LVLM benchmarks. Our code is available at: https://github.com/dexterdley/VORD.

论文简评

VORD是一种旨在通过调整令牌预测基于图像对修改后的顺序关系来减轻大型视觉语言模型（LVLM）中对象幻觉的方法。它提出了一个训练无须的变体，以及一个可训练的目标函数，以增强校准并减少幻觉。

该研究提供了新颖的视角——利用图像对之间的顺序关系进行令牌预测的校准，这是在LVLM领域的全新尝试。VORD的双策略——简易的无须训练方法和可训练的损失函数——为不同应用提供了灵活性。

论文整体内容围绕解决LVLM中的对象幻觉问题，尤其关注在敏感应用中安全部署的挑战。提出的VORD方法不仅有效解决该问题，还引入了创新性的视角，具有重要意义。

3.Rethinking Uncertainty Estimation in Natural Language Generation

Authors: Lukas Aichberger, Kajetan Schweighofer, Sepp Hochreiter

https://arxiv.org/abs/2412.15176

论文摘要

Large Language Models (LLMs) are increasingly employed in real-world applications, driving the need to evaluate the trustworthiness of their generated text. To this end, reliable uncertainty estimation is essential. Since current LLMs generate text autoregressively through a stochastic process, the same prompt can lead to varying outputs. Consequently, leading uncertainty estimation methods generate and analyze multiple output sequences to determine the LLM's uncertainty. However, generating output sequences is computationally expensive, making these methods impractical at scale. In this work, we inspect the theoretical foundations of the leading methods and explore new directions to enhance their computational efficiency. Building on the framework of proper scoring rules, we find that the negative log-likelihood of the most likely output sequence constitutes a theoretically grounded uncertainty measure. To approximate this alternative measure, we propose \method{}, which has the advantage of being obtained using only a single output sequence generated by greedy decoding. This makes uncertainty estimation more efficient and straightforward, while preserving theoretical rigor. Empirical results demonstrate that \method{} achieves state-of-the-art performance across various LLMs and tasks. Our work lays the foundation for efficient and reliable uncertainty estimation in natural language generation, challenging the necessity of more computationally involved methods currently leading the field.

论文简评

这篇论文提出了一种新的方法——G-NLL，用于自然语言生成（NLG）中的不确定性估计。该方法利用最可能输出序列的负对数似然度作为不确定性的评估指标，旨在提高不确定性估计的计算效率和可靠性。与目前普遍存在的高计算需求的方法相比，G-NLL具有显著优势。

通过理论论证和实证研究，论文证明了G-NLL在各种任务和模型上的有效性。这些成果表明，G-NLL能够克服当前NLG领域中一些重要的问题，如文本生成的信任度，这对实现实际应用至关重要。

总的来说，G-NLL作为一种新颖且有效的方法，为不确定性在自然语言生成中的应用提供了新视角，并有望推动相关领域的进一步发展。

4.Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

Authors: Yingshui Tan, Boren Zheng, Baihui Zheng, Kerui Cao, Huiyun Jing, Jincheng Wei, Jiaheng Liu, Yancheng He, Wenbo Su, Xiangyong Zhu, Bo Zheng

https://arxiv.org/abs/2412.15265

论文摘要

With the rapid advancement of Large Language Models (LLMs), significant safety concerns have emerged. Fundamentally, the safety of large language models is closely linked to the accuracy, comprehensiveness, and clarity of their understanding of safety knowledge, particularly in domains such as law, policy, and ethics. This \textbf{factuality ability} is crucial in determining whether these models can be deployed and applied safely and compliantly within specific regions. To address these challenges and better evaluate the factuality ability of LLMs to answer short questions, we introduce the \textbf{Chinese SafetyQA} benchmark. Chinese SafetyQA has several properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate, Safety-related, Harmless). Based on Chinese SafetyQA, we perform a comprehensive evaluation on the factuality abilities of existing LLMs and analyze how these capabilities relate to LLM abilities, e.g., RAG ability and robustness against attacks.

论文简评

该论文是关于评估大型语言模型（LLM）在中文语境中安全性相关事实性的研究。它提出了一个包含超过2000个示例的数据集，旨在填补现有基准无法全面评估安全知识的空白。作者认为，该基准弥补了当前评估法律和伦理理解方面的不足。

论文的关键特点在于其对安全性和事实性问题的关注，并提供了涵盖多种安全相关主题的数据集，以解决现有基准存在的局限性。此外，文中还详细介绍了收集数据的方法论，包括自动化和专家驱动的过程，确保数据的质量和准确性。

总的来说，该论文提出了一种有效的框架来评估LLM的安全性和法律/伦理理解能力，为未来的研究提供了有价值的基础。通过提供一个全面且高质量的数据集，论文有助于推动这一领域的研究进展，特别是在涉及法律法规和道德伦理的问题上。

5.Length Controlled Generation for Black-box LLMs

Authors: Yuxuan Gu, Wenjie Wang, Xiaocheng Feng, Weihong Zhong, Kun Zhu, Lei Huang, Tat-Seng Chua, Bing Qin

https://arxiv.org/abs/2412.14656

论文摘要

Large language models (LLMs) have demon strated impressive instruction following capa bilities, while still struggling to accurately man age the length of the generated text, which is a fundamental requirement in many real-world applications. Existing length control methods involve fine-tuning the parameters of LLMs, which is inefficient and suboptimal for practi cal use. In this paper, we propose a novel itera tive sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strat egy. This framework efficiently and reliably regulates LLMs to generate length-constrained text without modifying the underlying param eters, thereby preserving the original capabili ties of LLMs. Experimental results demonstrate that our framework achieves almost 100% suc cess rates of length control on LLAMA3.1 for tasks such as length-controlled abstractive sum marization and length-constrained instruction following, with minimal additional computa tional overhead. This also highlights the signif icant potential of our method for precise length control across a broader range of applications, without compromising the versatility of LLMs.

论文简评

该论文提出了一个迭代采样框架以控制大型语言模型（LLM）生成文本的长度，这一方法通过整合Metropolis-Hastings算法与重要性采样法来实现目标。该方法有效控制文本长度而不修改模型参数，从而在各种任务中取得了高成功率，实验结果显示该方法在不同任务上具有很高的准确率。

论文所提出的创新结合了Metropolis-Hastings算法和重要性采样法，有效提高了效率，并且在控制文本长度上取得了显著效果。此外，文中展示了该方法在实际应用中的优越性能，为未来的研究提供了有价值的参考。

总的来说，该论文不仅解决了大规模语言模型生成文本时的长度控制问题，而且提出了一个高效、实用的方法，值得进一步研究和探讨。

我们欢迎您在评论区中留下宝贵的建议！包括但不限于：

可以提出推文中论文简评的不足！
可以分享最近更值得推荐的论文并给出理由！

END

2024-12-24 论文分享 | 大语言模型最新进展

论文分享 | 大语言模型相关研究进展

1.Accelerating Retrieval-Augmented Generation

论文摘要

论文简评

2.VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models

论文摘要

论文简评

3.Rethinking Uncertainty Estimation in Natural Language Generation

论文摘要

论文简评

4.Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

论文摘要

论文简评

5.Length Controlled Generation for Black-box LLMs

论文摘要

论文简评

推荐阅读