点击蓝字 关注我们
论文分享 | 大语言模型相关研究进展
我们从2024-12-17到2024-12-19的36篇文章中精选出5篇优秀的工作分享给读者。
Large Language Model Federated Learning with Blockchain and Unlearning for Cross-Organizational Collaboration A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree Leveraging Foundation Language Models (FLMs) for Automated Cohort Extraction from Large EHR Databases
1.Large Language Model Federated Learning with Blockchain and Unlearning for Cross-Organizational Collaboration
Authors:Xuhan Zuo, Minghao Wang, Tianqing Zhu, Shui Yu, Wanlei Zhou
https://arxiv.org/abs/2412.13551
论文摘要
Large language models (LLMs) have transformed the way computers understand and process human language, but using them effectively across different organizations remains still difficult. When organizations work together to improve LLMs, they face several main challenges. First, organizations hesitate to share their valuable data with others. Second, com petition between organizations creates trust problems during collaboration. Third, new privacy laws require organizations to be able to delete specific data when requested, which is especially difficult when multiple organizations are learning from shared data. Traditional federated learning approaches do not address these interconnected challenges, particularly in scenarios where participants cannot fully trust each other or the central aggregator. To overcome these limitations, we propose a hybrid blockchain-based federated learning framework that uniquely combines public and private blockchain architectures with multi agent reinforcement learning. Our framework enables trans parent sharing of model update through the public blockchain while protecting sensitive computations in private chains. Each organization operates as an intelligent agent, using Q-learning to optimize its participation strategy and resource allocation, thus aligning individual incentives with collective goals. Notably, we introduce an efficient unlearning mechanism based on Low-Rank Adaptation (LoRA) that enables selective removal of specific data contributions without compromising the model’s overall performance. Through extensive experimentation on real-world datasets, we demonstrate that our framework effectively balances privacy protection, trust establishment, and regulatory compli ance while maintaining high model performance. Case studies in healthcare and education sectors validate our approach’s practical applicability in sensitive domains where data privacy and trust are paramount.
论文简评
该论文提出了一个融合公共和私有区块链、多代理强化学习以及基于低秩适应(LoRA)的高效去训练机制的混合区块链分布式联邦学习框架,用于跨组织环境下大型语言模型(LLM)的训练。该框架整合了公有和私有区块链,多代理强化学习和LoRA的高效去训练机制来解决隐私、信任和监管合规性挑战。通过实验演示,证明了该框架在实际数据集上的有效性,表明它在保持高模型性能的同时平衡了透明度和隐私,并建立了信任关系,同时确保了监管合规性。
此外,作者还详细介绍了该框架在医疗保健和教育领域中的实践应用案例,展示了其在这些领域的实用性。总的来说,该论文提出了一种创新的方法论,旨在解决当前大型语言模型训练过程中面临的复杂问题,并提供了有效的解决方案。通过对不同行业的案例研究,进一步验证了该方法的有效性和适用性,为未来大型语言模型的跨组织训练提供了新的思路和策略。
2.A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Authors:Shilin Sun, Wenbin An, Feng Tian, Fang Nan, Qidong Liu, Jun Liu, Nazaraf Shah, Ping Chen
https://arxiv.org/abs/2412.14056
论文摘要
Artificial intelligence (AI) has rapidly developed through advancements in computational power and the growth of massive datasets. However, this progress has also heightened challenges in interpreting the black-box nature of AI models. To address these concerns, eXplainable AI (XAI) has emerged with a focus on transparency and interpretability to enhance human understanding and trust in AI decision-making processes. In the context of multimodal data fusion and complex reasoning scenarios, the proposal of Multimodal eXplainable AI (MXAI) integrates multiple modalities for prediction and explanation tasks. Meanwhile, the advent of Large Language Models (LLMs) has led to remarkable breakthroughs in natural language processing, yet their complexity has further exacerbated the issue of MXAI. To gain key insights into the development of MXAI methods and provide crucial guidance for building more transparent, fair, and trustworthy AI systems, we review the MXAI methods from a historical perspective and categorize them across four eras: traditional machine learning, deep learning, discriminative foundation models, and generative LLMs. We also review evaluation metrics and datasets used in MXAI research, concluding with a discussion of future challenges and directions. A project related to this review has been created at https://github.com/ShilinSun/mxai_review.
论文简评
这篇论文对多模态可解释人工智能(Multimodal Explainable AI, MXAI)的发展历程进行了全面而深入的回顾。该研究从传统的机器学习、深度学习、判别式基础模型到生成性语言模型四个历史时期出发,详细阐述了多模态可解释人工智能方法的演变过程,并对数据驱动、模型驱动以及事后解释的多模态可解释人工智能方法进行了分类和分析。此外,还讨论了相关数据集和评估指标,为研究人员提供了宝贵的参考信息。综上所述,这篇论文不仅提出了一个多模态可解释人工智能发展的完整框架,而且对其重要性和应用前景进行了系统性的探讨,对于推动该领域的发展具有重要的理论价值和实践意义。
3.Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games
Authors:Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, Kai Han
https://arxiv.org/abs/2412.13602
论文摘要
Large Language Models (LLMs) are increasingly deployed in real-world applications that demand complex reasoning. To track progress, robust benchmarks are required to evaluate their capabilities beyond superficial pattern recognition. However, current LLM reason ing benchmarks often face challenges such as insufficient interpretability, performance sat uration or data contamination. To address these challenges, we introduce GAMEBOT, a gaming arena designed for rigorous and trans parent assessment of LLM reasoning capa bilities. GAMEBOT decomposes complex reasoning in games into predefined modular subproblems. This decomposition allows us to design a suite of Chain-of-Thought (CoT) prompts that leverage domain knowledge to guide LLMs in addressing these subproblems before action selection. Furthermore, we de velop a suite of rule-based algorithms to gen erate ground truth for these subproblems, en abling rigorous validation of the LLMs’ inter mediate reasoning steps. This approach fa cilitates evaluation of both the quality of fi nal actions and the accuracy of the underly ing reasoning process. GAMEBOT also natu rally alleviates the risk of data contamination through dynamic gamesandhead-to-head LLM competitions. We benchmark 17 prominent LLMsacross eight games, encompassing vari ous strategic abilities and game characteristics. Our results suggest that GAMEBOT presents a significant challenge, even when LLMs are pro vided with detailed CoT prompts. Project page: https://visual-ai.github.io/gamebot
论文简评
这篇论文提出了一个名为GAMEBOT的基准,旨在评估语言模型(LLMs)在竞争性游戏环境中的复杂决策过程。通过分解复杂的游戏决策为模块化子问题,该框架使用CoT提示和基于规则的算法来验证中间推理步骤,从而全面评估最终结果和中间推理过程。 文章的关键特点在于其对解释性和鲁棒性的重视,以及其广泛覆盖了不同战略能力和游戏特性的多样化游戏的涵盖范围。此外,规则驱动的算法用于生成子问题的真值,进一步提高了评估的严谨性和可解释性。
总之,这篇论文提供了关于如何构建一个全面评估语言模型能力的有效基准的见解,并且展示了其在理解、解释和预测游戏行为方面的潜力。它是一个值得深入研究的重要发现,对于推动游戏AI的发展具有重要意义。
4.Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Authors:Xiangxiang Gao, Weisheng Xie, Yiwei Xiang, Feng Ji
https://arxiv.org/abs/2412.12639
论文摘要
Striking an optimal balance between minimal drafting latency and high speculation accuracy to enhance the inference speed of Large Language Models remains a significant challenge in speculative decoding. In this paper, we introduce Falcon, an innovative semi-autoregressive speculative decoding framework fashioned to augment both the drafter's parallelism and output quality. Falcon incorporates the Coupled Sequential Glancing Distillation technique, which fortifies inter-token dependencies within the same block, leading to increased speculation accuracy. We offer a comprehensive theoretical analysis to illuminate the underlying mechanisms. Additionally, we introduce a Custom-Designed Decoding Tree, which permits the drafter to generate multiple tokens in a single forward pass and accommodates multiple forward passes as needed, thereby boosting the number of drafted tokens and significantly improving the overall acceptance rate. Comprehensive evaluations on benchmark datasets such as MT-Bench, HumanEval, and GSM8K demonstrate Falcon's superior acceleration capabilities. The framework achieves a lossless speedup ratio ranging from 2.91x to 3.51x when tested on the Vicuna and LLaMA2-Chat model series. These results outstrip existing speculative decoding methods for LLMs, including Eagle, Medusa, Lookahead, SPS, and PLD, while maintaining a compact drafter architecture equivalent to merely two Transformer layers.
论文简评
Falcon,一个半自回归性推测解码框架,旨在通过增强并行性和令牌接受率来提高大型语言模型(LLM)的推理速度。该框架融合了耦合顺序扫视蒸馏(CSGD)方法以及自定义设计的解码树,以实现这些目标。作者提出了全面的实验结果,证明了与现有方法相比,Falcon实现了显著的速度加快比例。
论文首先介绍了LSTM作为一种有效的语言模型,在解决文本生成问题时表现出色。然而,随着训练数据量的增加,其性能逐渐下降。因此,研究者提出了一种新的框架——Falcon,旨在解决这个问题。论文中提出的解决方案是基于一种名为Coupled Sequential Glancing Distillation(CSGD)的方法和自定义设计的解码树。
论文的主要优点在于它对现有的技术进行了深入的研究,并且提出了独特的方案来解决问题。此外,论文的实验结果也表明,Falcon确实能够有效地提升模型的推理速度。总体而言,这篇论文为解决大规模语言模型推理速度慢的问题提供了一个潜在的方向,值得进一步探索和应用。
5.Leveraging Foundation Language Models (FLMs) for Automated Cohort Extraction from Large EHR Databases
Authors:Purity Mugambi, Alexandra Meliou, Madalina Fiterau
https://arxiv.org/abs/2412.11472
论文摘要
A crucial step in cohort studies is to extract the required cohort from one or more study datasets. This step is time-consuming, especially when a researcher is presented with a dataset that they have not previously worked with. When the cohort has to be extracted from multiple datasets, cohort extraction can be extremely laborious. In this study, we present an approach for partially automating cohort extraction from multiple electronic health record (EHR) databases. We formulate the guided multi-dataset cohort extraction problem in which selection criteria are first converted into queries, translating them from natural language text to language that maps to database entities. Then, using FLMs, columns of interest identified from the queries are automatically matched between the study databases. Finally, the generated queries are run across all databases to extract the study cohort. We propose and evaluate an algorithm for automating column matching on two large, popular and publicly-accessible EHR databases -- MIMIC-III and eICU. Our approach achieves a high top-three accuracy of, correctly matching out of the columns of interest, when using a small, pre-trained general purpose language model. Furthermore, this accuracy is maintained even as the search space (i.e., size of the database) increases.
论文简评
这篇论文是关于如何使用基础语言模型(FLM)来自动提取多源电子健康记录数据库中的群体。该方法涉及将筛选条件转换为查询,并匹配不同数据库中的列。作者专注于这一步骤,并对他们的方法进行了评估,特别是在MIMIC-III和eICU数据集上,取得了92%的精确度。
论文的核心贡献在于它提出了一个自动化的方法来从多个EHR数据库中提取人群,这是一个临床研究中重要的问题。此外,采用基础语言模型作为列匹配算法,这种方法具有新颖性和潜在的潜力。尽管论文提供了详细的背景信息、方法论和结果分析,但其不足之处在于缺乏详细的数据处理流程和更深入的实验设计细节。然而,整体而言,这篇论文是一个值得关注的研究成果,因为它展示了如何利用现代技术解决实际问题。
我们欢迎您在评论区中留下宝贵的建议!包括但不限于:
可以提出推文中论文简评的不足! 可以分享最近更值得推荐的论文并给出理由!
END