2023年12月7日,中国中文信息学会青年工作委员会受新加坡科技局的Rick Goh、刘勇、高斐以及宋宇婷博士之邀,参访了新加坡科技局高性能计算研究所(Institute of High Performance Computing, IHPC),开展了一场学术交流活动。在此次活动中,A*STAR高性能计算研究所的资深首席科学家兼计算与智能部门副主任刘勇为大家致开幕辞,并介绍了部门的主要研究方向和重点项目。中国中文信息学会青年工作委员会副主任魏忠钰也向与会者介绍了委员会的宗旨和职责。会议由新加坡技术局的宋宇婷博士和中国科学院计算所的庞亮博士共同主持。
此次会议特别邀请了中国中文信息学会青年工作委员会委员,包括复旦大学大数据学院副教授魏忠钰、莱顿大学副教授任昭春、浙江大学副教授张宁豫、新加坡国立大学NExT++研究中心研究员费豪、中国科学院计算技术研究所智能算法安全重点实验室副研究员庞亮等,他们与青年工作委员会成员及新加坡技术局的研究人员一起,就学术问题进行了深入的交流和讨论。在学术沙龙中,五位青工委专家分别就“ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks”、“Learning to Tokenize for Generative Retrieval”、“Editing Large Language Models: Problems, Methods, and Opportunities”、“From Multimodal LLM to AGI”、“Trustworthy Large Language Models: Challenges and Approaches”等主题进行了深入探讨。此外,A*STAR高性能计算研究所的科学家钱一鸣、资深科学家周阳和Satapathy Ranjan也就“Strategic Optimization of Language Model Utilization for Maximum Cost Efficiency”、“Language-Guided Design Generation and Medical Annotation”、“Integrating NLP in Financial Analysis and Decision-Making”等话题进行了精彩分享。
Title:ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented BenchmarksAbstract: Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formatsand construct ReForm-Eval benchmark. Based on ReForm-Eval, we conduct extensive experiments, thoroughly analyze the strengths and weaknesses of existing LVLMs, and identify the underlying factors.Title: Learning to Tokenize for Generative Retrieval
Abstract: As a new paradigm in information retrieval, generative retrieval directly generates a ranked list of document identifiers for a given query using generative language models. How to assign each document a unique docid (denoted as document tokenization) is a critical problem. We propose a novel document tokenization learning method, GENRET, which learns to encode the complete document semantics into docids. GENRET learns to tokenize documents into short discrete representations (i.e., docids) via a discrete auto-encoding approach. We develop a progressive training scheme to capture the autoregressive nature of docids and diverse clustering techniques to stabilize the training process. Based on the semantic-embedded docids of any set of documents, the generative retrieval model can learn to generate the most relevant docid only according to the docids’ semantic relevance to the queries. We conduct experiments on the NQ320K, MS MARCO, and BEIR datasets. GENRET establishes the new state-of-the-art on the NQ320K dataset. Compared to generative retrieval baselines, GENRET can achieve significant improvements on unseen documents. Moreover, GENRET can also outperform comparable baselines on MS MARCO and BEIR, demonstrating the method’s generalizability.Title: From Multimodal LLM to AGI
Abstract: In this report I'll briefly present the recent trend on multimodal LLMs, based on which I motivate our recent work of NExT-GPT, an end-to-end general-purpose any-to-any MM-LLM system that can perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. Then, more discussions about the latest trends on multimodal LLMs will be given, leading to more intelligent agent of future AI.Title: Strategic Optimization of Language Model Utilization for Maximum Cost Efficiency
Abstract: The operation cost for running the LLM system consists of a huge component in the AI companies. With the right techniques, the operation costs could go down signifitalty. This talk introduced several methods that lower down the token costs for calling LLM, data cleaning costs for human annotation, and server hosting cost for query search from the vector databases.
Title: Editing Large Language Models: Problems, Methods, and Opportunities
Abstract: Despite the ability to train capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To this end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to efficiently alter the behavior of LLMs within a specific domain without negatively impacting performance across other inputs. This talk will focus on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. In particular, we will provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal.Title: Language-guided design generation and medical annotation
Abstract: This talk explores two significant research studies that highlight the influence of natural language descriptions on design generation and medical annotation. The first study, "Tell2Design," focuses on generating floor plans directly from natural language instructions. The researchers introduce the Tell2Design (T2D) dataset, a comprehensive collection of floor plans paired with corresponding language descriptions. To address this novel task, they propose a robust Sequence-to-Sequence model as a foundational approach. The performance of this model is then benchmarked against various existing text-conditional image generation methods, providing valuable insights into this promising field. The second study, "MedRPG," tackles the crucial task of automatically annotating medical images. This process involves identifying the most relevant region in a medical image based on a descriptive phrase highlighting specific findings. The proposed approach, MedRPG, leverages a lightweight vision-language transformer architecture to efficiently predict the relevant region's bounding box, even with limited data. By delving into both language-guided design generation and medical annotation, this talk seeks to underscore the remarkable potential of utilizing language in design and medical image analysis. This opens doors for future advancements and transformative applications in both fields.Satapathy Ranjan A*STAR IHPC资深科学家Title: Integrating NLP in Financial Analysis and Decision-Making
Abstract: The integration of NLP in financial analysis and decision-making is driven by the need to manage large volumes of data more effectively, make more informed decisions quickly, enhance customer service, and improve overall efficiency and risk management in the financial sector. The talk focuses on two use cases: greenwashing detection using NLP techniques and explainability in the finance domain using aspect-based sentiment analysis.Title: Trustworthy Large Language Models: Challenges and Approaches
Abstract: Ensuring the reliability, credibility, and traceability of content produced by Large Language Models (LLMs), such as ChatGPT, is of utmost importance. In this study, we figure out that the untrusted problems come from hallucination context, source bias, and implicit unfairness. Then, we tackle these challenges comprehensively, addressing issues at the data source, during interactive processes, and through model audits. Firstly, we introduce a robust fusion strategy designed to combine retrieved information with generated content, thereby enhancing the credibility of the data source. Next, we explore the potential of interactive approaches that involve information retrieval models and LLMs. This collaborative process serves to rectify inaccuracies and false information often generated by large language models. Finally, we present a statistical-based methodology to discern the origin of generated text, distinguishing between contributions from the LLM and human inputs.