CIPS青工委学术沙龙活动之走进新加坡科技局（A*STAR）系列讲座圆满举行

学术 2023-12-13 14:52 上海

2023年12月7日，中国中文信息学会青年工作委员会受新加坡科技局的Rick Goh、刘勇、高斐以及宋宇婷博士之邀，参访了新加坡科技局高性能计算研究所（Institute of High Performance Computing, IHPC），开展了一场学术交流活动。在此次活动中，A*STAR高性能计算研究所的资深首席科学家兼计算与智能部门副主任刘勇为大家致开幕辞，并介绍了部门的主要研究方向和重点项目。中国中文信息学会青年工作委员会副主任魏忠钰也向与会者介绍了委员会的宗旨和职责。会议由新加坡技术局的宋宇婷博士和中国科学院计算所的庞亮博士共同主持。

此次会议特别邀请了中国中文信息学会青年工作委员会委员，包括复旦大学大数据学院副教授魏忠钰、莱顿大学副教授任昭春、浙江大学副教授张宁豫、新加坡国立大学NExT++研究中心研究员费豪、中国科学院计算技术研究所智能算法安全重点实验室副研究员庞亮等，他们与青年工作委员会成员及新加坡技术局的研究人员一起，就学术问题进行了深入的交流和讨论。在学术沙龙中，五位青工委专家分别就“ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks”、“Learning to Tokenize for Generative Retrieval”、“Editing Large Language Models: Problems, Methods, and Opportunities”、“From Multimodal LLM to AGI”、“Trustworthy Large Language Models: Challenges and Approaches”等主题进行了深入探讨。此外，A*STAR高性能计算研究所的科学家钱一鸣、资深科学家周阳和Satapathy Ranjan也就“Strategic Optimization of Language Model Utilization for Maximum Cost Efficiency”、“Language-Guided Design Generation and Medical Annotation”、“Integrating NLP in Financial Analysis and Decision-Making”等话题进行了精彩分享。

Invited Talks

魏忠钰复旦大学副教授

Title:ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Abstract: Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formatsand construct ReForm-Eval benchmark. Based on ReForm-Eval, we conduct extensive experiments, thoroughly analyze the strengths and weaknesses of existing LVLMs, and identify the underlying factors.

任昭春莱顿大学副教授

Title: Learning to Tokenize for Generative Retrieval

Abstract: As a new paradigm in information retrieval, generative retrieval directly generates a ranked list of document identifiers for a given query using generative language models. How to assign each document a unique docid (denoted as document tokenization) is a critical problem. We propose a novel document tokenization learning method, GENRET, which learns to encode the complete document semantics into docids. GENRET learns to tokenize documents into short discrete representations (i.e., docids) via a discrete auto-encoding approach. We develop a progressive training scheme to capture the autoregressive nature of docids and diverse clustering techniques to stabilize the training process. Based on the semantic-embedded docids of any set of documents, the generative retrieval model can learn to generate the most relevant docid only according to the docids’ semantic relevance to the queries. We conduct experiments on the NQ320K, MS MARCO, and BEIR datasets. GENRET establishes the new state-of-the-art on the NQ320K dataset. Compared to generative retrieval baselines, GENRET can achieve significant improvements on unseen documents. Moreover, GENRET can also outperform comparable baselines on MS MARCO and BEIR, demonstrating the method’s generalizability.

费豪 NUS NExT++研究中心研究员

Title: From Multimodal LLM to AGI

Abstract: In this report I'll briefly present the recent trend on multimodal LLMs, based on which I motivate our recent work of NExT-GPT, an end-to-end general-purpose any-to-any MM-LLM system that can perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. Then, more discussions about the latest trends on multimodal LLMs will be given, leading to more intelligent agent of future AI.

钱一鸣 A*STAR IHPC科学家

Title: Strategic Optimization of Language Model Utilization for Maximum Cost Efficiency

Abstract: The operation cost for running the LLM system consists of a huge component in the AI companies. With the right techniques, the operation costs could go down signifitalty. This talk introduced several methods that lower down the token costs for calling LLM, data cleaning costs for human annotation, and server hosting cost for query search from the vector databases.

张宁豫浙江大学副教授

Title: Editing Large Language Models: Problems, Methods, and Opportunities

Abstract: Despite the ability to train capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To this end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to efficiently alter the behavior of LLMs within a specific domain without negatively impacting performance across other inputs. This talk will focus on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. In particular, we will provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal.

周阳 A*STAR IHPC资深科学家

Title: Language-guided design generation and medical annotation

Abstract: This talk explores two significant research studies that highlight the influence of natural language descriptions on design generation and medical annotation. The first study, "Tell2Design," focuses on generating floor plans directly from natural language instructions. The researchers introduce the Tell2Design (T2D) dataset, a comprehensive collection of floor plans paired with corresponding language descriptions. To address this novel task, they propose a robust Sequence-to-Sequence model as a foundational approach. The performance of this model is then benchmarked against various existing text-conditional image generation methods, providing valuable insights into this promising field. The second study, "MedRPG," tackles the crucial task of automatically annotating medical images. This process involves identifying the most relevant region in a medical image based on a descriptive phrase highlighting specific findings. The proposed approach, MedRPG, leverages a lightweight vision-language transformer architecture to efficiently predict the relevant region's bounding box, even with limited data. By delving into both language-guided design generation and medical annotation, this talk seeks to underscore the remarkable potential of utilizing language in design and medical image analysis. This opens doors for future advancements and transformative applications in both fields.

Satapathy Ranjan A*STAR IHPC资深科学家

Title: Integrating NLP in Financial Analysis and Decision-Making

Abstract: The integration of NLP in financial analysis and decision-making is driven by the need to manage large volumes of data more effectively, make more informed decisions quickly, enhance customer service, and improve overall efficiency and risk management in the financial sector. The talk focuses on two use cases: greenwashing detection using NLP techniques and explainability in the finance domain using aspect-based sentiment analysis.

庞亮中国科学院计算技术研究所副研究员

Title: Trustworthy Large Language Models: Challenges and Approaches

Abstract: Ensuring the reliability, credibility, and traceability of content produced by Large Language Models (LLMs), such as ChatGPT, is of utmost importance. In this study, we figure out that the untrusted problems come from hallucination context, source bias, and implicit unfairness. Then, we tackle these challenges comprehensively, addressing issues at the data source, during interactive processes, and through model audits. Firstly, we introduce a robust fusion strategy designed to combine retrieved information with generated content, thereby enhancing the credibility of the data source. Next, we explore the potential of interactive approaches that involve information retrieval models and LLMs. This collaborative process serves to rectify inaccuracies and false information often generated by large language models. Finally, we present a statistical-based methodology to discern the origin of generated text, distinguishing between contributions from the LLM and human inputs.

http://mp.weixin.qq.com/s?__biz=MzAwMjM4NDU4MA==&mid=2649193022&idx=1&sn=2205a042753973e07b8339d7f40bf874

中国中文信息学会青年工作委员会

中国中文信息学会青年工作委员会（cips_ywc）是中国中文信息学会的下属学术组织，专门面向全国中文信息处理领域的青年学者和学生开展工作。\x0d\x0a本公众号及时发布中文信息学会青年工作委员会的相关活动、热点事件、重大新闻

最新文章

ACL执委会选举开始，冯洋研究员为唯一华人候选人

早鸟票倒计时5天｜第十九届中国中文信息学会暑期学校暨《前沿技术讲习班》（CIPS ATT 43&44）继续推出大模型系列专题！

早鸟票倒计时5天｜第九届语言与智能高峰论坛邀您共话语言与智能新发展！

早鸟票倒计时2天｜第十九届中国中文信息学会暑期学校暨《前沿技术讲习班》（CIPS ATT 44）继续推出大模型系列专题！

第九届语言与智能高峰论坛开放注册！

早鸟票倒计时3天｜第十九届中国中文信息学会暑期学校暨《前沿技术讲习班》（CIPS ATT 43）大模型系列专题报名！

第九届语言与智能高峰论坛开放注册！

早鸟票倒计时10天｜第十九届中国中文信息学会暑期学校暨《前沿技术讲习班》（CIPS ATT 43&44）继续推出大模型系列专题！

第九届语言与智能高峰论坛开放注册！

第二十一届自然语言处理青年学者研讨会(YSSNLP2024)顺利召开

中国中文信息学会《前沿技术讲习班》- 大模型系列专题·乌鲁木齐站

倒计时10天 | 中国中文信息学会《前沿技术讲习班》- 大模型系列专题·乌鲁木齐站

YSSNLP2024注册开放丨第二十一届自然语言处理青年学者研讨会全日程公开

报名倒计时 | 中国中文信息学会《前沿技术讲习班》- 大模型系列专题·乌鲁木齐站

第四届中国自然语言处理学生研讨会（CSSNLP 2023）顺利召开

开启报名 | CSSNLP 2023第四届中国自然语言处理学生研讨会即将召开

学术交流 |CIPS“大模型的构建训练方法与行业应用”前沿技术讲习班济南站-报告视频合集

CIPS青工委学术沙龙活动之走进新加坡科技局（A*STAR）系列讲座圆满举行

EMNLP 2023论文预讲会顺利召开

CIPS青工委学术沙龙活动之走进四川大学

开启报名 | EMNLP 2023 论文预讲会全日程公开

EMNLP 2023粤港澳大湾区论文预讲会

SMP 2023讲习班 | 大模型技术与实践

SMP 2023论坛预告 | LLM驱动的自主智能体

《前沿学术沙龙》-第一期·大模型研究综述

【早鸟票倒计时3天】暑期充电必备！大模型构建及应用前景课程等你来！

CCL2023持续注册 | 第二十二届中国计算语言大会全日程公开

CIPS青工委学术沙龙活动之走进厦门大学系列讲座圆满举行

开启报名丨ACL-IJCAI-SIGIR顶级会议论文报告会（AIS 2023）

CIPS青工委学术沙龙活动之走进齐鲁工业大学（山东省科学院）成功举办！

第二十届全国自然语言处理青年学者研讨会顺利召开

YSSNLP2023开启报名丨第二十届自然语言处理青年学者研讨会全日程公开

开启报名丨第二十届自然语言处理青年学者研讨会全日程公开

CIPS青工委学术沙龙活动之走进武汉大学系列讲座于3月18日圆满举行

CIPS青工委学术沙龙活动之走进武汉大学系列讲座

倒计时1天！CSSNLP 2022第三届中国自然语言处理学生研讨会即将召开

中文信息学会走进北京邮电大学学术报告圆满举行

NLP顶会COLING 2022论文预讲会日程全公开，36场精彩报告等你来！

HARS-CIPS青工委走进华为诺亚学术研讨会顺利召开

CIPS青工委学术沙龙活动之走进新疆大学

2022年度中文信息学会青年工作委学术报告会（华中科技大学）圆满举办成功

第十九届全国自然语言处理青年学者研讨会顺利召开

第十九届自然语言处理青年学者研讨会开幕倒计时2天，看直播送签名版图书（附程序册）

开启报名丨自然语言处理青年学者研讨会全日程公开，聚焦科技前沿&促进学科交叉

ACL-IJCAI-SIGIR顶级会议论文报告会(AIS 2022)成功举办!

恭祝元旦| CIPS青工委2021年年终总结

2021年度中文信息学会青年工作委学术报告会（华中科技大学）圆满举办成功

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉