线下讲座 | “使用人工智能将中国古典文献转化为可搜索数据库”11.8

文摘   2024-11-07 08:31   中国  

Transforming Classical Chinese Texts into Searchable Databases with AI

November 7 @ 12:00 pm – 1:00 pm

Speaker: Guenther Lomas, Founder, Sigtica

As artificial intelligence becomes integral to the digital humanities, it offers innovative methods that transform research capabilities and uncover new insights into historical texts and cultural narratives. This talk will demonstrate how AI-powered pipelines can process large volumes of unstructured classical Chinese texts, such as genealogies and Qing dynasty government employee records, including those from the Da Qing jin shen quan shu, into organized, searchable databases.

随着人工智能成为数字人文学科不可或缺的一部分,它所提供的创新方法改变了研究能力,并揭示了对历史文本和文化叙事的新见解。本讲座将展示人工智能驱动的管道如何将大量非结构化的中国古典文本(如家谱和清朝政府雇员记录,包括《大清会典》中的记录)处理成有组织、可搜索的数据库。

The pipeline addresses a longstanding challenge in classical Chinese studies: the labor-intensive manual data entry process. It is designed to efficiently process millions of pages from historical Chinese texts, tackling complexities like layout identification and precision in text extraction. Central to this effort is customized Optical Character Recognition (OCR), which enhances data extraction accuracy and identifies key fields using Named Entity Recognition (NER) models. The result is clean, tabular databases that improve accessibility, allowing researchers to analyze Chinese historical content with unprecedented efficiency. Furthermore, this methodology holds potential applications for other languages, including Japanese, Korean, Arabic and Latin, broadening its impact.

该管道解决了中国古典研究中一个长期存在的难题:劳动密集型手工数据录入过程。它旨在高效处理数百万页的中国历史典籍,解决版式识别和文本提取精度等复杂问题。这项工作的核心是定制的光学字符识别(OCR),它提高了数据提取的准确性,并使用命名实体识别(NER)模型识别关键字段。其结果是建立了简洁的表格数据库,提高了数据库的可访问性,使研究人员能够以前所未有的效率分析中文历史内容。此外,这种方法还有可能应用于其他语言,包括日语、韩语、阿拉伯语和拉丁语,从而扩大其影响。

By exploring these methodologies and their implications, this presentation aims to show how integrating advanced technological tools enriches scholarly inquiry in the digital humanities, providing deeper insights into patterns and narratives within Chinese history and beyond. This approach promises to revolutionize data collection, paving the way for alternative research practices across various linguistic contexts.

通过探讨这些方法及其影响,本讲座旨在展示如何通过整合先进的技术工具来丰富数字人文学科的学术探索,为中国历史及其他历史中的模式和叙事提供更深入的见解。这种方法有望彻底改变数据收集,为不同语言环境下的其他研究实践铺平道路
Details
Date:
November 7
Time:
12:00 pm – 1:00 pm
Event Category: Special Event
Website:
https://forms.office.com/r/BD82VZ6r2L
Organizer
Digital China Initiative
Venue
CGIS South Room S354
1730 Cambridge St
Cambridge, MA 02138
United States

在美国学历史
分享海外历史学会议、讲座、新书等前沿学术资讯,更多服务请搜索vx小店“历史留学”,投稿请致函zsy1998@outlook.com
 最新文章