Research Center for Digital Humanities of PKU 北京大学数字人文研究中心

图片来源: 北京大学, 未提供日期

[English Version

名称

北京大学数字人文研究中心

成立年份

2020

简要描述

2020 年,学校成立校级实体 「北京大学数字人文研究中心」,同时组建北京大学数字人文开放实验室。2022 年 3 月起,该中心开始接受字节跳动的公益捐赠,从事古籍资源智能开发与利用研究 (北京大学数字人文研究中心, 2024)。为感谢字节跳动的公益支持,实验室更名为 「北京大学-字节跳动数字人文开放实验室」,隶属于北京大学人工智能研究院。实验室目前的研究方向包括自然语言处理、深度学习、本体与知识图谱、信息可视化、交互设计、用户信息行为研究等。另外,北京大学数字人文研究中心是一个跨学科的研究机构,来自北京大学各个院系的研究导师共同指导实验室的学生,比如历史学、计算机科学技术、中国语言文学、外国语言文学、地球与空间科学、哲学和信息管理。

数字人文教学

北京大学的数字人文中心主要致力于推进博士级别的研究和教育,通过跨学科的教授团队提供全面的博士生督导。该中心还举办面向更广泛受众的研讨会和培训项目,包括研究生、高年级本科生和年轻教师。

关键学者

王军教授,北京大学信息管理系教授。他还担任北京大学数字人文研究中心主任。

苏祺博士,计算语言学研究所副教授,兼任。她的主要研究兴趣涉及自然语言处理、计算语言学和语料库语言学等领域。

杨浩博士,哲学系助理教授,同时在儒学经典整理研究中心工作。

位通博士,计算机科学博士,信息管理系和北京大学数字人文研究中心双聘助理教授。

主要项目及链接

  1. 古典文本知识图谱生成平台的开发,2021年8月至2022年8月。
    • 「宋元学案知识图谱系统」对《宋元学案》的240万字进行了文本处理和分析。它从学案中提取了人物、时间、地点、作品等实体及其复杂的语义关系,构建了一个知识图谱。该系统提供了可视化、交互式浏览和语义查询等功能,使用户能够以更直观和结构化的方式探索文本中的关系和信息。
  2. 古代经典文本目录数据集成与分析系统,2021年8月至2022年8月。
    • 由北京大学数字人文研究中心和中国科学院自然科学史研究所共同开发,用于分析古代中国文献目录中相似书籍和书目之间的关系的可视化分析系统。
  3. 国家图书馆与北京大学数字人文研究中心共同承担的国家珍稀古籍目录知识库的建设,从2021年8月至2022年8月。
    • 该知识库系统结合了交互式可视化技术和语义关联技术,实现了对“国家珍稀古籍目录”中所收录古籍的多维查询和探索。它展示了不同类型的文学、文本、历史时期、版本和地理区域中珍贵古籍的分布情况。这旨在突出中国文化,为公众理解和研究中国经典提供指导和线索。
  4. 高清图像数据库编《永乐大典》(第一卷)高清图像数据库编制工作, 由中国国家图书馆出版社(国家图书馆出版社, 未提供日期)与北京大学数字人文研究中心共同承担, 从2021年8月至2022年8月。
    • 《永乐大典》高清图像数据库系统的总体目标是利用《永乐大典》的高清图像和相应文本作为核心材料。它还补充了有关《永乐大典》格式、编纂过程、历史背景和传播的信息。通过采用数字人文技术和多媒体传播的表达力,该系统旨在充分展示《永乐大典》的文化、文学和艺术价值。
  5. 中国儒家学术史知识图谱的构建, 由中国国家自然科学基金国际重大合作项目资助,涉及北京大学数字人文研究中心与哈佛大学费正清中国研究中心的合作,跨越2021年至2025年。
    • 该平台收集了200多部古代中国哲学经典著作的全文数据,提供了关于各自时代、作者等详细信息。利用深度学习算法,它自动分析词汇,识别句子结构,从而探索文学作品的时空维度。旨在阐明文字和句子的错综复杂中的思想和文化演变,为人文研究提供了有价值的帮助。
  6. 古籍数字化关键技术创新与应用研究, 由中宣部出版局资助,2020年。
    • 「中国传记数据库」(CBDB)WEB检索系统的第二版,2010年至2020年开发。中国传记数据库(CBDB)是一个免费的关系数据库。截至2020年5月,它包含了大约47万条从7世纪到19世纪的传记条目。除了作为传记信息的参考外,该数据库还旨在促进统计和空间分析。
  7. 古典文学大数据分析平台的开发,自2020年起持续进行。
    • 这个平台是「北京大学-字节跳动数字人文开放实验室」的产物,致力于智能开发和利用古籍资源,旨在开发一个基于古代文本智能处理的‘识经平台’。该平台将对公众免费开放,提供对数字化古籍资源的访问和利用。‘识经平台’旨在探索检索方法、异体字支持、文本质量、阅读辅助和浏览体验等各个方面。其目标是建立一个具有高质量文本、丰富功能和优秀阅读体验的古代文本阅读平台。

其他信息

数字人文暑期工作坊,每年都会以不同的主题举办。最近的工作坊包括:

国际数字人文联合暑期工作坊2023

北京大学、哈佛大学和普林斯顿大学联合建立了「数字人文暑期工作坊」。该项目向全球学生开放,采用面对面教学,轮流在北京大学、哈佛大学和普林斯顿大学的校园举办。首次联合工作坊于2023年8月初在北京大学举行,主题为「智能信息环境中的人文创新」,人文学者和人工智能专家共同授课。课程对具有独立研究能力的研究生、高年级本科生以及对该主题感兴趣的年轻教师开放。本次工作坊主要采用中国历史和古代思想史作为实验材料。

数字人文暑期工作坊2022

北京大学数字人文研究中心与北京大学人工智能研究所于2022年7月18日至7月30日举办了「数字人文暑期工作坊」。该课程吸引了来自文学、历史、哲学、艺术、考古、人工智能、计算语言学和软件工程等各个学科的学生。除了讲座外,工作坊还组织了跨学科研讨会和研究实践,培养既具备人文素养又具备信息技术能力的跨学科人才。

北京大学数字人文工作坊2020

「北京大学数字人文工作坊」是由北京大学数字人文研究中心提供的专业在线培训课程。该课程教授了数字人文领域常用的方法和工具,旨在培养应用计算方法解决人文研究问题的意识和能力。该系列工作坊共分为六个会话,由来自德国马克思·普朗克研究所、德国柏林国家图书馆、台湾大学和北京大学的六位领域专家讲授。

Image credit: Neo, 2019

Peking University Digital Humanities Research Center

Name

Research Center for Digital Humanities of PKU#english

Year of Foundation

2020

Short Description

In 2020, the university established the campus-level entity ‘Peking University Digital Humanities Research Center’ and simultaneously formed the Peking University Digital Humanities Open Laboratory. Starting from March 2022, it began accepting donations from ByteDance to engage in research on the intelligent information processing and utilisation of ancient book resources (Research Centre for Digital Humanities of PKU, 2024). Following this donation, the laboratory was renamed ‘Peking University-ByteDance Digital Humanities Open Laboratory’ and operates under the Institute of Artificial Intelligence at Peking University. Current research includes natural language processing, deep learning, ontology and knowledge graph, information visualization, interaction design, and user information behaviour research. In addition, the Research Center for Digital Humanities of PKU is an interdisciplinary research institution where research mentors from various departments of Peking University jointly guide students in the laboratory, such as History, Computer Science and Technology, Chinese Language and Literature, Foreign Languages and Literatures, Earth and Space Sciences, Philosophy, and Information Management.

Teaching

Peking University’s Digital Humanities Center primarily focuses on advancing research and education at the doctoral level, engaging PhD students with comprehensive supervision from a diverse pool of professors across various disciplines. While the center also hosts workshops and training programs open to a wider audience, including graduate and senior undergraduate students, and young teachers.

PHD students

Doctoral students are jointly supervised by professors from various disciplines including the Department of Information Management, Institute of Artificial Intelligence, Department of Philosophy, Department of Chinese Language and Literature, Department of History, School of Foreign Languages, and others.

Key Academics

Prof Jun Wang, Professor at the Department of Information Management at Peking University. He also serves as the director of the Peking University Digital Humanities Research Center. 

Dr Qi Sun, associate professor with a dual appointment at the Institute of Computational Linguistics. Her primary research interests lie in the fields of natural language processing, computational linguistics, and corpus linguistics.

Dr Hao Yang, assistant professor in the Department of Philosophy and serves at the Center for Compilation and Research of Confucian Classics.

Dr Hong Wei, assistant professor with a Ph.D. in Computer Science. He holds positions at both the Department of Information Management and the Peking University Digital Humanities Research Center.

Key Projects with links

  1. Development of a Knowledge Graph Generation Platform for Classical Texts , August 2021 – August 2022. 
    • The ‘Song-Yuan Study Case Knowledge Graph System’ has carried out text processing and analysis on the 2.4 million characters of the ‘Song-Yuan Study Case’. It extracted entities such as people, time, locations, works, and their complex semantic relationships from the study case to construct a knowledge graph. This system provides functionalities such as visualization, interactive browsing, and semantic queries, enabling users to explore the relationships and information within the text in a more intuitive and structured manner.
  2. Integration and Analysis System for Catalog Data of Classical Texts throughout the Dynastiesfrom August 2021 to August 2022. 
  3. Construction of a Knowledge Base for the Catalogue of National Rare Ancient Books
    • Jointly undertaken by the National Library and the Peking University Digital Humanities Research Center, from August 2021 to August 2022. 
    • This knowledge base system combines interactive visualization technology with semantic correlation techniques to enable multidimensional querying and exploration of the ancient texts included in the “National Catalogue of Precious Ancient Books.” It showcases the distribution of valuable ancient texts across different types of literature, scripts, historical periods, versions, and geographical regions. This aims to highlight Chinese culture, providing guidance and clues for the public to understand and study Chinese classics.
  4. Compilation of a High-Definition Image Database for the ‘Yongle Encyclopedia’ (Volume 1)
    • Jointly undertaken by the National Library of China Publishing House (NLP Press, nd) and the Peking University Digital Humanities Research Center, from August 2021 to August 2022. The overall goal of the ‘Yongle Encyclopedia’ High-Definition Image Database System is to utilise high-definition images and corresponding texts of the ‘Yongle Encyclopedia’ as its core materials. This is supplemented with information on the format, compilation process, historical context, and dissemination of the ‘Yongle Encyclopedia’ itself. By employing digital humanities techniques and leveraging the expressive power of multimedia dissemination, the system aims to fully present the cultural, literary, and artistic value of the ‘Yongle Encyclopedia’. 
  5. Construction of a Knowledge Graph of Chinese Confucian Academic History
    • Funded by the National Natural Science Foundation of China International Key Cooperation Project, involving a collaboration between the Peking University Digital Humanities Research Center and the Fairbank Center for Chinese Studies at Harvard University, spanning from 2021 to 2025. 
    • The platform gathers full-text data from over two hundred classic works of ancient Chinese philosophy, providing detailed information about their respective eras, authors, and more. Utilizing deep learning algorithms, it automatically segments and analyses vocabulary, identifies sentence structures, and thus explores the temporal and spatial dimensions of the literature. The aim is to elucidate the evolution of ideas and culture within the intricacies of words and sentences, serving as a valuable aid to humanities research.
  6. Innovation and Application Research on Key Technologies for Ancient Books Digitization  
    • Funded by the Publishing Bureau of the Central Propaganda Department, in 2020. Second Edition of the ‘China Biographical Database’ (CBDB) WEB Retrieval System, developed from 2010 to 2020.  The China Biographical Database (CBDB) is a freely accessible relational database. As of May 2020, it contained approximately 470,000 biographical entries spanning from the 7th to the 19th century. Apart from serving as a reference for biographical information, the database also aims to facilitate statistical and spatial analysis.
  7. Development of a Big Data Analysis Platform for Classical Literature , ongoing since 2020.
    • An output of the ‘Peking University-ByteDance Digital Humanities Open Laboratory’ this platform is dedicated to the intelligent development and utilization of ancient book resources, aimed to develop a ‘Recognizing Classics’ reading platform based on intelligent processing of ancient texts. This platform will be freely accessible to the public, providing access to and utilization of digitized ancient book resources. The ‘Recognizing Classics’ platform aims to explore various aspects such as retrieval methods, variant character support, text quality, reading assistance, and browsing experience. The goal is to establish a platform for reading ancient texts that is characterized by high-quality text, rich functionality, and excellent reading experience.

Other info

Digital Humanities Summer Workshops, taking place every year on different topic. The most recent ones include:

International Joint Summer Workshop on Digital Humanities 2023

Peking University, Harvard University, and Princeton University have jointly established the Digital Humanities Summer Workshop’. This program, open to students worldwide, is based on face-to-face teaching that rotates between the campuses of Peking University, Harvard University, and Princeton University. The first joint workshop was held at Peking University in early August 2023, with the theme ‘Humanistic Innovation in the Intelligent Information Environment’, with both humanists and artificial intelligence experts co-teaching sessions. The course was open to graduate students and senior undergraduate students with independent research capabilities, as well as young teachers interested in the subject. This session mainly used materials from Chinese history and ancient thought history as experimental materials. 

Digital Humanities Summer Workshop 2022

The Peking University Digital Humanities Center and the Institute of Artificial Intelligence at Peking University held a ‘Digital Humanities Summer Workshop’ from July 18th to July 30th, 2022. The course included students from various disciplines, including literature, history, philosophy, art, archaeology, artificial intelligence, computational linguistics, and software engineering. In addition to lectures, the workshop organized interdisciplinary seminars and research practices to cultivate interdisciplinary talents who possess both humanities literacy and information technology skills. 

Peking University Digital Humanities Workshop 2020

The ‘Peking University Digital Humanities Workshop’ is a specialized online training course offered by the Peking University Digital Humanities Research Center. The course taught commonly used methods and tools in digital humanities, aiming to cultivate awareness and capabilities in applying computational methods to solve problems in humanities research. The workshop series consisted of six sessions, with lectures delivered by six domain experts from the Max Planck Institute in Germany, the National Library of Berlin in Germany, National Taiwan University, and Peking University.