Meet Lei Lei

[中文版]

Lei is a final-year PhD candidate in the SOAS History department. Her research focuses on the transformations in economic thought and practices in late Qing China, particularly the role of the intricate interconnections among the Qing central government, literati-officialdom, and comprador-merchants in the shift in economic thought, policies and the ownership of modern enterprises amid the global rise of neomercantilist ideas. Throughout her doctoral studies, she has presented portions of her work at several national and international symposiums and contributed to conference compilations. She is a member of the EACS, the EHS and the Association for the History of Chinese Economic Thought. Prior to her PhD, Lei completed her Master’s degree in Economic History at the London School of Economics and her Bachelor’s degree at the University of Liverpool. Lei aims to pursue a career in academia, combining digital humanities research methods with an economic perspective on ancient Chinese books.

1.How do you define Digital Humanities?

Digital Humanities (DH) is an innovative interdisciplinary approach that integrates traditional humanistic inquiry with modern digital and computational technologies. It enables scholars to explore questions that were previously unapproachable due to the scale, complexity, or inaccessibility of data. 

2. How did you become interested in DH? 

My interest in digital humanities methods was sparked by my academic supervisor and deepened after a 2024 academic symposium on Chinese economic thought where I presented a chapter focusing on the Huangchao jingshi wenbian collections, applying digital text mining and critical source analysis to quantify the use of key works and evaluate their significance. During the symposium, a chair recommended I apply the Term Frequency-Inverse Document Frequency (TF-IDF) technique. 

3. Tell us about your dissertation

My research focuses on the transformations in economic thought and practices in late Qing China, particularly the role of the intricate interconnections among the Qing central government, literati-officialdom, and comprador-merchants in the shift in economic thought, policies and modern enterprises amid the global rise of neomercantilist ideas. 

4. And a DH project you like?

One digital humanities project that impressed me was a digital scholarship initiative by the British Library. The number of early printed books involved is enormous—around 23,000 volumes published between 1908 and 2007. Moreover, PhD candidates who are interested in collaborating can propose their own projects and consult with the library’s teams. The scale of the collection and the flexibility in project design made me realize how efficient and innovative digital humanities can be.

介绍Lei Lei

Lei 是伦敦大学亚非学院(SOAS)历史系的一名博士毕业年级学生。她的研究聚焦于清朝晚期经济思想和实践的转变,特别关注清朝中央政府、士大夫官僚阶层与买办商人之间错综复杂的联系,在经济思想、政策及现代企业所有权转变过程中所起的作用,这一切都发生在新重商主义思想全球兴起的背景下。在攻读博士学位期间,她曾在多个国内外学术研讨会上展示过部分研究成果,并为会议论文集作出贡献。她是欧洲汉学学会(EACS)、经济史学会(EHS)以及中国经济思想史学会的成员。

在攻读博士学位之前,雷曾在伦敦政治经济学院 (London School of Economics) 获得经济史硕士学位,并在利物浦大学(the University of Liverpool)完成了本科学业。她计划在学术界发展职业生涯,将数字人文学研究方法与中国古籍的经济视角相结合。

1. 你如何定义数字人文?

数字人文(Digital Humanities, DH)是一种创新的跨学科研究方法,将传统的人文学术研究与现代数字和计算技术相结合。它使学者能够探索过去因数据规模、复杂性或难以获取而难以研究的问题。

2. 你是如何对数字人文产生兴趣的?

我对数字人文方法的兴趣最初来自我的学术导师,在2024年一场关于中国经济思想的学术研讨会上得到了进一步深化。在会上,我展示了关于《皇朝经世文编》文献的研究章节,采用数字文本挖掘与批判性文献分析的方法,量化关键著作的使用频率,并评估其思想意义。会议期间,一位主持人建议我尝试使用“词频-逆文档频率”(TF-IDF)技术来进一步分析文本。

3. 请介绍你的论文课题。

我的研究关注晚清中国经济思想与实践的转型,尤其是清中央政府、士大夫官僚群体与买办商人之间复杂互动关系在经济思想、政策和现代企业发展中的作用。这一转变也与全球重商主义思潮的兴起密切相关。

4. 有什么让你印象深刻的数字人文项目?

让我印象深刻的一个数字人文项目是由英国图书馆发起的数字学术研究计划。该项目涉及的早期印刷书籍数量庞大,约有23,000册,出版时间从1908年到2007年不等。同时,任何有意参与的博士研究者都可以提出自己的项目,并与图书馆的研究团队进行合作。这一项目在资源规模和研究设计上的灵活性让我深刻认识到数字人文在学术研究中的高效性与创新性。

Meet Penelope Nguyen

[中文版]

Penelope Gia Bao Huu Nguyen is a UKRI Doctoral Fellow on the Marie Skłodowska-Curie Doctoral Network, CASCADE. She is working on the automation of concepts through time via the automatic construction of an English-language historical thesaurus. Penelope holds a bachelor’s degree in English Studies from Can Tho University in Vietnam, where she first developed her passion for linguistics, particularly in the field of pragmatics. As a Fulbright scholar, she completed her master’s in Linguistics at Purdue University (USA), focusing on impoliteness and emoji usage on Vietnamese Facebook pages. Her research integrates computational and corpus linguistics to address questions that advance linguistic theory and interdisciplinary fields.


1.How do you define Digital Humanities?

To me, Digital Humanities (DH) is using digital tools to study humanities. This encompasses a wide array of subjects, ranging from history and cultural heritage to language and literature. Digital tools can be used for all stages of research: data collection, data management, data analysis, result dissemination, etc. In other words, DH enables scholars to revisit familiar questions through innovative, data-driven methodologies.

2. How did you become interested in DH? 

I first heard the term “Digital Humanities” when pursuing an MA in Linguistics at Purdue, where a graduate certificate in DH was advertised to graduate students. I took an introductory DH course called Computational text analysis there, and right away wished I had known about this area of research sooner as it opens up so many new and exciting interdisciplinary avenues for high-impact projects. I was amazed to see how literary scholars use computational tools for “distant reading” to study stylometry and authorship research. In the age of AI, DH provides researchers in various fields with opportunities and tools to communicate with each other and collaborate for ground-breaking works. Nowadays, it’s no longer uncommon to see a computer scientist and an archaeologist – or even a 3D-printing expert – working together in a DH project. Another interesting aspect of DH projects is that they’re usually accessible to the public, perhaps in the form of a web page or a software. In this way, knowledge isn’t gatekept in paywalled academic journals only.

3. Tell us about your dissertation

Discursive concepts are historically significant concepts that cannot be captured by a single lexical item (e.g., ethics or taxation) or collocational structures (e.g., climate change or generation gap), but can be pinned down using a method called concept modelling. First, a concept model of quads, or four words that are found to be strongly associated based on PMI (Pointwise Mutual Information) scores across a large span of text, is produced by a processor. Then, a series of pragmatic routines and encyclopaedic enrichment is applied when close-reading a number of texts containing those quads to arrive at the discursive concept. An example is the quads like day – hour – minute –  moon; day – eclipse – minute – moon; etc. extracted from EEBO-TCP, which can refer to the practice of observing celestial bodies for practical purposes. However, this labour-intensive approach is still time-consuming, prone to biases, and computationally expensive, not to mention a lack of temporal factor (i.e., tracking how concepts change over time). I’m working to devise a new method for the automatic generation of a catalogue of zeitgeists, or historically significant concepts, from a collection of texts, across time, considering time, computational cost, human effort, and robustness. The catalogue, which is built using a bottom-up approach, can be used as a reference for historians, historical and/ or computational linguists, NLP researchers, etc.  

4. And a DH project you like?

I might be biased because both of my supervisors were actively involved in this project, but I was very much drawn to Linguistic DNA (https://www.linguisticdna.org/) when researching for my PhD application. The idea is brilliant: we all want to effectively capture context in texts, and concept modelling offers a way to do so. I was more used to traditional corpus linguistic measures and techniques, so learning a completely new approach and workflow is eye-opening. You can try it yourself, by interacting with the Demonstrator and reading the accompanying blog posts. I hope you’ll be as excited as I was when I first played with it!

介绍Penelope Nguyen

Penelope Gia Bao Huu Nguyen是英国研究与创新署(UKRI)Marie Skłodowska-Curie 博士网络“CASCADE”项目的博士研究员。她目前的研究聚焦于概念在历史中的自动演变,具体项目是通过自动构建一部英文历史同义词词典。

Penelope 拥有越南芹苴大学(Can Tho University)的英语研究学士学位,正是在那里,她首次培养了对语言学的热情,尤其是在语用学领域。作为一名富布赖特学者,她在美国普渡大学(Purdue University)完成了语言学硕士学位,研究重点是越南Facebook页面上的不礼貌现象及表情符号的使用。

她的研究融合了计算语言学和语料库语言学,旨在通过探索语言学理论和跨学科问题,推动相关领域的发展。

1. 您如何定义数字人文?

在我看来,数字人文(DH)是指利用数字工具来研究人文学科。这涵盖了广泛的主题,从历史与文化遗产到语言与文学。数字工具可以用于研究的各个阶段:数据采集、数据管理、数据分析、研究成果的传播等等。换句话说,数字人文使学者能够通过创新的、数据驱动的方法重新审视熟悉的问题。

2. 您是如何对数字人文产生兴趣的?

我第一次听说“数字人文”这个术语是在普渡大学攻读语言学硕士学位时,当时学校向研究生推荐一个数字人文的研究生证书项目。我选修了一门名为“计算文本分析”的入门课程,立刻就希望自己能更早了解这个研究领域,因为它为高影响力的跨学科项目打开了许多新的、令人兴奋的可能性。我非常惊讶地看到,文学研究者如何利用计算工具进行“远读”,来研究文体学和作者归属问题。在人工智能时代,数字人文为各个领域的研究者提供了交流与协作的机会和工具,以实现突破性的研究成果。如今,我们已不再奇怪看到计算机科学家与考古学家——甚至3D打印专家——在一个数字人文项目中合作。另一个有趣的方面是,数字人文项目通常是对公众开放的,可能以网页或软件的形式呈现。通过这种方式,知识不再仅仅被封锁在付费墙后的学术期刊中。

3. 请告诉我们一个您的数字人文项目?

话语概念(discursive concepts)是一些具有历史意义的概念,不能通过单一词汇(如“伦理”或“税收”)或固定搭配结构(如“气候变化”或“代沟”)来完整表达,但可以通过一种名为“概念建模”(concept modelling)的方法来识别。首先,使用处理器生成一个四词组合(quad)的概念模型,这些词在大规模语料中通过PMI(点互信息)计算得出强关联。然后,研究者通过细读包含这些词组的文本,结合语用操作和百科式的信息补充,从而提炼出具体的话语概念。例如,从EEBO-TCP语料中提取的“day – hour – minute – moon”、“day – eclipse – minute – moon”等四词组合,可以指代“为了实用目的观察天体的做法”。然而,这种方法非常耗费人力与计算资源,且仍容易产生偏差,也缺乏时间因素的考量(即无法追踪概念如何随时间演变)。我正在致力于开发一种新方法,能够自动地从一系列文本中、跨越时间地提取“时代精神目录”(catalogue of zeitgeists),即历史上具有意义的概念。这个目录采用自下而上的构建方式,在考虑时间、计算成本、人力投入和稳健性的前提下,最终可供历史学家、历史语言学家、计算语言学家、自然语言处理研究者等使用。

4. 您特别喜欢的一个数字人文项目?

我在准备博士申请时被“语言DNA”(Linguistic DNA, https://www.linguisticdna.org/)这个项目所吸引。也许我偏爱这个项目因为我的两位导师都曾积极参与其中。项目希望能够有效捕捉文本中的语境,而概念建模提供了一种实现方式。此前我更熟悉传统的语料库语言学方法和技术,因此学习这种全新的研究方法和工作流程对我来说非常有启发。你也可以亲自体验一下,通过与“演示器”(Demonstrator)互动并阅读相关的博客文章。我希望你会像我第一次接触它时一样感到兴奋!

Meet Błażej Mikuła

[中文版]

Błażej Mikuła is a team member of the Cultural Heritage Imaging Laboratory (CHIL) at Cambridge University Library. He is involved in digitising manuscripts and creating short films for various projects.

Previously, Błażej worked as a photojournalist, capturing significant historical events such as the war in Afghanistan and the famine in South Sudan. Now, he is rediscovering the hidden past in books using modern technology like multispectral photography. He has contributed to several projects, including , NewtonDarwinGenizah Project, and more. Currently, he is working on the Wong Avery project, where he creates 3D scans of Chinese oracle bones, allowing for the digital reconstruction of previously broken bones.

Spending long hours in the library had an unexpected side effect—he caught an interest in book collecting. Today, he owns a vast collection of “Klubówka” — pirated editions of science fiction and fantasy books printed in Poland during the collapse of communism, between 1980 and 1990. These were the first Polish editions of iconic works such as Conan, Star Wars, and I, Robot by Isaac Asimov. Often poorly translated, crudely printed, and unquestionably illicit, these books are relics of a time when literature slipped through the cracks of censorship — smuggled in ink and paper.

1. How do you define Digital Humanities?

This is where curiosity meets computation—using technology to deepen our understanding of culture, history, language, and society. Whether it’s exploring why 11th-century scribes in the Middle East chose specific inks or recovering erased text from an ancient palimpsest, Digital Humanities (DH) offers innovative ways to uncover answers. It brings together scholars, coders, librarians, photographers, and others to collaborate across disciplines. Tools like multispectral photography, reflectance transformation imaging, and 3D scanning reveal insights that would otherwise remain hidden. By bridging the past and the present through digital tools, DH not only transforms how we study the humanities—it redefines what’s possible when we ask new questions in new ways.

2. How did you become interested in DH?

It wasn’t love at first sight—mainly because I simply didn’t know what it was. At first, I was just a photographer, and my task was to make images. I joined the Parker on the Web project at Corpus Christi College, Cambridge, in the same year Apple introduced the first iPhone, Netflix launched its streaming service, Amy Winehouse’s Rehab played on the radio, and the Doomsday Clock moved from 7 to 5 minutes to midnight.

It took some time before I realized I could be more involved in Digital Humanities. When new technology came along, I jumped right on it. Today, I’m working with MSI, RTI, and 3D scanning, helping researchers uncover the mysteries of some of the most beautiful items in the library.

I consider myself lucky to have witnessed the growth of DH—from a time when barely anyone applied for the job. It was an amazing time—though today, we’re just 89 seconds to midnight.

3. Tell us about one of your DH projects

It’s dragons and magic. In 1899, when Wang Yirong—a Chinese scholar and official—visited a medicine shop, he noticed inscriptions on what were being sold as “dragon bones.” This marked the beginning of oraculology and the study of the earliest known form of Chinese writing.

Today, the Wong Avery Project—a collaboration between the University of Cambridge and UC San Diego—is cataloguing and digitising the Chinese Collection. We’ve already photographed all of the oracle bones and created several 3D scans with cross-polarised textures. There’s still plenty to do, and—as is typical in Digital Humanities—I’m waiting for someone to ask a new question—preferably one that would allow me to create a CT scan of CUL.52 

4. And a DH project you like?

Small Performances is an interdisciplinary project exploring the history and future of printing through the unique collection of typographic punches made by John Baskerville (1707–1775), now housed at Cambridge University Library. Supported by the CHERISH Hub, it brings together historians, scientists, and craftspeople to reconstruct 18th-century punch-cutting using a combination of pioneering scientific and artisanal methods. These efforts benefit both modern industry and education. While my role is limited due to commitments with the Wong Avery Project, I’m contributing 3D models and a short film documenting stone letter carving.

介绍Błażej Mikuła

Błażej Mikuła是剑桥大学(Cambridge University)图书馆文化遗产影像实验(Cultural Heritage Imaging Laboratory)的团队成员。他参与手稿的数字化工作,并为多个项目制作短片。

此前,Błażej 曾是一名新闻摄影记者,记录了包括阿富汗战争和南苏丹饥荒在内的重要历史事件。如今,他借助多光谱摄影等现代技术,在书籍中重新发掘被遗忘的历史。

他曾参与多个项目,包括牛顿 (Newton)、达尔文(Darwin)和开封吉尼扎(Genizah Project)等。目前,他正参与“黄艾芙丽项目”(Wong Avery project),通过三维扫描技术对中国甲骨文进行数字重建,使原本破损的骨片得以“复原”。

长时间待在图书馆意外激发了他对藏书的兴趣。如今,他拥有大量被称为“Klubówka”的书籍收藏——这些是波兰在20世纪80至90年代共产主义崩溃期间印制的科幻与奇幻类盗版书籍。这些书包括《科南》,《星球大战》,以及艾萨克·阿西莫夫的《我,机器人》等著名作品的波兰首译版。这些书往往翻译粗糙、印刷简陋,显然属于非法出版物,却是那个时代的历史遗物——当文学通过油墨与纸张,在审查制度的缝隙中流通传播。

1. 您如何定义数字人文?

数字人文是好奇心与计算技术的相遇——运用科技加深我们对文化、历史、语言与社会的理解。无论是探究11世纪中东文士为何选择特定墨水,还是从古老的重写羊皮纸中恢复被抹去的文字,数字人文(DH)都为我们提供了创新的方法去寻找答案。它汇集了学者、程序员、图书馆员、摄影师等各类专业人士,共同开展跨学科合作。多光谱摄影、反射变换成像(RTI)、三维扫描等工具揭示了原本隐藏的细节。通过数字工具连接过去与现在,数字人文不仅改变了我们研究人文学科的方式,更重新定义了当我们用新方式提出新问题时,研究能达到的可能性。

2. 您是如何对数字人文产生兴趣的?

这并不是一见钟情——主要是因为我一开始根本不知道它是什么。起初,我只是一个摄影师,我的任务就是拍照片。我是在苹果发布第一代 iPhone、Netflix 启动流媒体服务、艾米·怀恩豪斯的《Rehab》在广播中播放、末日时钟从午夜前7分钟拨到5分钟的那一年,加入了剑桥科珀斯克里斯蒂学院的“帕克网页项目”(Parker on the Web)。

过了一段时间,我才意识到自己可以更深入地参与数字人文。当新技术出现时,我毫不犹豫地投入其中。如今,我使用多光谱成像(MSI)、反射变换成像(RTI)和三维扫描,帮助研究人员揭示图书馆中最精美藏品背后的奥秘。

我觉得自己很幸运,见证了数字人文的成长——从几乎没人申请相关职位的时期走到今天。那是令人振奋的时光——尽管现在,我们距离“午夜”只剩下89秒。

3. 请告诉我们一个您的数字人文项目?

这个项目与龙和魔法有关。1899年,中国学者兼官员王懿荣在药铺中发现被称为“龙骨”的药材上刻有文字。这一发现开启了甲骨学的研究,并引发了对中国已知最早文字形式的深入探讨。

如今,“黄艾芙丽项目”(Wong Avery Project)是剑桥大学与加州大学圣地亚哥分校合作开展的项目,致力于整理和数字化中国馆藏。我们已经完成了全部甲骨的拍摄,并制作了多个带有交叉偏振纹理的三维扫描模型。目前还有大量工作待完成,而正如数字人文领域的一贯特点——我正在等待有人提出一个新的研究问题,最好是那种能让我为  CUL.52 进行CT扫描的问题!

4. 您特别喜欢的一个数字人文项目?

“微型演出”(Small Performances)是一个跨学科项目,通过剑桥大学图书馆收藏的约翰·巴斯克维尔(John Baskerville, 1707–1775)字体冲头,探索印刷术的历史与未来。在 CHERISH Hub 的支持下,该项目联合历史学家、科学家与工艺师,利用先进的科学与传统手工相结合的方法,重建18世纪字体雕刻工艺。这些努力不仅对现代工业有益,也推动了教育发展。

尽管我因参与黄艾芙丽项目而无法全程投入该项目,但我仍参与了三维建模,并制作了一部关于石刻字母雕刻过程的短片。