Meet Penelope Nguyen

[中文版]

Penelope Gia Bao Huu Nguyen is a UKRI Doctoral Fellow on the Marie Skłodowska-Curie Doctoral Network, CASCADE. She is working on the automation of concepts through time via the automatic construction of an English-language historical thesaurus. Penelope holds a bachelor’s degree in English Studies from Can Tho University in Vietnam, where she first developed her passion for linguistics, particularly in the field of pragmatics. As a Fulbright scholar, she completed her master’s in Linguistics at Purdue University (USA), focusing on impoliteness and emoji usage on Vietnamese Facebook pages. Her research integrates computational and corpus linguistics to address questions that advance linguistic theory and interdisciplinary fields.


1.How do you define Digital Humanities?

To me, Digital Humanities (DH) is using digital tools to study humanities. This encompasses a wide array of subjects, ranging from history and cultural heritage to language and literature. Digital tools can be used for all stages of research: data collection, data management, data analysis, result dissemination, etc. In other words, DH enables scholars to revisit familiar questions through innovative, data-driven methodologies.

2. How did you become interested in DH? 

I first heard the term “Digital Humanities” when pursuing an MA in Linguistics at Purdue, where a graduate certificate in DH was advertised to graduate students. I took an introductory DH course called Computational text analysis there, and right away wished I had known about this area of research sooner as it opens up so many new and exciting interdisciplinary avenues for high-impact projects. I was amazed to see how literary scholars use computational tools for “distant reading” to study stylometry and authorship research. In the age of AI, DH provides researchers in various fields with opportunities and tools to communicate with each other and collaborate for ground-breaking works. Nowadays, it’s no longer uncommon to see a computer scientist and an archaeologist – or even a 3D-printing expert – working together in a DH project. Another interesting aspect of DH projects is that they’re usually accessible to the public, perhaps in the form of a web page or a software. In this way, knowledge isn’t gatekept in paywalled academic journals only.

3. Tell us about your dissertation

Discursive concepts are historically significant concepts that cannot be captured by a single lexical item (e.g., ethics or taxation) or collocational structures (e.g., climate change or generation gap), but can be pinned down using a method called concept modelling. First, a concept model of quads, or four words that are found to be strongly associated based on PMI (Pointwise Mutual Information) scores across a large span of text, is produced by a processor. Then, a series of pragmatic routines and encyclopaedic enrichment is applied when close-reading a number of texts containing those quads to arrive at the discursive concept. An example is the quads like day – hour – minute –  moon; day – eclipse – minute – moon; etc. extracted from EEBO-TCP, which can refer to the practice of observing celestial bodies for practical purposes. However, this labour-intensive approach is still time-consuming, prone to biases, and computationally expensive, not to mention a lack of temporal factor (i.e., tracking how concepts change over time). I’m working to devise a new method for the automatic generation of a catalogue of zeitgeists, or historically significant concepts, from a collection of texts, across time, considering time, computational cost, human effort, and robustness. The catalogue, which is built using a bottom-up approach, can be used as a reference for historians, historical and/ or computational linguists, NLP researchers, etc.  

4. And a DH project you like?

I might be biased because both of my supervisors were actively involved in this project, but I was very much drawn to Linguistic DNA (https://www.linguisticdna.org/) when researching for my PhD application. The idea is brilliant: we all want to effectively capture context in texts, and concept modelling offers a way to do so. I was more used to traditional corpus linguistic measures and techniques, so learning a completely new approach and workflow is eye-opening. You can try it yourself, by interacting with the Demonstrator and reading the accompanying blog posts. I hope you’ll be as excited as I was when I first played with it!

介绍Penelope Nguyen

Penelope Gia Bao Huu Nguyen是英国研究与创新署(UKRI)Marie Skłodowska-Curie 博士网络“CASCADE”项目的博士研究员。她目前的研究聚焦于概念在历史中的自动演变,具体项目是通过自动构建一部英文历史同义词词典。

Penelope 拥有越南芹苴大学(Can Tho University)的英语研究学士学位,正是在那里,她首次培养了对语言学的热情,尤其是在语用学领域。作为一名富布赖特学者,她在美国普渡大学(Purdue University)完成了语言学硕士学位,研究重点是越南Facebook页面上的不礼貌现象及表情符号的使用。

她的研究融合了计算语言学和语料库语言学,旨在通过探索语言学理论和跨学科问题,推动相关领域的发展。

1. 您如何定义数字人文?

在我看来,数字人文(DH)是指利用数字工具来研究人文学科。这涵盖了广泛的主题,从历史与文化遗产到语言与文学。数字工具可以用于研究的各个阶段:数据采集、数据管理、数据分析、研究成果的传播等等。换句话说,数字人文使学者能够通过创新的、数据驱动的方法重新审视熟悉的问题。

2. 您是如何对数字人文产生兴趣的?

我第一次听说“数字人文”这个术语是在普渡大学攻读语言学硕士学位时,当时学校向研究生推荐一个数字人文的研究生证书项目。我选修了一门名为“计算文本分析”的入门课程,立刻就希望自己能更早了解这个研究领域,因为它为高影响力的跨学科项目打开了许多新的、令人兴奋的可能性。我非常惊讶地看到,文学研究者如何利用计算工具进行“远读”,来研究文体学和作者归属问题。在人工智能时代,数字人文为各个领域的研究者提供了交流与协作的机会和工具,以实现突破性的研究成果。如今,我们已不再奇怪看到计算机科学家与考古学家——甚至3D打印专家——在一个数字人文项目中合作。另一个有趣的方面是,数字人文项目通常是对公众开放的,可能以网页或软件的形式呈现。通过这种方式,知识不再仅仅被封锁在付费墙后的学术期刊中。

3. 请告诉我们一个您的数字人文项目?

话语概念(discursive concepts)是一些具有历史意义的概念,不能通过单一词汇(如“伦理”或“税收”)或固定搭配结构(如“气候变化”或“代沟”)来完整表达,但可以通过一种名为“概念建模”(concept modelling)的方法来识别。首先,使用处理器生成一个四词组合(quad)的概念模型,这些词在大规模语料中通过PMI(点互信息)计算得出强关联。然后,研究者通过细读包含这些词组的文本,结合语用操作和百科式的信息补充,从而提炼出具体的话语概念。例如,从EEBO-TCP语料中提取的“day – hour – minute – moon”、“day – eclipse – minute – moon”等四词组合,可以指代“为了实用目的观察天体的做法”。然而,这种方法非常耗费人力与计算资源,且仍容易产生偏差,也缺乏时间因素的考量(即无法追踪概念如何随时间演变)。我正在致力于开发一种新方法,能够自动地从一系列文本中、跨越时间地提取“时代精神目录”(catalogue of zeitgeists),即历史上具有意义的概念。这个目录采用自下而上的构建方式,在考虑时间、计算成本、人力投入和稳健性的前提下,最终可供历史学家、历史语言学家、计算语言学家、自然语言处理研究者等使用。

4. 您特别喜欢的一个数字人文项目?

我在准备博士申请时被“语言DNA”(Linguistic DNA, https://www.linguisticdna.org/)这个项目所吸引。也许我偏爱这个项目因为我的两位导师都曾积极参与其中。项目希望能够有效捕捉文本中的语境,而概念建模提供了一种实现方式。此前我更熟悉传统的语料库语言学方法和技术,因此学习这种全新的研究方法和工作流程对我来说非常有启发。你也可以亲自体验一下,通过与“演示器”(Demonstrator)互动并阅读相关的博客文章。我希望你会像我第一次接触它时一样感到兴奋!

Leave a Reply