当前位置: X-MOL 学术Aslib Journal of Information Management › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hierarchical topic analysis tool to facilitate digital humanities research
Aslib Journal of Information Management ( IF 2.4 ) Pub Date : 2022-04-29 , DOI: 10.1108/ajim-11-2021-0325
Chih-Ming Chen, Szu-Yu Ho, Chung Chang

Purpose

This study aims to develop a hierarchical topic analysis tool (HTAT) based on hierarchical Latent Dirichelet allocation (hLDA) to support digital humanities research that is associated with the need of topic exploration on the Digital Humanities Platform for Mr. Lo Chia-Lun’s Writings (DHP-LCLW). HTAT can assist humanities scholars on distant reading with analysis of hierarchical text topics, through classifying time-stamped texts into multiple historical eras, conducting hierarchical topic modeling (HTM) according to the texts from different eras and presenting through visualization. The comparative network diagram is another function provided to assist humanities scholars in comparing the difference in the topics they wish to explore and to track how the concept of a topic changes over time from a particular perspective. In addition, HTAT can also provide humanities scholars with the feature to view source texts, thus having high potential to be applied in promoting the effectiveness of topic exploration due to simultaneously integrating both the topic exploration functions of distant reading and close reading.

Design/methodology/approach

This study adopts a counterbalanced experimental design to examine whether there is significant differences in the effectiveness of topic inquiry, the number of relevant topics inquired and the time spent on them when research participants were alternately conducting text exploration using DHP-LCLW with HTAT or DHP-LCLW with Single-layer Topic Analysis Tool (SLTAT). A technology acceptance questionnaire and semi-structured interviews were also conducted to understand the research participants' perception and feelings toward using the two different tools to assist topic inquiry.

Findings

The experimental results show that DHP-LCLW with HTAT could better assist the research participants, in comparison with DHP-LCLW with SLTAT, to grasp the topic context of the texts from two particular perspectives assigned by this study within a short period. In addition, the results of the interviews revealed that DHP-LCLW with HTAT, in comparison with SLTAT, was able to provide a topic terms that better met research participnats' expectations and needs, and effectively guided them to the corresponding texts for close reading. In the analysis of technology acceptance and interview data, it can be found that the research participants have a high and positive tendency toward using DHP-LCLW with HTAT to assist topic inquiry.

Research limitations/implications

The Jieba Chinese word segmentation system was used in the Mr. Lo Chia-Lun’s Writings Database in this study, to perform word segmentation on Mr. Lo Chia-Lun’s writing texts for topic modeling based on hLDA. Since Jieba word segmentation system is a lexicon based word segmentation system, it cannot identify new words that have still not been collected in the lexicon well. In this case, the correctness of word segmentation on the target texts will affect the results of hLDA topic modeling, and the effectiveness of HTAT in assisting humanities scholars for topic inquiry.

Practical implications

An HTAT was developed to support digital humanities research in this study. With HTAT, DHP-LCLW provides hmanities scholars with topic clues from different hierarchical perspectives for textual exploration, and with temporal and comparative network diagrams to assist humanities scholars in tracking the evolution of the topics of specific perspectives over time, to gain a more comprehensive understanding of the overall context of the texts.

Originality/value

In recent years, topic analysis technology that can automatically extract key topic information from a large amount of texts has been developed rapidly, but the topics generated from traditional topic analysis models like LDA (Latent Dirichelet allocation) make it difficult for users to understand the differences in the topics of texts with different hierarchical levels. Thus, this study proposes HTAT which uses hLDA to build a hierarchical topic tree with a tree-like structure without the need to define the number of topics in advance, enabling humanities scholars to quickly grasp the concept of textual topics and use different hierarchical perspectives for further textual exploration. At the same time, it also provides a combination function of temporal division and comparative network diagram to assist humanities scholars in exploring topics and their changes in different eras, which helps them discover more useful research clues or findings.



中文翻译:

促进数字人文研究的分层主题分析工具

目的

本研究旨在开发一种基于分层潜在狄利克雷分配(hLDA)的分层主题分析工具(HTAT),以支持数字人文研究,这些研究与数字人文平台上卢嘉伦先生著作的主题探索需求相关( DHP-LCLW)。HTAT通过将带有时间戳的文本分类到多个历史时代,根据不同时代的文本进行层次主题建模(HTM),并通过可视化呈现,可以帮助人文学者进行远距离阅读的文本主题分析。比较网络图是另一个功能,旨在帮助人文学者比较他们希望探索的主题之间的差异,并从特定的角度跟踪主题的概念如何随时间变化。此外,

设计/方法/途径

本研究采用平衡实验设计来检验当研究参与者交替使用 DHP-LCLW 与 HTAT 或 DHP-带有单层主题分析工具 (SLTAT) 的 LCLW。还进行了技术接受调查问卷和半结构化访谈,以了解研究参与者对使用两种不同工具辅助主题查询的看法和感受。

发现

实验结果表明,与带有 SLTAT 的 DHP-LCLW 相比,带有 HTAT 的 DHP-LCLW 可以在短时间内更好地帮助研究参与者从本研究指定的两个特定角度把握文本的主题上下文。此外,访谈结果显示,与 SLTAT 相比,DHP-LCLW with HTAT 能够提供更符合研究参与者期望和需求的主题术语,并有效地引导他们阅读相应的文本进行细读。在技​​术接受和访谈数据的分析中,可以发现研究参与者对使用 DHP-LCLW 和 HTAT 辅助主题查询有很高的积极倾向。

研究局限性/影响

本研究在罗嘉伦先生的文字数据库中使用解霸中文分词系统,对罗嘉伦先生的文字文本进行分词,进行基于hLDA的主题建模。由于jieba分词系统是一个基于词库的分词系统,它不能很好地识别词库中还没有收录的新词。在这种情况下,目标文本分词的正确性将影响hLDA主题建模的结果,以及HTAT辅助人文学者进行主题查询的有效性。

实际影响

开发了 HTAT 以支持本研究中的数字人文研究。借助 HTAT,DHP-LCLW 为人文学者提供了不同层次视角的主题线索以供文本探索,并以时间和比较网络图帮助人文学者跟踪特定视角的主题随时间的演变,以获得更全面的理解文本的整体上下文。

原创性/价值

近年来,能够从大量文本中自动提取关键主题信息的主题分析技术得到了快速发展,但是传统的主题分析模型如LDA(Latent Dirichelet allocation)生成的主题使得用户难以理解差异在具有不同层次级别的文本主题中。因此,本研究提出了 HTAT,它使用 hLDA 构建具有树状结构的层次主题树,而无需事先定义主题数量,使人文学者能够快速掌握文本主题的概念,并使用不同的层次视角来研究进一步的文本探索。同时,

更新日期:2022-04-29
down
wechat
bug