当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Computational text analysis within the Humanities: How to combine working practices from the contributing fields?
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2019-06-26 , DOI: 10.1007/s10579-019-09459-3
Jonas Kuhn

This position paper is based on a keynote presentation at the COLING 2016 Workshop on Language Technology for Digital Humanities in Osaka, Japan. It departs from observations about working practices in Humanities disciplines following a hermeneutic tradition of text interpretation versus the method-oriented research strategies in Computational Linguistics (CL). The respective praxeological traditions are quite different. Yet more and more researchers are willing to open up towards truly transdisciplinary collaborations, trying to exploit advanced methods from CL within research that ultimately addresses questions from the traditional Humanities disciplines and the Social Sciences. The article identifies two central workflow-related issues for this type of collaborative project in the Digital Humanities (DH) and Computational Social Science: (1) a scheduling dilemma, which affects the point in the course of the project when specifications of the core analysis task are fixed (as early as possible from the computational perspective, but as late as possible from the Humanities perspective) and (2) the subjectivity problem, which concerns the degree of intersubjective stability of the target categories of analysis. CL methodology demands high inter-annotator agreement and theory-independent categories, while the categories in hermeneutic reasoning are often tied to a particular interpretive approach (viz. a theory of literary interpretation) and may bear a non-trivial relation to a reader’s pre-understanding. Building a comprehensive methodological framework that helps overcome these issues requires considerable time and patience. The established computational methodology has to be gradually opened up to more hermeneutically oriented research questions; resources and tools for the relevant categories of analysis have to be constructed. This article does not call into question that well-targeted efforts along this path are worthwhile. Yet, it makes the following additional programmatic point regarding directions for future research: It might be fruitful to explore—in parallel—the potential lying in DH-specific variants of the concept of rapid prototyping from Software Engineering. To get an idea of how computational analysis of some aspect of text might contribute to a hermeneutic research question, a prototypical analysis model is constructed, e.g., from related data collections and analysis categories, using transfer techniques. While the initial quality of analysis may be limited, the idea of rapid probing allows scholars to explore how the analysis fits in an actual workflow on the target text data and it can thus provide early feedback for the process of refining the modeling. If the rapid probing method can indeed be incorporated in a hermeneutic framework to the satisfaction of well-disposed Humanities scholars, a swifter exploration of alternative paths of analysis would become possible. This may generate considerable additional momentum for transdisciplinary integration. It is as yet too early to point to truly Humanities-oriented examples of the proposed rapid probing technique. To nevertheless make the programmatic idea more concrete, the article uses two experimental scenarios to argue how rapid probing might help addressing the scheduling dilemma and the subjectivity problem respectively. The first scenario illustrates the transfer of complex analysis pipelines across corpora; the second one addresses rapid annotation experiments targeting character mentions in literary text.

中文翻译:

人文学科内的计算文本分析:如何结合贡献领域的工作实践?

该立场文件基于在日本大阪举行的COLING 2016数字人文语言技术研讨会上的主题演讲。它与关于人文学科工作实践的观察不同,后者遵循文本解释的诠释传统,而不是计算语言学(CL)中面向方法的研究策略。各自的人类行为传统截然不同。越来越多的研究人员愿意向真正的跨学科合作开放,试图在研究中利用CL的先进方法,最终解决传统人文学科和社会科学的问题。本文针对数字人文(DH)和计算社会科学中的此类协作项目确定了两个与工作流程相关的中心问题:(1)a进度安排的困境,这会影响项目的进度,而核心分析任务的规格是固定的(从计算的角度来看应尽早,而从人文的角度来看则应尽可能晚),以及(2)主观性问题,其中涉及目标类别分析的主体间稳定性的程度。CL方法学要求注释者之间的高度一致和与理论无关的类别,而诠释推理中的类别通常与特定的解释方法联系在一起(即文学解释理论),并且可能与读者的理解有重要关系。建立有助于克服这些问题的综合方法框架需要大量的时间和耐心。既定的计算方法必须逐步开放给更多以诠释学为导向的研究问题。必须构建用于相关分析类别的资源和工具。本文不怀疑在此道路上进行有针对性的努力是值得的。但是,它还为以后的研究方向提供了以下其他编程要点:并行探索软件工程中快速原型概念的DH特定变体中的潜力可能是富有成果的。为了了解文本某些方面的计算分析如何对解释学研究问题做出贡献,使用传输技术从相关数据收集和分析类别中构建了原型分析模型。虽然最初的分析质量可能会受到限制,但是快速探测允许学者探索分析如何适合目标文本数据的实际工作流程,从而可以为提炼建模过程提供早期反馈。如果确实可以将快速探测方法纳入诠释学框架中,以使人文科学界的学者们感到满意,那么对替代分析路径的更快速探索将成为可能。这可能为跨学科整合带来可观的额外动力。指出所提出的快速探测技术真正面向人文的例子还为时过早。尽管如此,为了使程序设计思想更具体,本文使用两个实验方案来论证快速探测如何分别帮助解决计划难题和主观性问题。第一种情况说明了跨语料库的复杂分析管道的传输;第二部分针对针对文学文本中人物提及的快速注释实验。
更新日期:2019-06-26
down
wechat
bug