当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A framework for information extraction from tables in biomedical literature
International Journal on Document Analysis and Recognition ( IF 1.8 ) Pub Date : 2019-02-15 , DOI: 10.1007/s10032-019-00317-0
Nikola Milosevic , Cassie Gregson , Robert Hernandez , Goran Nenadic

The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider all complexities and challenges of a table. Our research is examining the methods for extracting numerical (number of patients, age, gender distribution) and textual (adverse reactions) information from tables in the clinical literature. We present a requirement analysis template and an integral methodology for information extraction from tables in clinical domain that contains 7 steps: (1) table detection, (2) functional processing, (3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6) cell selection and (7) syntactic processing and extraction. Our approach performed with the F-measure ranged between 82 and 92%, depending on the variable, task and its complexity.

中文翻译:

从生物医学文献表中提取信息的框架

科学文献呈指数增长,专业人士不再能够应对当前的出版物数量。过去提供的文本挖掘方法可从文本中检索和提取信息;但是,这些方法大多数都忽略了表格和数字。在挖掘表数据方面进行的研究仍然没有一种综合的挖掘方法,该方法不会考虑表的所有复杂性和挑战。我们的研究正在研究从临床文献表中提取数字(患者人数,年龄,性别分布)和文本(不良反应)信息的方法。我们提供了从临床领域中的表格中提取信息的需求分析模板和一种完整的方法,包括7个步骤:(1)表格检测,(2)功能处理,(3)结构处理,(4)语义标记,(5)语用处理,(6)单元选择和(7)句法处理和提取。我们的方法是根据变量,任务及其复杂性,F量度在82%到92%之间。
更新日期:2019-02-15
down
wechat
bug