Large-scale data mining of rapid residue detection assay data from HTML and PDF documents: improving data access and visualization for veterinarians,Frontiers in Veterinary Science

当前位置： X-MOL 学术 › Front. Vet. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Large-scale data mining of rapid residue detection assay data from HTML and PDF documents: improving data access and visualization for veterinarians
Frontiers in Veterinary Science ( IF 2.6 ) Pub Date : 2021-06-21 , DOI: 10.3389/fvets.2021.674730
Majid Jaberi-Douraki _{1,

2,

3} , Soudabeh Taghian Dinani _{1,

3,

4} , Nuwan Indika Millagaha Gedara _{1,

3,

5} , Xuan Xu _{1,

2,

3} , Emily Richards ₆ , Fiona Maunsell ₇ , Nader Zad _{1,

2,

8} , Lisa A Tell ₆

Affiliation

Extra-label drug use in food animal medicine is authorized by the US Animal Medicinal Drug Use Clarification Act (AMDUCA), and estimated withdrawal intervals are based on published scientific pharmacokinetic data. Occasionally there is a paucity of scientific data on which to base a withdrawal interval or a large number of animals being treated, driving the need to test for drug residues. Rapid assay commercial farm-side tests are essential for monitoring drug residues in animal products to protect human health. Active ingredients, sensitivity, matrices, and species that have been evaluated for commercial rapid assay tests are typically reported on manufacturers’ websites or in PDF documents that are available to consumers but may require a special access request. Additionally, this information is not always correlated with FDA-approved tolerances. Furthermore, parameter changes for these tests can be very challenging to regularly identify, especially those listed on websites or in documents that are not publicly available. Therefore, artificial intelligence plays a critical role in efficiently extracting the data and ensure current information. Extracting tables from PDF and HTML documents has been investigated both by academia and commercial tool builders. Research in text mining of such documents has become a widespread yet challenging arena in implementing natural language programming. However, techniques of extracting tables are still in their infancy and being investigated and improved by researchers. In this study, we developed and evaluated a data-mining method for automatically extracting rapid assay data from electronic documents. Our automatic electronic data extraction method includes a software package module, a developed pattern recognition tool, and a data mining engine. Assay details were provided by several commercial entities that produce these rapid drug residue assay tests. During this study, we developed a real-time conversion system and method for reflowing contents in these files for accessibility practice and research data mining. Embedded information was extracted using an AI technology for text extraction and text mining to convert to structured formats. These data were then made available to veterinarians and producers via an online interface, allowing interactive searching and also presenting the commercial test assay parameters in reference to FDA-approved tolerances.

中文翻译：

从 HTML 和 PDF 文档中挖掘快速残留检测分析数据的大规模数据挖掘：改进兽医的数据访问和可视化

美国动物药用药物使用澄清法案 (AMDUCA) 授权在食品动物药物中使用额外标签药物，并且估计的停药间隔基于已发表的科学药代动力学数据。有时，由于缺乏科学数据可作为停药间隔或大量接受治疗的动物的依据，因此需要测试药物残留。快速分析商业农场测试对于监测动物产品中的药物残留以保护人类健康至关重要。已评估用于商业快速测定测试的活性成分、灵敏度、基质和物种通常在制造商的网站上或消费者可用的 PDF 文档中报告，但可能需要特殊访问请求。此外，这些信息并不总是与 FDA 批准的容差相关。此外，定期识别这些测试的参数更改可能非常具有挑战性，尤其是那些列在网站上或未公开可用的文档中。因此，人工智能在有效提取数据和确保当前信息方面起着至关重要的作用。从 PDF 和 HTML 文档中提取表格已经被学术界和商业工具构建者研究过。对此类文档的文本挖掘的研究已成为实现自然语言编程的一个广泛但具有挑战性的领域。然而，提取表格的技术仍处于起步阶段，研究人员正在研究和改进。在这项研究中，我们开发并评估了一种数据挖掘方法，用于从电子文档中自动提取快速检测数据。我们的自动电子数据提取方法包括软件包模块、开发的模式识别工具和数据挖掘引擎。生产这些快速药物残留分析测试的几个商业实体提供了分析细节。在这项研究中，我们开发了一种实时转换系统和方法，用于重排这些文件中的内容，用于可访问性实践和研究数据挖掘。嵌入信息是使用人工智能技术提取的，用于文本提取和文本挖掘，以转换为结构化格式。然后通过在线界面将这些数据提供给兽医和生产商，允许进行交互式搜索，并根据 FDA 批准的容差提供商业测试分析参数。和数据挖掘引擎。生产这些快速药物残留分析测试的几个商业实体提供了分析细节。在这项研究中，我们开发了一种实时转换系统和方法，用于重排这些文件中的内容，用于可访问性实践和研究数据挖掘。嵌入信息是使用人工智能技术提取的，用于文本提取和文本挖掘，以转换为结构化格式。然后通过在线界面将这些数据提供给兽医和生产商，允许进行交互式搜索，并根据 FDA 批准的容差提供商业测试分析参数。和数据挖掘引擎。生产这些快速药物残留分析测试的几个商业实体提供了分析细节。在这项研究中，我们开发了一种实时转换系统和方法，用于重排这些文件中的内容，用于可访问性实践和研究数据挖掘。嵌入信息是使用人工智能技术提取的，用于文本提取和文本挖掘，以转换为结构化格式。然后通过在线界面将这些数据提供给兽医和生产商，允许进行交互式搜索，并根据 FDA 批准的容差提供商业测试分析参数。我们开发了一种实时转换系统和方法，用于重排这些文件中的内容，用于可访问性实践和研究数据挖掘。嵌入信息是使用人工智能技术提取的，用于文本提取和文本挖掘，以转换为结构化格式。然后通过在线界面将这些数据提供给兽医和生产商，允许进行交互式搜索，并根据 FDA 批准的容差提供商业测试分析参数。我们开发了一种实时转换系统和方法，用于重排这些文件中的内容，用于可访问性实践和研究数据挖掘。嵌入信息是使用人工智能技术提取的，用于文本提取和文本挖掘，以转换为结构化格式。然后通过在线界面将这些数据提供给兽医和生产商，允许进行交互式搜索，并根据 FDA 批准的容差提供商业测试分析参数。

更新日期：2021-06-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文