An Industrial Approach to Using Artificial Intelligence and Natural Language Processing for Accelerated Document Preparation in Drug Development

Viswanath, Shekhar; Fennell, Jared W.; Balar, Kalpesh; Krishna, Praful

doi:10.1007/s12247-020-09449-x

An Industrial Approach to Using Artificial Intelligence and Natural Language Processing for Accelerated Document Preparation in Drug Development

Original Article
Published: 15 May 2020

Volume 16, pages 302–316, (2021)
Cite this article

Journal of Pharmaceutical Innovation Aims and scope Submit manuscript

Shekhar Viswanath ORCID: orcid.org/0000-0002-4742-667X¹,
Jared W. Fennell¹,
Kalpesh Balar² &
…
Praful Krishna²

624 Accesses
5 Citations
2 Altmetric
Explore all metrics

Abstract

Purpose

Due to the exceptionally high standards for accuracy and data integrity in scientific regulatory reporting, it is vital that any tool that aims to streamline this process is as efficient or more in gathering data as a team of scientists, without higher cost in terms of time or resources. For this reason, an artificial intelligence-based tool with parallel search, document creation, and data integrity review capabilities is being investigated as a potential solution. This paper describes a proof of concept project to develop an AI-based tool to rapidly assemble an end-of-phase 2 (EOP2) briefing document for a potential medicine. We have called the tool an Intelligent Machine for Document Preparation or IMDP.

Methods

A training corpus of approximately 65,000 pdf documents derived from electronic lab notebooks and technical reports related to five molecules (including Merestinib) was ingested, and prior EOP2 documents from the remaining four molecules was used to generate training questions and answers. Then, an annotation-light natural language processing algorithm analyzed a set of structured and unstructured data regarding Merestinib. A simple user interface was created allowing scientists to query the system in natural language, and a table builder, image/plot finder, and free-text addition features were added to allow for advanced search without dependence on keywords.

Results

Three significant innovations were designed-in to improve overall performance as compared to our benchmark solution without sacrificing usability. First, the AI-based IMDP was built to improve accuracy and accelerate document creation with remarkably low amount of training. Second, image search capability was added to enrich the knowledge base, and third, the IMDP was integrated with the existing process rather than adding a step in the workflow. Finally, accuracy and total document creation time were compared with the existing tool (benchmark tool). Our experiments show that the AI-based technology reached 89% accuracy which surpassed the internal benchmark of 54% and retrieved the right information 3.6 times faster.

Conclusions

The main contribution of this study is to show the value of artificial intelligence-based tools in accelerating all major stages of regulatory report creation while allowing a team of scientists to seamlessly collaborate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The role of artificial intelligence in healthcare: a structured literature review

Article Open access 10 April 2021

Silvana Secinaro, Davide Calandra, … Paolo Biancone

Investigating the impact of structured reporting on the linguistic standardization of radiology reports through natural language processing over a 10-year period

Article Open access 05 August 2023

Jan Vosshenrich, Ivan Nesic, … Tobias Heye

Revolutionizing healthcare: the role of artificial intelligence in clinical practice

Article Open access 22 September 2023

Shuroug A. Alowais, Sahar S. Alghamdi, … Abdulkareem M. Albekairy

References

Venkatasubramanian V. The promise of artificial intelligence in chemical engineering: is it here, finally? AIChE J. 2018;65(2).
Gupta A. Introduction to deep learning. Chem Eng Prog. 2018.
Yu LX, Raw A, Wu A, Capacci-Daniel C, Zhang Y, Rosencrance S. FDA’s new pharmaceutical quality initiative: knowledge-aided assessment & structured applications. Int J Pharm. 2019;1.
Remolona MFM, Conway MF, Balasubramanian S, Fan L, Feng Z, Gu T, et al. Hybrid ontology-learning materials engineering system for pharmaceutical products: Multi-label entity recognition and concept detection. Comput Chem Eng. 2017;107:49–60.
Article CAS Google Scholar
Flower A, McKenna JW, Upreti G. Validity and reliability of GraphClick and DataThief III for data extraction. 2016;40(3):396–413.
Filippov IV, Nicklaus MC. Optical structure recognition software to recover chemical information: OSRA, an open source solution. J Chem Inf Model. 2009;49(3):740–3.
Article CAS Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.
Article CAS Google Scholar
Goldberg Y. A primer on neural network models for natural language processing. Journal of Artifical Intelligence Research. 2016;57:345–420. https://doi.org/10.1613/jair.4992.
Omer Levy YG. ACL anthology. Dependency-Based Word Embeddings 2014.
Tomas Mikolov KC, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv. 2013.
Jeffrey Pennington RS. Christopher manning. Global Vectors for Word Representation. ACL Anthology: Glove; 2014.
Google Scholar
Jacob Devlin M-WC, Kenton Lee, Kristina Toutanova. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. arXiv 2018.
Jinhyuk Lee WY, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. arXiv. 2019.
Lu J, Batra D, Parikh D, Lee S. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv. 2019.
Kulkarni R, Kulkarni H, Balar K, Krishna P. Cognitive natural language search using calibrated quantum mesh. IEEE. 2018.
Document Management – Portable Document Format. 2008;PDF 1.7.
Weir R. OpenDocument format: the standard for office documents. IEEE Internal Computing. 2009;13(2):83–7.
Article Google Scholar
Still M. The definitive guide to ImageMagick. 2006.
Google Scholar
Smith R. An overview of the Tesseract OCR engine. IEEE. 2007.
Quality Risk Management. 2005.
Lubani M, Noah SAM, Mahmud R. Ontology population: approaches and design aspects. J Inf Sci. 2018;45(4):502–15.
Article Google Scholar

Download references

Acknowledgments

The authors wish to acknowledge Rocketspace Inc. as a key collaborator before and during the project execution, as well as Justin Burt, Himanshu Gupta, and Harshad Kulkarni.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN, 46285, USA
Shekhar Viswanath & Jared W. Fennell
Arbot Solutions Inc dba Coseer, 301 Mission St Suite 9F, San Francisco, CA, 94105, USA
Kalpesh Balar & Praful Krishna

Authors

Shekhar Viswanath
View author publications
You can also search for this author in PubMed Google Scholar
Jared W. Fennell
View author publications
You can also search for this author in PubMed Google Scholar
Kalpesh Balar
View author publications
You can also search for this author in PubMed Google Scholar
Praful Krishna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shekhar Viswanath.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Availability of Data and Material

Given that this is a Natural Language Processing application on Pharmaceutical CMC data, the raw data as mentioned in the paper was composed of ~ 65,000 pdfs, and it is not practical to share so many pdfs.

Code Availability

We used a proprietary vendor platform augmented by custom coding onto the platform to generate the solution. The code is therefore not available, as such, for review.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Viswanath, S., Fennell, J.W., Balar, K. et al. An Industrial Approach to Using Artificial Intelligence and Natural Language Processing for Accelerated Document Preparation in Drug Development. J Pharm Innov 16, 302–316 (2021). https://doi.org/10.1007/s12247-020-09449-x

Download citation

Published: 15 May 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s12247-020-09449-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Industrial Approach to Using Artificial Intelligence and Natural Language Processing for Accelerated Document Preparation in Drug Development