Constructing and evaluating automated literature review systems

Portenoy, Jason; West, Jevin D.

doi:10.1007/s11192-020-03490-w

Constructing and evaluating automated literature review systems

Published: 03 June 2020

Volume 125, pages 3233–3251, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

1283 Accesses
7 Citations
4 Altmetric
Explore all metrics

Abstract

Automated literature reviews have the potential to accelerate knowledge synthesis and provide new insights. However, a lack of labeled ground-truth data has made it difficult to develop and evaluate these methods. We propose a framework that uses the reference lists from existing review papers as labeled data, which can then be used to train supervised classifiers, allowing for experimentation and testing of models and features at a large scale. We demonstrate our framework by training classifiers using different combinations of citation- and text-based features on 500 review papers. We use the R-Precision scores for the task of reconstructing the review papers’ reference lists as a way to evaluate and compare methods. We also extend our method, generating a novel set of articles relevant to the fields of misinformation studies and science communication. We find that our method can identify many of the most relevant papers for a literature review from a large set of candidate papers, and that our framework allows for development and testing of models and features to incrementally improve the results. The models we build are able to identify relevant papers even when starting with a very small set of seed papers. We also find that the methods can be adapted to identify previously undiscovered articles that may be relevant to a given topic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Metasearch: A Web-Based Application to Perform Systematic Reviews

Insights into relevant knowledge extraction techniques: a comprehensive review

Article 03 October 2019

Abdul Shahid, Muhammad Tanvir Afzal, … Jia-Wei Chang

Systematic Reviews: Characteristics and Impact

Article 22 October 2020

Gali Halevi & Rachel Pinotti

Notes

For the clustering, we used the cleaned version of the Web of Science network as described in “Data” section. We used the network after cleaning for citations, but before removing papers with other missing metadata. This version of the network had 73,725,142 nodes and 1,164,650,021 edges.
Since every node is in exactly one cluster (even if the cluster is only one node), and the leaves of the hierarchy tree represent the nodes themselves, the minimum depth in the hierarchy is 2. In this case, the first level is the cluster the node belongs to, and the second level is the node.
We divide the standard measure of distance between nodes in a tree by the sum of the nodes’ depth. This is because, in the case of hierarchical Infomap clustering, the total depth varies throughout the tree, and the actual depth of the nodes is arbitrary when describing the distance between the nodes. For example, a pair of nodes in the same bottom-level cluster at a depth of level 5 are no closer together than a pair of nodes in the same bottom-level cluster at level 2.
Machine learning experiments were conducted using scikit-learn version 0.20.3 running on Python 3.6.9.
Although we only performed ranking and clustering once, it would be ideal to remove all nodes and links past the year of the review paper, as well as the review paper itself, and cluster this network. However, performing a separate clustering for each review paper would be computationally infeasible. Nevertheless, any bias introduced by this should be small, as the clustering method we use considers the overall flow of information across multiple pathways, which makes it robust to the removal of individual nodes and links in large networks.
We chose to report the best-performing model for each experiment, rather than restricting to a single classifier type. This decision did not have a large effect on the results. We chose to be flexible in which classifier to use because there are differences among the different review articles. We will continue to explore the nature of these differences in future work.
The actual feature used was the absolute difference between a paper’s publication year and the mean publication year of the seed papers.
We used the spaCy library (version 2.2.3) with a pretrained English language model (core_web_lg version 2.2.5).
The models that had both network and title embedding features, but not publication year (“Cluster, PageRank, Embeddings”), performed worse in general than models with embeddings alone, with scores tending to be between 0.5 and 0.7. The reason for this is unclear.
Since the same random seeds (1, 2, 3, 4, 5) were used each time, the smaller seed sets are always subsets of the larger ones. For example, for a given review article and a given random seed, the 100 seed papers identified are all included in the set of 150; the set of 50 seed papers are all included in both the set of 100 and 150; and so on.
See Data and Methods at http://www.misinformationresearch.org for details

References

Albarqouni, L., Doust, J., & Glasziou, P. (2017). Patient preferences for cardiovascular preventive medication: A systematic review. Heart, 103(20), 1578–1586. https://doi.org/10.1136/heartjnl-2017-311244.
Article Google Scholar
Ammar, W., Groeneveld, D., Bhagavatula, C., Beltagy, I., Crawford, M., & Downey, D., et al. Construction of the literature graph in semantic scholar. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human language technologies, Volume 3 (Industry Papers), pp. 84–91. Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-3011. https://www.aclweb.org/anthology/N18-3011
Bae, S. H., Halperin, D., West, J., Rosvall, M., Howe, B. (2013). Scalable flow-based community detection for large-scale network analysis. In 2013 IEEE 13th international conference on data mining workshops (pp. 303–310). https://doi.org/10.1109/ICDMW.2013.138
Bastian, H., Glasziou, P., & Chalmers, I. (2010). Seventy-five trials and eleven systematic reviews a day: How will we ever keep up? PLOS Medicine, 7(9), e1000326. https://doi.org/10.1371/journal.pmed.1000326.
Article Google Scholar
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338. https://doi.org/10.1007/s00799-015-0156-0.
Article Google Scholar
Belter, C. W. (2016). Citation analysis as a literature search method for systematic reviews. Journal of the Association for Information Science and Technology, 67(11), 2766–2777. https://doi.org/10.1002/asi.23605.
Article Google Scholar
Chen, T. T. (2012). The development and empirical study of a literature review aiding system. Scientometrics, 92(1), 105–116. https://doi.org/10.1007/s11192-012-0728-3.
Article Google Scholar
Cormack, G. V., Grossman, M. R. (2014). Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, SIGIR ’14, pp. 153–162. ACM, New York, NY, USA. https://doi.org/10.1145/2600428.2609601. http://doi.acm.org/10.1145/2600428.2609601. Event-place: Gold Coast, Queensland, Australia
Djidjev, H. N., Pantziou, G. E., & Zaroliagis, C. D. (1991). Computing shortest paths and distances in planar graphs. In J. L. Albert, B. Monien, & M. R. Artalejo (Eds.), Automata, languages and programming (pp. 327–338). Berlin: Springer.
Chapter Google Scholar
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5), 75–174.
Article MathSciNet Google Scholar
Greenhalgh, T., & Peacock, R. (2005). Effectiveness and efficiency of search methods in systematic reviews of complex evidence: Audit of primary sources. BMJ, 331(7524), 1064–1065. https://doi.org/10.1136/bmj.38636.593461.68.
Article Google Scholar
Gupta, S., Varma, V. (2017). Scientific Article recommendation by using distributed representations of text and graph. In Proceedings of the 26th international conference on world wide web companion, WWW ’17 Companion (pp. 1267–1268). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland. https://doi.org/10.1145/3041021.3053062.
Horsley, T., Dingwall, O., & Sampson, M. (2011). Checking reference lists to find additional studies for systematic reviews. Cochrane Database of Systematic Reviews,. https://doi.org/10.1002/14651858.MR000026.pub2.
Article Google Scholar
Janssens, A. C. J. W., & Gwinn, M. (2015). Novel citation-based search method for scientific literature: Application to meta-analyses. BMC Medical Research Methodology, 15(1), 84. https://doi.org/10.1186/s12874-015-0077-z.
Article Google Scholar
Jha, R., Abu-Jbara, A., Radev, D. (2013). A system for summarizing scientific topics starting from keywords. In Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers) (pp. 572–577).
Kanakia, A., Shen, Z., Eide, D., Wang, K. A scalable hybrid research paper recommender system for microsoft academic. In The World Wide Web conference, WWW ’19 (pp. 2893–2899). Association for Computing Machinery. https://doi.org/10.1145/3308558.3313700.
Kong, X., Mao, M., Wang, W., Liu, J., & Xu, B. (2018). VOPRec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing,. https://doi.org/10.1109/TETC.2018.2830698.
Article Google Scholar
Larsen, K. R., Hovorka, D., Dennis, A., & West, J. (2019). Understanding the elephant: The discourse approach to boundary identification and corpus construction for theory review articles. Journal of the Association for Information Systems, 20, 7. https://doi.org/10.17705/1jais.00556.
Article Google Scholar
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval (1st ed.). New York: Cambridge University Press.
Book Google Scholar
Miwa, M., Thomas, J., O’Mara-Eves, A., & Ananiadou, S. (2014). Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics, 51, 242–253. https://doi.org/10.1016/j.jbi.2014.06.005.
Article Google Scholar
Murphy, K. P. (2010). Machine Learning: A Probabilistic Perspective. Cambridge: MIT Press.
MATH Google Scholar
National Academies of Sciences. (2017). Engineering, and Medicine and others: Communicating science effectively: A research agenda. National Academies Press.
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews, 4(1), 1–22. https://doi.org/10.1186/2046-4053-4-5.
Article Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab. http://ilpubs.stanford.edu:8090/422.
Portenoy, J., & West, J. D. (2019). Supervised learning for automated literature review. BIRNDL, 2019, 9.
Google Scholar
Robinson, K. A., Dunn, A. G., Tsafnat, G., & Glasziou, P. (2014). Citation networks of related trials are often disconnected: Implications for bidirectional citation searches. Journal of Clinical Epidemiology, 67(7), 793–799. https://doi.org/10.1016/j.jclinepi.2013.11.015.
Article Google Scholar
Ronzano, F., Saggion, H. (2015). Dr. inventor framework: Extracting structured information from scientific publications. In International conference on discovery science (pp. 209–220). Springer.
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118–1123.
Article Google Scholar
Silva, F. N., Amancio, D. R., Bardosova, M., Costa, L.D.F., & Oliveira, O.N. (2016). Using network science and text analytics to produce surveys in a scientific topic. Journal of Informetrics, 10(2), 487–502. https://doi.org/10.1016/j.joi.2016.03.008.
Article Google Scholar
Tsafnat, G., Dunn, A., Glasziou, P., & Coiera, E. (2013). The automation of systematic reviews: Would lead to best currently available evidence at the push of a button. BMJ, 346(7891), 8–8.
Google Scholar
Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C., & Schmid, C. H. (2010). Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11(1), 55. https://doi.org/10.1186/1471-2105-11-55.
Article Google Scholar
Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: Writing a literature review. MIS Quarterly, 26(2), xiii–xxiii.
Williams, K., Wu, J., Choudhury, S. R., Khabsa, M., Giles, C. L. Scholarly big data information extraction and integration in the CiteSeerx digital library. In 2014 IEEE 30th international conference on data engineering workshops (pp. 68–73). IEEE. https://doi.org/10.1109/ICDEW.2014.6818305. http://ieeexplore.ieee.org/document/6818305/.
Yu, Z., Kraft, N. A., & Menzies, T. (2018). Finding better active learners for faster literature reviews. Empirical Software Engineering, 23(6), 3161–3186. https://doi.org/10.1007/s10664-017-9587-0.
Article Google Scholar
Yu, Z., & Menzies, T. (2019). FAST2: An intelligent assistant for finding relevant papers. Expert Systems with Applications, 120, 57–71. https://doi.org/10.1016/j.eswa.2018.11.021.
Article Google Scholar
Zitt, M. (2015). Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields. Delineation, 102(3), 2223–2245. https://doi.org/10.1007/s11192-014-1482-5.
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank Dr. Chirag Shah for helpful conversations around evaluation measures, and Clarivate Analytics for the use of the Web of Science data. We also thank three anonymous reviewers for constructive feedback. This work was facilitated through the use of advanced computational, storage, and networking infrastructure provided by the Hyak supercomputer system and funded by the STF at the University of Washington.

Author information

Authors and Affiliations

University of Washington, Seattle, WA, 98105, USA
Jason Portenoy & Jevin D. West

Authors

Jason Portenoy
View author publications
You can also search for this author in PubMed Google Scholar
Jevin D. West
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason Portenoy.

Appendix

Example of autoreview results

Below is a sample of results (random samples of true positives, false positives, true negatives, and false negatives) from the autoreview classifier using the references from Fortunato (2010)—a review on Community Detection in Graphs—with a random seed of 5. The “Rank” represents the position of the candidate paper when ordered descending by the classifier’s score. The false positives, while not in the original reference list, still seem to be relevant to the topic (e.g., “Overlapping Community Search for Social Networks”). The true negatives tend to have lower scores than the false negatives, suggesting that the assigned score does tend to predict relevant documents, even if they are below the cutoff.

True Positives
Rank	Title	Year
10	Role Models For Complex Networks	2007
21	Adaptive Clustering Algorithm For Community Detection In Complex Networks	2008
24	Random Field Ising Model And Community Structure In Complex Networks	2006
50	Bayesian Approach To Network Modularity	2008
58	The Effect Of Size Heterogeneity On Community Identification In Complex Networks	2006
76	Loops And Multiple Edges In Modularity Maximization Of Networks	2010
97	Synchronization Interfaces And Overlapping Communities In Complex Networks	2008
118	The Analysis And Dissimilarity Comparison Of Community Structure	2006
119	Searching For Communities In Bipartite Networks	2008
208	Epidemic Spreading In Scale-Free Networks	2001

False Positives
Rank	Title	Year
72	Modularity From Fluctuations In Random Graphs And Complex Networks	2004
94	Clustering Coefficient And Community Structure Of Bipartite Networks	2008
98	Detecting Overlapping Community Structures In Networks	2009
106	Size Reduction Of Complex Networks Preserving Modularity	2007
129	Extracting Weights From Edge Directions To Find Communities In Directed Networks	2010
146	Identifying The Role That Animals Play In Their Social Networks	2004
150	Seeding The Kernels In Graphs: Toward Multi-Resolution Community Analysis	2009
159	Overlapping Community Search For Social Networks	2010
162	Modularity Clustering Is Force-Directed Layout	2009
185	Cartography Of Complex Networks: Modules And Universal Roles	2005

True Negatives
Rank	Title	Year
2967	Graph Models Of Complex Information-Sources	1979
120959	Parallel Distributed Network Characteristics Of The Dsct	1992
322251	Hidden Semantic Concept Discovery In Region Based Image Retrieval	2004
327308	A Multilevel Matrix Decomposition Algorithm For Analyzing Scattering From Large Structures	1996
394850	Multiple-Model Approach To Finite Memory Adaptive Filtering	1992
749175	Statistical Computer-Aided Design For Microwave Circuits	1996
943999	Segmental Anhidrosis In The Spinal Dermatomes In Sjogrens Syndrome-Associated Neuropathy	1993
1121787	Rheological And Dielectrical Characterization Of Melt Mixed Polycarbonate-Multiwalled Carbon Nanotube Composites	2004
1177851	Explaining The Rate Spread On Corporate Bonds	2001
1256866	The Cyanobacterial Cell Division Factor Ftn6 Contains An N-Terminal Dnad-Like Domain	2009

False Negatives
Rank	Title	Year
259	Heterogeneity In Oscillator Networks: Are Smaller Worlds Easier To Synchronize?	2003
324	Assessing The Relevance Of Node Features For Network Structure	2009
385	The Use Of Edge-Betweenness Clustering To Investigate Biological Function In Protein Interaction Networks	2005
6605	A Measure Of Betweenness Centrality Based On Random Walks	2005
19863	On Decomposition Of Networks In Minimally Interconnected Subnetworks	1969
59900	Objective Criteria For Evaluation Of Clustering Methods	1971
139178	Optimization With Extremal Dynamics	2001
250583	The Tie Effect On Information Dissemination: The Spread Of A Commercial Rumor In Hong Kong	2002
281952	Compartments Revealed In Food-Web Structure	2003
1203248	Dynamic Asset Trees And Portfolio Analysis	2002

Rights and permissions

Reprints and permissions

About this article

Cite this article

Portenoy, J., West, J.D. Constructing and evaluating automated literature review systems. Scientometrics 125, 3233–3251 (2020). https://doi.org/10.1007/s11192-020-03490-w

Download citation

Received: 01 October 2019
Published: 03 June 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11192-020-03490-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constructing and evaluating automated literature review systems

Abstract

Access this article

Similar content being viewed by others

Metasearch: A Web-Based Application to Perform Systematic Reviews

Insights into relevant knowledge extraction techniques: a comprehensive review

Systematic Reviews: Characteristics and Impact

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Example of autoreview results

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constructing and evaluating automated literature review systems

Abstract

Access this article

Similar content being viewed by others

Metasearch: A Web-Based Application to Perform Systematic Reviews

Insights into relevant knowledge extraction techniques: a comprehensive review

Systematic Reviews: Characteristics and Impact

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Example of autoreview results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation