Introduction

Science is a dynamic system of scientific advances in which breakthrough discoveries are important developmental markers whose impact may extend beyond their own field of study. From the perspective of scientific research and policy, breakthrough discoveries have attracted the interest of researchers in many fields. In fact, several scholars have even established theoretical models to identify and explore the features of breakthrough research (Chen, 2012; Chen et al., 2009; Ponomarev et al., 2014; Winnink et al., 2016; Wolcott et al., 2016). At the same time, to accelerate scientific development, governments and funding agencies around the world have invested considerable resources into prioritizing and funding emerging research—especially transformative research. In 2007, the American National Science Board proposed creating programs to enhance support for transformative research (National Science Board, 2007). More recently, the US National Institutes of Health (2016) established the Transformative Research Projects Program (R01), which specifically targets transformative research that aims to support “exceptionally innovative and/or unconventional research projects with the potential to create or overturn fundamental paradigms”. In 2017, the Report of the 19th National Congress of the Communist Party of China (2017) also promoted these ends, emphasizing the need to make China a country of innovators who could contribute major breakthroughs to pioneering basic research. Although each country has established funding and research infrastructures to foster breakthrough research, determining how to best utilize funding and which projects to fund is still an important challenge. Significant gains in scientific research could result for both scientists and agencies if breakthrough research could be identified early on and promoted through sympathetic funding policies. Therefore, our purpose with this research was to develop a feasible method of filtering breakthrough discoveries out of the vast expanses of academic literature—ideally before they have become “old news”.

What is a breakthrough?

At present, there is no uniform definition of a breakthrough, which obviously makes it difficult to identify one. Hollingsworth (2008) defines a breakthrough as “a finding or process, often preceded by numerous small advances, which leads to a new way of think about a problem…”. He argues that an essential property of a breakthrough is “…new way of thinking about a problem”. Another significant feature of breakthroughs is their link to creative, transformative and groundbreaking research (Ponomarev et al., 2014; Winnink et al., 2019; Wolcott et al., 2016). According to Kuhn’s theory of scientific development (Kuhn, 1962), periods of “conventional science” can be interrupted by “scientific revolutions” which transform science and result in a new stage of “conventional science”. During times of conventional science, incremental innovation supplements previous research under the existing paradigm, promoting the cumulative development of science. But during the relatively rare occurrences of a “scientific revolution”, science evolves rapidly through new insights, research methods or theoretical explanations, overthrowing the old paradigm in favor of a new framework. Rather than operating independently, there is a dynamic and interactive relationship between incremental research and scientific breakthroughs, with the gradual accumulation of incremental innovations ultimately leading to a new scientific paradigm.

The American National Science Board and National Science Foundation—two major scientific decision-making and funding agencies—champion the notion of “transformational research” as the means by which breakthroughs are made in scientific research. While the National Science Board (2007) defines transformative research in terms of policy, and the National Science Foundation (2015) identifies it through the management perspective of research funding, both believe that such research has the potential to lead to paradigm shifts. In that sense, transformative research has a similar connotation to the notion of a scientific revolution in Kuhn’s theory.

Transformative research, therefore, has the essential attributes of breakthrough research. At the same time, while important scientific discoveries produced by incremental research may not have sufficient transformative potential, they can provide new ideas and knowledge that play a crucial role in breakthrough research. As such, we regard both transformative research and major incremental scientific innovations as breakthroughs, and seek methods by which to recognize either and both in scientific publications.

Prior work identifying breakthrough publications

The current approaches to identifying breakthroughs mostly fall into two categories. One is the qualitative identification method used by academic communities, known as peer review. The other is based in scientometrics.

Peer review is a comprehensive evaluation approach that, no matter what automatic method is invented, will remain an important and effective method by which such breakthroughs are identified. But, it also has obvious shortcomings, being both time-consuming and highly dependent on expert opinions. Further, given the explosive growth of scientific literature, peer review is becoming an increasingly inefficient means by which to identify the presence of valuable breakthrough research in publications.

Most scientific research to identify breakthrough publications therefore falls within the field of scientometrics. Methods of identifying breakthroughs often prioritize citation statistics. High citation counts are used to identify high-value articles and even predict Nobel Prize winners (Garfield & Welljams-Dorof, 1992). Recently, however, techniques such as that developed by Ponomarev et al. (2014) are formulating single indicators for the early detection of candidate breakthroughs based on the dynamics of publication citation. Wolcott et al. (2016) incorporated multiple time-dependent and time-independent features of publications into a model to differentiate known breakthrough research from randomly selected control papers, such as the key publications reported in a high-quality data set like The American Society of Clinical Oncology (ASCO) Annual Report. Some researchers have used cited or long-term reference analysis to detect publications containing seminal research (Comins & Hussey, 2015) or research milestones (Comins & Leydesdorff, 2017). The idea is that transformative research appears to cause a “disruption” in the citation chain of the prevailing research paradigm. Hence, papers a given a “disruption score” and those exceeding a threshold may contain breakthrough research. The technique has been tested successfully with research in the fields of physics, computer science, and biomedicine (Huang et al., 2013, 2014). Winnink et al. (2016, 2019) demonstrate that characteristic patterns in the citation profiles of known breakthrough publications can be used in the early detection of discoveries with an important impact on scientific development. However, identifying breakthrough research using citation statistics relies on a correlation between breakthrough publications and external indicators rather than causation. For example, not all citations are equally important or even positive (Hernandez-Alvarez et al., 2017). A large number of descriptions are simply neutral, not to mention negative. To reveal the true worth of an article, one needs a positive evaluation by the academic community. Even more problematic for this approach is the time lag between publishing and citations gathering momentum.

With the availability of full-text articles, citances as a method of identifying breakthroughs has gradually attracted the attention of researchers. Scholars have also attempted to identify transformative scientific findings by combining citation analysis with content characteristics. Citing sentences, also called citances, are sentences from full-text articles that contains one or more references (Nakov et al., 2004). Citing sentences contain additional information not appearing in abstracts (Elkiss et al., 2008) and, therefore, more accurately represent the contribution of an article to scientific development because the practice of citing sentences is collectively deemed to signify the importance of an article by peer researchers (Radev & Abu-Jbara, 2012). Some scholars have accordingly summarized the contributions of research based on the practice of citing sentences (Chen & Zhuge, 2014).

The peculiar property of citances provides a new direction for recognizing breakthroughs. For example, Guo et al. (2014) combined the citation analysis (the analysis reference duration and highly cited/co-citation) and citance analysis to identify milestones articles that form the “academic chain”. In the citance analysis, the authors used "first", "broken", "breakthrough" and other iconic filtered comment words to select the papers. Small et al. (2017) extracted citances with the cue word “*discover*” and corresponding references to form “discovery citance-reference” pairs, and subsequently manually screened articles with at least 20 “discovery citances” (293 articles) to identify scientific discoveries (128 articles). In their 2017 study, Small et al. illustrate the important role cue words play in identifying transformative research. A common feature of the aforementioned studies is their use of citances to identify the intrinsic value of references, making up for the limitations in accuracy based on external indicators.

One consistent factor in all the research reviewed above—whether in the form of identifying breakthroughs based on external indicators, content analysis, or a combination of both—is the central role played by the citation relationship. Lacking in these researches is the evaluation by the authors themselves regarding their own research. Whether such an evaluation can be found based on the abstract text is an interesting question to investigate. Research into abstract-based literature classification suggests that it is possible to identify breakthroughs based on the content of abstracts.

In this study, we aim to combine the evaluation of others with self-evaluations to identify potential breakthrough publications. Small et al.’s study (2017) offers insights into breakthrough identification based on linguistic features. However, in this study, the focus was only on the word “discover” and its variants. It makes sense to identify and explore other words that represent an innovation or breakthrough evaluation.

There are two main tasks, then, that we focus on in our study. The first is to search for more words that indicate breakthrough research through word frequency analysis. The second is to identify potential breakthrough publications through the others-self evaluation processes and with the help of classification algorithms.

Materials and methods

High-quality breakthrough papers

The first challenge of predicting breakthrough publications is defining a core set of high-quality breakthrough publications to explore how they are cited by others as well as how their authors evaluate the breakthroughs. Due to the explosive growth of publications in the biomedical field, it is difficult to identify breakthrough papers to create a high-quality data set. In this study, we used articles recognized by the Nobel Prize Committee in physiology or medicine or a Science Breakthrough of the Year Award as “ground truths” of biomedical breakthroughs. Each year the Nobel Prize Committee acknowledges the key publications of the prize winners that represent their award-winning achievements and highlight the scientific breakthrough they are being recognized for. Although there is typically a significant time lag between the actual research and it winning a Nobel Prize, the publications nevertheless undeniably represent scientific breakthroughs. We also included those papers recognized with a Science Breakthrough of the Year award. Each year Science’s editors and writers choose a significant development as the Breakthrough of the Year (with nine runners-up) and provide the references that resulted in this recognition. The high-quality data set of breakthrough papers, therefore, includes the following:

  1. 1.

    Key publications of Nobel Prize winners in Physiology or Medicine from 1981 and 2018 for a total of 103 articles); and

  2. 2.

    Publications acknowledged in the Science Breakthrough of the Year award in the biomedical field from 1996 and 2018 (for a total of 556 articles).

Using these two sources, we identified 648 unique breakthrough publications indexed in the PubMed database.

Breakthrough cue word extraction

Citances are evaluations by others, made by researchers who have read a paper and wish to convey or build upon its ideas in their own work. Abstracts are self-evaluations. They are an author’s summary of their own research, including its benefits and worth. To find the different cue words signaling a possible breakthrough in others- versus self-evaluations, we collected the citances and abstracts of high-quality breakthrough papers and performed both word frequency statistics and manual screening. Using the two databases, Comments on Literature in Literature (Fujiwara & Yamamoto, 2015) and PubMed, we retrieved 135,526 citances and 467 abstracts tied to the high-quality breakthrough papers. The reason why the number of abstracts differs from the number of total breakthrough publications is that some articles do not have abstracts, such as letter.

To identify and extract the cue words, we used the Stanford CoreNLP tool to perform word segmentation and calculate the word frequency statistics for each corpus. The steps are shown in Fig. 1. For each sentence, we first performed word segmentation to reduce the sentences to single words, merging different forms of the same word appearing in the text. Part of speech tagging means to tag all the words according to their context. Word frequencies were calculated after they were lemmatized and tagged. Finally, the cue words were selected manually from a high-frequency word list based on whether the word’s meaning could reflect a breakthrough evaluation.

Fig. 1
figure 1

The example of word frequency statistics method flow

In the field of information retrieval, Recall and Precision (Cleverdon, 1967) are important indicators of a process’s effectiveness. Hence, these were the metrics we used to evaluate the cue words extracted through our process.

The test dataset included two kinds of papers from the Faculty Opinions (formerly called the F1000) database: those that were designated by at least five reviewers as having a “new finding”, i.e., articles that presented novel methods, models, etc., deemed to be breakthroughs, and those that were designated by reviewers as only reflecting “negative/null results”, i.e., articles that presented less-valuable results, regarded as non-breakthroughs. In terms of counts, the test dataset comprised 183 abstracts and 1895 citances from “new finding” publications and 125 abstracts and 1840 citances from “negative/null results” publications.

Recall and Precision were calculated according to Eqs. (1) and (2), respectively. The definitions of abbreviations in the formulas are shown in Table 1. Both indicators reflect the ability to retrieve breakthrough publications using the extracted cue words.

$${\text{Recall}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FP}}} \right)$$
(1)
$${\text{Precision}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FN}}} \right)$$
(2)
Table 1 Definitions of TP, FP, TN, and FN in the Precision and Recall calculation formulas

Identifying potential breakthroughs

The results in “Breakthrough cue words” section show that the cue word Precision for the citances was higher than for the abstracts. Hence, to identify breakthrough publications as accurately as possible from such a large volume of literature, we read the full articles with the citances to try and glean some insights into cue word selection. What we found was that the cue words were largely used by the author to describe the work in references, such as in a Literature Review. Obviously, our next step was to download all the references (from the CoLIL database) that had been described with a breakthrough keyword and read those articles, too. We developed the lingo of calling a citance containing a breakthrough cue word as a “breakthrough citance”. According to Small et al. (2017), however, a strategy of limiting cue words to those appearing in citances can still fail to find true links between cue words and references or, conversely, create a false association if the reference and cue words occur in the same sentence but are semantically unrelated. These problems can be compensated for (to some extent) by requiring that cue words and specific references occur in multiple citances. Therefore, we selected the papers with at least 100 breakthrough citances for subsequent analysis. Selecting papers with breakthrough citances meeting a threshold is part of an “others-evaluation” process, which means that many researchers use cue words in their evaluation of the papers.

Turning to the process of self-evaluation, where authors provide cue words about the worth of their own research, it was important to consider that abstracts serve many purposes. They can provide a summary of the research, state shortcomings in the literature or outline problems to be solved. They may discuss the intended audience, and they can provide an evaluation of the research or its implications. However, sometimes an abstract may not include a positive evaluation. After all, many scholars tend to err on the side of conservativism and caution, letting others judge the significance of their work. Alternatively, the results may be neutral or negative. Both can make it difficult to determine whether the research is of groundbreaking significance. Further, reviews, surveys, guidelines, statistical reports, etc., might also contain cue words yet not reflect original innovations or major breakthroughs in themselves. Both possibilities made manual screening necessary. During this process, we coded abstracts without positive evaluations plus reviews and the like as “0”, i.e., non-breakthrough. To be coded with a “1”, the article had to meet two criteria: (1) the abstract had to include a definitively positive evaluation; (2) the research results had to include new findings, change prevailing thinking, or prove a scientific phenomenon for the first time. We ultimately implemented a self-assessment process on the approximately 2000 papers selected after the others-evaluation process.

With these manual reviews done, we then used a text classification algorithm to separate the valuable information from the not-so-valuable. We tested three algorithms—SVM (Chang & Lin, 2011), TextCNN (Zhang & Wallace, 2015), and BERT (Devlin et al., 2018)—and chose the best results. Abstracts are highly accessible, so we formed a dataset of abstracts based on papers with 100 or more “breakthrough citances”. 80% of the data was used for training (selected at random), with the remaining 20% used as the test set. The model produced the best result in terms of Precision, Recall, F1-score, and Accuracy was selected for subsequent error analysis.

Results

This section presents the experimental results for each of the steps in the process. A full set of results can be found in Figs. 2, 3, 4, 5, 6 and 7.

Fig. 2
figure 2

Biomedical breakthrough papers identification steps and corresponding results

Fig. 3
figure 3

The proportion of parts of speech in the abstracts and citances corpora. NN, JJ, VB, RB refer to nouns, adjectives, verbs, and adverbs, respectively

Fig. 4
figure 4

The distribution of potential breakthrough papers over time

Fig. 5
figure 5

The categories of 273 potential breakthrough articles recommended with F1000 and corresponding articles number

Fig. 6
figure 6

The area under precision-recall curve of BERT model

Fig. 7
figure 7

The area under precision-recall curve of BERT-KeyPos-Last sentence model

Breakthrough cue words

After performing word frequency analysis on the abstracts and citances, we extracted 7058 possible cue words from the abstracts and 70,995 from the citances. We merged words with the same stem together and then classified each word as a part of speech, as shown in Fig. 3. Three types of parts of speech dominated the results for both the abstracts and the citances—nouns (NN), adjectives (JJ), and verbs (VB)—indicating that abstracts and citances are similar in composition. However, there were slight differences in the proportions of the three groups. For example, the proportion of verbs in the abstracts was significantly higher than in the citances, suggesting that authors use verbs more frequently when describing their own research.

During the cue word selection process, we took the top 500 most-frequently-used words as candidate terms. Eliminating medical specialty words, like MeSH terms, as well as words unrelated to breakthrough evaluation, we ultimately selected 8 cue words from the abstracts corpus: “new”, “novel”, “potential”, “key”, “change”, “evidence”, “basis”, and “base”, and 8 cue words from the citances corpus: “change”, “first”, “potential”, “new”, “novel”, “since”, “discovery” and “discover”. Table 2 provides the descriptive statistics for the words in these sets. On the whole, whether from the perspective of self-evaluation or others-evaluation, researchers often use the words “new”, “potential”, and “novel” (and their variants) to describe innovative and valuable research in the abstracts. However, further into an article, they tend to use a richer vocabulary to describe research, and their attitudes toward that research tend to be more spelled out more plainly.

Table 2 The breakthrough cue words selected from the abstracts and citances corpus and the corresponding parts of speech, frequency and original words

The next step was to calculate the Recall and Precision scores. The plan was to use cue words to retrieve breakthroughs in the test dataset and calculate the Recall and Precision of cue words at retrieving breakthrough publications according to formulae (1) and (2). Different corpora were retrieved with cue words from corresponding sources, and we used all of the original words rather than lemmas. The Recall and Precision for the citances were much higher than for abstracts (citances: Precision 70.77%, Recall 83.13% vs. abstracts: Precision 58.54%, Recall 52.46%). These results indicate that citances cue words are much more effective at retrieving breakthrough publications. Therefore, for the remainder of the study, we only used the citance cue words and dataset for analysis.

Potential breakthroughs identified via the others-self assessment process

From the PMC open access subset, we retrieved a total of about 2.32 million articles containing at least one cue word somewhere in the full text of the paper. Tracing the references cited in these papers using PubMed resulted in 12 million articles. All citances were downloaded using the unique “PubMed ID” of the reference through the CoLiL database. All data were stored in a MySQL database for subsequent queries and analysis. After eliminating those publications without citances and those without breakthrough citances, roughly 4.5 million articles remained with a total of 13.3 million breakthrough citances (about 3 citances per paper).

As described in the “Methods” section, we set the threshold of breakthrough citances per paper to 100 in the process of others-evaluation. Only 2117 articles met this threshold. These articles had a total of around 378,000 breakthrough citances, accounting for 2.84% of the total number of citances, with an average of 179 breakthrough citances per article. Of these 2117 articles, the paper Global Cancer Statistics (2011) published in 2011, had the highest number of breakthrough citances about 1891. Authors often cite facts taken from statistical reports in the introduction or background sections of their papers, and so these papers tend to accumulate a high citation count over a shorter period of time. But these statistical reports are not breakthrough studies, which shows the necessity of manual screening. In the process of manual screening. After manual screening and, again, removing reviews and the like, only 237 of the 2117 papers met the dual criteria for being classified as a breakthrough candidate. Thus, the majority of papers (88.80%) were classified as non-breakthroughs, which is consistent with our understanding that breakthrough research is quite rare.

After further filtering via the others-self evaluation process, we identified a total of 237 potential breakthrough articles—all published between 1973 and 2015 (Fig. 4). Most were published after 2000, which could be for a variety of reasons. For example, prior to 2000, digitization was rare, and so was open science. Hence, more articles published post-2000 are available for analysis. No breakthrough articles were identified after 2015, which may be related to citation lag and the time it takes to reach 1000 citances.

To further verify the effectiveness of the identification method and determine whether the identified articles did, in fact, represent breakthrough research, we extended our assessment to include the F1000 indicators and recommendation counts per category. Among the 273 candidate papers, 145 had been recommended at least once. Figure 5 shows the categories of the 273 potential breakthrough articles recommended by the F1000. Most articles were recommended as “new finding”; some were considered to be technological advances, interesting hypotheses, or the discovery of novel drug targets.

All the evaluation results prove the effectiveness of the others-self evaluation method in identifying potential breakthroughs. For example, consider the top 20 potential breakthrough articles with the largest number of breakthrough citances (Table 3): 17 articles were recommended as containing new findings, and a few were also designated in the “Novel Drug Target” and “Technical Advance” categories.

Table 3 The potential breakthrough articles with the largest number of breakthrough citances (Top20) and corresponding counts and categories recommended by F1000

Deep learning results

As identified by the others-self evaluation method, only 237 articles were candidates for breakthrough research and 1880 were non-breakthroughs. Some articles did not contain abstracts, reducing the final training dataset to only 1960 abstracts. We used the SVM, TextCNN, and the BERT algorithms to conduct preliminary training and then optimize the classification models.

Table 4 summarizes the results. From these, we see that all models have a strong ability to predict the samples coded “0”. This may be related to the large difference in the number of breakthrough versus non-breakthroughs samples (the ratio of samples coded was 1 to 0 is 1:7). However, the BERT model showed the stronger ability to identify breakthrough research and achieved a better balance at identifying positive and negative samples. The F1-score of this model was 0.79 with an accuracy of 0.89—both of which are higher than the other two models.

Table 4 A summary of classification results for the four models, SVM, TextCNN, BERT, and BERT-KeyPos-Last sentence in terms of indicators Recall, Precision, F1-score, and Accuracy

In the self-evaluation screening experiment, we made decisions based on whether the authors had made a positive evaluation of their own research by extracting “judgment sentences”. During the process, we found that, in addition to the above breakthrough cue words, there were some 2-g phrases, such as new view/insight/direction/avenue, first time/report/demonstration, etc., and some 3-g phrases, such as “open the way”, “narrows the gap”, “challenge the view”, etc., that also represented a positive evaluation. Unlike the polysemy of words, the meaning of phrases is more precise. From an analysis of the judgment sentences, 71.3% contained positive keywords and 82.87% appeared at the end of the abstract.

Given the results of the model comparison, we optimized BERT. We extracted the sentences containing positive keywords and the last sentences of abstracts to construct “judgment sentences”. The positive keywords included cue words obtained by word frequency analysis as well as 2-g and 3-g phrases obtained during the self-evaluation screening process. During the training process, we only gave “judgment sentences” as inputs to the BERT model. From a comparison of the area under precision-recall curve and F1-score of the BERT and Bert-KeyPos-Last Sentence models (as shown in Figs. 6, 7 and Table 4), the latter significantly improved the ability to identify abstracts having breakthrough evaluations (F1 = 0.89).

Error analysis

During the training experiments, the significant difference between the number of breakthroughs and non-breakthroughs caused the traditional machine learning classification algorithms SVM and TextCNN to experience difficulties in identifying abstracts with positive descriptions, while BERT—a novel and functional deep learning algorithm—performed much better. After optimization, BERT’s classification ability improved even more. Among the above four classification models, the KeyPos-Last sentence model based on the BERT algorithm has the strongest ability for breakthrough identification, and we used it to perform further error analysis.

There were 7 false negatives and 11 false positives in the identification results of the BERT-KeyPos-Last sentence model on the test dataset. The 7 false negatives (where the manual classification said they were breakthrough papers but the machine did not) concerned: identification of new roles in gut microbiota and unannotated RNAs; the discovery of a novel orthobunyavirus; a potential non-invasive molecular marker for colorectal cancer screening (MiR-92); a new strategy named Drop-seq for quickly separating cells into nanoliter-sized aqueous droplets; and mRNA transcripts. The machine judged them as non-breakthrough research, perhaps because the last sentence of the abstract could not be used as a basis for judgment or because the breakthrough meaning of keywords was not obvious. The 11 false positives (where the deep learning said they were breakthroughs but the manual classification said they were not) were primarily because the abstract contained breakthrough keywords but whose meaning was assessed as non-breakthrough during manual screening. Although the deep learning model had errors in identifying the abstracts having breakthrough evaluation, it can still quickly find many high-quality and highly-evaluated articles from a large number of articles and can be of considerable assistance in breakthrough identification.

Discussion

Breakthrough research can greatly advance scientific progress and development. If a breakthrough can be identified in its early stages, more funding can be invested and invested sooner into these research fields. It may then be that the breakthrough occurs more quickly. Researchers have made exploratory attempts at identifying breakthrough research but, so far, no method has proven particularly worthwhile. With this study, we hope we have presented a promising method for filtering breakthrough research out of academia’s vast knowledge stores. However, we openly admit that early recognition remains elusive.

In this article, we proposed an “others-self” dual evaluation method for breakthrough identification. The method combines evaluations of others’ work via citances and an author’s evaluations of their own research via claims made in abstracts. To assess the efficacy of the method, we compared our results to the F1000 insights, derived through qualitative evaluations of research. The results showed that most of the articles we identified are included in the F1000 database, recommended in one or more of the categories New Finding, Novel Drug Target and/or Technical Advance. Hence, we are confident in claiming that the others-self evaluation method can identify high-value publications. Additionally, we input the abstracts to the deep learning model in the hopes that it could automatically identify ones with positive evaluations, which simplifies the process of self evaluation by replacing manually review each article. For interest’s sake, the results showed that the BERT-KeyPos-Last sentence model has the best identification ability with an F1 score of 0.89. However, there are some process issues that need to be resolved before we can deem the method truly useful—the main being the influence of citation lag. During our process, we had to set the threshold for the number of breakthrough citances to 100 to distill the candidate articles down to a manageable number. As a result, all recent publications were eliminated from the candidate list, leaving the last article at near to six years old. More on these points is discussed alongside some other limitations of the study shortly.

The “others-self” dual evaluation process can also be regarded as a way of evaluating articles. When evaluating a paper, the authors’ own evaluation and the assessment of authors who cite an article are taken into account, which enriches traditional academic evaluation methods. In traditional academic evaluations, the number of citations is still the main index that reflects the academic value and influence of an article. But not all citations are equivalent: some are positive, and most are neutral, and some are even negative (Radev & Abu-Jbara, 2012). During the others-evaluation process, we used the number of citances containing breakthrough cue words to evaluate each article on the basis that the more breakthrough citances, the greater the academic value of the article. Although this method of evaluating articles evaluation is more accurate, it has limited usefulness for recent publications due to citation lad (see Fig. 4).

However, for recent publications or articles with a relatively small number of citations, the automatic identification model tested in this study can be used to classify abstracts and identify significant articles from the author’s own perspective for the purpose of article evaluation. It is not uncommon for breakthrough research to be overlooked or ignored for years, or for it to conflict with existing scientific paradigms and therefore be harshly critiqued by the scientific community. Over time, these “sleeping beauty” publications are awakened by a “prince” publication and so ultimately accumulate a large number of citations. This is a self-reinforcing loop where citations confer importance to the research, perpetuating more approval and more citations. However, authors often have a global overview of their research area when publishing their papers, so they are well aware of the significance of their findings and often indicate as much in their abstract. Abstract-based identification of breakthroughs is possible and provides a new avenue for evaluating recent publications or special articles.

The method proposed in this study, therefore, has practical applications in several respects. This dual evaluation method can be used to accurately identify the potential breakthrough articles in any biomedical subfield by first filtering the number of breakthrough citances and then adopting the model to identify those abstracts having positive evaluations. For recent publications or those with few citations, the approach can directly identify abstracts with positive evaluations and offer a point of departure for subsequent study and analysis.

In our research, we also attempted to explore how authors who cite breakthrough articles differ in their description and evaluation compared to those made by the actual author of the breakthrough article. It turns out that authors who employ citations often use nouns to describe references. Although the authors of breakthrough papers also used nouns to describe their own research, the proportion of verbs increased significantly, which likely reflects their involvement in the research. From the perspective of the academic community, authors who cite breakthrough papers often use “change”, “first”, “potential”, “new”, “novel”, “since”, “discovery”, and “discover” to evaluate breakthrough articles. From the perspective of the authors of breakthrough articles, they prefer to use “new”, “novel”, “potential”, “key”, “change”, “evidence”, “basis”, and “base” to describe the significance of their own research. We also explored whether these words can be used to retrieve breakthrough research. As it turns out, both Recall and Precision of retrieval based on citance cue words are much higher than that based on cue words in abstracts. Citance-based retrieval is likely more effective at identifying valuable and significant research compared with the more commonly used abstract field search because the knowledge and contribution mentioned in citances are what the peers think has an important influence on their research. Therefore, retrieval based on citances may be more helpful to find breakthrough articles.

There are also several limitations to our study. First, this method may not be suitable for recent publications or articles with few citations. Even though the threshold for the number of breakthrough citances per article is adjustable, we still used a threshold during the others-evaluation process to narrow the number of articles, and therefore potentially excluded articles with breakthrough potential. In the process of extracting breakthrough text features from abstracts and citances, we also selected breakthrough cue words from the top 500 words with the highest frequency, ignoring the words that appear less frequently but represent strong positive emotions, such as “milestones” and “landmarks” (even the term “breakthrough” was excluded due to its low frequency). Another limitation is that we regarded the citances containing breakthrough cue words as breakthrough citances during the process of “others-evaluation”. But, because of polysemy, the same word can be used in different contexts. For example, “new case”, “nucleic acid base”, etc., do not indicate breakthrough evaluations but are commonly used phrases in the biomedical field. Therefore, it is challenging to determine whether a citance containing breakthrough cue words has a breakthrough meaning, which leads to the unreliability of screening results by “others-evaluation” (although we compensated for this limitation by creating another by setting a relatively high threshold for the number of breakthrough citances).

There remain significant future opportunities for research into breakthrough identification. Most current studies on breakthroughs (including this one) remain focused on retrospective identification. However, early identification is more meaningful as it could provide useful guidance for scientific research planning and funding, as well as a new research direction for researchers. In the above experiments, we have shown the feasibility of using a text classification algorithm to identify breakthrough articles. Although it remains more practical to identify valuable articles based on whether there is a breakthrough evaluation in the abstract (because almost all abstracts are openly accessible in the literature database), identifying them through citances is more meaningful because they represent (in concentrated form) the value the academic community places on important original research. At the same time, citances have the power to signal breakthrough research early. While an article with a low citation count does not suggest it has the potential to be a breakthrough article, if its citances repeatedly show the article contains a “first finding…”, it strongly suggests that the article may have groundbreaking potential.

At present, researchers have studied the polarity of citation, dividing into positive, negative, and neutral (Abu-Jbara et al., 2013), or identifying meaningful citations from all citations (Hassan et al., 2018; Valenzuela et al., 2015). Alternatively, we suggest using text classification algorithms to divide citances into those with and without breakthrough evaluations. This method can be used to identify recent publications or articles with breakthrough potential despite their relatively small number of citations. It is worth noting that the results obtained by these identification methods are only references. Whether these studies are truly groundbreaking remains an open question, and recent examples of articles with important findings that are later retracted for academic misconduct give us pause. These issues cannot be resolved in advance, making identification of breakthrough research or major scientific discoveries more difficult.