Introduction

The President of the American Information Industry Association, Paul Zurkowski, in 1974, said, “Information literacy is a kind of skill that comprehensively uses information tools and information sources to solve problems encountered” (Zurkowski, 1974). In 1989, the American Library Association Presidential Committee noted, “To be information literate, a person must be able to recognize when information is needed and have the ability to locate, evaluate, and use effectively the needed information” (ALA, 1989). Since then, the range of skills and knowledge required for information literacy has expanded to include newer forms of literacy to accommodate the continually developing requirements for effective information handling and to become more suitable for complex and diversified information environments (Behrens, 1994). On the one hand, scholars or practitioners divide information literacy skills differently, including basic theoretical knowledge and basic application skills of information and information technology; the ability to use information technology to study, communicate, cooperate and solve problems; and information awareness and ethics (ACRL, 2000; UNESCO, 2010). On the other hand, “computer literacy”, “media literacy”, “network literacy”, “digital literacy”, “ICT”, “data literacy”, “mobile literacy”, and other concepts have emerged as extensions of the concept of information literacy and have been widely discussed (Bruce, 2000; Pinto et al., 2010, Pinto et al., 2019; Stopar & Bartol, 2019). Although the connotation and extension of information literacy are constantly being adjusted and revised, it is always defined as a series of definable and standardized capabilities that guide people to obtain, screen, evaluate and integrate useful information from rich and diverse information sources to determine a direction of action.

Other scholars and practitioners believe that information literacy is no longer a simple universal skill but a practice activity that cannot be taught independently of the knowledge domains, organizations, and practical tasks in which these skills are used (Limberg et al., 2012; Tuominen et al., 2005). In fact, information literacy in the workplace has been a concern for a long time and is considered “a model of an information seeking and using process in the workplace” (Wai‐yi, 1998). Previously, the information literacy of different roles was explored (Lloyd, 2005), but today such a research direction is relatively scarce. Fortunately, with the explosive growth of information in the field of education and health, which has posed a substantial challenge to processionals and citizens, information literacy has rapidly developed in specific work scenarios. In education, the information literacy of students (Zhu et al., 2019), teachers (Zhou et al., 2020) and education managers (Celep & Tülübaş, 2014) has been widely examined for the promotion of education reform and the realization of education innovation. In the domain of health, many health information resources published on the Internet increasingly require people to effectively locate, critically evaluate, and efficiently use them (Haruna & Hu, 2018).

Therefore, information literacy is a concept with a long developmental time, rich connotation and extension, and wide application scenarios, making it of great significance to adapt to the vigorous development of technology. Studies on information literacy topics and their changes can be informative for researchers and can help them to better understand the domains of such topics, that is, by clarifying the developmental context of information literacy, finding weak points and grasping future trends, which will contribute to the development and innovation of information literacy. Some studies have classified the topics of information literacy via qualitative systematic reviews (Sproles et al., 2013), content analyses (Wu, Li, et al., 2020), bibliometric analyses (Kolle, 2017; Park & Kim, 2011), citation analyses (Taşkin et al., 2013) and word co-occurrence analyses (Pinto, 2015; Pinto et al., 2014, b, 2019). These studies are often focused on limited scenarios, such as higher education and information literacy education. However, the analyses of hot spots and topics are often broad, and a dynamic trend analysis has yet to be conducted.

In this paper, we conducted a topic model dynamic analysis of the articles on information literacy studies in the Web of Science (WOS) core collection database from 2005 to 2019. The global topics and their popularities, topic similarities and correlations, along with the temporal evolution of local topics and the subject diffusion of local topics were analyzed and presented. In addition, future research directions were discussed. This study focused on three major research questions:

  1. 1.

    What are the research topics within information literacy?

  2. 2.

    How did the research topics evolve along the temporal dimension?

  3. 3.

    How did the research topics diffuse among the fields of subjects?

Literature review

Studies on information literacy and its changes can be informative for researchers and can enable them to better understand its domains. Researchers and practitioners usually explore the current status and trends of information literacy research in various ways.

In the early stages, information literacy standards or projects were reviewed to grasp the key points of information literacy development. Behrens (1994) analyzed the concept of information literacy by investigating some leading definitions and descriptions of information literacy and found that “educating for information literacy”, “exploring the literacy continuum” and the “role of librarians” were developmental trends. Bruce (2000) provided an overview of contemporary information research and practice and summarized various efforts to seek new directions in educational, community and workplace contexts. Marcum (2002) proposed that information literacy be refocused away from information towards learning and beyond literacy in the direction of sociotechnical fluency. At this stage, information literacy was still in its infancy, focused on the connotation and extension of the concept.

With the development of bibliometrics, scholars began to explore the development status of information literacy through the statistical analysis of existing research (Nazim & Ahmad, 2007; Panda et al., 2013; Pinto et al., 2013). In addition to the analysis of yearly statistical trends, active authors, productive institutions, and representative journals, popular research topics were identified according to keyword frequency. Pinto et al. (2010) showed that “research libraries”, “information literacy”, “information seeking”, “academic librarianship” and “literacy skills” were the most significant terms in the selected texts, illustrating how information literacy has been progressively incorporated into the library and academic fields. Aharony (2010) found three main topic types—“miscellaneous”, “health and medicine”, and “education”—which reflected a tendency to associate information literacy with health and medicine and stressed people’s need for information literacy in this specific context. Park and Kim (2011) found five major clusters calculated by the nearest neighbor cluster program, while “user training” and “students” were major descriptors in the subtopic area of information literacy, confirming that the research area has focused mostly on higher education, school libraries, and the education sector in general. Thus, information literacy research in the fields of library and information science (LIS) and education still plays a leading role and is expanding the research territory for diverse populations in communities, workplaces, and other contexts.

Next, researchers pay more attention to the topics of information literacy in specific fields or domains with effective methods, such as multidimensional scaling and network analyses. Hsieh et al. (2013) found that “information literacy”, “media literacy”, and “digital literacy” were treated significantly different between the United States and Taiwan by exploring the characteristics of theses and dissertations on information literacy from 1988 to 2010. Sproles et al. (2013) analyzed the literature on information literacy and library instruction from 2001 to 2010 and discovered that key topics continue to be collaboration, assessment, and the application of technology to instruction efforts. Pinto et al. (2014) compared the conceptual structure of information literacy between the social sciences and health sciences from 1974 to 2011. The area of health sciences yielded four clusters, and the most central descriptor was “education”, which was strongly linked to “information retrieval” and weakly linked to “information skills”, “information seeking”, and “information science”. The social sciences had six clusters in which “information literacy” and “education” had the most occurrences. Kumar (2014) examined the scientific productivity of digital literacy in Online Library Information Science and Technology Abstracts (LISTA) from 1997 to 2011 and found that majority of articles focused mainly on academic education. Pinto et al. (2014) diagnosed the scientific production of Ibero-American researchers on information literacy from 1985 to 2013. In addition to the most common term “information literacy”, “training in information competences” has gradually gained a presence in education, informatics, and communication areas. Li et al. (2015) found that “information literacy” and “information” were the most frequent keywords in the domain of health information–seeking behavior. Pinto (2015) examined the subject area of Information Literacy Assessment in Higher Education (ILAHE) in a retrospective and selective search from 2000 to 2011. Five clusters—“evaluation education”, “assessment”, “students efficacy”, “learning research”, and “library”—were identified but overlapped significantly as a result of their terminological ‘‘proximity’. Tallolli and Mulla (2016) indicated that library and information science researchers have made significant contributions to information literacy studies. Bhardwaj (2017) showed that research on information literacy in developing countries was unpopular in the humanities and social sciences. Related studies focused on the primary disciplines showed that “information literacy”, “education” and “evaluation” remained the core terms or topics.

Influenced by social development, technological updates and field integration, related studies have also paid more attention to the future direction of information literacy topics. According to Kolle (2017), “digital divide”, “media literacy”, “pedagogy”, “higher education” and “critical thinking” were popular research topics in the IL domain from 2005 to 2014. In addition, he pointed out that the assessment of IL among students in higher education appears to be a new area of IL research. Martzoukou and Sayyad Abdi (2017) categorized the key research directions into four broad contextual areas—“encompassing leisure and community activities”, “citizenship and the fulfillment of social roles”, “public health” and “critical life situations”—which pointed to the need to develop an information literacy mindset. Stopar and Bartol (2019) showed that the computer-, information-, and digital-related terms that define these clusters predominate in different periods: the computer-related terms are earlier terms, which are followed by the information- and digital-related terms, as mature terminology is used to embrace more trendy novel concepts. They suggested that all fields should collaborate in the future. Pinto et al. (2019) offered a bibliometric analysis of the scientific production on Mobile Information Literacy in Higher Education published between 2006 and 2017. They note that incorporating IL in the context of online learning highlights how mobile technologies are gaining ground at all levels. Pinto et al. (2020) further investigated the development of mobile information literacy from 2006 to 2019, and six clusters were identified—“IL and e-learning”, “mobile devices and competencies”, “ethics”, “library and e-resources”, “educational technology” and “technological environment”—demonstrating the growing interdisciplinarity of scientific publications on “mobile information literacy”. Wu et al. (Wu, Li, et al., 2020) conducted source tracing, an association analysis and feature mining of the research content of information literacy theory from five aspects—“placement”, “dialogue”, “application”, “evaluation” and “generation”—emphasizing that information literacy should be interdisciplinary, should deepen theory and should be people-oriented.

Overall, the research topics of information literacy diversified over time, and interdisciplinary integration was encouraged. However, those studies remain targeted towards limited scenarios, such as higher education and information literacy education. In addition, the analyses of hot spots and topics are often broad, and a dynamic trend analysis has yet to be conducted. Therefore, it is necessary to further retrospectively examine the topical development of information literacy from a dynamic perspective. On the other hand, those studies have classified the topics of information literacy via qualitative systematic review content, bibliometric, citation and word co-occurrence analyses. However, with the development of technology and the differences in domain characteristics and user requirements, the corresponding information literacy studies have a limited focus, which is often ignored in the analysis of high-frequency words. It is highly necessary to combine the textual content (such as titles and abstracts) to obtain a deeper classification, which will facilitate understanding of the topical development of information literacy studies. For topic detection and tracking, topic modeling is regarded as more flexible and effective than alternative approaches such as document clustering (Kuhn, 2018). A word-embedding algorithm can be used to represent words semantically based on the context to calculate topic correlations (Chen et al., 2017; Xie et al., 2020). Therefore, this study attempts to use LDA, which is a widely used topic modeling tool, to describe the topics of information literacy and to explore the evolution and diffusion of these topics in combination with the BERT word-embedding algorithm.

Data and methodology

Figure 1 presents a research overview, which illustrates the dataset acquisition and analysis methodology. The overall scheme can be divided into three subprocesses: (1) data retrieval and preprocessing, (2) topic modeling, and (3) topical evolution and diffusion. Meanwhile, the analytical process of this paper is based on the Python environment, and some open-source textual analysis algorithms and data visualization tools are used to analyze or visualize the results.

Fig. 1
figure 1

Research overview

Data retrieval and preprocessing

The nature of our study required us to identify the search terms, as there are related or similar concepts. On the one hand, the concept is not limited to a single category of information; related terms that are similar to “computer”, “network”, “media”, “ICT”, “digital”, “data” and “mobile” have emerged based on the background of technological development. On the other hand, literacy, competence and skill are often used in the context of these terms. Scientific publication data that were related to information literacy and its related concepts were gathered via a search strategy from the WOS core collection database in June 2020. Based on the above, a Boolean search equation was applied to expand the search to related concepts:

TS= (“computer competen*” OR “computer literac*” OR “computer skill*” OR “data competen*” OR “data literac*” OR “data skill*” OR “digital competen*” OR “digital literac*” OR “digital skill*” OR “ICT competen*” OR “ICT literac*” OR “ICT skill*” OR “information competen*” OR “information literac*” OR “information skill*” OR “media literac*” OR “internet literac*” OR “meta literac*” OR “mobile literac*” OR “Information Communications Technology competen*” OR “Information Communications Technology skill*”)

The selected document types included articles, books, book chapters, and proceedings papers. Based on the search results, only a few related papers were published before 2005, so we finally collected 8461 English-language documents published from 2005 to 2019. The distributions of the publication years and the main subjects are shown in Fig. 2. These distributions show the developmental trend and multidisciplinarity of this field. In 2015, the number of documents increased significantly, and the documents involved 128 subjects (WOS research areas), which included 12 main subjects that were covered by more than 200 published papers. The percentage of papers that covered these subjects was 97.70%. The researchers are mainly in the fields of education and educational research (EER) and information science and library science (ISIS) for the following reasons: (1) information literacy was developed from and has become an important integral part of ISIS, and, (2) with the rapid development of educational technology, the information literacy and other related digital competences of teachers and students have received substantial attention.

Fig. 2
figure 2

Distributions of the publication years and the main subjects

To increase the data quality, preprocessing was conducted by using the following Python package: Natural Language Toolkit (NLTK) (Bird et al., 2009). First, the title and the abstract were used as the text corpus to extract the terms. Second, numbers, punctuation symbols, and stop words were deleted. Third, all terms were converted to the singular form and the lower case and were stemmed to produce more readable words. Fourth, bigrams that appeared 20 times or more were added to documents. Fifth, we filtered out words that occurred in less than 20 documents or in more than 50% of the documents and saved these as a dictionary. Finally, we transformed the documents to a vectorized form by simply computing the frequencies of the words in the dictionary.

Topic modeling

Latent Dirichlet allocation (LDA), which was proposed by Blei et al. (2003), was used to extract topics from the corpus. LDA is a probabilistic model in which each document in a corpus is described by a random mixture over latent topics. Each of the latent topics is characterized by a distribution over words. The Gensim library (Rehurek & Sojka, 2010) was utilized for LDA modeling.

As LDA parameters, the perplexity and average topic coherence (ATC) were estimated within Gensim for the selection of the number of topics (as presented in Fig. 3). The perplexity is only a crude measure; it is helpful for getting close to a suitable number of topics in a corpus. The average topic coherence, which measures the degree of semantic similarity between high-scoring words in the topics, can help distinguish between topics that are semantically interpretable and topics that are artifacts of statistical inference (Stevens et al., 2012). After comprehensive consideration, a 9-topic model was finally selected.

Fig. 3
figure 3

Perplexity and average topic coherence for an n-topic model

LDA modeling yielded two matrices: a matrix of document probabilities, which contains the probability distribution of each document that belongs to one of the topics, as presented in Table 1, and a matrix of word probabilities in the corpus and their association with each topic, as presented in Table 2.

Table 1 Document probabilities
Table 2 Word probabilities

In addition, the pyLDAvis package (Sievert & Shirley, 2014) can be used to visualize topics. An intertopic distance map, which was obtained via multidimensional scaling, is presented on the left, and the top 30 most relevant words and their frequencies for each topic are presented on the other side.

Topical evolution and diffusion

We explored the developmental paths of information literacy topics from the temporal and subject perspectives. To investigate the temporal dynamics, the corpus was divided into five three-year time spans: 2005–2007, 2008–2010, 2011–2013, 2014–2016, and 2017–2019. The topic models were trained individually for each time span. To identify the topical relationships between the main subjects, each subject topic model was estimated. For differentiation, we refer to the 9 general topics as global topics, the time-span topics as temporal local topics, and the subject topics as subject local topics.

Global topic popularity

Global topic popularity was calculated by aggregating the per-document topic distribution by year or subject dimension. First, the probabilities of the same topic in the same dimension were summed to obtain the total probability. Next, the results were normalized by the number of publications in the dimension. Finally, we obtained the global topic popularity values for additional years or subjects using visualization tools. When the proportion is higher, the topic is more popular. In addition, a graphing tool named Cluster Purity Visualizer (Swamy, 2016) was implemented to obtain a basic distribution graph in different subjects.

Topical similarity

A topical similarity formula established by (Xie et al., 2020) was extended for calculating the multiperiod and multisubject topical similarity as follows:

$${\text{Sim}}(T_{i}^{N} ,T_{j}^{M} ) = \mathop \sum \limits_{k = 1}^{K} p_{k}^{n} \left( {T_{i,k}^{N} } \right) \times p_{k}^{n} \left( {T_{j,k}^{M} } \right) \times S_{{k\left( {N,M} \right)}} \left( {T_{i,k}^{N} ,T_{j,k}^{M} } \right)$$

where \(T_{i}^{N}\) is topic i in topic N, \(T_{i,k}^{N}\) is the topic word in topic i, \(T_{j}^{M}\) is topic j in topic M, \(T_{j,k}^{M}\) is the topic word in topic j, K is the number of topic words in a topic, and \(p_{k}^{n}\) is the normalized value that is obtained via formula \(1- \log (p_{k} )\), in which pk is the LDA probability value. Then, \(S_{{k\left( {N,M} \right)}} \left( {T_{i,k}^{N} ,T_{j,k}^{M} } \right)\) is the averaged tensor similarity between \(\left( {T_{i,k}^{N} ,T_{j,k}^{M} } \right)\), which is obtained by the embedding algorithm.

We selected BERT (Devlin et al., 2019) as the representation algorithm. The BERT algorithm is a new language representation model that stands for bidirectional encoder representations from transformers. In contrast to word2vec, BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning both the left and right contexts in all layers. The BERT base uncased with 12 layers, 768 hidden nodes, 12 heads, and 110 M parameters from the Google BERT site was used for superior training. First, sentences were separated by separator [CLS] at the front and separator [SEP] at the end of each sentence. Then, three embeddings were calculated for each sentence: token, segment and position. The input of BERT was formed by superimposing the three embeddings. Finally, we obtained tensors for various sentences. The tensor of a word may differ among contexts. Therefore, the averaged tensor was calculated based on the word-corresponding sentences in the local range for a target topic word. Figure 4 presents an example of the calculation process for word \(T_{1,1}^{G}\), which denotes the first word, namely, “student”, in global topic 1. Then, the final tensor is compared with the word “student” in global topic 9.

Fig. 4
figure 4

Calculation process for the word “student”. (Color figure online)

Documents that contained the word “student” numbered 3598, 994 of which belonged to global topic 1 and 22 of which belonged to global topic 9. In topic 1, 3670 sentences and their tensors were further extracted by using the BERT pretrained model. Each sentence was represented by a vector with 768 dimensions. Then, the average tensor was used as the final tensor for the word “student” in topic 1. The “student” final average tensor in topic 9 was calculated via the same method. The high-dimensional tensors of the “student” average values (in the red rectangle) and 20 randomly selected sentences in each topic (pink for topic 1 and blue for topic 9) were visualized by the Embedding Projector. The meaning of “student” differed among topics, and the semantic relationships among sentences were closer under the same topic. Therefore, this method can effectively identify the direct similarities of topic words.

Topical correlations

Based on all final topic similarities in the corresponding context, we examined the topic similarity average value and the products of this value and 0.8, 0.9, 1.1, 1.2, 1.5, 2, etc. To identify the correlations between topics as clearly as possible, we set two thresholds. If the similarity score between two topics exceeded the corresponding threshold, we defined the relation as topic correlation. The average value was set as the threshold for temporal local topics, while 1.2 times the average value was used for subject local topics.

Evolution of temporal local topics

We named each temporal local topic based on the temporal span and proportion ranks. For example, the topic with the largest proportions from 2005 to 2008 was named 05-11_topic_1. We calculated the similarities between the temporal local topics and their subsequent temporal local topics, and we identified the correlation pairs until all periods had been considered. Then, the evolutionary paths of all topics were multiplied to obtain the evolutionary path of information literacy topics by using the Sankey diagram as a visualization tool with Apache ECharts (Li et al., 2018).

Diffusion of subject local topics

We named each temporal local topic based on the subject and proportion ranks. For example, the topic with the largest proportions in education and educational research (EER) was named EER_topic_1. The similarities between topics in different subjects were calculated to obtain all topic correlations. Next, all topic pairs for the increasing year (cumulative to 25%) of subject local topics were multiplied to obtain the diffusion of information literacy topics by using chord diagrams in Apache ECharts.

Results

Global topics

The results for the global topics are presented in Table 3. For each topic, Table 3 lists the topic_id, the suggested topic labels, the top 5 words with the highest probabilities, and the topic proportions in the complete corpus.

Table 3 Results for the global topics

Nine topics resulted—“learning and education”, “library service”, “new digital technology”, “teacher ICT”, “health information”, “Internet use”, “medium literacy”, “evaluation”, and “computer skill”—which were suggested by two domain experts. The relationship among the topics is illustrated in Fig. 5.

Fig. 5
figure 5

Global topics with pyLDAvis

Topic 1, which has the largest share and the most related topics, focuses on student information literacy learning. It is the original objective of information literacy development and an important issue in the information or intelligence age. Topic 2, which concerns library service for information literacy, ranked second. Libraries have long adopted the important responsibility of information literacy education due to their rich information resources, advanced information technology, professional training teams, and suitable learning environments. Topic 3, which also occupies a relatively large share, considers information literacy practice from a broader perspective and focuses on the objective of keeping pace with the current technology, such as digital technology. Topics 4 and 5 consider information literacy for two groups: teachers and patients, respectively. Education and health are both important application domains of information literacy. Topics 6–9 occupy smaller shares and are independent of other topics. Topic 7 is related to medium literacy, of which the scope is information in various media, while the objective is the same as that of information literacy: digital survival. The other topics are related to information literacy measurement: topics 6 and 9 are the dimensions, while topic 8 corresponds to evaluation research.

Global topical popularity

The global topical popularity of years is presented in Fig. 6. “Learning and education” and “library service” were relatively stable and have many papers, which together constitute a share of nearly 40%. “New digital technology” and “teacher ICT” showed increasing trends, each with 10% growth, while the shares of “health information” and “computer skill” halved. The remaining topics—namely, “Internet use”, “medium literacy”, and “evaluation”—underwent generally mild trends and had relatively low shares.

Fig. 6
figure 6

Global topical popularity of years

The global topic popularities of the main subjects are presented in Fig. 7. The topic preference differs among the subjects. More topics are covered in the fields of education and educational research (EER); communication (CO); computer science (CS); health care science and services (HCSS); nursing (NU) and public, environmental and occupational health (PEOH). In EER, multiple topics show increasing trends. In CS, the popularities of the top five topics are relatively average; hence, the research content of information literacy in the CS field is wider. In each of the remaining subjects, studies focused on a single topic; for example, in information science and library science (ISIS), studies concentrated on “learning and education”, and, in psychology (PS), studies focused on “Internet use”.

Fig. 7
figure 7

Global topical popularities of the main subjects

Temporal local topics

The top five words of all temporal local subjects were statistically classified into six categories: ability, technology, field, people, place and application. The main words and frequencies of each part are listed in Table 4.

Table 4 Words in temporal local topics

First, most of the key words still focus on the embodiment of ability and technology for the field, people and place. In addition to the emergence of media and the Internet, various changes have occurred in technology. Earlier words were mainly “computer”, “online” and “web”, which were related to the main technologies that appeared at that time. In recent years, “digital”, “social” and “data” have emerged, which have posed new challenges to people's information literacy. Moreover, field, people, and place were mainly concentrated in education and health and were affected by the informationized development of these two fields. Unfortunately, there was no special bias in the application, which had not yet formed a relatively mature and rich application of the research dimension.

Evolution of information literacy topics

According to the similarities between the temporal local topics, the topical evolution from 2005 to 2019 is presented in Fig. 8. An emerging topic refers to a topic that had weaker or no correlations in the previous period, such as 11-13_topic_8. A disappearing topic is one that has weaker or no correlations in the subsequent period, such as 08-10_topic_6 and 11-13_topic_7. In addition, some topics appear only in a single period of time and subsequently disappear, such as 05-07_topic_7, 08-10_topic_8, 11-13_topic_5, 17-19_topic_4, and 17-19_topic_7; they are not shown on the evolutionary path. The topics that are discussed above mainly refer to two aspects, thereby reflecting two evolutionary mechanisms:

Fig. 8
figure 8

Evolution of temporal local topics

Transferring

Several topics contain “medium”; their similarities were slightly lower due to their differences in context. In 05-07_topic_7 (medium, literacy, medium_literacy, use, and system), medium literacy was emphasized. In 08-10_topic_8 (medium, adult, child, digital, and literacy), children and adults were emphasized. In 11-13_topic_8 (medium, child, intervention, program, and adolescent), young people's abilities were developed through intervention or projects. In addition, emerging topic 17-19_topic_7 (child, test, group, use, and reading), in which there was no mention of “medium”, focused on children’s media literacy in the smaller scope of “reading”. The focuses of these topics vary; hence, the evolutionary path shows no continuity.

Suspending

Here, topics on “health” were considered. The skills of nursing students were considered in 8-10_topic_6 (student, program, skill, health, and nursing). Both 11-13_topic_7 (patient, health, care, access, and based) and 17-19_topic_4 (health, patient, method, skill, and student) attached importance to evaluation, but consideration of these topics was suspended from 2014 to 2016. These topics have received renewed attention.

Most of the topics were in the process of continuous merging and splitting, thereby leading to the dynamic evolution of the research topic focus. Two evolutionary mechanisms were identified:

Crossing

Crossing typically occurs on topics with widely used words. These topics had a wide range of meanings, with many sources and divisions, such as 05-07_topic_1 (student, learning, literacy, information_literacy, and education), 08-10_topic_2 (medium, education, learning, practice, and student), 11-13_topic_9 (computer, use, student, technology, and learning), 14-16_topic_6 (teacher, ict, competence, education, and school), and 17-19_topic_1 (student, teacher, learning, education, and digital). These topic words were mostly core high-frequency words, which easily correlate with other topics. However, these topics and evolutionary pathways are too broad to reflect knowledge transfer.

Growing

This term refers to the relatively limited and stable mechanism of topical change in evolution. The number of related topics should be controlled within a limited range; an example is presented in Table 5, which corresponds to the dynamics of 08_10_topic_4 and 14_16_topic_4. 08_10_topic_4 was the combination of health and computer-related skills to form the content of health online search, which was divided into health and computer in subsequent studies. Compared to the words in 2005–2007, the health and computer research content changed in 2011–2013: 11-13_topic_6 placed more emphasis on technology (such as “internet” and “computer”), and 11-13_topic_9 emphasized “learning”. Then, these merged into 14-16_topic_4, which considered the use of health data. This topic was related to data and was influenced by the development of big data. However, the related topics in 2017–2019 did not contain the word “health” but had wide ranges. This finding may be due to the smaller amount of research on data analysis in the health domain. Hence, 14-16_topic_4 can be divided only into broad topics. However, in any case, these topics continue to grow as the environment changes.

Table 5 Dynamics of 08-10_topic_4 and 14-16_topic_4

In summary, crossing was the main evolutionary mechanism in the temporal local topics of information literacy. The core research topics—such as learning, library and technology—were all considered, but the focus changed according to the background of the times. However, the core topic words were relatively stable, and unfortunately, few new research directions in information literacy have been investigated in recent years.

Subject local topics

The top five words of all subject local topics were statistically classified as described above. The main words and frequencies of each part are listed in Table 6. On the left are words that appear in multiple subject local topics, and those words cover more than one-third of the subjects. On the right are words with fewer subjects; there were 16 words in three subjects, 36 words in two subjects, and 74 words in only one subject.

Table 6 Words in subject local topics

High-frequency words have also appeared in various subject local topics. However, many words still only appeared in a few subjects. For example, in terms of technology, social media was more discussed in psychology (PS) and social science (SS) regarding the use of various types of users on social media. Health information was emphasized in health care science and service (HCSS) and public, environmental and occupational health (PEOH), which refers to a special type of information. Web was also used in two subjects on influencing factors around related intervention experiments. Application had more words that appeared only a few times because each field has its own characteristic application aspects. For example, care was used in three health-related subjects (HCSS, NU, and MI), and self was related to measurements, which were more common in EER, ISIS and PS.

Diffusion of information literacy topics

The topical diffusion of subject local topics among 12 main subjects is illustrated in Fig. 9. Among the 108 subject local topics, 47 (43.52%) were related to other topics. Most of the EER, PS, and NU topics were related to other subject topics, while only one or two local topics in HCSS, BE and PEOH appeared in the diffusion diagram.

Fig. 9
figure 9

Diffusion of subject local topics

Table 7 lists and describes the nine topics that have extensive links to other topics. CS_topic_4 (student, ict, teacher, school, and learning), which has 33 links with other topics, has two core words that correspond to ICT: teachers and students. Similarly, the contents of other topics were core words in various subjects, which mainly included students’ learning, the use of a computer or network, and children’s skills, among others. Most of these topics were developed prior to 2010. The earliest was NU_topic_2 (technology, learning, student, education, and communication). This topic corresponds to student education in the field of nursing. The nursing field has focused on information literacy education for a long time due to the development of health informatization.

Table 7 Topics with extensive links to other topics

In addition, this paper examined the diffusion of topics between each pair of subjects and identified the diffusion mechanisms, as presented in Table 8.

Table 8 Diffusion mechanisms between the subject local topics

Absorbing

This term refers to the scenario in which there was only a single topic connection between two subjects and there was an order of the rising year. This action supports an absorption relationship of the topic between subjects. Based on absorbing, absorbing with division and absorbing with merger expanded the number of topics. The former refers to many topics of one subject having absorbed another subject, while the latter refers to topic absorbance by another discipline. Here, we consider the subject communication as an example, as illustrated in Fig. 10.

Fig. 10
figure 10

Diffusion mechanisms between communication local topics and other topics

Four topics in communication (CO) have diffusion relations with other subjects. First, CO_topic_5 (internet, use, child, skill, parent) corresponds to Internet usage in the home, for which the rising year was 2013. It absorbed EER_topic_6 (computer, self, ict, level, and efficacy) and NU_topic_2 (technology, learning, student, education, and communication) and was absorbed by SS_topic_7 (medium, young, child, news, and sexual). Several CO topics absorbed topics in CS, PS and BE with the diffusion mechanism of absorbing with division, namely, CS_topic_1 (computer, digital, use, ict, and student), CS_topic_4 (student, ict, teacher, school, and learning), PS_topic_3 (computer, self, efficacy, self_efficacy, and ict), PS_topic_4 (use, child, internet, online, and parent), PS_topic_5 (medium, girl, body, medium_literacy, and intervention), and BE_topic_7 (literacy, advertising, training, effect, and child). In addition, MI_topic_5 (informatics, competency, skill, readiness, and emr) and MI_topic_6 (patient, older, adult, older_adult, and intervention) were absorbed by CO_topic_5 with merging. These topics focused on children’s or students’ computer, Internet and media usage and self-efficacy, but the characteristics of these subjects differed.

Paralleling

In paralleling, the rising years of similar topics in two subjects are the same. Interlacing refers to the multiple interactions between two subjects without a temporal sequence. The diffusion mechanisms of medical informatics (MI) are presented in Fig. 11 as an example.

Fig. 11
figure 11

Diffusion mechanisms between medical informatics local topics and other topics

There was a parallel relationship between MI and HCSS, while there was an interlacement with PS. Such linkages were often influenced by other common foundations and developed in a common direction. For example, MI_topic_1, MI_topic_5 and HCSS_topic_6 were associated with NU_topic_1 (student, literacy, health, practice, and evidence) and NU_topic_2 (technology, learning, student, education, and communication). These results were from an initial study of students’ health literacy and technology learning and subsequent studies of professional health technologies. Other associations between MI and PS were more complex. In addition to absorbing and being absorbed, these topics also acted with a diffusion mechanism with other subjects in time. ISIS_topic_6 and CS_topic_4, which contained core words in topics, were easily connected with MI and PS, absorbed their relevant content and influenced it. Overall, MI was closely related to many health-related subjects and was developing in the direction of measurement based on psychological interaction.

First, based on the diffusion mechanisms between the subject local topics, NU had the most diffusion mechanisms with other subjects, which mainly involved “student”, “computer/internet”, “health/care” and “evidence_based” topic words. Only in the field of nursing did studies in NU begin to consider the evaluation of students' information literacy very early. NU has formed the research scope of health-related information literacy with MI, HCSS and PEOH. In addition, it has provided ideas for the evaluation of many subjects. Second, PS, CS and ISIS presented a state of transmission, which absorbed or had been absorbed by several subjects, namely, CS for student ICT, PS for usage of technology, and ISIS for library. These subjects were often interdisciplinary; hence, the related research presented complex diffusion mechanisms. Third, EER, CO and LI absorbed the topics of related subjects; however, they were less absorbed by other subjects due to their specificity. For example, in addition to the common topic words, EER corresponded to “education”, “teacher” and “school”; CO corresponded to “social”, “news” and “message”; and LI was associated with words such as “text” and “English”. Finally, the interaction of SS and BE was relatively weak, and their research scopes were relatively small.

Overall, there were no readily observable differences in the core topics of various subjects, but preferences were identified according to the subject characteristics. There were five diffusion mechanisms of subject local topics, among which absorbing with division and absorbing were the main mechanisms, which supported the diffusion progress of information literacy studies among subjects. NU developed earlier; PS, CS and ISIS were multidisciplinary; and EER, CO and LI showed research preferences.

Discussion

An analysis of above results reveals potential issues and directions that may require more attention and effort from scholars by focusing on the latest research on information literacy.

Focusing on the impact of information technology

The studies on information literacy presented strong “technology-related” characteristics. Three global topics were related to information technologies. The discussion on “new digital technology” was relatively more extensive and growing compared to those on “Internet use” and “computer skill”. Regarding the words in the local topics, in addition to “technology”, “computer”, “internet”, and “digital”, technology-related words often appeared, such as “medium”, “social”, “online”, and “data”. This occurrence is because information literacy is the basic literacy for citizens adapting to the information age or even the intelligent age.

Meanwhile, information technology has been developing rapidly. Figure 12 shows the search trends for the words “technology”, “computer”, “internet”, “big data”, and “artificial intelligence” in the past ten years. These trends are similar to the extension of the concept of information literacy. Bawden (2001) expanded “literacy” to include newer forms of literacy, which were more suitable for complex information environments, such as “media literacy”, “computer literacy”, “library literacy”, “network literacy”, and “internet literacy”; these types of literacy are based largely on selected skills. Furthermore, based on the background of big data, the Association of College and Research Libraries (ACRL) has used the concept of “data literacy” many times (Prado & Marzal, 2013). Pinto et al. (2019), b) adopted the concept of “mobile information literacy” for ubiquitous learning, connectivism, and multimodal learning. Thus, technology has always been an important factor in the development of information literacy.

Fig. 12
figure 12

Search trend chart of technical keywords

In many studies, technology is used only as the background for exploring the influencing factors of technology acceptance (Ketikidis et al., 2012; Scherer et al., 2019), integration (Farjon et al., 2019) or use (Hughes et al., 2020) for specified people. Fewer information literacy studies focus on the impact of information technology. Technology plays an intermediary role. The deep integration of technology and other fields can contribute to a more convenient, more intelligent and fairer social environment. According to Aljanabi & AL-Hadban (2018), information literacy can strengthen learners’ satisfaction via social networking technology use. Califf and Brooks (2020) found that literacy facilitation reduces the effects of techno-stressors.

Increasing the breadth and depth of the research field

Studies on information literacy have focused on education and educational research (EER) and information science & library science (ISIS). Among the research fields, education (such as “learning” and “education”) and health (such as “health”, “nursing”, “clinical”, and “eHealth”) were the main contents; media (such as “reading” and “news”) was also considered. These topics were included because (1) information literacy is the basic literacy that citizens need to adapt to the information age, so it is highly necessary to attach importance to its cultivation in students, and, (2) with the development of education and health informatization, it is highly important to improve the information literacy of related users in these fields. Nevertheless, the breadth and depth of the research field must be strengthened.

The research on education and health information literacy is as rich as ever, especially during the COVID-19 pandemic (Wang et al., 2020; Wu et al., 2020b). In addition, more attention has been focused on the development of information literacy in various, diverse and specific fields in recent years. Liaqat et al. (2020) proved that financial information literacy can help manage earnings, Ahmad et al. (2020) investigated the relationship between CEO information literacy and innovation in enterprises, and Chebet and Cheruiyot (2020) identified the contributions of farmers’ information literacy to productivity and profitability. The topic is of great significance to improve citizens' lives and work, promote social progress and eliminate the information gap.

In addition to enriching the content, studies on information literacy can increase the depth of research as follows. First, they focus on the role of information literacy in selected issues, such as the identification of fake news. Haggar (2020) analyzed George Orwell's diaries through an information literacy lens, which demonstrated how concerns regarding the need to evaluate information sources were represented. Second, we explore the application of information literacy in various scenarios. Pinto et al. (2020) used scale validation techniques and an exploratory factorial analysis to analyze the use of mobile technologies in the teaching–learning of information competencies. Third, we explore the influence mechanism of information literacy via new methods. Wu et al. (2020c) examined the impacts of parents on adolescents’ information literacy through a person-centered approach that utilized a latent profile analysis.

Innovation of evaluation methods based on data

The evaluation of information literacy is a highly important issue for improvement, while few studies have been conducted on “evaluation”. The share of the global topic “evaluation” is only 4.9%. From the perspective of the temporal dimension, information literacy evaluation initially focused on “self-efficacy”; gradually developed to “program”, “intervention” and experiment-related words; and has focused on “test” and “group” in recent years. From the perspective of subjects, NU emphasized the measurement of information literacy earlier, but it focused mainly on “self-efficacy”, which was absorbed by many subjects, especially for children’s use. Overall, the information literacy evaluation studies that were conducted were relatively traditional and lacked deep-seated evaluation and exploration.

In the evaluation of information literacy, typical standards and frameworks have been established (AASL&AECT, 1998; ACRL, 2000, 2015; IEA, 2013; ILFA, 2006; SCONUL, 1999, 2011; UNESCO, 2010, 2011, 2013). The current authoritative standards are used mainly for learning and education, which affirms the importance of information literacy for technology development and adaptation but shows a relative lack of situational specificity, which is consistent with the above discussion. However, scholars have expanded and constructed special information literacy evaluation indicators for various fields and objects (Niemelä et al., 2012; Zhou et al., 2020; Zhu et al., 2019), which provide satisfactory support for the precise evaluation of information literacy in different scenarios.

Regarding the methods of evaluating information literacy, most scholars use only self-designed questionnaires that consist of self-assessments with closed-ended test questions (Pinto et al., 2019). Other scholars have combined interviews (Walters et al., 2020), experiments (Ding & Ma, 2013) and other survey methods to evaluate the developmental level of information literacy. However, these methods lack enthusiasm and flexibility, and the data collection process requires substantial cooperation from users, which is time-consuming and laborious. With the development of the Internet, big data, artificial intelligence and other emerging technologies, massive and diverse process data can be recorded, which provides the possibility of data-driven evaluation. Various studies on technology behavior have used user process data; for example, Han et al. (2019) assessed the degrees and features of teachers’ online participation in BL implementation, and Walters et al. (2020) used students’ comments on a library instruction session to evaluate their information literacy. Kim et al. (2020) used online search behavior to identify differences between self-perceived eHealth literacy and performance in judging the authenticity of cancer information. These studies provide a new approach for process evaluation, but few data-driven evaluations of information literacy have been conducted using multispatial fusion data.

Conclusion

This study conducted a topic model dynamic analysis of the articles on information literacy research in the WOS core collection database from 2005 to 2019. The global topic and its popularity, topic similarity and correlation, temporal evolution of local topics, and subject diffusion of local topics were analyzed and presented.

For the first research question, nine global topics were identified by the LDA model—namely, “learning and education”, “library service”, “new digital technology”, “teacher ICT”, “health information”, “internet use”, “medium literacy”, “evaluation”, and “computer skill”—which presented the following characteristics: “learning and education” and “library service” were relatively stable and were covered by many papers. “New digital technology” and “teacher ICT” showed increasing trends, especially in EER, indicating that information literacy plays an important role in the process of education modernization. The research content of information literacy in the CS field was wider ranging. The development of technology is closely related to the requirements of information literacy, and it is the necessary result of the research and development of data-driven information literacy teaching and evaluation (Gómez-García et al., 2020). In the future, researchers in all fields should pay attention to the development of information literacy in the computer field and find a more suitable entry point through cross fusion.

For the second and third research questions, the words showed different temporal and subject characteristics in the local topics and focused on ability, technology, field, people, place and application in terms of information literacy. The temporal local topics of information literacy showed four evolutionary mechanisms: transferring, suspending, crossing and growing. Crossing, which typically occurs on topics with widely used words, was the main mechanism; hence, the core topic words remained stable with few emerging topics. For the subject local topics of information literacy, by examining the diffusion of topics between each pair of subjects, five diffusion mechanisms were identified: absorbing, absorbing with division, absorbing with merging, paralleling, and interlacing. Among them, absorbing with division and absorbing were the main mechanisms, which supported the diffusion progress of information literacy studies. Various interdisciplinary subjects—such as PS, CS and ISIS—showed more interactivity, while the subjects highlighted their research preferences. In addition, information on the assessment of information literacy and related abilities in the field of health were disseminated. However, the current research on information literacy remains insufficient, few new research topics have emerged, a cooperation mechanism between disciplines has not been formed, and successful cases with universal practice are lacking. Many scholars have advocated the multidisciplinary integration of information literacy research and application in the early stage (Aharony, 2010; Weiner, 2011). Although research topics in different subjects have crossed, integration has not truly been achieved in terms of theoretical basis, research methods, practical exploration, etc., and more in-depth exploration is needed to form a new research paradigm (Huang et al., 2020).

Furthermore, it discussed the future research direction of information literacy to provide references for educators, researchers and practitioners. First, the rapid development and diversity of technology gives us higher requirements, which means that information literacy education should be strengthened, through such measures as training professional teachers and covering more audiences. (Huang et al., 2020). In addition, the research content of the impact of technology on information literacy should receive more attention. Second, the scope of information literacy will continue to expand and develop in more fields. In addition, the occurrence of the 2019 novel coronavirus (2019-nCoV) highlights the importance and urgency of improving information literacy for large-scale and long-term online teaching (Wu et al., 2020b) and identifying false public health information (Wang et al., 2020). Third, a gradual turn to data-driven information literacy evaluation research is needed because information behavior and information literacy are correlated (Hepworth et al., 2014). A large amount of user data is recorded in the information and intelligent environment, which can be used to conduct more real, accurate, multidimensional, undisturbed and continuous evaluations.