  • On the Persistence of Persistent Identifiers of the Scholarly Web
    arXiv.cs.DL Pub Date : 2020-04-06
    Martin Klein; Lyudmila Balakireva

    Scholarly resources, just like any other resources on the web, are subject to reference rot as they frequently disappear or significantly change over time. Digital Object Identifiers (DOIs) are commonplace to persistently identify scholarly resources and have become the de facto standard for citing them. We investigate the notion of persistence of DOIs by analyzing their resolution on the web. We derive

  • Mapping Three Decades of Intellectual Change in Academia
    arXiv.cs.DL Pub Date : 2020-04-02
    Daniel Rammage; Christopher Manning; Daniel A. McFarland

    Research on the development of science has focused on the creation of multidisciplinary teams. However, while this coming together of people is symmetrical, the ideas, methods, and vocabulary of science have a directional flow. We present a statistical model of the text of dissertation abstracts from 1980 to 2010, revealing for the first time the large-scale flow of language across fields. Results

  • An alternative analysis on the scientific output of Spanish Sociology What can altmetrics tell us?
    arXiv.cs.DL Pub Date : 2020-04-03
    Daniel Torres-Salinas; Wenceslao Arroyo-Machado; Nicolás Robinson-García

    In recent years, new indicators known as altmetrics have been introduced to measure the impact of scientific activity. These indicators are obtained through the mentions realised from different social media, existing several aggregators of these data that collect several of them in the same database, being Altmetric.com the most popular. However, in spite of the popularization of these metrics, several

  • Lost or found? Discovering data needed for research
    arXiv.cs.DL Pub Date : 2019-09-01
    Kathleen Gregory; Paul Groth; Andrea Scharnhorst; Sally Wyatt

    Finding data is a necessary precursor to being able to reuse data, although relatively little large-scale empirical evidence exists about how researchers discover, make sense of and (re)use data for research. This study presents evidence from the largest known survey investigating how researchers discover and use data that they do not create themselves. We examine the data needs and discovery strategies

  • GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution
    arXiv.cs.DL Pub Date : 2020-04-01
    Supatsara Wattanakriengkrai; Bodin Chinthanet; Hideaki Hata; Raula Gaikovina Kula; Christoph Treude; Jin Guo; Kenichi Matsumoto

    Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Software implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown. This paper investigates the role of academic paper references contained in these

  • The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle
    arXiv.cs.DL Pub Date : 2020-03-31
    Xinyue Wang; Zhiwu Xie

    The WARC file format is widely used by web archives to preserve collected web content for future use. With the rapid growth of web archives and the increasing interest to reuse these archives as big data sources for statistical and analytical research, the speed to turn these data into insights becomes critical. In this paper we show that the WARC format carries significant performance penalties for

  • Ontology Extraction and Usage in the Scholarly Knowledge Domain
    arXiv.cs.DL Pub Date : 2020-03-27
    Angelo A. Salatino; Francesco Osborne; Enrico Motta

    Ontologies of research areas have been proven to be useful in many application for analysing and making sense of scholarly data. In this chapter, we present the Computer Science Ontology (CSO), which is the largest ontology of research areas in the field of Computer Science, and discuss a number of applications that build on CSO, to support high-level tasks, such as topic classification, metadata extraction

  • Persistent Identification Of Instruments
    arXiv.cs.DL Pub Date : 2020-03-29
    Markus Stocker; Louise Darroch; Rolf Krahl; Ted Habermann; Anusuriya Devaraju; Ulrich Schwardmann; Claudio D'Onofrio; Ingemar Häggström

    Instruments play an essential role in creating research data. Given the importance of instruments and associated metadata to the assessment of data quality and data reuse, globally unique, persistent and resolvable identification of instruments is crucial. The Research Data Alliance Working Group Persistent Identification of Instruments (PIDINST) developed a community-driven solution for persistent

  • Opioids for pain treatment of cancer: a knowledge maturity mapping
    arXiv.cs.DL Pub Date : 2020-03-28
    C. Aguado; A. Silva; V. M. Castano

    The conceptual structure of opioids, based on the bibliometric analysis of 4,935 articles of the Web of Science was constructed. The results were processed identifying the most cited articles to extract the main connections and frequencies of key words, authors, journals, countries, institutions, and their tendencies and their connection and degree of collaboration. The temporal tendencies, the word

  • A bibliometric analysis of research based on the Roy Adaptation Model: a contribution to Nursing
    arXiv.cs.DL Pub Date : 2020-03-29
    Paulina Hurtado-Arenas; Miguel R. Guevara

    Objective. To perform a modern bibliometric analysis of the research based on the Roy Adaptation Model, a founding nursing model proposed by Sor Callista Roy in the1970s. Method. A descriptive and longitudinal study. We used information from the two dominant scientific databases, Web Of Science and SCOPUS. We obtained 137 publications from the Core Collection of WoS, and 338 publications from SCOPUS

  • Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web
    arXiv.cs.DL Pub Date : 2020-03-29
    Daniel Garijo; María Poveda-Villalón

    With the adoption of Semantic Web technologies, an increasing number of vocabularies and ontologies have been developed in different domains, ranging from Biology to Agronomy or Geosciences. However, many of these ontologies are still difficult to find, access and understand by researchers due to a lack of documentation, URI resolving issues, versioning problems, etc. In this chapter we describe guidelines

  • Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid
    arXiv.cs.DL Pub Date : 2020-03-30
    Penny Labropoulou; Katerina Gkirtzou; Maria Gavriilidou; Miltos Deligiannis; Dimitrios Galanis; Stelios Piperidis; Georg Rehm; Maria Berger; Valérie Mapelli; Mickaël Rigault; Victoria Arranz; Khalid Choukri; Gerhard Backfried; José Manuel Gómez Pérez; Andres Garcia Silva

    The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies

  • State of Open Access penetration in universities worldwide
    arXiv.cs.DL Pub Date : 2020-03-27
    Nicolas Robinson-Garcia; Rodrigo Costas; Thed N. van Leeuwen

    The implementation of policies promoting the adoption of an Open Science culture must be accompanied by indicators that allow monitoring the penetration of such policies and their potential effects on research publishing and sharing practices. This study presents indicators of Open Access (OA) penetration at the institutional level for universities worldwide. By combining data from Web of Science,

  • Text-based Technological Signatures and Similarities: How to create them and what to do with them
    arXiv.cs.DL Pub Date : 2020-03-27
    Daniel Hain; Roman Jurowetzki; Tobias Buchmann; Patrick Wolf

    This paper describes a new approach to measure technological similarity between patents by leveraging their textual description. Using embedding techniques from natural language processing, we represent their description as a high dimensional numerical vector, thus capturing their technological signature. Deploying an almost near linear-scaling approximate nearest neighbor matching techniques, we are

  • Scientific elite revisited: Patterns of productivity, collaboration, authorship and impact
    arXiv.cs.DL Pub Date : 2020-03-27
    Jichao Li; Yian Yin; Santo Fortunato; Dashun Wang

    Throughout history, a relatively small number of individuals have made a profound and lasting impact on science and society. Despite long-standing, multi-disciplinary interests in understanding careers of elite scientists, there have been limited attempts for a quantitative, career-level analysis. Here, we leverage a comprehensive dataset we assembled, allowing us to trace the entire career histories

  • Overview of the TREC 2019 Fair Ranking Track
    arXiv.cs.DL Pub Date : 2020-03-25
    Asia J. Biega; Fernando Diaz; Michael D. Ekstrand; Sebastian Kohlmeier

    The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstracts

  • A Heterogeneous Dynamical Graph Neural Networks Approach to Quantify Scientific Impact
    arXiv.cs.DL Pub Date : 2020-03-26
    Fan Zhou; Xovee Xu; Ce Li; Goce Trajcevski; Ting Zhong; Kunpeng Zhang

    Quantifying and predicting the long-term impact of scientific writings or individual scholars has important implications for many policy decisions, such as funding proposal evaluation and identifying emerging research fields. In this work, we propose an approach based on Heterogeneous Dynamical Graph Neural Network (HDGNN) to explicitly model and predict the cumulative impact of papers and authors

  • Covid-19 Tweeting in English: Gender Differences
    arXiv.cs.DL Pub Date : 2020-03-24
    Mike Thelwall; Saheeda Thelwall

    At the start of 2020, COVID-19 became the most urgent threat to global public health. Uniquely in recent times, governments have imposed partly voluntary, partly compulsory restrictions on the population to slow the spread of the virus. In this context, public attitudes and behaviors are vitally important for reducing the death rate. Analyzing tweets about the disease may therefore give insights into

  • Which papers cited which tweets? An empirical analysis based on Scopus data
    arXiv.cs.DL Pub Date : 2020-03-25
    Robin Haunschild; Lutz Bornmann

    Many altmetric studies analyze which papers were mentioned how often in specific altmetrics sources. In order to study the potential policy relevance of tweets from another perspective, we investigate which tweets were cited in papers. If many tweets were cited in publications, this might demonstrate that tweets have substantial and useful content. Overall, a rather low number of tweets (n=5506) were

  • Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
    arXiv.cs.DL Pub Date : 2020-03-22
    Malte Ostendorff; Terry Ruas; Moritz Schubotz; Georg Rehm; Bela Gipp

    Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents

  • Embedding technique and network analysis of scientific innovations emergence in an arXiv-based concept network
    arXiv.cs.DL Pub Date : 2020-03-23
    Serhii Brodiuk; Vasyl Palchykov; Yurij Holovatch

    Novelty is an inherent part of innovations and discoveries. Such processes may be considered as an appearance of new ideas or as an emergence of atypical connections between the existing ones. The importance of such connections hints for investigation of innovations through network or graph representation in the space of ideas. In such representation, a graph node corresponds to the relevant concept

  • Interdisciplinarity metric based on the co-citation network
    arXiv.cs.DL Pub Date : 2020-03-23
    Juan María Hernández; Pablo Dorta-González

    Quantifying the interdisciplinarity of a research is a relevant problem in the evaluative bibliometrics. The concept of interdisciplinarity is ambiguous and multidimensional. Thus, different measures of interdisciplinarity have been propose in the literature. However, few studies have proposed interdisciplinary metrics without previously defining classification sets, and no one use the co-citation

  • A Corpus of Adpositional Supersenses for Mandarin Chinese
    arXiv.cs.DL Pub Date : 2020-03-18
    Siyao Peng; Yang Liu; Yilun Zhu; Austin Blodgett; Yushi Zhao; Nathan Schneider

    Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in

  • Prevalence of Potentially Predatory Publishing in Scopus on the Country Level
    arXiv.cs.DL Pub Date : 2020-03-18
    Tatiana SavinaNational Research University Higher School of Economics, Moscow, Russian Federation; Ivan SterligovNational Research University Higher School of Economics, Moscow, Russian Federation

    We present the results of a large-scale study of potentially predatory journals (PPJ) represented in the Scopus database, which is widely used for research evaluation. Both journal metrics and country, disciplinary data have been evaluated for different groups of PPJ: those listed by Jeffrey Beall and those delisted by Scopus because of "publication concerns". Our results show that even after years

  • New Research Trends in Unconventional Oil and Gas Environmental Issue: A Bibliometric Analysis
    arXiv.cs.DL Pub Date : 2020-03-13
    Dan Bi; Ju-e Guo; Shouyang Wang; Shaolong Sun

    With the booming of unconventional gas production in the world, how to balance environment pollution risk and economy of unconventional gas have become a common dilemma around the world. The aim of this study is to elucidate the research about environmental issue brought with development of unconventional oil and gas industry. To achieve this goal, we present a bibliometrics overview of this field

  • Expressiveness and machine processability of Knowledge Organization Systems (KOS): An analysis of concepts and relations
    arXiv.cs.DL Pub Date : 2020-03-11
    Manolis Peponakis; Anna Mastora; Sarantos Kapidakis; Martin Doerr

    This study considers the expressiveness (that is the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the Semantic Web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD)

  • A Catalogue of Locus Algorithm Pointings for Optimal Differential Photometry for 23,779 Quasars
    arXiv.cs.DL Pub Date : 2020-03-10
    Oisín Creaner; Kevin Nolan; David Grennan; Niall Smith; Eugene Hickey

    This paper presents a catalogue of optimised pointings for differential photometry of 23,779 quasars extracted from the Sloan Digital Sky Survey (SDSS) Catalogue and a score for each indicating the quality of the Field of View (FoV) associated with that pointing. Observation of millimagnitude variability on a timescale of minutes typically requires differential observations with reference to an ensemble

  • Cross-tier web programming for curated databases: A case study
    arXiv.cs.DL Pub Date : 2020-03-08
    Simon Fowler; Simon D. Harding; Joanna Sharman; James Cheney

    Curated databases have become important sources of information across scientific disciplines, and due to the manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation are widely regarded as important features for curated databases, but implementing such features is challenging, and small database projects often lack the resources

  • A Quantitative History of A.I. Research in the United States and China
    arXiv.cs.DL Pub Date : 2020-03-05
    Daniel Ish; Andrew Lohn; Christian Curriden

    Motivated by recent interest in the status and consequences of competition between the U.S. and China in A.I. research, we analyze 60 years of abstract data scraped from Scopus to explore and quantify trends in publications on A.I. topics from institutions affiliated with each country. We find the total volume of publications produced in both countries grows with a remarkable regularity over tens of

  • The citation advantage of linking publications to research data
    arXiv.cs.DL Pub Date : 2019-07-04
    Giovanni Colavizza; Iain Hrynaszkiewicz; Isla Staden; Kirstie Whitaker; Barbara McGillivray

    Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these statements actually contain well-formed links to data,

  • ASMD: an automatic framework for compiling multimodal datasets
    arXiv.cs.DL Pub Date : 2020-03-04
    Federico Simonetta; Stavros Ntalampiras; Federico Avanzini

    This paper describes an open-source Python framework for handling datasets for music processing tasks, built with the aim of improving the reproducibility of research projects in music computing and assessing the generalization abilities of machine learning models. The framework enables the automatic download and installation of several commonly used datasets for multimodal music processing. Specifically

  • Impact of JD Bernal Thoughts in the Science of Science upon China: Implications for Quantitative Studies of Science Today
    arXiv.cs.DL Pub Date : 2020-03-03
    Yong Zhao; Jian Du; Yishan Wu

    John Desmond Bernal (1901-1970) was one of the most eminent scientists in molecular biology, and also regarded as the founding father of the Science of Science. His book The Social Function of Science laid the theoretical foundations for the discipline. In this article, we summarize four chief characteristics of his ideas in the Science of Science: the socio-historical perspective, theoretical models

  • Textual analysis of artificial intelligence manuscripts reveals features associated with peer review outcome
    arXiv.cs.DL Pub Date : 2019-10-21
    Philippe Vincent-Lamarre; Vincent Larivière

    We analysed a dataset of scientific manuscripts that were submitted to various conferences in artificial intelligence. We performed a combination of semantic, lexical and psycholinguistic analyses of the full text of the manuscripts and compared them with the outcome of the peer review process. We found that accepted manuscripts scored lower than rejected manuscripts on two indicators of readability

  • China may need to support more small teams in scientific research
    arXiv.cs.DL Pub Date : 2020-02-29
    Linlin Liu; Jianfei Yu; Junming Huang; Feng Xia; Tao Jia

    Modern science is dominated by scientific productions from teams. Large teams have demonstrated a clear advantage over small teams in applying for research funding, performing complicated research tasks and producing research works with high impact. Recent research, however, shows that both large and small teams have their own merits. Small teams tend to expand the frontier of knowledge by creating

  • Domain-topic models with chained dimensions: charting the evolution of a major oncology conference (1995-2017)
    arXiv.cs.DL Pub Date : 2019-12-31
    Alexandre Hannud Abdo; Jean-Philippe Cointet; Pascale Bourret; Alberto Cambrosio

    This paper presents three main contributions to the computational study of science from bibliographic corpora. First, by combining hypergraphs and stochastic block models, it introduces a new approach to model corpora based on their substantive contents and integrating both temporal and other metadata dimensions. We call this simultaneous modeling of documents and words "domain-topic models", and their

  • The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources
    arXiv.cs.DL Pub Date : 2020-03-02
    Jennifer D'Souza; Anett Hoppe; Arthur Brack; Mohamad Yaser Jaradeh; Sören Auer; Ralph Ewerth

    We introduce the STEM (Science, Technology, Engineering, and Medicine) Dataset for Scientific Entity Extraction, Classification, and Resolution, version 1.0 (STEM-ECR v1.0). The STEM-ECR v1.0 dataset has been developed to provide a benchmark for the evaluation of scientific entity extraction, classification, and resolution tasks in a domain-independent fashion. It comprises abstracts in 10 STEM disciplines

  • Gender Disparities in International Research Collaboration: A Large-scale Bibliometric Study of 25,000 University Professors
    arXiv.cs.DL Pub Date : 2020-03-01
    Marek Kwiek; Wojciech Roszka

    In this research, we examine the hypothesis that gender disparities in international research collaboration differ by collaboration intensity, academic position, age, and academic discipline. The following are the major findings: (1) while female scientists exhibit a higher rate of general, national, and institutional collaboration, male scientists exhibit a higher rate of international collaboration

  • Who pays? Comparing cost sharing models for a Gold Open Access publication environment
    arXiv.cs.DL Pub Date : 2020-02-27
    Andre Bruns; Christine Rimmert; Niels Taubert

    The article focuses on possible financial effects of the transformation towards Gold Open Access publishing based on article processing charges and studies an aspect that has so far been overlooked: Do possible cost sharing models lead to the same overall expenses or do they result in different financial burdens for the research institutions involved? It takes the current state of Gold OA publishing

  • A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility
    arXiv.cs.DL Pub Date : 2020-02-06
    Nicholas J Tierney; Karthik Ram

    Data makes science possible. Sharing data improves visibility, and makes the research process transparent. This increases trust in the work, and allows for independent reproduction of results. However, a large proportion of data from published research is often only available to the original authors. Despite the obvious benefits of sharing data, and scientists' advocating for the importance of sharing

  • Impact Factor volatility to a single paper: A comprehensive analysis
    arXiv.cs.DL Pub Date : 2019-11-05
    Manolis Antonoyiannakis

    We study how a single paper affects the Impact Factor (IF) by analyzing data from 3,088,511 papers published in 11639 journals in the 2017 Journal Citation Reports of Clarivate Analytics. We find that IFs are highly volatile. For example, the top-cited paper of 381 journals caused their IF to increase by more than 0.5 points, while for 818 journals the relative increase exceeded 25%. And one in 10

  • Universality of citation distributions and its explanation
    arXiv.cs.DL Pub Date : 2020-02-24
    Michael Golosovsky

    Universality or near-universality of citation distributions was found empirically a decade ago but its theoretical justification has been lacking so far. Here, we systematically study citation distributions for different disciplines in order to characterize this putative universality and to understand it theoretically. Using our calibrated model of citation dynamics, we find microscopic explanation

  • Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning
    arXiv.cs.DL Pub Date : 2020-02-23
    Haiwen Wang; Ruijie Wang; Chuan Wen; Shuhao Li; Yuting Jia; Weinan Zhang; Xinbing Wang

    Author name ambiguity causes inadequacy and inconvenience in academic information retrieval, which raises the necessity of author name disambiguation (AND). Existing AND methods can be divided into two categories: the models focusing on content information to distinguish whether two papers are written by the same author, the models focusing on relation information to represent information as edges

  • MODMA dataset: a Multi-model Open Dataset for Mental-disorder Analysis
    arXiv.cs.DL Pub Date : 2020-02-20
    Hanshu Cai; Yiwen Gao; Shuting Sun; Na Li; Fuze Tian; Han Xiao; Jianxiu Li; Zhengwu Yang; Xiaowei Li; Qinglin Zhao; Zhenyu Liu; Zhijun Yao; Minqiang Yang; Hong Peng; Jing Zhu; Xiaowei Zhang; Xiping Hu; Bin Hu

    According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important

  • Dr. Strangelove or: how I learned to stop worrying and love the citations
    arXiv.cs.DL Pub Date : 2020-02-21
    Alberto Saracco

    Citations are getting more and more important in the career of a researcher. But how to use them in the best possible way? This is a satirical paper, showing a bad trend currently happening in citation trends, due to intensive use of citation metrics. I am putting this on the arXiv and on Researchgate. Should you be interested to publish this paper on a journal of which you are editor, let me know

  • The practice of self-citations: a longitudinal study
    arXiv.cs.DL Pub Date : 2019-03-14
    Silvio Peroni; Paolo Ciancarini; Aldo Gangemi; Andrea Giovanni Nuzzolese; Francesco Poggi; Valentina Presutti

    In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific maturity of researchers and in Italy, as in many other

  • Are nationally oriented journals indexed in Scopus becoming more international? The effect of publication language and access modality
    arXiv.cs.DL Pub Date : 2020-02-18
    Henk F. Moed; Felix de Moya-Anegon; Vicente Guerrero-Bote; Carmen Lopez-Illescas

    An exploratory, descriptive analysis is presented of the national orientation of scientific, scholarly journals as reflected in the affiliations of publishing or citing authors. It calculates for journals covered in Scopus an Index of National Orientation (INO), and analyses the distribution of INO values across disciplines and countries, and the correlation between INO values and journal impact factors

  • HybridCite: A Hybrid Model for Context-Aware Citation Recommendation
    arXiv.cs.DL Pub Date : 2020-02-15
    Michael Färber; Ashwath Sampath

    Citation recommendation systems aim to recommend citations for either a complete paper or a small portion of text called a citation context. The process of recommending citations for citation contexts is called local citation recommendation and is the focus of this paper. In this paper, firstly, we develop citation recommendation approaches based on embeddings, topic modeling, and information retrieval

  • Knowledge and Social Relatedness Shape Research Portfolio Diversification
    arXiv.cs.DL Pub Date : 2020-02-15
    Giorgio Tripodi; Francesca Chiaromonte; Fabrizio Lillo

    Scientific discovery is shaped by scientists' choices and thus by their career patterns. The increasing knowledge required to work at the frontier of science makes it harder for an individual to embark on unexplored paths. Yet collaborations can reduce learning costs -- albeit at the expense of increased coordination costs. In this article, we use data on the publication histories of a very large sample

  • Citation Recommendation: Approaches and Datasets
    arXiv.cs.DL Pub Date : 2020-02-17
    Michael Färber; Adam Jatowt

    Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets

  • menoci: Lightweight Extensible Web Portal enabling FAIR Data Management for Biomedical Research Projects
    arXiv.cs.DL Pub Date : 2020-02-07
    Markus Suhr; Christoph Lehmann; Christian Robert Bauer; Theresa Bender; Cornelius Knopp; Luca Freckmann; Björn Öst Hansen; Christian Henke; Georg Aschenbrandt; Lea Kühlborn; Sophia Rheinländer; Linus Weber; Bartlomiej Marzec; Marcel Hellkamp; Philipp Wieder; Harald Kusch; Ulrich Sax; Sara Yasemin Nussbeck

    Background: Biomedical research projects deal with data management requirements from multiple sources like funding agencies' guidelines, publisher policies, discipline best practices, and their own users' needs. We describe functional and quality requirements based on many years of experience implementing data management for the CRC 1002 and CRC 1190. A fully equipped data management software should

  • Unveiling the research landscape of Sustainable Development Goals and their inclusion in Higher Education Institutions and Research Centers: major trends in 2000-2017
    arXiv.cs.DL Pub Date : 2020-02-12
    Nuria Bautista-Puig; Ana Marta Aleixo; Susana Leal; Ulisses Azeiteiro; Rodrigo Costas

    Sustainable Development Goals are the blueprint to achieve a better and more sustainable future for society. Its legacy is linked with the Millennium Development Goals, set up in 2000. A bibliometric analysis was conducted to 1) measure "core" research output from 2000-2017, with the aim to map the global research of sustainability goals, 2) describe thematic specialization based on keywords co-occurrence

  • Testing of Support Tools for Plagiarism Detection
    arXiv.cs.DL Pub Date : 2020-02-11
    Tomáš Foltýnek; Dita Dlabolová; Alla Anohina-Naumeca; Salim Razı; Július Kravjar; Laima Kamzola; Jean Guerrero-Dib; Özgür Çelik; Debora Weber-Wulff

    There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may

  • Science through Wikipedia: A novel representation of open knowledge through co-citation networks
    arXiv.cs.DL Pub Date : 2020-02-11
    Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Enrique Herrera-Viedma; Esteban Romero-Frías

    This study provides an overview of science from the Wikipedia perspective. A methodology has been established for the analysis of how Wikipedia editors regard science through their references to scientific papers. The method of co-citation has been adapted to this context in order to generate Pathfinder networks (PFNET) that highlight the most relevant scientific journals and categories, and their

  • MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream
    arXiv.cs.DL Pub Date : 2018-07-16
    Vladimir V. Arlazarov; Konstantin Bulatov; Timofey Chernov; Vladimir L. Arlazarov

    A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets

  • Chemistry research in India in a global perspective- A scientometrics profile
    arXiv.cs.DL Pub Date : 2020-02-08
    Muthu Madhan; Subbiah Gunasekaran; Rani M T; Subbiah Arunachalam; T A Abinandanan

    Papers from India are cited 14.68 times on average compared to cites per paper of 45.34 for Singapore, 30.47 for USA, 23.12 for China, 26.51 for the UK, 21.77 for South Korea and 24.77 for Germany. Less than 39% of papers from India are found in quartile 1 (high impact factor) journals, compared to 53.6% for China and 53.8% for South Korea. Percent share of papers in quartile 1 journals from India

  • A tale of two databases: The use of Web of Science and Scopus in academic papers
    arXiv.cs.DL Pub Date : 2020-02-07
    Junwen Zhu; Weishu Liu

    Web of Science and Scopus are two world-leading and competing citation databases. By using the Science Citation Index Expanded and Social Sciences Citation Index, this paper conducts a comparative, dynamic, and empirical study focusing on the use of Web of Science (WoS) and Scopus in academic papers published during 2004 and 2018. This brief communication reveals that although both Web of Science and

  • Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations
    arXiv.cs.DL Pub Date : 2020-02-07
    Andre Greiner-Petter; Moritz Schubotz; Fabian Mueller; Corinna Breitinger; Howard S. Cohl; Akiko Aizawa; Bela Gipp

    Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv

  • Experiments with Different Indexing Techniques for Text Retrieval tasks on Gujarati Language using Bag of Words Approach
    arXiv.cs.DL Pub Date : 2020-02-05
    Dr. Jyoti Pareek; Hardik Joshi; Krunal Chauhan; Rushikesh Patel

    This paper presents results of various experiments carried out to improve text retrieval of gujarati text documents. Text retrieval involves searching and ranking of text documents for a given set of query terms. We have tested various retrieval models that uses bag-of-words approach. Bag-of-words approach is a traditional approach that is being used till date where the text document is represented

  • The role of geographic proximity in knowledge diffusion, measured by citations to scientific literature
    arXiv.cs.DL Pub Date : 2020-02-03
    Giovanni Abramo; Ciriaco Andrea D'Angelo; Flavia Di Costa

    This paper analyses the influence of geographic distance on knowledge flows, measured through citations to scientific publications. Previous works using the same approach are limited to single disciplines. In this study, we analyse the Italian scientific production in all disciplines matured in the period 2010-2012. To calculate the geographic distances between citing and cited publications, each one

  • Keeping out the Masses: Understanding the Popularity and Implications of Internet Paywalls
    arXiv.cs.DL Pub Date : 2019-02-18
    Panagiotis Papadopoulos; Peter Snyder; Benjamin Livshits

    Funding the production of quality online content is a pressing problem for content producers. The most common funding method, online advertising, is rife with well-known performance and privacy harms, and an intractable subject-agent conflict: many users do not want to see advertisements, depriving the site of needed funding. Because of these negative aspects of advertisement-based funding, paywalls

