• arXiv.cs.IR Pub Date : 2020-04-07
Léo BouscarratQARMA, TALEP; Antoine BonnefoyLIF, QARMA; Cécile CapponiLIF, QARMA; Carlos RamischTALEP

Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English

• arXiv.cs.IR Pub Date : 2020-04-06
Yumo Xu; Mirella Lapata

We consider the problem of better modeling query-cluster interactions to facilitate query focused multi-document summarization (QFS). Due to the lack of training data, existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments. In this work, we leverage distant supervision from question answering where various resources are available to more

• arXiv.cs.IR Pub Date : 2020-04-06
Iztok Fister Jr.; Karin Fister; Iztok Fister

A COVID-19 pandemic has already proven itself to be a global challenge. It proves how vulnerable humanity can be. It has also mobilized researchers from different sciences and different countries in the search for a way to fight this potentially fatal disease. In line with this, our study analyses the abstracts of papers related to COVID-19 and coronavirus-related-research using association rule text

• arXiv.cs.IR Pub Date : 2020-04-01
Francesco Sovrano; Monica Palmirani; Fabio Vitali

The main goal of this research is to produce a useful software for United Nations (UN), that could help to speed up the process of qualifying the UN documents following the Sustainable Development Goals (SDGs) in order to monitor the progresses at the world level to fight poverty, discrimination, climate changes. In fact human labeling of UN documents would be a daunting task given the size of the

• arXiv.cs.IR Pub Date : 2020-04-03
Lukas Stankevičius; Mantas Lukoševičius

A recent introduction of Transformer deep learning architecture made breakthroughs in various natural language processing tasks. However, non-English languages could not leverage such new opportunities with the English text pre-trained models. This changed with research focusing on multilingual models, where less-spoken languages are the main beneficiaries. We compare pre-trained multilingual BERT

• arXiv.cs.IR Pub Date : 2019-09-19
Arindam Mitra; Pratyay Banerjee; Kuntal Kumar Pal; Swaroop Mishra; Chitta Baral

Recently several datasets have been proposed to encourage research in Question Answering domains where commonsense knowledge is expected to play an important role. Recent language models such as ROBERTA, BERT and GPT that have been pre-trained on Wikipedia articles and books have shown reasonable performance with little fine-tuning on several such Multiple Choice Question-Answering (MCQ) datasets.

• arXiv.cs.IR Pub Date : 2019-11-10
Pratyay Banerjee; Kuntal Kumar Pal; Murthy Devarakonda; Chitta Baral

In this work, we formulate the NER task as a multi-answer question answering (MAQA) task and provide different knowledge contexts, such as entity types, questions, definitions and definitions with examples. This formulation (a) enables systems to jointly learn from varied NER datasets, enabling systems to learn more NER specific features, (b) can use knowledge-text attention to identify words having

• arXiv.cs.IR Pub Date : 2019-11-08
Andrew O. Arnold; William W. Cohen

Perhaps the simplest type of multilingual transfer learning is instance-based transfer learning, in which data from the target language and the auxiliary languages are pooled, and a single model is learned from the pooled data. It is not immediately obvious when instance-based transfer learning will improve performance in this multilingual setting: for instance, a plausible conjecture is this kind

• arXiv.cs.IR Pub Date : 2020-04-03
Bo Peng; Zhiyun Ren; Srinivasan Parthasarathy; Xia Ning

Next-basket recommendation considers the problem of recommending a set of items into the next basket that users will purchase as a whole. In this paper, we develop a new mixed model with preferences and hybrid transitions for the next-basket recommendation problem. This method explicitly models three important factors: 1) users' general preferences; 2) transition patterns among items and 3) transition

• arXiv.cs.IR Pub Date : 2020-04-01
Dietmar Jannach; Ahtsham Manzoor; Wanling Cai; Li Chen

Recommender systems are software applications that help users to find items of interest in situations of information overload. Current research often assumes a one-shot interaction paradigm, where the users' preferences are estimated based on past observed behavior and where the presentation of a ranked list of suggestions is the main, one-directional form of user interaction. Conversational recommender

• arXiv.cs.IR Pub Date : 2020-04-01
Erik Quintanilla; Yogesh Rawat; Andrey Sakryukin; Mubarak Shah; Mohan Kankanhalli

We have recently seen great progress in image classification due to the success of deep convolutional neural networks and the availability of large-scale datasets. Most of the existing work focuses on single-label image classification. However, there are usually multiple tags associated with an image. The existing works on multi-label classification are mainly based on lab curated labels. Humans assign

• arXiv.cs.IR Pub Date : 2019-08-28
Michael P. J. Camilleri; Adrian Muscat; Victor Buttigieg; Maria Attard

In this paper we describe $Vja\dot{g}\dot{g}$, a battery-aware journey detection algorithm that executes on the mobile device. The algorithm can be embedded in the client app of the transport service provider or in a general purpose mobility data collector. The thick client setup allows the customer/participant to select which journeys are transferred to the server, keeping customers in control of

• arXiv.cs.IR Pub Date : 2016-12-14
Yike Liu; Tara Safavi; Abhilash Dighe; Danai Koutra

While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing

• arXiv.cs.IR Pub Date : 2020-03-31
Ramya Tekumalla; Juan M. Banda

There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minimal

• arXiv.cs.IR Pub Date : 2020-03-28
Tooba Aamir; Hai Dong; Athman Bouguettaya

The extensive use of social media platforms and overwhelming amounts of imagery data creates unique opportunities for sensing, gathering and sharing information about events. One of its potential applications is to leverage crowdsourced social media images to create a tapestry scene for scene analysis of designated locations and time intervals. The existing attempts however ignore the temporal-semantic

• arXiv.cs.IR Pub Date : 2020-03-31
Ramya Tekumalla; Juan M. Banda

With the increase in popularity of deep learning models for natural language processing (NLP) tasks, in the field of Pharmacovigilance, more specifically for the identification of Adverse Drug Reactions (ADRs), there is an inherent need for large-scale social-media datasets aimed at such tasks. With most researchers allocating large amounts of time to crawl Twitter or buying expensive pre-curated datasets

• arXiv.cs.IR Pub Date : 2020-03-31
Aaron Feng; Shuwei Chen; Yuliang Li; Hiroshi Matsuda; Hidekazu Tamaki; Wang-Chiew Tan

Existing e-commerce search engines typically support search only over objective attributes, such as price and locations, leaving the more desirable subjective attributes, such as romantic vibe and worklife balance unsearchable. We found that this is also the case for Recruit Group, which operates a wide range of online booking and search services, including jobs, travel, housing, bridal, dining, beauty

• arXiv.cs.IR Pub Date : 2020-03-31
Björn Friedrich; Jürgen Bauer; Andreas Hein

Proper nutrition is very important for the well-being and independence of elderly people. A significant loss of body weight or a decrease of the Body Mass Index respectively is an indicator for malnutrition. A continuous monitoring of the BMI enables doctors and nutritionists to intervene on impending malnutrition. However, continuous monitoring of the BMI by professionals is not applicable and self-monitoring

• arXiv.cs.IR Pub Date : 2020-03-27
Angelo A. Salatino; Francesco Osborne; Enrico Motta

Ontologies of research areas have been proven to be useful in many application for analysing and making sense of scholarly data. In this chapter, we present the Computer Science Ontology (CSO), which is the largest ontology of research areas in the field of Computer Science, and discuss a number of applications that build on CSO, to support high-level tasks, such as topic classification, metadata extraction

• arXiv.cs.IR Pub Date : 2020-03-28
AJ Venkatakrishnan; Arjun Puranik; Akash Anand; David Zemmour; Xiang Yao; Xiaoying Wu; Ramakrishna Chilaka; Dariusz K. Murakowski; Kristopher Standish; Bharathwaj Raghunathan; Tyler Wagner; Enrique Garcia-Rivera; Hugo Solomon; Abhinav Garg; Rakesh Barve; Anuli Anyanwu-Ofili; Najat Khan; Venky Soundararajan

The COVID-19 pandemic demands assimilation of all available biomedical knowledge to decode its mechanisms of pathogenicity and transmission. Despite the recent renaissance in unsupervised neural networks for decoding unstructured natural languages, a platform for the real-time synthesis of the exponentially growing biomedical literature and its comprehensive triangulation with deep omic insights is

• arXiv.cs.IR Pub Date : 2020-03-30
Xusheng Luo; Luxin Liu; Yonghua Yang; Le Bo; Yuanpeng Cao; Jinhang Wu; Qiang Li; Keping Yang; Kenny Q. Zhu

One of the ultimate goals of e-commerce platforms is to satisfy various shopping needs for their customers. Much efforts are devoted to creating taxonomies or ontologies in e-commerce towards this goal. However, user needs in e-commerce are still not well defined, and none of the existing ontologies has the enough depth and breadth for universal user needs understanding. The semantic gap in-between

• arXiv.cs.IR Pub Date : 2020-03-29
Elena Leitner; Georg Rehm; Julián Moreno-Schneider

We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal

• arXiv.cs.IR Pub Date : 2020-03-30
Tomislav Duricic; Hussain Hussain; Emanuel Lacic; Dominik Kowald; Denis Helic; Elisabeth Lex

In this work, we study the utility of graph embeddings to generate latent user representations for trust-based collaborative filtering. In a cold-start setting, on three publicly available datasets, we evaluate approaches from four method families: (i) factorization-based, (ii) random walk-based, (iii) deep learning-based, and (iv) the Large-scale Information Network Embedding (LINE) approach. We find

• arXiv.cs.IR Pub Date : 2020-03-30
Noemi Mauro; Liliana Ardissono

Collaborative Filtering is largely applied to personalize item recommendation but its performance is affected by the sparsity of rating data. In order to address this issue, recent systems have been developed to improve recommendation by extracting latent factors from the rating matrices, or by exploiting trust relations established among users in social networks. In this work, we are interested in

• arXiv.cs.IR Pub Date : 2020-03-30
Noemi Mauro; Liliana Ardissono; Adriano Savoca

Textual queries are largely employed in information retrieval to let users specify search goals in a natural way. However, differences in user and system terminologies can challenge the identification of the user's information needs, and thus the generation of relevant results. We argue that the explicit management of ontological knowledge, and of the meaning of concepts (by integrating linguistic

• arXiv.cs.IR Pub Date : 2020-03-27
Xuan Wang; Xiangchen Song; Yingjun Guan; Bangzheng Li; Jiawei Han

We created this CORD-19-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020- 03-13). This CORD-19-NER dataset covers 74 fine-grained named entity types. It is automatically generated by combining the annotation results from four sources: (1) pre-trained NER model on 18 general entity types from Spacy, (2) pre-trained NER

• arXiv.cs.IR Pub Date : 2020-03-27
Alexander Schindler; Sergiu Gordea; Peter Knees

We present an approach to unsupervised audio representation learning. Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness. By applying Latent Semantic Indexing (LSI) we embed corresponding textual information into a latent vector space from which we derive track relatedness for online triplet selection. This LSI

• arXiv.cs.IR Pub Date : 2020-03-25
Himan Abdollahpouri; Robin Burke; Masoud Mansoury

Fairness in machine learning has been studied by many researchers. In particular, fairness in recommender systems has been investigated to ensure the recommendations meet certain criteria with respect to certain sensitive features such as race, gender etc. However, often recommender systems are multi-stakeholder environments in which the fairness towards all stakeholders should be taken care of. It

• arXiv.cs.IR Pub Date : 2020-03-25
Asia J. Biega; Fernando Diaz; Michael D. Ekstrand; Sebastian Kohlmeier

The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstracts

• arXiv.cs.IR Pub Date : 2020-03-26
Julian Risch; Ralf Krestel

Comment sections below online news articles enjoy growing popularity among readers. However, the overwhelming number of comments makes it infeasible for the average news consumer to read all of them and hinders engaging discussions. Most platforms display comments in chronological order, which neglects that some of them are more relevant to users and are better conversation starters. In this paper

• arXiv.cs.IR Pub Date : 2020-03-25
Mikael Sørensen; Charilaos I. Kanatsoulis; Nicholas D. Sidiropoulos

Generalized Canonical Correlation Analysis (GCCA) is an important tool that finds numerous applications in data mining, machine learning, and artificial intelligence. It aims at finding `common' random variables that are strongly correlated across multiple feature representations (views) of the same set of entities. CCA and to a lesser extent GCCA have been studied from the statistical and algorithmic

• arXiv.cs.IR Pub Date : 2020-03-25
Bin Liu; Chenxu Zhu; Guilin Li; Weinan Zhang; Jincai Lai; Ruiming Tang; Xiuqiang He; Zhenguo Li; Yong Yu

Learning effective feature interactions is crucial for click-through rate (CTR) prediction tasks in recommender systems. In most of the existing deep learning models, feature interactions are either manually designed or simply enumerated. However, enumerating all feature interactions brings large memory and computation cost. Even worse, useless interactions may introduce unnecessary noise and complicate

• arXiv.cs.IR Pub Date : 2020-03-25
Noemi Mauro; Liliana Ardissono

Exploratory information search can challenge users in the formulation of efficacious search queries. Moreover, complex information spaces, such as those managed by Geographical Information Systems, can disorient people, making it difficult to find relevant data. In order to address these issues, we developed a session-based suggestion model that proposes concepts as a "you might also be interested

• arXiv.cs.IR Pub Date : 2020-03-25
Noemi Mauro; Liliana Ardissono; Zhongli Filippo Hu

Many collaborative recommender systems leverage social correlation theories to improve suggestion performance. However, they focus on explicit relations between users and they leave out other types of information that can contribute to determine users' global reputation; e.g., public recognition of reviewers' quality. We are interested in understanding if and when these additional types of feedback

• arXiv.cs.IR Pub Date : 2020-03-23
Kunwoo Park; Taegyun Kim; Seunghyun Yoon; Meeyoung Cha; Kyomin Jung

In digital environments where substantial amounts of information are shared online, news headlines play essential roles in the selection and diffusion of news articles. Some news articles attract audience attention by showing exaggerated or misleading headlines. This study addresses the \textit{headline incongruity} problem, in which a news headline makes claims that are either unrelated or opposite

• arXiv.cs.IR Pub Date : 2020-03-21
Siqi Wu; Marian-Andrei Rizoiu; Lexing Xie

A comprehensive understanding of data bias is the cornerstone of mitigating biases in social media research. This paper presents in-depth measurements of the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing two complete tweet streams, we show that Twitter rate limit message is an accurate measure for the volume of

• arXiv.cs.IR Pub Date : 2020-03-21
Tao Qi; Fangzhao Wu; Chuhan Wu; Yongfeng Huang; Xing Xie

News recommendation aims to display news articles to users based on their personal interest. Existing news recommendation methods rely on centralized storage of user behavior data for model training, which may lead to privacy concerns and risks due to the privacy-sensitive nature of user behaviors. In this paper, we propose a privacy-preserving method for news recommendation model training based on

• arXiv.cs.IR Pub Date : 2020-03-22
Malte Ostendorff; Terry Ruas; Moritz Schubotz; Georg Rehm; Bela Gipp

Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents

• arXiv.cs.IR Pub Date : 2020-03-22
Yao Qiang; Xin Li; Dongxiao Zhu

Existing aspect based sentiment analysis (ABSA) approaches leverage various neural network models to extract the aspect sentiments via learning aspect-specific feature representations. However, these approaches heavily rely on manual tagging of user reviews according to the predefined aspects as the input, a laborious and time-consuming process. Moreover, the underlying methods do not explain how and

• arXiv.cs.IR Pub Date : 2020-03-22
Alexander C. Nwala; Michele C. Weigle; Michael L. Nelson

We investigate the overlap of topics of online news articles from a variety of sources. To do this, we provide a platform for studying the news by measuring this overlap and scoring news stories according to the degree of attention in near-real time. This can enable multiple studies, including identifying topics that receive the most attention from news organizations and identifying slow news days

• arXiv.cs.IR Pub Date : 2020-03-23
Yang Liu; Liang Chen; Xiangnan He; Jiaying Peng; Zibin Zheng; Jie Tang

The prevalence of online social network makes it compulsory to study how social relations affect user choice. However, most existing methods leverage only first-order social relations, that is, the direct neighbors that are connected to the target user. The high-order social relations, e.g., the friends of friends, which very informative to reveal user preference, have been largely ignored. In this

• arXiv.cs.IR Pub Date : 2020-03-23
Caleb Belth; Xinyi Zheng; Jilles Vreeken; Danai Koutra

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work

• arXiv.cs.IR Pub Date : 2020-03-23
Venkatesh S. Kadandale; Juan F. Montesinos; Gloria Haro; Emilia Gómez

A fairly straightforward approach for music source separation is to train independent models, wherein each model is dedicated for estimating only a specific source. Training a single model to estimate multiple sources generally does not perform as well as the independent dedicated models. However, Conditioned U-Net (C-U-Net) uses a control mechanism to train a single model for multi-source separation

• arXiv.cs.IR Pub Date : 2020-03-23
Eric Müller-Budack; Jonas Theiner; Sebastian Diering; Maximilian Idahl; Ralph Ewerth

The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. Photo content can range from decorative, depict additional important information, or can even contain misleading information. Therefore, automatic approaches to quantify cross-modal

• arXiv.cs.IR Pub Date : 2020-03-19
Parag Agrawal; Tulasi Menon; Aya Kamel; Michel Naim; Chaikesh Chouragade; Gurvinder Singh; Rohan Kulkarni; Anshuman Suri; Sahithi Katakam; Vineet Pratik; Prakul Bansal; Simerpreet Kaur; Neha Rajput; Anand Duggal; Achraf Chalabi; Prashant Choudhari; Reddy Satti; Niranjan Nayak

Having a bot for seamless conversations is a much-desired feature that products and services today seek for their websites and mobile apps. These bots help reduce traffic received by human support significantly by handling frequent and directly answerable known questions. Many such services have huge reference documents such as FAQ pages, which makes it hard for users to browse through this data. A

• arXiv.cs.IR Pub Date : 2020-03-10
Shuo Jiang; Jianxi Luo; Guillermo Ruiz Pava; Jie Hu; Christopher L. Magee

The patent database is often used in searches of inspirational stimuli for innovative design opportunities because of its large size, extensive variety and rich design information in patent documents. However, most patent mining research only focuses on textual information and ignores visual information. Herein, we propose a convolutional neural network (CNN)- based patent image retrieval method. The

• arXiv.cs.IR Pub Date : 2020-03-01
David Pickup; Xianfang Sun; Paul L Rosin; Ralph R Martin; Z Cheng; Zhouhui Lian; Masaki Aono; A Ben Hamza; A Bronstein; M Bronstein; S Bu; Umberto Castellani; S Cheng; Valeria Garro; Andrea Giachetti; Afzal Godil; Luca Isaia; J Han; Henry Johan; L Lai; Bo Li; C Li; Haisheng Li; Roee Litman; X Liu; Z Liu; Yijuan Lu; L Sun; G Tam; Atsushi Tatsuma; J Ye

3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new models

• arXiv.cs.IR Pub Date : 2020-02-26
Nitish Nag; Bindu Rajanna; Ramesh Jain

With the exponential growth in the usage of social media to share live updates about life, taking pictures has become an unavoidable phenomenon. Individuals unknowingly create a unique knowledge base with these images. The food images, in particular, are of interest as they contain a plethora of information. From the image metadata and using computer vision tools, we can extract distinct insights for

• arXiv.cs.IR Pub Date : 2020-03-18
Yuan Shen; Shanduojiao Jiang; Muhammad Rizky Wellyanto; Ranjitha Kumar

When people talk about fashion, they care about the underlying meaning of fashion concepts,e.g., style.For example, people ask questions like what features make this dress smart.However, the product descriptions in today fashion websites are full of domain specific and low level words. It is not clear to people how exactly those low level descriptions can contribute to a style or any high level fashion

• arXiv.cs.IR Pub Date : 2020-03-18
Jimmy Lin; Joel Mackenzie; Chris Kamphuis; Craig Macdonald; Antonio Mallia; Michał Siedlaczek; Andrew Trotman; Arjen de Vries

There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that

• arXiv.cs.IR Pub Date : 2020-03-18
Islam Elnabarawy; Wei Jiang; Donald C. Wunsch II

Collaborative filtering recommendation systems provide recommendations to users based on their own past preferences, as well as those of other users who share similar interests. The use of recommendation systems has grown widely in recent years, helping people choose which movies to watch, books to read, and items to buy. However, users are often concerned about their privacy when using such systems

• arXiv.cs.IR Pub Date : 2018-12-21
Jonathan Dumas; Bertrand Cornélusse

The key contribution of this paper is to propose a classification into two dimensions of the load forecasting studies to decide which forecasting tools to use in which case. This classification aims to provide a synthetic view of the relevant forecasting techniques and methodologies by forecasting problem. In addition, the key principles of the main techniques and methodologies used are summarized

• arXiv.cs.IR Pub Date : 2019-09-05
Xavier BostLIA; Serigne GueyeLIA; Vincent LabatutLIA; Martha LarsonDMIR; Georges LinarèsLIA; Damien MalinasCNELIAS; Raphaël RothCNELIAS

Today's popular TV series tend to develop continuous, complex plots spanning several seasons, but are often viewed in controlled and discontinuous conditions. Consequently, most viewers need to be re-immersed in the story before watching a new season. Although discussions with friends and family can help, we observe that most viewers make extensive use of summaries to re-engage with the plot. Automatic

• arXiv.cs.IR Pub Date : 2019-11-22
Meng Chen; Ruixue Liu; Lei Shen; Shaozu Yuan; Jingyan Zhou; Youzheng Wu; Xiaodong He; Bowen Zhou

Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn

• arXiv.cs.IR Pub Date : 2020-03-12

We are living in the data age. Communications over scientific networks creates new opportunities for researchers who aim to discover the hidden pattern in these huge repositories. This study utilizes network science to create collaboration network of Iranian Scientific Institutions. A modularity-based approach applied to find network communities. To reach a big picture of science production flow, analysis

• arXiv.cs.IR Pub Date : 2020-03-17
Nick Craswell; Bhaskar Mitra; Emine Yilmaz; Daniel Campos; Ellen M. Voorhees

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training queries

• arXiv.cs.IR Pub Date : 2020-03-16
Lenz FurrerUniversity of Zurich, SwitzerlandSwiss Institute of Bioinformatics, Switzerland; Joseph CorneliusUniversity of Zurich, Switzerland; Fabio RinaldiUniversity of Zurich, SwitzerlandDalle Molle Institute for Artificial Intelligence ResearchSwiss Institute of Bioinformatics, Switzerland

Motivation: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly

• arXiv.cs.IR Pub Date : 2020-03-16
Antonia Saravanou; Giorgio Stefanoni; Edgar Meij

The volume of news content has increased significantly in recent years and systems to process and deliver this information in an automated fashion at scale are becoming increasingly prevalent. One critical component that is required in such systems is a method to automatically determine how notable a certain news story is, in order to prioritize these stories during delivery. One way to do so is to

• arXiv.cs.IR Pub Date : 2019-12-04
Jibril Frej; Didier Schwab; Jean-Pierre Chevallet

Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for

• arXiv.cs.IR Pub Date : 2020-03-13
Ikki Ohmukai; Yasunori Yamamoto; Maori Ito; Takashi Okumura

In the cases when public health authorities confirm a patient with highly contagious disease, they release the summaries about patient locations and travel information. However, due to privacy concerns, these releases do not include the detailed data and typically comprise the information only about commercial facilities and public transportation used by the patients. We addressed this problem and

