-
The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models arXiv.cs.IR Pub Date : 2021-01-14 Ronak Pradeep; Rodrigo Nogueira; Jimmy Lin
We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains. At the core, our design relies on pretrained sequence-to-sequence models within a standard multi-stage ranking architecture. "Expando" refers to the use of document expansion techniques to enrich keyword representations
-
$C^3DRec$: Cloud-Client Cooperative Deep Learning for Temporal Recommendation in the Post-GDPR Era arXiv.cs.IR Pub Date : 2021-01-13 Jialiang Han; Yun Ma
Mobile devices enable users to retrieve information at any time and any place. Considering the occasional requirements and fragmentation usage pattern of mobile users, temporal recommendation techniques are proposed to improve the efficiency of information retrieval on mobile devices by means of accurately recommending items via learning temporal interests with short-term user interaction behaviors
-
Eating Garlic Prevents COVID-19 Infection: Detecting Misinformation on the Arabic Content of Twitter arXiv.cs.IR Pub Date : 2021-01-09 Sarah Alqurashi; Btool Hamoui; Abdulaziz Alashaikh; Ahmad Alhindi; Eisa Alanazi
The rapid growth of social media content during the current pandemic provides useful tools for disseminating information which has also become a root for misinformation. Therefore, there is an urgent need for fact-checking and effective techniques for detecting misinformation in social media. In this work, we study the misinformation in the Arabic content of Twitter. We construct a large Arabic dataset
-
TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation arXiv.cs.IR Pub Date : 2021-01-12 Guangneng Hu; Qiang Yang
We investigate how to solve the cross-corpus news recommendation for unseen users in the future. This is a problem where traditional content-based recommendation techniques often fail. Luckily, in real-world recommendation services, some publisher (e.g., Daily news) may have accumulated a large corpus with lots of consumers which can be used for a newly deployed publisher (e.g., Political news). To
-
Learning Student Interest Trajectory for MOOCThread Recommendation arXiv.cs.IR Pub Date : 2021-01-10 Shalini Pandey; Andrew Lan; George Karypis; Jaideep Srivastava
In recent years, Massive Open Online Courses (MOOCs) have witnessed immense growth in popularity. Now, due to the recent Covid19 pandemic situation, it is important to push the limits of online education. Discussion forums are primary means of interaction among learners and instructors. However, with growing class size, students face the challenge of finding useful and informative discussion forums
-
Analysis of E-commerce Ranking Signals via Signal Temporal Logic arXiv.cs.IR Pub Date : 2021-01-14 Tommaso DreossiAmazon Search; Giorgio BallardinAmazon Search; Parth GuptaAmazon Search; Jan BakusAmazon Search; Yu-Hsiang LinAmazon Search; Vamsi SalakaAmazon Search
The timed position of documents retrieved by learning to rank models can be seen as signals. Signals carry useful information such as drop or rise of documents over time or user behaviors. In this work, we propose to use the logic formalism called Signal Temporal Logic (STL) to characterize document behaviors in ranking accordingly to the specified formulas. Our analysis shows that interesting document
-
Knowledge-Enhanced Top-K Recommendation in Poincaré Ball arXiv.cs.IR Pub Date : 2021-01-13 Chen Ma; Liheng Ma; Yingxue Zhang; Haolun Wu; Xue Liu; Mark Coates
Personalized recommender systems are increasingly important as more content and services become available and users struggle to identify what might interest them. Thanks to the ability for providing rich information, knowledge graphs (KGs) are being incorporated to enhance the recommendation performance and interpretability. To effectively make use of the knowledge graph, we propose a recommendation
-
Heterogeneous Network Embedding for Deep Semantic Relevance Match in E-commerce Search arXiv.cs.IR Pub Date : 2021-01-13 Ziyang Liu; Zhaomeng Cheng; Yunjiang Jiang; Yue Shang; Wei Xiong; Sulong Xu; Bo Long; Di Jin
Result relevance prediction is an essential task of e-commerce search engines to boost the utility of search engines and ensure smooth user experience. The last few years eyewitnessed a flurry of research on the use of Transformer-style models and deep text-match models to improve relevance. However, these two types of models ignored the inherent bipartite network structures that are ubiquitous in
-
Probabilistic Metric Learning with Adaptive Margin for Top-K Recommendation arXiv.cs.IR Pub Date : 2021-01-13 Chen Ma; Liheng Ma; Yingxue Zhang; Ruiming Tang; Xue Liu; Mark Coates
Personalized recommender systems are playing an increasingly important role as more content and services become available and users struggle to identify what might interest them. Although matrix factorization and deep learning based methods have proved effective in user preference modeling, they violate the triangle inequality and fail to capture fine-grained preference information. To tackle this
-
Discrete Knowledge Graph Embedding based on Discrete Optimization arXiv.cs.IR Pub Date : 2021-01-13 Yunqi Li; Shuyuan Xu; Bo Liu; Zuohui Fu; Shuchang Liu; Xu Chen; Yongfeng Zhang
This paper proposes a discrete knowledge graph (KG) embedding (DKGE) method, which projects KG entities and relations into the Hamming space based on a computationally tractable discrete optimization algorithm, to solve the formidable storage and computation cost challenges in traditional continuous graph embedding methods. The convergence of DKGE can be guaranteed theoretically. Extensive experiments
-
Distributed storage algorithms with optimal tradeoffs arXiv.cs.IR Pub Date : 2021-01-13 Michael Luby; Thomas Richardson
One of the primary objectives of a distributed storage system is to reliably store large amounts of source data for long durations using a large number $N$ of unreliable storage nodes, each with $c$ bits of storage capacity. Storage nodes fail randomly over time and are replaced with nodes of equal capacity initialized to zeroes, and thus bits are erased at some rate $e$. To maintain recoverability
-
LaDiff ULMFiT: A Layer Differentiated training approach for ULMFiT arXiv.cs.IR Pub Date : 2021-01-13 Mohammed Azhan; Mohammad Ahmad
In our paper, we present Deep Learning models with a layer differentiated training method which were used for the SHARED TASK@ CONSTRAINT 2021 sub-tasks COVID19 Fake News Detection in English and Hostile Post Detection in Hindi. We propose a Layer Differentiated training procedure for training a pre-trained ULMFiT arXiv:1801.06146 model. We used special tokens to annotate specific parts of the tweets
-
On the Calibration and Uncertainty of Neural Learning to Rank Models arXiv.cs.IR Pub Date : 2021-01-12 Gustavo Penha; Claudia Hauff
According to the Probability Ranking Principle (PRP), ranking documents in decreasing order of their probability of relevance leads to an optimal document ranking for ad-hoc retrieval. The PRP holds when two conditions are met: [C1] the models are well calibrated, and, [C2] the probabilities of relevance are reported with certainty. We know however that deep neural networks (DNNs) are often not well
-
Neural News Recommendation with Negative Feedback arXiv.cs.IR Pub Date : 2021-01-12 Chuhan Wu; Fangzhao Wu; Yongfeng Huang; Xing Xie
News recommendation is important for online news services. Precise user interest modeling is critical for personalized news recommendation. Existing news recommendation methods usually rely on the implicit feedback of users like news clicks to model user interest. However, news click may not necessarily reflect user interests because users may click a news due to the attraction of its title but feel
-
AI- and HPC-enabled Lead Generation for SARS-CoV-2: Models and Processes to Extract Druglike Molecules Contained in Natural Language Text arXiv.cs.IR Pub Date : 2021-01-12 Zhi Hong; J. Gregory Pauloski; Logan Ward; Kyle Chard; Ben Blaiszik; Ian Foster
Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of coronavirus research. We report here on a project that leverages both human
-
Toward Effective Automated Content Analysis via Crowdsourcing arXiv.cs.IR Pub Date : 2021-01-12 Jiele Wu; Chau-Wai Wong; Xinyan Zhao; Xianpeng Liu
Many computer scientists use the aggregated answers of online workers to represent ground truth. Prior work has shown that aggregation methods such as majority voting are effective for measuring relatively objective features. For subjective features such as semantic connotation, online workers, known for optimizing their hourly earnings, tend to deteriorate in the quality of their responses as they
-
Measuring Recommender System Effects with Simulated Users arXiv.cs.IR Pub Date : 2021-01-12 Sirui Yao; Yoni Halpern; Nithum Thain; Xuezhi Wang; Kang Lee; Flavien Prost; Ed H. Chi; Jilin Chen; Alex Beutel
Imagine a food recommender system -- how would we check if it is \emph{causing} and fostering unhealthy eating habits or merely reflecting users' interests? How much of a user's experience over time with a recommender is caused by the recommender system's choices and biases, and how much is based on the user's preferences and biases? Popularity bias and filter bubbles are two of the most well-studied
-
Locality Sensitive Hashing for Efficient Similar Polygon Retrieval arXiv.cs.IR Pub Date : 2021-01-12 Haim Kaplan; Jay Tenenbaum
Locality Sensitive Hashing (LSH) is an effective method of indexing a set of items to support efficient nearest neighbors queries in high-dimensional spaces. The basic idea of LSH is that similar items should produce hash collisions with higher probability than dissimilar items. We study LSH for (not necessarily convex) polygons, and use it to give efficient data structures for similar shape retrieval
-
Quantum Mathematics in Artificial Intelligence arXiv.cs.IR Pub Date : 2021-01-12 Dominic Widdows; Kirsty Kitto; Trevor Cohen
In the decade since 2010, successes in artificial intelligence have been at the forefront of computer science and technology, and vector space models have solidified a position at the forefront of artificial intelligence. At the same time, quantum computers have become much more powerful, and announcements of major advances are frequently in the news. The mathematical techniques underlying both these
-
Disentangled Self-Attentive Neural Networks for Click-Through Rate Prediction arXiv.cs.IR Pub Date : 2021-01-11 Yanqiao Zhu; Yichen Xu; Feng Yu; Qiang Liu; Shu Wu; Liang Wang
Click-through rate (CTR) prediction, which aims to predict the probability that whether of a user will click on an item, is an essential task for many online applications. Due to the nature of data sparsity and high dimensionality in CTR prediction, a key to making effective prediction is to model high-order feature interactions among feature fields. To explicitly model high-order feature interactions
-
Transfer Learning and Augmentation for Word Sense Disambiguation arXiv.cs.IR Pub Date : 2021-01-10 Harsh Kohli
Many downstream NLP tasks have shown significant improvement through continual pre-training, transfer learning and multi-task learning. State-of-the-art approaches in Word Sense Disambiguation today benefit from some of these approaches in conjunction with information sources such as semantic relationships and gloss definitions contained within WordNet. Our work builds upon these systems and uses data
-
Towards Long-term Fairness in Recommendation arXiv.cs.IR Pub Date : 2021-01-10 Yingqiang Ge; Shuchang Liu; Ruoyuan Gao; Yikun Xian; Yunqi Li; Xiangyu Zhao; Changhua Pei; Fei Sun; Junfeng Ge; Wenwu Ou; Yongfeng Zhang
As Recommender Systems (RS) influence more and more people in their daily life, the issue of fairness in recommendation is becoming more and more important. Most of the prior approaches to fairness-aware recommendation have been situated in a static or one-shot setting, where the protected groups of items are fixed, and the model provides a one-time fairness solution based on fairness-constrained optimization
-
Context-Aware Target Apps Selection and Recommendation for Enhancing Personal Mobile Assistants arXiv.cs.IR Pub Date : 2021-01-09 Mohammad Aliannejadi; Hamed Zamani; Fabio Crestani; W. Bruce Croft
Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users' lives. This paper addresses two research problems that are vital for developing effective personal mobile assistants: target apps selection and
-
Generate Natural Language Explanations for Recommendation arXiv.cs.IR Pub Date : 2021-01-09 Hanxiong Chen; Xu Chen; Shaoyun Shi; Yongfeng Zhang
Providing personalized explanations for recommendations can help users to understand the underlying insight of the recommendation results, which is helpful to the effectiveness, transparency, persuasiveness and trustworthiness of recommender systems. Current explainable recommendation models mostly generate textual explanations based on pre-defined sentence templates. However, the expressiveness power
-
Selection of Optimal Parameters in the Fast K-Word Proximity Search Based on Multi-component Key Indexes arXiv.cs.IR Pub Date : 2021-01-09 Alexander B. Veretennikov
Proximity full-text search is commonly implemented in contemporary full-text search systems. Let us assume that the search query is a list of words. It is natural to consider a document as relevant if the queried words are near each other in the document. The proximity factor is even more significant for the case where the query consists of frequently occurring words. Proximity full-text search requires
-
An Unsupervised Normalization Algorithm for Noisy Text: A Case Study for Information Retrieval and Stance Detection arXiv.cs.IR Pub Date : 2021-01-09 Anurag Roy; Shalmoli Ghosh; Kripabandhu Ghosh; Saptarshi Ghosh
A large fraction of textual data available today contains various types of 'noise', such as OCR noise in digitized documents, noise due to informal writing style of users on microblogging sites, and so on. To enable tasks such as search/retrieval and classification over all the available data, we need robust algorithms for text normalization, i.e., for cleaning different kinds of noise in the text
-
Evaluating Deep Learning Approaches for Covid19 Fake News Detection arXiv.cs.IR Pub Date : 2021-01-11 Apurva Wani; Isha Joshi; Snehal Khandve; Vedangi Wagh; Raviraj Joshi
Social media platforms like Facebook, Twitter, and Instagram have enabled connection and communication on a large scale. It has revolutionized the rate at which information is shared and enhanced its reach. However, another side of the coin dictates an alarming story. These platforms have led to an increase in the creation and spread of fake news. The fake news has not only influenced people in the
-
Investigating the Vision Transformer Model for Image Retrieval Tasks arXiv.cs.IR Pub Date : 2021-01-11 Socratis Gkelios; Yiannis Boutalis; Savvas A. Chatzichristofis
This paper introduces a plug-and-play descriptor that can be effectively adopted for image retrieval tasks without prior initialization or preparation. The description method utilizes the recently proposed Vision Transformer network while it does not require any training data to adjust parameters. In image retrieval tasks, the use of Handcrafted global and local descriptors has been very successfully
-
Summaformers @ LaySumm 20, LongSumm 20 arXiv.cs.IR Pub Date : 2021-01-10 Sayar Ghosh Roy; Nikhil Pinnaparaju; Risubh Jain; Manish Gupta; Vasudeva Varma
Automatic text summarization has been widely studied as an important task in natural language processing. Traditionally, various feature engineering and machine learning based systems have been proposed for extractive as well as abstractive text summarization. Recently, deep learning based, specifically Transformer-based systems have been immensely popular. Summarization is a cognitively challenging
-
Leveraging Multilingual Transformers for Hate Speech Detection arXiv.cs.IR Pub Date : 2021-01-08 Sayar Ghosh Roy; Ujwal Narayan; Tathagata Raha; Zubair Abid; Vasudeva Varma
Detecting and classifying instances of hate in social media text has been a problem of interest in Natural Language Processing in the recent years. Our work leverages state of the art Transformer language models to identify hate speech in a multilingual setting. Capturing the intent of a post or a comment on social media involves careful evaluation of the language style, semantic content and additional
-
Application of Knowledge Graphs to Provide Side Information for Improved Recommendation Accuracy arXiv.cs.IR Pub Date : 2021-01-07 Yuhao Mao; Serguei A. Mokhov; Sudhir P. Mudur
Personalized recommendations are popular in these days of Internet driven activities, specifically shopping. Recommendation methods can be grouped into three major categories, content based filtering, collaborative filtering and machine learning enhanced. Information about products and preferences of different users are primarily used to infer preferences for a specific user. Inadequate information
-
Spatial Object Recommendation with Hints: When Spatial Granularity Matters arXiv.cs.IR Pub Date : 2021-01-08 Hui Luo; Jingbo Zhou; Zhifeng Bao; Shuangli Li; J. Shane Culpepper; Haochao Ying; Hao Liu; Hui Xiong
Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at different
-
Dynamic Graph Collaborative Filtering arXiv.cs.IR Pub Date : 2021-01-08 Xiaohan Li; Mengqi Zhang; Shu Wu; Zheng Liu; Liang Wang; Philip S. Yu
Dynamic recommendation is essential for modern recommender systems to provide real-time predictions based on sequential data. In real-world scenarios, the popularity of items and interests of users change over time. Based on this assumption, many previous works focus on interaction sequences and learn evolutionary embeddings of users and items. However, we argue that sequence-based models are not able
-
Multistage BiCross Encoder: Team GATE Entry for MLIA Multilingual Semantic Search Task 2 arXiv.cs.IR Pub Date : 2021-01-08 Iknoor Singh; Carolina Scarton; Kalina Bontcheva
The Coronavirus (COVID-19) pandemic has led to a rapidly growing `infodemic' online. Thus, the accurate retrieval of reliable relevant data from millions of documents about COVID-19 has become urgently needed for the general public as well as for other stakeholders. The COVID-19 Multilingual Information Access (MLIA) initiative is a joint effort to ameliorate exchange of COVID-19 related information
-
Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies arXiv.cs.IR Pub Date : 2020-12-15 Carlos Badenes-Olmedo; Jose-Luis Redondo García; Oscar Corcho
With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform thematic explorations on collections of texts in multiple
-
Towards Meaningful Statements in IR Evaluation. Mapping Evaluation Measures to Interval Scales arXiv.cs.IR Pub Date : 2021-01-07 Marco Ferrante; Nicola Ferro; Norbert Fuhr
Recently, it was shown that most popular IR measures are not interval-scaled, implying that decades of experimental IR research used potentially improper methods, which may have produced questionable results. However, it was unclear if and to what extent these findings apply to actual evaluations and this opened a debate in the community with researchers standing on opposite positions about whether
-
Metric Learning for Session-based Recommendations arXiv.cs.IR Pub Date : 2021-01-07 Bartłomiej Twardowski; Paweł Zawistowski; Szymon Zaborowski
Session-based recommenders, used for making predictions out of users' uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users' events and the next action. We discuss and compare metric learning
-
Attitudes toward Open Access, Open Peer Review, and Altmetrics among Contributors to Spanish Scholarly Journals arXiv.cs.IR Pub Date : 2021-01-07 Francisco Segado-Boj; Juan Martin-Quevedo; Juan Jose Prieto-Gutierrez
This paper aims to gain a better understanding of the perspectives of contributors to Spanish academic journals regarding open access, open peer review, and altmetrics. It also explores how age, gender, professional experience, career history, and perception and use of social media influence authors opinions toward these developments in scholarly publishing. A sample of contributors (n-1254) to Spanish
-
Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity arXiv.cs.IR Pub Date : 2021-01-07 Ankush Chopra; Shruti Agrawal; Sohom Ghosh
Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic
-
Transformer-based approach towards music emotion recognition from lyrics arXiv.cs.IR Pub Date : 2021-01-06 Yudhik Agrawal; Ramaguru Guru Ravi Shanker; Vinoo Alluri
The task of identifying emotions from a given music track has been an active pursuit in the Music Information Retrieval (MIR) community for years. Music emotion recognition has typically relied on acoustic features, social tags, and other metadata to identify and classify music emotions. The role of lyrics in music emotion recognition remains under-appreciated in spite of several studies reporting
-
A Multilayer Correlated Topic Model arXiv.cs.IR Pub Date : 2021-01-02 Ye Tian
We proposed a novel multilayer correlated topic model (MCTM) to analyze how the main ideas inherit and vary between a document and its different segments, which helps understand an article's structure. The variational expectation-maximization (EM) algorithm was derived to estimate the posterior and parameters in MCTM. We introduced two potential applications of MCTM, including the paragraph-level document
-
Investigating the efficacy of music version retrieval systems for setlist identification arXiv.cs.IR Pub Date : 2021-01-06 Furkan Yesiler; Emilio Molina; Joan Serrà; Emilia Gómez
The setlist identification (SLI) task addresses a music recognition use case where the goal is to retrieve the metadata and timestamps for all the tracks played in live music events. Due to various musical and non-musical changes in live performances, developing automatic SLI systems is still a challenging task that, despite its industrial relevance, has been under-explored in the academic literature
-
COVID-19: Comparative Analysis of Methods for Identifying Articles Related to Therapeutics and Vaccines without Using Labeled Data arXiv.cs.IR Pub Date : 2021-01-05 Mihir Parmar; Ashwin Karthik Ambalavanan; Hong Guan; Rishab Banerjee; Jitesh Pabla; Murthy Devarakonda
Here we proposed an approach to analyze text classification methods based on the presence or absence of task-specific terms (and their synonyms) in the text. We applied this approach to study six different transfer-learning and unsupervised methods for screening articles relevant to COVID-19 vaccines and therapeutics. The analysis revealed that while a BERT model trained on search-engine results generally
-
SF-QA: Simple and Fair Evaluation Library for Open-domain Question Answering arXiv.cs.IR Pub Date : 2021-01-06 Xiaopeng Lu; Kyusong Lee; Tiancheng Zhao
Although open-domain question answering (QA) draws great attention in recent years, it requires large amounts of resources for building the full system and is often difficult to reproduce previous results due to complex configurations. In this paper, we introduce SF-QA: simple and fair evaluation framework for open-domain QA. SF-QA framework modularizes the pipeline open-domain QA system, which makes
-
Taxonomy Completion via Triplet Matching Network arXiv.cs.IR Pub Date : 2021-01-06 Jieyu Zhang; Xiangchen Song; Ying Zeng; Jiaze chen; Jiaming Shen; Yuning Mao; Lei Li
Automatically constructing taxonomy finds many applications in e-commerce and web search. One critical challenge is as data and business scope grow in real applications, new concepts are emerging and needed to be added to the existing taxonomy. Previous approaches focus on the taxonomy expansion, i.e. finding an appropriate hypernym concept from the taxonomy for a new query concept. In this paper,
-
Contrastive Learning for Recommender System arXiv.cs.IR Pub Date : 2021-01-05 Zhuang Liu; Yunpu Ma; Yuanxin Ouyang; Zhang Xiong
Recommender systems, which analyze users' preference patterns to suggest potential targets, are indispensable in today's society. Collaborative Filtering (CF) is the most popular recommendation model. Specifically, Graph Neural Network (GNN) has become a new state-of-the-art for CF. In the GNN-based recommender system, message dropout is usually used to alleviate the selection bias in the user-item
-
Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization arXiv.cs.IR Pub Date : 2021-01-05 Jiamou Sun; Zhenchang Xing; Hao Guo; Deheng Ye; Xiaohong Li; Xiwei Xu; Liming Zhu
ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60\% of these vulnerabilities have high- or critical-security risks. Unfortunately, over 73\% of exploits appear publicly earlier than the corresponding CVEs, and about 40\% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we
-
Presenting a Dataset for Collaborator Recommending Systems in Academic Social Network: a Case Study on ReseachGate arXiv.cs.IR Pub Date : 2020-12-29 Zahra Roozbahani; Jalal Rezaeenour; Roshan Shahrooei; Hanif Emamgholizadeh
Collaborator finding systems are a special type of expert finding models. There is a long-lasting challenge for research in the collaborator recommending research area, which is the lack of a structured dataset to be used by the researchers. We introduce two datasets to fill this gap. The first dataset is prepared for designing a consistent, collaborator finding system. The next one, called a co-author
-
Improving reference mining in patents with BERT arXiv.cs.IR Pub Date : 2021-01-04 Ken Voskuil; Suzan Verberne
References in patents to scientific literature provide relevant information for studying the relation between science and technological inventions. These references allow us to answer questions about the types of scientific work that leads to inventions. Most prior work analysing the citations between patents and scientific publications focussed on the front-page citations, which are well structured
-
Coreference Resolution in Research Papers from Multiple Domains arXiv.cs.IR Pub Date : 2021-01-04 Arthur Brack; Daniel Uwe Müller; Anett Hoppe; Ralph Ewerth
Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research
-
Scalable representation learning and retrieval for display advertising arXiv.cs.IR Pub Date : 2021-01-04 Olivier Koch; Amine Benhalloum; Guillaume Genthial; Denis Kuzin; Dmitry Parfenchik
Over the past decades, recommendation has become a critical component of many online services such as media streaming and e-commerce. Recent advances in algorithms, evaluation methods and datasets have led to continuous improvements of the state-of-the-art. However, much work remains to be done to make these methods scale to the size of the internet. Online advertising offers a unique testbed for recommendation
-
Recommending Accurate and Diverse Items Using Bilateral Branch Network arXiv.cs.IR Pub Date : 2021-01-04 Yile Liang; Tieyun Qian
Recommender systems have played a vital role in online platforms due to the ability of incorporating users' personal tastes. Beyond accuracy, diversity has been recognized as a key factor in recommendation to broaden user's horizons as well as to promote enterprises' sales. However, the trading-off between accuracy and diversity remains to be a big challenge, and the data and user biases have not been
-
An Elo-like System for Massive Multiplayer Competitions arXiv.cs.IR Pub Date : 2021-01-02 Aram Ebtekar; Paul Liu
Rating systems play an important role in competitive sports and games. They provide a measure of player skill, which incentivizes competitive performances and enables balanced match-ups. In this paper, we present a novel Bayesian rating system for contests with many participants. It is widely applicable to competition formats with discrete ranked matches, such as online programming competitions, obstacle
-
CRSLab: An Open-Source Toolkit for Building Conversational Recommender System arXiv.cs.IR Pub Date : 2021-01-04 Kun Zhou; Xiaolei Wang; Yuanhang Zhou; Chenzhan Shang; Yuan Cheng; Wayne Xin Zhao; Yaliang Li; Ji-Rong Wen
In recent years, conversational recommender system (CRS) has received much attention in the research community. However, existing studies on CRS vary in scenarios, goals and techniques, lacking unified, standardized implementation or comparison. To tackle this challenge, we propose an open-source CRS toolkit CRSLab, which provides a unified and extensible framework with highly-decoupled modules to
-
Searching Personalized $k$-wing in Large and Dynamic Bipartite Graphs arXiv.cs.IR Pub Date : 2021-01-04 Aman Abidi; Lu Chen; Rui Zhou; Chengfei Liu
There are extensive studies focusing on the application scenario that all the bipartite cohesive subgraphs need to be discovered in a bipartite graph. However, we observe that, for some applications, one is interested in finding bipartite cohesive subgraphs containing a specific vertex. In this paper, we study a new query dependent bipartite cohesive subgraph search problem based on $k$-wing model
-
A multi-modal approach towards mining social media data during natural disasters -- a case study of Hurricane Irma arXiv.cs.IR Pub Date : 2021-01-02 Somya D. Mohanty; Brown Biggers; Saed Sayedahmed; Nastaran Pourebrahim; Evan B. Goldstein; Rick Bunch; Guangqing Chi; Fereidoon Sadri; Tom P. McCoy; Arthur Cosby
Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use
-
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval arXiv.cs.IR Pub Date : 2021-01-02 Omar Khattab; Christopher Potts; Matei Zaharia
Multi-hop reasoning (i.e., reasoning across two or more documents) at scale is a key step toward NLP models that can exhibit broad world knowledge by leveraging large collections of documents. We propose Baleen, a system that improves the robustness and scalability of multi-hop reasoning over current approaches. Baleen introduces a per-hop condensed retrieval pipeline to mitigate the size of the search
-
Assessing Emoji Use in Modern Text Processing Tools arXiv.cs.IR Pub Date : 2021-01-02 Abu Awal Md Shoeb; Gerard de Melo
Emojis have become ubiquitous in digital communication, due to their visual appeal as well as their ability to vividly convey human emotion, among other factors. The growing prominence of emojis in social media and other instant messaging also leads to an increased need for systems and tools to operate on text containing emojis. In this study, we assess this support by considering test sets of tweets
-
Reader-Guided Passage Reranking for Open-Domain Question Answering arXiv.cs.IR Pub Date : 2021-01-01 Yuning Mao; Pengcheng He; Xiaodong Liu; Yelong Shen; Jianfeng Gao; Jiawei Han; Weizhu Chen
Current open-domain question answering (QA) systems often follow a Retriever-Reader (R2) architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer. In this paper, we propose a simple and effective passage reranking method, Reader-guIDEd Reranker (Rider), which does not involve any training and reranks the retrieved passages
-
De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of De-Identifiers arXiv.cs.IR Pub Date : 2021-01-01 Leibo Liu; Oscar Perez-Concha; Anthony Nguyen; Vicki Bennett; Louisa Jorm
Objective:Electronic Medical Records (EMRs) contain clinical narrative text that is of great potential value to medical researchers. However, this information is mixed with Protected Health Information (PHI) that presents risks to patient and clinician confidentiality. This paper presents an end-to-end de-identification framework to automatically remove PHI from hospital discharge summaries. Materials
Contents have been reproduced by permission of the publishers.