• arXiv.cs.IR Pub Date : 2021-01-14
Ronak Pradeep; Rodrigo Nogueira; Jimmy Lin

We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains. At the core, our design relies on pretrained sequence-to-sequence models within a standard multi-stage ranking architecture. "Expando" refers to the use of document expansion techniques to enrich keyword representations

更新日期：2021-01-15
• arXiv.cs.IR Pub Date : 2021-01-13
Jialiang Han; Yun Ma

Mobile devices enable users to retrieve information at any time and any place. Considering the occasional requirements and fragmentation usage pattern of mobile users, temporal recommendation techniques are proposed to improve the efficiency of information retrieval on mobile devices by means of accurately recommending items via learning temporal interests with short-term user interaction behaviors

更新日期：2021-01-15
• arXiv.cs.IR Pub Date : 2021-01-09
Sarah Alqurashi; Btool Hamoui; Abdulaziz Alashaikh; Ahmad Alhindi; Eisa Alanazi

The rapid growth of social media content during the current pandemic provides useful tools for disseminating information which has also become a root for misinformation. Therefore, there is an urgent need for fact-checking and effective techniques for detecting misinformation in social media. In this work, we study the misinformation in the Arabic content of Twitter. We construct a large Arabic dataset

更新日期：2021-01-15
• arXiv.cs.IR Pub Date : 2021-01-12
Guangneng Hu; Qiang Yang

We investigate how to solve the cross-corpus news recommendation for unseen users in the future. This is a problem where traditional content-based recommendation techniques often fail. Luckily, in real-world recommendation services, some publisher (e.g., Daily news) may have accumulated a large corpus with lots of consumers which can be used for a newly deployed publisher (e.g., Political news). To

更新日期：2021-01-15
• arXiv.cs.IR Pub Date : 2021-01-10
Shalini Pandey; Andrew Lan; George Karypis; Jaideep Srivastava

In recent years, Massive Open Online Courses (MOOCs) have witnessed immense growth in popularity. Now, due to the recent Covid19 pandemic situation, it is important to push the limits of online education. Discussion forums are primary means of interaction among learners and instructors. However, with growing class size, students face the challenge of finding useful and informative discussion forums

更新日期：2021-01-15
• arXiv.cs.IR Pub Date : 2021-01-14
Tommaso DreossiAmazon Search; Giorgio BallardinAmazon Search; Parth GuptaAmazon Search; Jan BakusAmazon Search; Yu-Hsiang LinAmazon Search; Vamsi SalakaAmazon Search

The timed position of documents retrieved by learning to rank models can be seen as signals. Signals carry useful information such as drop or rise of documents over time or user behaviors. In this work, we propose to use the logic formalism called Signal Temporal Logic (STL) to characterize document behaviors in ranking accordingly to the specified formulas. Our analysis shows that interesting document

更新日期：2021-01-15
• arXiv.cs.IR Pub Date : 2021-01-13
Chen Ma; Liheng Ma; Yingxue Zhang; Haolun Wu; Xue Liu; Mark Coates

Personalized recommender systems are increasingly important as more content and services become available and users struggle to identify what might interest them. Thanks to the ability for providing rich information, knowledge graphs (KGs) are being incorporated to enhance the recommendation performance and interpretability. To effectively make use of the knowledge graph, we propose a recommendation

更新日期：2021-01-14
• arXiv.cs.IR Pub Date : 2021-01-13
Ziyang Liu; Zhaomeng Cheng; Yunjiang Jiang; Yue Shang; Wei Xiong; Sulong Xu; Bo Long; Di Jin

Result relevance prediction is an essential task of e-commerce search engines to boost the utility of search engines and ensure smooth user experience. The last few years eyewitnessed a flurry of research on the use of Transformer-style models and deep text-match models to improve relevance. However, these two types of models ignored the inherent bipartite network structures that are ubiquitous in

更新日期：2021-01-14
• arXiv.cs.IR Pub Date : 2021-01-13
Chen Ma; Liheng Ma; Yingxue Zhang; Ruiming Tang; Xue Liu; Mark Coates

Personalized recommender systems are playing an increasingly important role as more content and services become available and users struggle to identify what might interest them. Although matrix factorization and deep learning based methods have proved effective in user preference modeling, they violate the triangle inequality and fail to capture fine-grained preference information. To tackle this

更新日期：2021-01-14
• arXiv.cs.IR Pub Date : 2021-01-13
Yunqi Li; Shuyuan Xu; Bo Liu; Zuohui Fu; Shuchang Liu; Xu Chen; Yongfeng Zhang

This paper proposes a discrete knowledge graph (KG) embedding (DKGE) method, which projects KG entities and relations into the Hamming space based on a computationally tractable discrete optimization algorithm, to solve the formidable storage and computation cost challenges in traditional continuous graph embedding methods. The convergence of DKGE can be guaranteed theoretically. Extensive experiments

更新日期：2021-01-14
• arXiv.cs.IR Pub Date : 2021-01-13
Michael Luby; Thomas Richardson

One of the primary objectives of a distributed storage system is to reliably store large amounts of source data for long durations using a large number $N$ of unreliable storage nodes, each with $c$ bits of storage capacity. Storage nodes fail randomly over time and are replaced with nodes of equal capacity initialized to zeroes, and thus bits are erased at some rate $e$. To maintain recoverability

更新日期：2021-01-14
• arXiv.cs.IR Pub Date : 2021-01-13

In our paper, we present Deep Learning models with a layer differentiated training method which were used for the SHARED TASK@ CONSTRAINT 2021 sub-tasks COVID19 Fake News Detection in English and Hostile Post Detection in Hindi. We propose a Layer Differentiated training procedure for training a pre-trained ULMFiT arXiv:1801.06146 model. We used special tokens to annotate specific parts of the tweets

更新日期：2021-01-14
• arXiv.cs.IR Pub Date : 2021-01-12
Gustavo Penha; Claudia Hauff

According to the Probability Ranking Principle (PRP), ranking documents in decreasing order of their probability of relevance leads to an optimal document ranking for ad-hoc retrieval. The PRP holds when two conditions are met: [C1] the models are well calibrated, and, [C2] the probabilities of relevance are reported with certainty. We know however that deep neural networks (DNNs) are often not well

更新日期：2021-01-13
• arXiv.cs.IR Pub Date : 2021-01-12
Chuhan Wu; Fangzhao Wu; Yongfeng Huang; Xing Xie

News recommendation is important for online news services. Precise user interest modeling is critical for personalized news recommendation. Existing news recommendation methods usually rely on the implicit feedback of users like news clicks to model user interest. However, news click may not necessarily reflect user interests because users may click a news due to the attraction of its title but feel

更新日期：2021-01-13
• arXiv.cs.IR Pub Date : 2021-01-12
Zhi Hong; J. Gregory Pauloski; Logan Ward; Kyle Chard; Ben Blaiszik; Ian Foster

Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of coronavirus research. We report here on a project that leverages both human

更新日期：2021-01-13
• arXiv.cs.IR Pub Date : 2021-01-12
Jiele Wu; Chau-Wai Wong; Xinyan Zhao; Xianpeng Liu

Many computer scientists use the aggregated answers of online workers to represent ground truth. Prior work has shown that aggregation methods such as majority voting are effective for measuring relatively objective features. For subjective features such as semantic connotation, online workers, known for optimizing their hourly earnings, tend to deteriorate in the quality of their responses as they

更新日期：2021-01-13
• arXiv.cs.IR Pub Date : 2021-01-12
Sirui Yao; Yoni Halpern; Nithum Thain; Xuezhi Wang; Kang Lee; Flavien Prost; Ed H. Chi; Jilin Chen; Alex Beutel

Imagine a food recommender system -- how would we check if it is \emph{causing} and fostering unhealthy eating habits or merely reflecting users' interests? How much of a user's experience over time with a recommender is caused by the recommender system's choices and biases, and how much is based on the user's preferences and biases? Popularity bias and filter bubbles are two of the most well-studied

更新日期：2021-01-13
• arXiv.cs.IR Pub Date : 2021-01-12
Haim Kaplan; Jay Tenenbaum

Locality Sensitive Hashing (LSH) is an effective method of indexing a set of items to support efficient nearest neighbors queries in high-dimensional spaces. The basic idea of LSH is that similar items should produce hash collisions with higher probability than dissimilar items. We study LSH for (not necessarily convex) polygons, and use it to give efficient data structures for similar shape retrieval

更新日期：2021-01-13
• arXiv.cs.IR Pub Date : 2021-01-12
Dominic Widdows; Kirsty Kitto; Trevor Cohen

In the decade since 2010, successes in artificial intelligence have been at the forefront of computer science and technology, and vector space models have solidified a position at the forefront of artificial intelligence. At the same time, quantum computers have become much more powerful, and announcements of major advances are frequently in the news. The mathematical techniques underlying both these

更新日期：2021-01-13
• arXiv.cs.IR Pub Date : 2021-01-11
Yanqiao Zhu; Yichen Xu; Feng Yu; Qiang Liu; Shu Wu; Liang Wang

Click-through rate (CTR) prediction, which aims to predict the probability that whether of a user will click on an item, is an essential task for many online applications. Due to the nature of data sparsity and high dimensionality in CTR prediction, a key to making effective prediction is to model high-order feature interactions among feature fields. To explicitly model high-order feature interactions

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-10
Harsh Kohli

Many downstream NLP tasks have shown significant improvement through continual pre-training, transfer learning and multi-task learning. State-of-the-art approaches in Word Sense Disambiguation today benefit from some of these approaches in conjunction with information sources such as semantic relationships and gloss definitions contained within WordNet. Our work builds upon these systems and uses data

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-10
Yingqiang Ge; Shuchang Liu; Ruoyuan Gao; Yikun Xian; Yunqi Li; Xiangyu Zhao; Changhua Pei; Fei Sun; Junfeng Ge; Wenwu Ou; Yongfeng Zhang

As Recommender Systems (RS) influence more and more people in their daily life, the issue of fairness in recommendation is becoming more and more important. Most of the prior approaches to fairness-aware recommendation have been situated in a static or one-shot setting, where the protected groups of items are fixed, and the model provides a one-time fairness solution based on fairness-constrained optimization

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-09

Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users' lives. This paper addresses two research problems that are vital for developing effective personal mobile assistants: target apps selection and

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-09
Hanxiong Chen; Xu Chen; Shaoyun Shi; Yongfeng Zhang

Providing personalized explanations for recommendations can help users to understand the underlying insight of the recommendation results, which is helpful to the effectiveness, transparency, persuasiveness and trustworthiness of recommender systems. Current explainable recommendation models mostly generate textual explanations based on pre-defined sentence templates. However, the expressiveness power

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-09
Alexander B. Veretennikov

Proximity full-text search is commonly implemented in contemporary full-text search systems. Let us assume that the search query is a list of words. It is natural to consider a document as relevant if the queried words are near each other in the document. The proximity factor is even more significant for the case where the query consists of frequently occurring words. Proximity full-text search requires

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-09
Anurag Roy; Shalmoli Ghosh; Kripabandhu Ghosh; Saptarshi Ghosh

A large fraction of textual data available today contains various types of 'noise', such as OCR noise in digitized documents, noise due to informal writing style of users on microblogging sites, and so on. To enable tasks such as search/retrieval and classification over all the available data, we need robust algorithms for text normalization, i.e., for cleaning different kinds of noise in the text

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-11
Apurva Wani; Isha Joshi; Snehal Khandve; Vedangi Wagh; Raviraj Joshi

Social media platforms like Facebook, Twitter, and Instagram have enabled connection and communication on a large scale. It has revolutionized the rate at which information is shared and enhanced its reach. However, another side of the coin dictates an alarming story. These platforms have led to an increase in the creation and spread of fake news. The fake news has not only influenced people in the

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-11
Socratis Gkelios; Yiannis Boutalis; Savvas A. Chatzichristofis

This paper introduces a plug-and-play descriptor that can be effectively adopted for image retrieval tasks without prior initialization or preparation. The description method utilizes the recently proposed Vision Transformer network while it does not require any training data to adjust parameters. In image retrieval tasks, the use of Handcrafted global and local descriptors has been very successfully

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-10
Sayar Ghosh Roy; Nikhil Pinnaparaju; Risubh Jain; Manish Gupta; Vasudeva Varma

Automatic text summarization has been widely studied as an important task in natural language processing. Traditionally, various feature engineering and machine learning based systems have been proposed for extractive as well as abstractive text summarization. Recently, deep learning based, specifically Transformer-based systems have been immensely popular. Summarization is a cognitively challenging

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-08
Sayar Ghosh Roy; Ujwal Narayan; Tathagata Raha; Zubair Abid; Vasudeva Varma

Detecting and classifying instances of hate in social media text has been a problem of interest in Natural Language Processing in the recent years. Our work leverages state of the art Transformer language models to identify hate speech in a multilingual setting. Capturing the intent of a post or a comment on social media involves careful evaluation of the language style, semantic content and additional

更新日期：2021-01-12
• arXiv.cs.IR Pub Date : 2021-01-07
Yuhao Mao; Serguei A. Mokhov; Sudhir P. Mudur

Personalized recommendations are popular in these days of Internet driven activities, specifically shopping. Recommendation methods can be grouped into three major categories, content based filtering, collaborative filtering and machine learning enhanced. Information about products and preferences of different users are primarily used to infer preferences for a specific user. Inadequate information

更新日期：2021-01-11
• arXiv.cs.IR Pub Date : 2021-01-08
Hui Luo; Jingbo Zhou; Zhifeng Bao; Shuangli Li; J. Shane Culpepper; Haochao Ying; Hao Liu; Hui Xiong

Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at different

更新日期：2021-01-11
• arXiv.cs.IR Pub Date : 2021-01-08
Xiaohan Li; Mengqi Zhang; Shu Wu; Zheng Liu; Liang Wang; Philip S. Yu

Dynamic recommendation is essential for modern recommender systems to provide real-time predictions based on sequential data. In real-world scenarios, the popularity of items and interests of users change over time. Based on this assumption, many previous works focus on interaction sequences and learn evolutionary embeddings of users and items. However, we argue that sequence-based models are not able

更新日期：2021-01-11
• arXiv.cs.IR Pub Date : 2021-01-08
Iknoor Singh; Carolina Scarton; Kalina Bontcheva

The Coronavirus (COVID-19) pandemic has led to a rapidly growing `infodemic' online. Thus, the accurate retrieval of reliable relevant data from millions of documents about COVID-19 has become urgently needed for the general public as well as for other stakeholders. The COVID-19 Multilingual Information Access (MLIA) initiative is a joint effort to ameliorate exchange of COVID-19 related information

更新日期：2021-01-11
• arXiv.cs.IR Pub Date : 2020-12-15
Carlos Badenes-Olmedo; Jose-Luis Redondo García; Oscar Corcho

With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform thematic explorations on collections of texts in multiple

更新日期：2021-01-11
• arXiv.cs.IR Pub Date : 2021-01-07
Marco Ferrante; Nicola Ferro; Norbert Fuhr

Recently, it was shown that most popular IR measures are not interval-scaled, implying that decades of experimental IR research used potentially improper methods, which may have produced questionable results. However, it was unclear if and to what extent these findings apply to actual evaluations and this opened a debate in the community with researchers standing on opposite positions about whether

更新日期：2021-01-08
• arXiv.cs.IR Pub Date : 2021-01-07
Bartłomiej Twardowski; Paweł Zawistowski; Szymon Zaborowski

Session-based recommenders, used for making predictions out of users' uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users' events and the next action. We discuss and compare metric learning

更新日期：2021-01-08
• arXiv.cs.IR Pub Date : 2021-01-07
Francisco Segado-Boj; Juan Martin-Quevedo; Juan Jose Prieto-Gutierrez

This paper aims to gain a better understanding of the perspectives of contributors to Spanish academic journals regarding open access, open peer review, and altmetrics. It also explores how age, gender, professional experience, career history, and perception and use of social media influence authors opinions toward these developments in scholarly publishing. A sample of contributors (n-1254) to Spanish

更新日期：2021-01-08
• arXiv.cs.IR Pub Date : 2021-01-07
Ankush Chopra; Shruti Agrawal; Sohom Ghosh

Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic

更新日期：2021-01-08
• arXiv.cs.IR Pub Date : 2021-01-06
Yudhik Agrawal; Ramaguru Guru Ravi Shanker; Vinoo Alluri

The task of identifying emotions from a given music track has been an active pursuit in the Music Information Retrieval (MIR) community for years. Music emotion recognition has typically relied on acoustic features, social tags, and other metadata to identify and classify music emotions. The role of lyrics in music emotion recognition remains under-appreciated in spite of several studies reporting

更新日期：2021-01-07
• arXiv.cs.IR Pub Date : 2021-01-02
Ye Tian

We proposed a novel multilayer correlated topic model (MCTM) to analyze how the main ideas inherit and vary between a document and its different segments, which helps understand an article's structure. The variational expectation-maximization (EM) algorithm was derived to estimate the posterior and parameters in MCTM. We introduced two potential applications of MCTM, including the paragraph-level document

更新日期：2021-01-07
• arXiv.cs.IR Pub Date : 2021-01-06
Furkan Yesiler; Emilio Molina; Joan Serrà; Emilia Gómez

The setlist identification (SLI) task addresses a music recognition use case where the goal is to retrieve the metadata and timestamps for all the tracks played in live music events. Due to various musical and non-musical changes in live performances, developing automatic SLI systems is still a challenging task that, despite its industrial relevance, has been under-explored in the academic literature

更新日期：2021-01-07
• arXiv.cs.IR Pub Date : 2021-01-05
Mihir Parmar; Ashwin Karthik Ambalavanan; Hong Guan; Rishab Banerjee; Jitesh Pabla; Murthy Devarakonda

Here we proposed an approach to analyze text classification methods based on the presence or absence of task-specific terms (and their synonyms) in the text. We applied this approach to study six different transfer-learning and unsupervised methods for screening articles relevant to COVID-19 vaccines and therapeutics. The analysis revealed that while a BERT model trained on search-engine results generally

更新日期：2021-01-07
• arXiv.cs.IR Pub Date : 2021-01-06
Xiaopeng Lu; Kyusong Lee; Tiancheng Zhao

Although open-domain question answering (QA) draws great attention in recent years, it requires large amounts of resources for building the full system and is often difficult to reproduce previous results due to complex configurations. In this paper, we introduce SF-QA: simple and fair evaluation framework for open-domain QA. SF-QA framework modularizes the pipeline open-domain QA system, which makes

更新日期：2021-01-07
• arXiv.cs.IR Pub Date : 2021-01-06
Jieyu Zhang; Xiangchen Song; Ying Zeng; Jiaze chen; Jiaming Shen; Yuning Mao; Lei Li

Automatically constructing taxonomy finds many applications in e-commerce and web search. One critical challenge is as data and business scope grow in real applications, new concepts are emerging and needed to be added to the existing taxonomy. Previous approaches focus on the taxonomy expansion, i.e. finding an appropriate hypernym concept from the taxonomy for a new query concept. In this paper,

更新日期：2021-01-07
• arXiv.cs.IR Pub Date : 2021-01-05
Zhuang Liu; Yunpu Ma; Yuanxin Ouyang; Zhang Xiong

Recommender systems, which analyze users' preference patterns to suggest potential targets, are indispensable in today's society. Collaborative Filtering (CF) is the most popular recommendation model. Specifically, Graph Neural Network (GNN) has become a new state-of-the-art for CF. In the GNN-based recommender system, message dropout is usually used to alleviate the selection bias in the user-item

更新日期：2021-01-06
• arXiv.cs.IR Pub Date : 2021-01-05
Jiamou Sun; Zhenchang Xing; Hao Guo; Deheng Ye; Xiaohong Li; Xiwei Xu; Liming Zhu

ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60\% of these vulnerabilities have high- or critical-security risks. Unfortunately, over 73\% of exploits appear publicly earlier than the corresponding CVEs, and about 40\% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we

更新日期：2021-01-06
• arXiv.cs.IR Pub Date : 2020-12-29
Zahra Roozbahani; Jalal Rezaeenour; Roshan Shahrooei; Hanif Emamgholizadeh

Collaborator finding systems are a special type of expert finding models. There is a long-lasting challenge for research in the collaborator recommending research area, which is the lack of a structured dataset to be used by the researchers. We introduce two datasets to fill this gap. The first dataset is prepared for designing a consistent, collaborator finding system. The next one, called a co-author

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-04
Ken Voskuil; Suzan Verberne

References in patents to scientific literature provide relevant information for studying the relation between science and technological inventions. These references allow us to answer questions about the types of scientific work that leads to inventions. Most prior work analysing the citations between patents and scientific publications focussed on the front-page citations, which are well structured

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-04
Arthur Brack; Daniel Uwe Müller; Anett Hoppe; Ralph Ewerth

Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-04
Olivier Koch; Amine Benhalloum; Guillaume Genthial; Denis Kuzin; Dmitry Parfenchik

Over the past decades, recommendation has become a critical component of many online services such as media streaming and e-commerce. Recent advances in algorithms, evaluation methods and datasets have led to continuous improvements of the state-of-the-art. However, much work remains to be done to make these methods scale to the size of the internet. Online advertising offers a unique testbed for recommendation

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-04
Yile Liang; Tieyun Qian

Recommender systems have played a vital role in online platforms due to the ability of incorporating users' personal tastes. Beyond accuracy, diversity has been recognized as a key factor in recommendation to broaden user's horizons as well as to promote enterprises' sales. However, the trading-off between accuracy and diversity remains to be a big challenge, and the data and user biases have not been

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-02
Aram Ebtekar; Paul Liu

Rating systems play an important role in competitive sports and games. They provide a measure of player skill, which incentivizes competitive performances and enables balanced match-ups. In this paper, we present a novel Bayesian rating system for contests with many participants. It is widely applicable to competition formats with discrete ranked matches, such as online programming competitions, obstacle

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-04
Kun Zhou; Xiaolei Wang; Yuanhang Zhou; Chenzhan Shang; Yuan Cheng; Wayne Xin Zhao; Yaliang Li; Ji-Rong Wen

In recent years, conversational recommender system (CRS) has received much attention in the research community. However, existing studies on CRS vary in scenarios, goals and techniques, lacking unified, standardized implementation or comparison. To tackle this challenge, we propose an open-source CRS toolkit CRSLab, which provides a unified and extensible framework with highly-decoupled modules to

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-04
Aman Abidi; Lu Chen; Rui Zhou; Chengfei Liu

There are extensive studies focusing on the application scenario that all the bipartite cohesive subgraphs need to be discovered in a bipartite graph. However, we observe that, for some applications, one is interested in finding bipartite cohesive subgraphs containing a specific vertex. In this paper, we study a new query dependent bipartite cohesive subgraph search problem based on $k$-wing model

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-02
Somya D. Mohanty; Brown Biggers; Saed Sayedahmed; Nastaran Pourebrahim; Evan B. Goldstein; Rick Bunch; Guangqing Chi; Fereidoon Sadri; Tom P. McCoy; Arthur Cosby

Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-02
Omar Khattab; Christopher Potts; Matei Zaharia

Multi-hop reasoning (i.e., reasoning across two or more documents) at scale is a key step toward NLP models that can exhibit broad world knowledge by leveraging large collections of documents. We propose Baleen, a system that improves the robustness and scalability of multi-hop reasoning over current approaches. Baleen introduces a per-hop condensed retrieval pipeline to mitigate the size of the search

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-02
Abu Awal Md Shoeb; Gerard de Melo

Emojis have become ubiquitous in digital communication, due to their visual appeal as well as their ability to vividly convey human emotion, among other factors. The growing prominence of emojis in social media and other instant messaging also leads to an increased need for systems and tools to operate on text containing emojis. In this study, we assess this support by considering test sets of tweets

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-01
Yuning Mao; Pengcheng He; Xiaodong Liu; Yelong Shen; Jianfeng Gao; Jiawei Han; Weizhu Chen

Current open-domain question answering (QA) systems often follow a Retriever-Reader (R2) architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer. In this paper, we propose a simple and effective passage reranking method, Reader-guIDEd Reranker (Rider), which does not involve any training and reranks the retrieved passages

更新日期：2021-01-05
• arXiv.cs.IR Pub Date : 2021-01-01
Leibo Liu; Oscar Perez-Concha; Anthony Nguyen; Vicki Bennett; Louisa Jorm

Objective:Electronic Medical Records (EMRs) contain clinical narrative text that is of great potential value to medical researchers. However, this information is mixed with Protected Health Information (PHI) that presents risks to patient and clinician confidentiality. This paper presents an end-to-end de-identification framework to automatically remove PHI from hospital discharge summaries. Materials

更新日期：2021-01-05
Contents have been reproduced by permission of the publishers.

down
wechat
bug