• Data Min. Knowl. Discov. (IF 2.879) Pub Date : 2020-01-17
Florian Adriaens, Tijl De Bie, Aristides Gionis, Jefrey Lijffijt, Antonis Matakos, Polina Rozenshtein

Abstract Social networks often provide only a binary perspective on social ties: two individuals are either connected or not. While sometimes external information can be used to infer the strength of social ties, access to such information may be restricted or impractical to obtain. Sintos and Tsaparas (KDD 2014) first suggested to infer the strength of social ties from the topology of the network alone, by leveraging the Strong Triadic Closure (STC) property. The STC property states that if person A has strong social ties with persons B and C, B and C must be connected to each other as well (whether with a weak or strong tie). They exploited this property to formulate the inference of the strength of social ties as a NP-hard maximization problem, and proposed two approximation algorithms. We refine and improve this line of work, by developing a sequence of linear relaxations of the problem, which can be solved exactly in polynomial time. Usefully, these relaxations infer more fine-grained levels of tie strength (beyond strong and weak), which also allows one to avoid making arbitrary strong/weak strength assignments when the network topology provides inconclusive evidence. Moreover, these relaxations allow us to easily change the objective function to more sensible alternatives, instead of simply maximizing the number of strong edges. An extensive theoretical analysis leads to two efficient algorithmic approaches. Finally, our experimental results elucidate the strengths of the proposed approach, while at the same time questioning the validity of leveraging the STC property for edge strength inference in practice.

更新日期：2020-01-17
• arXiv.cs.SD Pub Date : 2020-01-15
Andrea Valenti; Antonio Carta; Davide Bacciu

We address the challenging open problem of learning an effective latent space for symbolic music data in generative music modeling. We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information concerning music genre and style. Through the paper, we show how Gaussian mixtures taking into account music metadata information can be used as an effective prior for the autoencoder latent space, introducing the first Music Adversarial Autoencoder (MusAE). The empirical analysis on a large scale benchmark shows that our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders. It is also able to create realistic interpolations between two musical sequences, smoothly changing the dynamics of the different tracks. Experiments show that the model can organise its latent space accordingly to low-level properties of the musical pieces, as well as to embed into the latent variables the high-level genre information injected from the prior distribution to increase its overall performance. This allows us to perform changes to the generated pieces in a principled way.

更新日期：2020-01-17
• arXiv.cs.SD Pub Date : 2020-01-15
Huy Phan; Ian V. McLoughlin; Lam Pham; Oliver Y. Chén; Philipp Koch; Maarten De Vos; Alfred Mertins

Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. Most, if not all, existing speech enhancement GANs (SEGANs) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose two novel SEGAN frameworks, iterated SEGAN (ISEGAN) and deep SEGAN (DSEGAN). In the two proposed frameworks, the GAN architectures are composed of multiple generators that are chained to accomplish multiple-stage enhancement mapping which gradually refines the noisy input signals in stage-wise fashion. On the one hand, ISEGAN's generators share their parameters to learn an iterative enhancement mapping. On the other hand, DSEGAN's generators share a common architecture but their parameters are independent; as a result, different enhancement mappings are learned at different stages of the network. We empirically demonstrate favorable results obtained by the proposed ISEGAN and DSEGAN frameworks over the vanilla SEGAN. The source code is available at http://github.com/pquochuy/idsegan.

更新日期：2020-01-17
• arXiv.cs.SD Pub Date : 2020-01-16
Bohan Zhai; Tianren Gao; Flora Xue; Daniel Rothchild; Bichen Wu; Joseph E. Gonzalez; Kurt Keutzer

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave.

更新日期：2020-01-17
• arXiv.cs.SD Pub Date : 2020-01-16
Chunyi Wang

A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused by lack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extracted from existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposed solution is evaluated on the two Chinese corpus and two English corpus, and is shown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.

更新日期：2020-01-17
• arXiv.cs.MM Pub Date : 2020-01-12
Yiyan Chen; Li Tao; Xueting Wang; Toshihiko Yamasaki

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each frame is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video frames in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.

更新日期：2020-01-17
• arXiv.cs.MM Pub Date : 2020-01-16
Viet Duong; Phu Pham; Ritwik Bose; Jiebo Luo

Recently, the emergence of the #MeToo trend on social media has empowered thousands of people to share their own sexual harassment experiences. This viral trend, in conjunction with the massive personal information and content available on Twitter, presents a promising opportunity to extract data driven insights to complement the ongoing survey based studies about sexual harassment in college. In this paper, we analyze the influence of the #MeToo trend on a pool of college followers. The results show that the majority of topics embedded in those #MeToo tweets detail sexual harassment stories, and there exists a significant correlation between the prevalence of this trend and official reports on several major geographical regions. Furthermore, we discover the outstanding sentiments of the #MeToo tweets using deep semantic meaning representations and their implications on the affected users experiencing different types of sexual harassment. We hope this study can raise further awareness regarding sexual misconduct in academia.

更新日期：2020-01-17
• arXiv.cs.MM Pub Date : 2019-07-22
Quang-Trung Luu; Sylvaine Kerboeuf; Alexandre Mouradian; Michel Kieffer

With network slicing in 5G networks, Mobile Network Operators can create various slices for Service Providers (SPs) to accommodate customized services. Usually, the various Service Function Chains (SFCs) belonging to a slice are deployed on a best-effort basis. Nothing ensures that the Infrastructure Provider (InP) will be able to allocate enough resources to cope with the increasing demands of some SP. Moreover, in many situations, slices have to be deployed over some geographical area: coverage as well as minimum per-user rate constraints have then to be taken into account. This paper takes the InP perspective and proposes a slice resource provisioning approach to cope with multiple slice demands in terms of computing, storage, coverage, and rate constraints.The resource requirements of the various SFCs within a slice are aggregated within a graph of Slice Resource Demands (SRD). Infrastructure nodes and links have then to be provisioned so as to satisfy all SRDs. This problem leads to a Mixed Integer Linear Programming formulation. A two-step approach is considered, with several variants, depending on whether the constraints of each slice to be provisioned are taken into account sequentially or jointly. Once provisioning has been performed, any slice deployment strategy may be considered on the reduced-size infrastructure graph on which resources have been provisioned. Simulation results demonstrate the effectiveness of the proposed approach compared to a more classical direct slice embedding approach.

更新日期：2020-01-17
• arXiv.cs.IR Pub Date : 2020-01-15
Anant Khandelwal; Niraj Kumar

Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.

更新日期：2020-01-17
• arXiv.cs.IR Pub Date : 2020-01-16
Antoine Gourru; Adrien Guille; Julien Velcin; Julien Jacques

We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.

更新日期：2020-01-17
• arXiv.cs.IR Pub Date : 2020-01-16
Matteo Allaix; Lukas Holzbaur; Tefjol Pllaha; Camilla Hollanti

In the classical private information retrieval (PIR) setup, a user wants to retrieve a file from a database or a distributed storage system (DSS) without revealing the file identity to the servers holding the data. In the quantum PIR (QPIR) setting, a user privately retrieves a classical file by downloading quantum systems from the servers. The QPIR problem has been treated by Song \emph{et al.} in the case of replicated servers, both without collusion and with all but one servers colluding. In this paper, the QPIR setting is extended to account for MDS-coded servers. The proposed protocol works for any [n,k]-MDS code and t-collusion with t = n - k. Similarly to the previous cases, the rates achieved are better than those known or conjectured in the classical counterparts.

更新日期：2020-01-17
• arXiv.cs.IR Pub Date : 2020-01-16
Tong Zeng; Longfeng Wu; Sarah Bratt; Daniel E. Acuna

A citation is a well-established mechanism for connecting scientific artifacts. Citation networks are used by citation analysis for a variety of reasons, prominently to give credit to scientists' work. However, because of current citation practices, scientists tend to cite only publications, leaving out other types of artifacts such as datasets. Datasets then do not get appropriate credit even though they are increasingly reused and experimented with. We develop a network flow measure, called DataRank, aimed at solving this gap. DataRank assigns a relative value to each node in the network based on how citations flow through the graph, differentiating publication and dataset flow rates. We evaluate the quality of DataRank by estimating its accuracy at predicting the usage of real datasets: web visits to GenBank and downloads of Figshare datasets. We show that DataRank is better at predicting this usage compared to alternatives while offering additional interpretable outcomes. We discuss improvements to citation behavior and algorithms to properly track and assign credit to datasets.

更新日期：2020-01-17
• arXiv.cs.IR Pub Date : 2020-01-16
Islam Samy; Mohamed A. Attia; Ravi Tandon; Loukas Lazos

In many applications, content accessed by users (movies, videos, news articles, etc.) can leak sensitive latent attributes, such as religious and political views, sexual orientation, ethnicity, gender, and others. To prevent such information leakage, the goal of classical PIR is to hide the identity of the content/message being accessed, which subsequently also hides the latent attributes. This solution, while private, can be too costly, particularly, when perfect (information-theoretic) privacy constraints are imposed. For instance, for a single database holding $K$ messages, privately retrieving one message is possible if and only if the user downloads the entire database of $K$ messages. Retrieving content privately, however, may not be necessary to perfectly hide the latent attributes. Motivated by the above, we formulate and study the problem of latent-variable private information retrieval (LV-PIR), which aims at allowing the user efficiently retrieve one out of $K$ messages (indexed by $\theta$) without revealing any information about the latent variable (modeled by $S$). We focus on the practically relevant setting of a single database and show that one can significantly reduce the download cost of LV-PIR (compared to the classical PIR) based on the correlation between $\theta$ and $S$. We present a general scheme for LV-PIR as a function of the statistical relationship between $\theta$ and $S$, and also provide new results on the capacity/download cost of LV-PIR. Several open problems and new directions are also discussed.

更新日期：2020-01-17
• arXiv.cs.IR Pub Date : 2019-07-24
Muhammad Umer Anwaar; Dmytro Rybalko; Martin Kleinsteuber

Improved search quality enhances users' satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Traditional Learning to Rank (LTR) algorithms require relevance judgments on products. In E-Com, getting such judgments poses an immense challenge. In the literature, it is proposed to employ user feedback (add-to-basket (AtB) clicks, orders etc) to generate relevance judgments. It is done in two steps: first, query-product pair data are aggregated from the logs and then order rate etc are calculated for each pair in the logs. In this paper, we advocate counterfactual risk minimization (CRM) approach which circumvents the need of relevance judgements, data aggregation and is better suited for learning from logged data, i.e. contextual bandit feedback. Due to unavailability of public E-Com LTR dataset, we provide \textit{Commercial dataset} from our platform. It contains more than 10 million AtB click logs and 1 million order logs from a catalogue of about 3.5 million products associated with 3060 queries. To the best of our knowledge, this is the first work which examines effectiveness of CRM approach in learning ranking model from real-world logged data. Our empirical evaluation shows that CRM approach learns effectively from logged data and beats strong baseline ranker ($\lambda$-MART) by a huge margin. Our method outperforms full-information loss (e.g. cross-entropy) on various deep neural network models. These findings show that by adopting CRM approach, E-Com platforms can improve product search quality without artificially mending the data to fit in the traditional LTR algorithms.

更新日期：2020-01-17
• arXiv.cs.IR Pub Date : 2019-12-06
Shuqi Xu; Qianming Zhang; Linyuan Lv; Manuel Sebastian Mariani

Over the past decade, many startups have sprung up, which create a huge demand for financial support from venture investors. However, due to the information asymmetry between investors and companies, the financing process is usually challenging and time-consuming, especially for the startups that have not yet obtained any investment. Because of this, effective data-driven techniques to automatically match startups with potentially relevant investors would be highly desirable. Here, we analyze 34,469 valid investment events collected from www.itjuzi.com and consider the cold-start problem of recommending investors for new startups. We address this problem by constructing different tripartite network representations of the data where nodes represent investors, companies, and companies' domains. First, we find that investors have strong domain preferences when investing, which motivates us to introduce virtual links between investors and investment domains in the tripartite network construction. Our analysis of the recommendation performance of diffusion-based algorithms applied to various network representations indicates that prospective investors for new startups are effectively revealed by integrating network diffusion processes with investors' domain preference.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2020-01-16
Yuan Liang; Hsuan-Wei Fan; Zhujun Fang; Leiying Miao; Wen Li; Xuan Zhang; Weibin Sun; Kun Wang; Lei He; Xiang Anthony Chen

Due to a lack of medical resources or oral health awareness, oral diseases are often left unexamined and untreated, affecting a large population worldwide. With the advent of low-cost, sensor-equipped smartphones, mobile apps offer a promising possibility for promoting oral health. However, to the best of our knowledge, no mobile health (mHealth) solutions can directly support a user to self-examine their oral health condition. This paper presents \textit{OralCam}, the first interactive app that enables end-users' self-examination of five common oral conditions (diseases or early disease signals) by taking smartphone photos of one's oral cavity. \textit{OralCam} allows a user to annotate additional information (e.g. living habits, pain, and bleeding) to augment the input image, and presents the output hierarchically, probabilistically and with visual explanations to help a laymen user understand examination results. Developed on our in-house dataset that consists of 3,182 oral photos annotated by dental experts, our deep learning based framework achieved an average detection sensitivity of 0.787 over five conditions with high localization accuracy. In a week-long in-the-wild user study (N=18), most participants had no trouble using \textit{OralCam} and interpreting the examination results. Two expert interviews further validate the feasibility of \textit{OralCam} for promoting users' awareness of oral health.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2020-01-16
Chunggi Lee; Sanghoon Kim; Dongyun Han; Hongjun Yang; Young-Woo Park; Bum Chul Kwon; Sungahn Ko

Users may face challenges while designing graphical user interfaces, due to a lack of relevant experience and guidance. This paper aims to investigate the issues that users with no experience face during the design process, and how to resolve them. To this end, we conducted semi-structured interviews, based on which we built a GUI prototyping assistance tool called GUIComp. This tool can be connected to GUI design software as an extension, and it provides real-time, multi-faceted feedback on a user's current design. Additionally, we conducted two user studies, in which we asked participants to create mobile GUIs with or without GUIComp, and requested online workers to assess the created GUIs. The experimental results show that GUIComp facilitated iterative design and the participants with GUIComp had better a user experience and produced more acceptable designs than those who did not.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2020-01-16
A. Asadipour; K. Debattista; V. Patel; A. Chalmers

Computer-assisted multimodal training is an effective way of learning complex motor skills in various applications. In particular disciplines (eg. healthcare) incompetency in performing dexterous hands-on examinations (clinical palpation) may result in misdiagnosis of symptoms, serious injuries or even death. Furthermore, a high quality clinical examination can help to exclude significant pathology, and reduce time and cost of diagnosis by eliminating the need for unnecessary medical imaging. Medical palpation is used regularly as an effective preliminary diagnosis method all around the world but years of training are required currently to achieve competency. This paper focuses on a multimodal palpation training system to teach and improve clinical examination skills in relation to the abdomen. It is our aim to shorten significantly the palpation training duration by increasing the frequency of rehearsals as well as providing essential augmented feedback on how to perform various abdominal palpation techniques which has been captured and modelled from medical experts. Twenty three first year medical students divided into a control group (n=8), a semi-visually trained group (n=8), and a fully visually trained group (n=7) were invited to perform three palpation tasks (superficial, deep and liver). The medical students performances were assessed using both computer-based and human-based methods where a positive correlation was shown between the generated scores, r=.62, p(one-tailed)<.05. The visually-trained group significantly outperformed the control group in which abstract visualisation of applied forces and their palmar locations were provided to the students during each palpation examination (p<.05). Moreover, a positive trend was observed between groups when visual feedback was presented, J=132, z=2.62, r=0.55.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2020-01-16

Human bodies influence the owners' affect through posture, facial expressions, and movement. It remains unclear whether similar links between virtual bodies and affect exist. Such links could present design opportunities for virtual environments and advance our understanding of fundamental concepts of embodied VR. An initial outside-the-lab between-subjects study using commodity equipment presented 207 participants with seven avatar manipulations, related to posture, facial expression, and speed. We conducted a lab-based between-subjects study using high-end VR equipment with 41 subjects to clarify affect's impact on body ownership. The results show that some avatar manipulations can subtly influence affect. Study I found that facial manipulations emerged as most effective in this regard, particularly for positive affect. Also, body ownership showed a moderating influence on affect: in Study I body ownership varied with valence but not with arousal, and Study II showed body ownership to vary with positive but not with negative affect.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2020-01-11
Richard Savery; Ryan Rose; Gil Weinberg

As human-robot collaboration opportunities continue to expand, trust becomes ever more important for full engagement and utilization of robots. Affective trust, built on emotional relationship and interpersonal bonds is particularly critical as it is more resilient to mistakes and increases the willingness to collaborate. In this paper we present a novel model built on music-driven emotional prosody and gestures that encourages the perception of a robotic identity, designed to avoid uncanny valley. Symbolic musical phrases were generated and tagged with emotional information by human musicians. These phrases controlled a synthesis engine playing back pre-rendered audio samples generated through interpolation of phonemes and electronic instruments. Gestures were also driven by the symbolic phrases, encoding the emotion from the musical phrase to low degree-of-freedom movements. Through a user study we showed that our system was able to accurately portray a range of emotions to the user. We also showed with a significant result that our non-linguistic audio generation achieved an 8% higher mean of average trust than using a state-of-the-art text-to-speech system.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2020-01-14
Vivian Lai; Han Liu; Chenhao Tan

To support human decision making with machine learning models, we often need to elucidate patterns embedded in the models that are unsalient, unknown, or counterintuitive to humans. While existing approaches focus on explaining machine predictions with real-time assistance, we explore model-driven tutorials to help humans understand these patterns in a training phase. We consider both tutorials with guidelines from scientific papers, analogous to current practices of science communication, and automatically selected examples from training data with explanations. We use deceptive review detection as a testbed and conduct large-scale, randomized human-subject experiments to examine the effectiveness of such tutorials. We find that tutorials indeed improve human performance, with and without real-time assistance. In particular, although deep learning provides superior predictive performance than simple models, tutorials and explanations from simple models are more useful to humans. Our work suggests future directions for human-centered tutorials and explanations towards a synergy between humans and AI.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2020-01-16
Patrick Rodler

We challenge existing query-based ontology fault localization methods wrt. assumptions they make, criteria they optimize, and interaction means they use. We find that their efficiency depends largely on the behavior of the interacting expert, that performed calculations can be inefficient or imprecise, and that used optimization criteria are often not fully realistic. As a remedy, we suggest a novel (and simpler) interaction approach which overcomes all identified problems and, in comprehensive experiments on faulty real-world ontologies, enables a successful fault localization while requiring fewer expert interactions in 66 % of the cases, and always at least 80 % less expert waiting time, compared to existing methods.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2019-07-19

Recently, brain-computer interface (BCI) systems developed based on steady-state visual evoked potential (SSVEP) have attracted much attention due to their high information transfer rate (ITR) and increasing number of targets. However, SSVEP-based methods can be improved in terms of their accuracy and target detection time. We propose a new method based on canonical correlation analysis (CCA) to integrate subject-specific models and subject-independent information and enhance BCI performance. We propose to use training data of other subjects to optimize hyperparameters for CCA-based model of a specific subject. An ensemble version of the proposed method is also developed for a fair comparison with ensemble task-related component analysis (TRCA). The proposed method is compared with TRCA and extended CCA methods. A publicly available, 35-subject SSVEP benchmark dataset is used for comparison studies and performance is quantified by classification accuracy and ITR. The ITR of the proposed method is higher than those of TRCA and extended CCA. The proposed method outperforms extended CCA in all conditions and TRCA for time windows greater than 0.3 s. The proposed method also outperforms TRCA when there are limited training blocks and electrodes. This study illustrates that adding subject-independent information to subject-specific models can improve performance of SSVEP-based BCIs.

更新日期：2020-01-17
• arXiv.cs.HC Pub Date : 2019-12-13
Daniel Karl I. Weidele; Justin D. Weisz; Eno Oduor; Michael Muller; Josh Andres; Alexander Gray; Dakuo Wang

Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate model results. Thus, users often do not understand the process, neither do they trust the outputs. In this short paper, we provide a first user evaluation by 10 data scientists of an experimental system, AutoAIViz, that aims to visualize AutoAI's model generation process. We find that the proposed system helps users to complete the data science tasks, and increases their understanding, toward the goal of increasing trust in the AutoAI system.

更新日期：2020-01-17
• arXiv.cs.GR Pub Date : 2016-06-22
Barak Sober; David Levin

In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.

更新日期：2020-01-17
• arXiv.cs.GR Pub Date : 2018-11-16
Marta Nuñez-Garcia; Gabriel Bernardino; Francisco Alarcón; Gala Caixal; Lluís Mont; Oscar Camara; Constantine Butakoff

Two-dimensional representation of 3D anatomical structures is a simple and intuitive way for analysing patient information across populations and image modalities. It also allows convenient visualizations that can be included in clinical reports for a fast overview of the whole structure. While cardiac ventricles, especially the left ventricle, have an established standard representation (e.g. bull's eye plot), the 2D depiction of the left atrium (LA) is challenging due to its sub-structural complexity including the pulmonary veins (PV) and the left atrial appendage (LAA). Quasi-conformal flattening techniques, successfully applied to cardiac ventricles, require additional constraints in the case of the LA to place the PV and LAA in the same geometrical 2D location for different cases. Some registration-based methods have been proposed but 3D (or 2D) surface registration is time-consuming and prone to errors if the geometries are very different. We propose a novel atrial flattening methodology where a quasi-conformal 2D map of the LA is obtained quickly and without errors related to registration. In our approach, the LA is divided into 5 regions which are then mapped to their analogue two-dimensional regions. A dataset of 67 human left atria from magnetic resonance images (MRI) was studied to derive a population-based 2D LA template representing the averaged relative locations of the PVs and LAA. The clinical application of the proposed methodology is illustrated on different use cases including the integration of MRI and electroanatomical data.

更新日期：2020-01-17
• arXiv.cs.FL Pub Date : 2020-01-15
Lars Jaffke; Mateus de Oliveira Oliveira; Hans Raj Tiwary

It can be shown that each permutation group $G \sqsubseteq S_n$ can be embedded, in a well defined sense, in a connected graph with $O(n+|G|)$ vertices. Some groups, however, require much fewer vertices. For instance, $S_n$ itself can be embedded in the $n$-clique $K_n$, a connected graph with n vertices. In this work, we show that the minimum size of a context-free grammar generating a finite permutation group $G \sqsubseteq S_n$ can be upper bounded by three structural parameters of connected graphs embedding $G$: the number of vertices, the treewidth, and the maximum degree. More precisely, we show that any permutation group $G \sqsubseteq S_n$ that can be embedded into a connected graph with $m$ vertices, treewidth k, and maximum degree $\Delta$, can also be generated by a context-free grammar of size $2^{O(k\Delta\log\Delta)}\cdot m^{O(k)}$. By combining our upper bound with a connection between the extension complexity of a permutation group and the grammar complexity of a formal language, we also get that these permutation groups can be represented by polytopes of extension complexity $2^{O(k \Delta\log \Delta)}\cdot m^{O(k)}$. The above upper bounds can be used to provide trade-offs between the index of permutation groups, and the number of vertices, treewidth and maximum degree of connected graphs embedding these groups. In particular, by combining our main result with a celebrated $2^{\Omega(n)}$ lower bound on the grammar complexity of the symmetric group $S_n$ we have that connected graphs of treewidth $o(n/\log n)$ and maximum degree $o(n/\log n)$ embedding subgroups of $S_n$ of index $2^{cn}$ for some small constant $c$ must have $n^{\omega(1)}$ vertices. This lower bound can be improved to exponential on graphs of treewidth $n^{\varepsilon}$ for $\varepsilon<1$ and maximum degree $o(n/\log n)$.

更新日期：2020-01-17
• arXiv.cs.FL Pub Date : 2020-01-16
Gerco van Heerdt; Tobias Kappé; Jurriaan Rot; Matteo Sammartino; Alexandra Silva

Automata learning is a popular technique used to automatically construct an automaton model from queries. Much research went into devising ad hoc adaptations of algorithms for different types of automata. The CALF project seeks to unify these using category theory in order to ease correctness proofs and guide the design of new algorithms. In this paper, we extend CALF to cover learning of algebraic structures that may not have a coalgebraic presentation. Furthermore, we provide a detailed algorithmic account of an abstract version of the popular L* algorithm, which was missing from CALF. We instantiate the abstract theory to a large class of Set functors, by which we recover for the first time practical tree automata learning algorithms from an abstract framework and at the same time obtain new algorithms to learn algebras of quotiented polynomial functors.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-15
Anant Khandelwal; Niraj Kumar

Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-15
Pinkesh Badjatiya; Manish Gupta; Vasudeva Varma

With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of information from the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and in mitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size ~96k and a Twitter dataset of size ~24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-15
Laura Ruis; Mitchell Stern; Julia Proskurnia; William Chan

We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation. The model consists of two phases that are executed iteratively, 1) an insertion phase and 2) a deletion phase. The insertion phase parameterizes a distribution of insertions on the current output hypothesis, while the deletion phase parameterizes a distribution of deletions over the current output hypothesis. The training method is a principled and simple algorithm, where the deletion model obtains its signal directly on-policy from the insertion model output. We demonstrate the effectiveness of our Insertion-Deletion Transformer on synthetic translation tasks, obtaining significant BLEU score improvement over an insertion-only model.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Silei Xu; Giovanni Campagna; Jian Li; Monica S. Lam

Virtual assistants today require every website to submit skills individually into their proprietary repositories. The skill consists of a fixed set of supported commands and the formal representation of each command. The assistants use the contributed data to create a proprietary linguistic interface, typically using an intent classifier. This paper proposes an open-source toolkit, called Schema2QA, that leverages the Schema.org markup found in many websites to automatically build skills. Schema2QA has several advantages: (1) Schema2QA handles compositional queries involving multiple fields automatically, such as "find the Italian restaurant around here with the most reviews", or "what W3C employees on LinkedIn went to Oxford"; (2) Schema2QA translates natural language into executable queries on the up-to-date data from the website; (3) natural language training can be applied to one domain at a time to handle multiple websites using the same Schema.org representations. We apply Schema2QA to two different domains, showing that the skills we built can answer useful queries with little manual effort. Our skills achieve an overall accuracy between 74% and 78%, and can answer questions that span three or more properties with 65% accuracy. We also show that a new domain can be supported by transferring knowledge. The open-source Schema2QA lets each website create and own its linguistic interface.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Haoran Chen; Jianmin Li; Xiaolin Hu

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there still exist some non-negligible problems in the decoder of a video captioning model. We make a thorough investigation into the decoder and adopt three techniques to improve the performance of the model. First of all, a combination of variational dropout and layer normalization is embedded into a recurrent unit to alleviate the problem of overfitting. Secondly, a new method is proposed to evaluate the performance of a model on a validation set so as to select the best checkpoint for testing. Finally, a new training strategy called \textit{professional learning} is proposed which develops the strong points of a captioning model and bypasses its weaknesses. It is demonstrated in the experiments on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 11.7% on MSVD and 5% on MSR-VTT compared with the previous state-of-the-art models.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Trung Q. Tran

I introduce a simple but efficient method to solve one of the critical aspects of English grammar which is the relationship between active sentence and passive sentence. In fact, an active sentence and its corresponding passive sentence express the same meaning, but their structure is different. I utilized Prolog [4] along with Definite Clause Grammars (DCG) [5] for doing the conversion between active sentence and passive sentence. Some advanced techniques were also used such as Extra Arguments, Extra Goals, Lexicon, etc. I tried to solve a variety of cases of active and passive sentences such as 12 English tenses, modal verbs, negative form, etc. More details and my contributions will be presented in the following sections. The source code is available at https://github.com/tqtrunghnvn/ActiveAndPassive.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Kiet Van Nguyen; Khiem Vinh Tran; Son T. Luu; Anh Gia-Tuan Nguyen; Ngan Luu-Thuy Nguyen

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Jan Trienes; Dolf Trieschnigg; Christin Seifert; Djoerd Hiemstra

Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Antoine Gourru; Adrien Guille; Julien Velcin; Julien Jacques

We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-03
David Noever; Wes Regian; Matt Ciolino; Josh Kalin; Dom Hambrick; Kaye Blankenship

Small satellite constellations provide daily global coverage of the earth's landmass, but image enrichment relies on automating key tasks like change detection or feature searches. For example, to extract text annotations from raw pixels requires two dependent machine learning models, one to analyze the overhead image and the other to generate a descriptive caption. We evaluate seven models on the previously largest benchmark for satellite image captions. We extend the labeled image samples five-fold, then augment, correct and prune the vocabulary to approach a rough min-max (minimum word, maximum description). This outcome compares favorably to previous work with large pre-trained image models but offers a hundred-fold reduction in model size without sacrificing overall accuracy (when measured with log entropy loss). These smaller models provide new deployment opportunities, particularly when pushed to edge processors, on-board satellites, or distributed ground stations. To quantify a caption's descriptiveness, we introduce a novel multi-class confusion or error matrix to score both human-labeled test data and never-labeled images that include bounding box detection but lack full sentence captions. This work suggests future captioning strategies, particularly ones that can enrich the class coverage beyond land use applications and that lessen color-centered and adjacency adjectives ("green", "near", "between", etc.). Many modern language transformers present novel and exploitable models with world knowledge gleaned from training from their vast online corpus. One interesting, but easy example might learn the word association between wind and waves, thus enriching a beach scene with more than just color descriptions that otherwise might be accessed from raw pixels without text annotation.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-15
Shubham Agarwal; Raghav Goyal

This manuscript describes our approach for the Visual Dialog Challenge 2018. We use an ensemble of three discriminative models with different encoders and decoders for our final submission. Our best performing model on 'test-std' split achieves the NDCG score of 55.46 and the MRR value of 63.77, securing third position in the challenge.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-14
Vivian Lai; Han Liu; Chenhao Tan

To support human decision making with machine learning models, we often need to elucidate patterns embedded in the models that are unsalient, unknown, or counterintuitive to humans. While existing approaches focus on explaining machine predictions with real-time assistance, we explore model-driven tutorials to help humans understand these patterns in a training phase. We consider both tutorials with guidelines from scientific papers, analogous to current practices of science communication, and automatically selected examples from training data with explanations. We use deceptive review detection as a testbed and conduct large-scale, randomized human-subject experiments to examine the effectiveness of such tutorials. We find that tutorials indeed improve human performance, with and without real-time assistance. In particular, although deep learning provides superior predictive performance than simple models, tutorials and explanations from simple models are more useful to humans. Our work suggests future directions for human-centered tutorials and explanations towards a synergy between humans and AI.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-15
Li Wang; Zechen Bai; Yonghua Zhang; Hongtao Lu

Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Chunyi Wang

A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused by lack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extracted from existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposed solution is evaluated on the two Chinese corpus and two English corpus, and is shown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Jiaju Du; Fanchao Qi; Maosong Sun; Zhiyuan Liu

Sememes, defined as the minimum semantic units of human languages in linguistics, have been proven useful in many NLP tasks. Since manual construction and update of sememe knowledge bases (KBs) are costly, the task of automatic sememe prediction has been proposed to assist sememe annotation. In this paper, we explore the approach of applying dictionary definitions to predicting sememes for unannotated words. We find that sememes of each word are usually semantically matched to different words in its dictionary definition, and we name this matching relationship local semantic correspondence. Accordingly, we propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes. We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance. Moreover, further quantitative analysis shows that our model can properly learn the local semantic correspondence between sememes and words in dictionary definitions, which explains the effectiveness of our model. The source codes of this paper can be obtained from https://github.com/thunlp/scorp.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2020-01-16
Viet Duong; Phu Pham; Ritwik Bose; Jiebo Luo

Recently, the emergence of the #MeToo trend on social media has empowered thousands of people to share their own sexual harassment experiences. This viral trend, in conjunction with the massive personal information and content available on Twitter, presents a promising opportunity to extract data driven insights to complement the ongoing survey based studies about sexual harassment in college. In this paper, we analyze the influence of the #MeToo trend on a pool of college followers. The results show that the majority of topics embedded in those #MeToo tweets detail sexual harassment stories, and there exists a significant correlation between the prevalence of this trend and official reports on several major geographical regions. Furthermore, we discover the outstanding sentiments of the #MeToo tweets using deep semantic meaning representations and their implications on the affected users experiencing different types of sexual harassment. We hope this study can raise further awareness regarding sexual misconduct in academia.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2018-12-31
Alexandra N. M. Darmon; Marya Bazzi; Sam D. Howison; Mason A. Porter

Whether enjoying the lucid prose of a favorite author or slogging through some other writer's cumbersome, heavy-set prattle (full of parentheses, em dashes, compound adjectives, and Oxford commas), readers will notice stylistic signatures not only in word choice and grammar, but also in punctuation itself. Indeed, visual sequences of punctuation from different authors produce marvelously different (and visually striking) sequences. Punctuation is a largely overlooked stylistic feature in "stylometry", the quantitative analysis of written text. In this paper, we examine punctuation sequences in a corpus of literary documents and ask the following questions: Are the properties of such sequences a distinctive feature of different authors? Is it possible to distinguish literary genres based on their punctuation sequences? Do the punctuation styles of authors evolve over time? Are we on to something interesting in trying to do stylometry without words, or are we full of sound and fury (signifying nothing)?

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-07-09
Gavin Abercrombie; Riza Batista-Navarro

Parliamentary and legislative debate transcripts provide access to information concerning the opinions, positions and policy preferences of elected politicians. They attract attention from researchers from a wide variety of backgrounds, from political and social sciences to computer science. As a result, the problem of automatic sentiment and position-taking analysis has been tackled from different perspectives, using varying approaches and methods, and with relatively little collaboration or cross-pollination of ideas. The existing research is scattered across publications from various fields and venues. In this article we present the results of a systematic literature review of 61 studies, all of which address the automatic analysis of the sentiment and opinions expressed and positions taken by speakers in parliamentary (and other legislative) debates. In this review, we discuss the available research with regard to the aims and objectives of the researchers who work on these problems, the automatic analysis tasks they undertake, and the approaches and methods they use. We conclude by summarizing their findings, discussing the challenges of applying computational analysis to parliamentary debates, and suggesting possible avenues for further research.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-07-29
Weinan E; Yajun Zhou

We present a mathematical model to characterize the meaning of words with language-independent numerical fingerprints. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector, interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words. These language-independent semantic representations enable a robot reader to both understand short texts in a given language (automated question-answering) and match medium-length texts across different languages (automated word translation). Our semantic fingerprints quantify local meaning of words in 14 representative languages across 5 major language families, suggesting a universal and cost-effective mechanism by which human languages are processed at the semantic level.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-08-30
Shuailiang Zhang; Hai Zhao; Yuwei Wu; Zhuosheng Zhang; Xi Zhou; Xiang Zhou

Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question. Previous approaches usually only calculate question-aware passage representation and ignore passage-aware question representation when modeling the relationship between passage and question, which obviously cannot take the best of information between passage and question. In this work, we propose dual co-matching network (DCMN) which models the relationship among passage, question and answer options bidirectionally. Besides, inspired by how human solve multi-choice questions, we integrate two reading strategies into our model: (i) passage sentence selection that finds the most salient supporting sentences to answer the question, (ii) answer option interaction that encodes the comparison information between answer options. DCMN integrated with the two strategies (DCMN+) obtains state-of-the-art results on five multi-choice reading comprehension datasets which are from different domains: RACE, SemEval-2018 Task 11, ROCStories, COIN, MCTest.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-09-04
Bas Hofstra; Vivek V. Kulkarni; Sebastian Munoz-Najar Galvez; Bryan He; Dan Jurafsky; Daniel A. McFarland

Prior work finds a diversity paradox: diversity breeds innovation, and yet, underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-population of ~1.2 million US doctoral recipients from 1977-2015 and following their careers into publishing and faculty positions. We use text analysis and machine learning to answer a series of questions: How do we detect scientific innovations? Are underrepresented groups more likely to generate scientific innovations? And are the innovations of underrepresented groups adopted and rewarded? Our analyses show that underrepresented groups produce higher rates of scientific novelty. However, their novel contributions are devalued and discounted: e.g., novel contributions by gender and racial minorities are taken up by other scholars at lower rates than novel contributions by gender and racial majorities, and equally impactful contributions of gender and racial minorities are less likely to result in successful scientific careers than for majority groups. These results suggest there may be unwarranted reproduction of stratification in academic careers that discounts diversity's role in innovation and partly explains the underrepresentation of some groups in academia.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-09-05
Wenxuan Zhou; Hongtao Lin; Bill Yuchen Lin; Ziqi Wang; Junyi Du; Leonardo Neves; Xiang Ren

Deep neural models for relation extraction tend to be less reliable when perfectly labeled data is limited, despite their success in label-sufficient scenarios. Instead of seeking more instance-level labels from human annotators, here we propose to annotate frequent surface patterns to form labeling rules. These rules can be automatically mined from large text corpora and generalized via a soft rule matching mechanism. Prior works use labeling rules in an exact matching fashion, which inherently limits the coverage of sentence matching and results in the low-recall issue. In this paper, we present a neural approach to ground rules for RE, named NERO, which jointly learns a relation extraction module and a soft matching module. One can employ any neural relation extraction models as the instantiation for the RE module. The soft matching module learns to match rules with semantically similar sentences such that raw corpora can be automatically labeled and leveraged by the RE module (in a much better coverage) as augmented supervision, in addition to the exactly matched sentences. Extensive experiments and analysis on two public and widely-used datasets demonstrate the effectiveness of the proposed NERO framework, comparing with both rule-based and semi-supervised methods. Through user studies, we find that the time efficiency for a human to annotate rules and sentences are similar (0.30 vs. 0.35 min per label). In particular, NERO's performance using 270 rules is comparable to the models trained using 3,000 labeled sentences, yielding a 9.5x speedup. Moreover, NERO can predict for unseen relations at test time and provide interpretable predictions. We release our code to the community for future research.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-10-08
Gustavo Aguilar; Yuan Ling; Yu Zhang; Benjamin Yao; Xing Fan; Chenlei Guo

Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft-labels to optimize the student. However, when the teacher is considerably large, there is no guarantee that the internal knowledge of the teacher will be transferred into the student; even if the student closely matches the soft-labels, its internal representations may be considerably different. This internal mismatch can undermine the generalization capabilities originally intended to be transferred from the teacher to the student. In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it. We formulate two ways to distill such representations and various algorithms to conduct the distillation. We experiment with datasets from the GLUE benchmark and consistently show that adding knowledge distillation from internal representations is a more powerful method than only using soft-label distillation.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-11-05
Mahaveer Jain; Kjell Schubert; Jay Mahadeokar; Ching-Feng Yeh; Kaustubh Kalgaonkar; Anuroop Sriram; Christian Fuegen; Michael L. Seltzer

Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization) into one single model. This greatly simplifies training and inference and hence makes RNN-T a desirable choice for ASR systems. In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed of the originally proposed RNN-T beam search algorithm. We evaluated our proposed system on English videos ASR dataset and show that neural RNN-T models can achieve comparable WER and better computational efficiency compared to a well tuned hybrid ASR baseline.

更新日期：2020-01-17
• arXiv.cs.CL Pub Date : 2019-12-28
Da Ju; Kurt Shuster; Y-Lan Boureau; Jason Weston

As single-task accuracy on individual language and image tasks has improved substantially in the last few years, the long-term goal of a generally skilled agent that can both see and talk becomes more feasible to explore. In this work, we focus on leveraging individual language and image tasks, along with resources that incorporate both vision and language towards that objective. We design an architecture that combines state-of-the-art Transformer and ResNeXt modules fed into a novel attentive multimodal module to produce a combined model trained on many tasks. We provide a thorough analysis of the components of the model, and transfer performance when training on one, some, or all of the tasks. Our final models provide a single system that obtains good results on all vision and language tasks considered, and improves the state-of-the-art in image-grounded conversational applications.

更新日期：2020-01-17
• arXiv.cs.SD Pub Date : 2020-01-14
Yanpei Shi; Qiang Huang; Thomas Hain

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain. To evaluate speaker identification and verification performance of the proposed approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark datasets. Moreover, the robustness of our proposed approach is also tested on VoxCeleb1 data when being corrupted by three types of interferences, general noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.

更新日期：2020-01-16
• arXiv.cs.SD Pub Date : 2020-01-15
Alexander Schindler; Thomas Lidy; Sebastian Böck

Deep Learning has become state of the art in visual computing and continuously emerges into the Music Information Retrieval (MIR) and audio retrieval domain. In order to bring attention to this topic we propose an introductory tutorial on deep learning for MIR. Besides a general introduction to neural networks, the proposed tutorial covers a wide range of MIR relevant deep learning approaches. \textbf{Convolutional Neural Networks} are currently a de-facto standard for deep learning based audio retrieval. \textbf{Recurrent Neural Networks} have proven to be effective in onset detection tasks such as beat or audio-event detection. \textbf{Siamese Networks} have been shown effective in learning audio representations and distance functions specific for music similarity retrieval. We will incorporate both academic and industrial points of view into the tutorial. Accompanying the tutorial, we will create a Github repository for the content presented at the tutorial as well as references to state of the art work and literature for further reading. This repository will remain public after the conference.

更新日期：2020-01-16
• arXiv.cs.SD Pub Date : 2020-01-11
Mingda Li; Weitong Ruan; Xinyue Liu; Luca Soldaini; Wael Hamza; Chengwei Su

In a modern spoken language understanding (SLU) system, the natural language understanding (NLU) module takes interpretations of a speech from the automatic speech recognition (ASR) module as the input. The NLU module usually uses the first best interpretation of a given speech in downstream tasks such as domain and intent classification. However, the ASR module might misrecognize some speeches and the first best interpretation could be erroneous and noisy. Solely relying on the first best interpretation could make the performance of downstream tasks non-optimal. To address this issue, we introduce a series of simple yet efficient models for improving the understanding of semantics of the input speeches by collectively exploiting the n-best speech interpretations from the ASR module.

更新日期：2020-01-16
• arXiv.cs.MM Pub Date : 2020-01-15
Rabie Hachemi; Ikram Achar; Biasi Wiga; Mahfoud Sidi Ali Mebarek

Humans are capable of identifying a book only by looking at its cover, but how can computers do the same? In this paper, we explore different feature detectors and matching methods for book cover identification, and compare their performances in terms of both speed and accuracy. This will allow, for example, libraries to develop interactive services based on cover book picture. Only one single image of a cover book needs to be available through a database. Tests have been performed by taking into account different transformations of each book cover image. Encouraging results have been achieved.

更新日期：2020-01-16
• arXiv.cs.MM Pub Date : 2020-01-15
Linsen Song; Wayne Wu; Chen Qian; Ran He; Chen Change Loy

We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating arbitrary source audio into arbitrary video output. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e., expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio. The geometry and pose parameters of the target human portrait are retained, therefore preserving the context of the original video footage. Finally, we introduce a novel video rendering network and a dynamic programming method to construct a temporally coherent and photo-realistic video. Extensive experiments demonstrate the superiority of our method over existing approaches. Our method is end-to-end learnable and robust to voice variations in the source audio.

更新日期：2020-01-16
• arXiv.cs.IR Pub Date : 2020-01-14
Rahul Radhakrishnan Iyer; Rohan Kohli; Shrimai Prabhumoye

With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping. However, there is still a big gap between the products that customers really desire to purchase and relevance of products that are suggested in response to a query from the customer. In this paper, we propose a robust way of predicting relevance scores given a search query and a product, using techniques involving machine learning, natural language processing and information retrieval. We compare conventional information retrieval models such as BM25 and Indri with deep learning models such as word2vec, sentence2vec and paragraph2vec. We share some of our insights and findings from our experiments.

更新日期：2020-01-16
• arXiv.cs.IR Pub Date : 2020-01-15
Nilavra Bhattacharya; Somnath Rakshit; Jacek Gwizdka; Paul Kogut

We propose an image-classification method to predict the perceived-relevance of text documents from eye-movements. An eye-tracking study was conducted where participants read short news articles, and rated them as relevant or irrelevant for answering a trigger question. We encode participants' eye-movement scanpaths as images, and then train a convolutional neural network classifier using these scanpath images. The trained classifier is used to predict participants' perceived-relevance of news articles from the corresponding scanpath images. This method is content-independent, as the classifier does not require knowledge of the screen-content, or the user's information-task. Even with little data, the image classifier can predict perceived-relevance with up to 80% accuracy. When compared to similar eye-tracking studies from the literature, this scanpath image classification method outperforms previously reported metrics by appreciable margins. We also attempt to interpret how the image classifier differentiates between scanpaths on relevant and irrelevant documents.

更新日期：2020-01-16
Contents have been reproduced by permission of the publishers.

down
wechat
bug