-
A data-driven approach for assessing biking safety in cities EPJ Data Sci. (IF 2.873) Pub Date : 2021-03-03 Sara Daraei; Konstantinos Pelechrinis; Daniele Quercia
With the focus that cities around the world have put on sustainable transportation during the past few years, biking has become one of the foci for local governments globally. Cities all over the world invest in biking infrastructure, including bike lanes, bike parking racks, shared (dockless) bike systems etc. However, one of the critical factors in converting city-dwellers to (regular) bike users/commuters
-
Understanding vehicular routing behavior with location-based service data EPJ Data Sci. (IF 2.873) Pub Date : 2021-02-26 Yanyan Xu, Riccardo Di Clemente, Marta C. González
Properly extracting patterns of individual mobility with high resolution data sources such as the one extracted from smartphone applications offers important opportunities. Potential opportunities not offered by call detailed records (CDRs), which offer resolutions triangulated from antennas, are route choices, travel modes detection and close encounters. Nowadays, there is not a standard and large
-
The mobility laws of location-based games EPJ Data Sci. (IF 2.873) Pub Date : 2021-02-15 Leonardo Tonetto, Eemil Lagerspetz, Aaron Yi Ding, Jörg Ott, Sasu Tarkoma, Petteri Nurmi
Mobility is a fundamental characteristic of human society that shapes various aspects of our everyday interactions. This pervasiveness of mobility makes it paramount to understand factors that govern human movement and how it varies across individuals. Currently, factors governing variations in personal mobility are understudied with existing research focusing on explaining the aggregate behaviour
-
Connecting and linking neurocognitive, digital phenotyping, physiologic, psychophysical, neuroimaging, genomic, & sensor data with survey data EPJ Data Sci. (IF 2.873) Pub Date : 2021-02-12 Charles E. Knott, Stephen Gomori, Mai Ngyuen, Susan Pedrazzani, Sridevi Sattaluri, Frank Mierzwa, Kim Chantala
Combining survey data with alternative data sources (e.g., wearable technology, apps, physiological, ecological monitoring, genomic, neurocognitive assessments, brain imaging, and psychophysical data) to paint a complete biobehavioral picture of trauma patients comes with many complex system challenges and solutions. Starting in emergency departments and incorporating these diverse, broad, and separate
-
Attention dynamics on the Chinese social media Sina Weibo during the COVID-19 pandemic EPJ Data Sci. (IF 2.873) Pub Date : 2021-02-03 Hao Cui, János Kertész
Understanding attention dynamics on social media during pandemics could help governments minimize the effects. We focus on how COVID-19 has influenced the attention dynamics on the biggest Chinese microblogging website Sina Weibo during the first four months of the pandemic. We study the real-time Hot Search List (HSL), which provides the ranking of the most popular 50 hashtags based on the amount
-
The rhythms of the night: increase in online night activity and emotional resilience during the spring 2020 Covid-19 lockdown EPJ Data Sci. (IF 2.873) Pub Date : 2021-02-01 Maria Castaldo, Tommaso Venturini, Paolo Frasca, Floriana Gargiulo
Context The lockdown orders established in multiple countries in response to the Covid-19 pandemic are arguably one of the most widespread and deepest shock experienced by societies in recent years. Studying their impact trough the lens of social media offers an unprecedented opportunity to understand the susceptibility and the resilience of human activity patterns to large-scale exogenous shocks.
-
Dark Web Marketplaces and COVID-19: before the vaccine EPJ Data Sci. (IF 2.873) Pub Date : 2021-01-21 Alberto Bracci, Matthieu Nadini, Maxwell Aliapoulios, Damon McCoy, Ian Gray, Alexander Teytelboym, Angela Gallo, Andrea Baronchelli
The COVID-19 pandemic has reshaped the demand for goods and services worldwide. The combination of a public health emergency, economic distress, and misinformation-driven panic have pushed customers and vendors towards the shadow economy. In particular, dark web marketplaces (DWMs), commercial websites accessible via free software, have gained significant popularity. Here, we analyse 851,199 listings
-
Characteristics of human mobility patterns revealed by high-frequency cell-phone position data EPJ Data Sci. (IF 2.873) Pub Date : 2021-01-19 Chen Zhao, An Zeng, Chi Ho Yeung
Human mobility is an important characteristic of human behavior, but since tracking personalized position to high temporal and spatial resolution is difficult, most studies on human mobility patterns rely on sparsely sampled position data. In this work, we re-examined human mobility patterns via comprehensive cell-phone position data recorded at a high frequency up to every second. We constructed human
-
Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts EPJ Data Sci. (IF 2.873) Pub Date : 2021-01-19 Ryan J. Gallagher, Morgan R. Frank, Lewis Mitchell, Aaron J. Schwartz, Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement
-
Unraveling the hidden organisation of urban systems and their mobility flows EPJ Data Sci. (IF 2.873) Pub Date : 2021-01-15 Riccardo Gallotti, Giulia Bertagnolli, Manlio De Domenico
Increasing evidence suggests that cities are complex systems, with structural and dynamical features responsible for a broad spectrum of emerging phenomena. Here we use a unique data set of human flows and couple it with information on the underlying street network to study, simultaneously, the structural and functional organisation of 10 world megacities. We quantify the efficiency of flow exchange
-
Challenges when identifying migration from geo-located Twitter data EPJ Data Sci. (IF 2.873) Pub Date : 2021-01-07 Caitrin Armstrong, Ate Poorthuis, Matthew Zook, Derek Ruths, Thomas Soehl
Given the challenges in collecting up-to-date, comparable data on migrant populations the potential of digital trace data to study migration and migrants has sparked considerable interest among researchers and policy makers. In this paper we assess the reliability of one such data source that is heavily used within the research community: geolocated tweets. We assess strategies used in previous work
-
Privacy preserving data visualizations EPJ Data Sci. (IF 2.873) Pub Date : 2021-01-07 Demetris Avraam, Rebecca Wilson, Oliver Butters, Thomas Burton, Christos Nicolaides, Elinor Jones, Andy Boyd, Paul Burton
Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants
-
Estimating tie strength in social networks using temporal communication data EPJ Data Sci. (IF 2.873) Pub Date : 2020-12-14 Javier Ureña-Carrion, Jari Saramäki, Mikko Kivelä
Even though the concept of tie strength is central in social network analysis, it is difficult to quantify how strong social ties are. One typical way of estimating tie strength in data-driven studies has been to simply count the total number or duration of contacts between two people. This, however, disregards many features that can be extracted from the rich data sets used for social network reconstruction
-
Quantifying the economic impact of disasters on businesses using human mobility data: a Bayesian causal inference approach EPJ Data Sci. (IF 2.873) Pub Date : 2020-12-03 Takahiro Yabe, Yunchang Zhang, Satish V. Ukkusuri
In recent years, extreme shocks, such as natural disasters, are increasing in both frequency and intensity, causing significant economic loss to many cities around the world. Quantifying the economic cost of local businesses after extreme shocks is important for post-disaster assessment and pre-disaster planning. Conventionally, surveys have been the primary source of data used to quantify damages
-
A multi-layer approach to disinformation detection in US and Italian news spreading on Twitter EPJ Data Sci. (IF 2.873) Pub Date : 2020-11-23 Francesco Pierri, Carlo Piccardi, Stefano Ceri
We tackle the problem of classifying news articles pertaining to disinformation vs mainstream news by solely inspecting their diffusion mechanisms on Twitter. This approach is inherently simple compared to existing text-based approaches, as it allows to by-pass the multiple levels of complexity which are found in news content (e.g. grammar, syntax, style). As we employ a multi-layer representation
-
Scholarly migration within Mexico: analyzing internal migration among researchers using Scopus longitudinal bibliometric data EPJ Data Sci. (IF 2.873) Pub Date : 2020-11-05 Andrea Miranda-González, Samin Aref, Tom Theile, Emilio Zagheni
The migration of scholars is a major driver of innovation and of diffusion of knowledge. Although large-scale bibliometric data have been used to measure international migration of scholars, our understanding of internal migration among researchers is very limited. This is partly due to a lack of data aggregated at a suitable sub-national level. In this study, we analyze internal migration in Mexico
-
A network theory of inter-firm labor flows EPJ Data Sci. (IF 2.873) Pub Date : 2020-11-02 Eduardo López, Omar A. Guerrero, Robert L. Axtell
Using detailed administrative microdata for two countries, we build a modeling framework that yields new explanations for the origin of firm sizes, the firm contributions to unemployment, and the job-to-job mobility of workers between firms. Firms are organized as nodes in networks where connections represent low mobility barriers for workers. These labor flow networks are determined empirically, and
-
The great divide: drivers of polarization in the US public EPJ Data Sci. (IF 2.873) Pub Date : 2020-10-28 Lucas Böttcher, Hans Gersbach
Many democratic societies have become more politically polarized, with the U.S. being the main example. The origins of this phenomenon are still not well-understood and subject to debate. To provide insight into some of the mechanisms underlying political polarization, we develop a mathematical framework and employ Bayesian Markov chain Monte-Carlo (MCMC) and information-theoretic concepts to analyze
-
Human biases in body measurement estimation EPJ Data Sci. (IF 2.873) Pub Date : 2020-10-27 Kirill Martynov, Kiran Garimella, Robert West
Body measurements, including weight and height, are key indicators of health. Being able to visually assess body measurements reliably is a step towards increased awareness of overweight and obesity and is thus important for public health. Nevertheless it is currently not well understood how accurately humans can assess weight and height from images, and when and how they fail. To bridge this gap,
-
Susceptible-infected-spreading-based network embedding in static and temporal networks EPJ Data Sci. (IF 2.873) Pub Date : 2020-10-16 Xiu-Xiu Zhan, Ziyu Li, Naoki Masuda, Petter Holme, Huijuan Wang
Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be
-
Modeling and predicting evacuation flows during hurricane Irma EPJ Data Sci. (IF 2.873) Pub Date : 2020-09-29 Lingzi Hong, Vanessa Frias-Martinez
Evacuations are a common practice to mitigate the potential risks and damages made by natural disasters. However, without proper coordination and management, evacuations can be inefficient and cause negative impact. Local governments and organizations need to have a better understanding of how the population responds to disasters and evacuation recommendations so as to enhance their disaster management
-
Early detection of influenza outbreak using time derivative of incidence. EPJ Data Sci. (IF 2.873) Pub Date : 2020-09-11 Woo-Sik Son,Ji-Eun Park,Okyu Kwon
For mitigation strategies of an influenza outbreak, it can be helpful to understand the characteristics of regional and age-group-specific spread. In South Korea, however, there has been no official statistic related to it. In this study, we extract the time series of influenza incidence from National Health Insurance Service claims database, which consists of all medical and prescription drug-claim
-
Estimating educational outcomes from students’ short texts on social media EPJ Data Sci. (IF 2.873) Pub Date : 2020-09-01 Ivan Smirnov
Digital traces have become an essential source of data in social sciences because they provide new insights into human behavior and allow studies to be conducted on a larger scale. One particular area of interest is the estimation of various users’ characteristics from their texts on social media. Although it has been established that basic categorical attributes could be effectively predicted from
-
Enriching feature engineering for short text samples by language time series analysis EPJ Data Sci. (IF 2.873) Pub Date : 2020-08-31 Yichen Tang; Kelly Blincoe; Andreas W. Kempa-Liehr
In this case study, we are extending feature engineering approaches for short text samples by integrating techniques which have been introduced in the context of time series classification and signal processing. The general idea of the presented feature engineering approach is to tokenize the text samples under consideration and map each token to a number, which measures a specific property of the
-
Estimating community feedback effect on topic choice in social media with predictive modeling EPJ Data Sci. (IF 2.873) Pub Date : 2020-08-31 David Ifeoluwa Adelani; Ryota Kobayashi; Ingmar Weber; Przemyslaw A. Grabowicz
Social media users post content on various topics. A defining feature of social media is that other users can provide feedback—called community feedback—to their content in the form of comments, replies, and retweets. We hypothesize that the amount of received feedback influences the choice of topics on which a social media user posts. However, it is challenging to test this hypothesis as user heterogeneity
-
A weighted travel time index based on data from Uber Movement EPJ Data Sci. (IF 2.873) Pub Date : 2020-08-08 Renato S. Vieira; Eduardo A. Haddad
In this paper, we combine data from Uber Movement and from a representative household travel survey to constructs a weighted travel time index for the Metropolitan Region of São Paulo. The index is calculated based on the average travel time of Uber trips taken between each pair of traffic zone and in each hour between January 1st, 2016 to December 31, 2018. The index is weighted based on trips reported
-
Mapping socioeconomic indicators using social media advertising data EPJ Data Sci. (IF 2.873) Pub Date : 2020-07-29 Masoomali Fatehkia; Isabelle Tingzon; Ardie Orden; Stephanie Sy; Vedran Sekara; Manuel Garcia-Herranz; Ingmar Weber
The United Nations Sustainable Development Goals (SDGs) are a global consensus on the world’s most pressing challenges. They come with a set of 232 indicators against which countries should regularly monitor their progress, ensuring that everyone is represented in up-to-date data that can be used to make decisions to improve people’s lives. However, existing data sources to measure progress on the
-
Efficient algorithm to compute Markov transitional probabilities for a desired PageRank EPJ Data Sci. (IF 2.873) Pub Date : 2020-07-29 Gábor Berend
We propose an efficient algorithm to learn the transition probabilities of a Markov chain in a way that its weighted PageRank scores meet some predefined target values. Our algorithm does not require any additional information about the nodes and the edges in the form of features, i.e., it solely considers the network topology for calibrating the transition probabilities of the Markov chain for obtaining
-
The Butterfly “Affect”: impact of development practices on cryptocurrency prices EPJ Data Sci. (IF 2.873) Pub Date : 2020-07-23 Silvia Bartolucci; Giuseppe Destefanis; Marco Ortu; Nicola Uras; Michele Marchesi; Roberto Tonelli
The network of developers in distributed ledgers and blockchains open source projects is essential to maintaining the platform: understanding the structure of their exchanges, analysing their activity and its quality (e.g. issues resolution times, politeness in comments) is important to determine how “healthy” and efficient a project is. The quality of a project affects the trust in the platform, and
-
Segregated interactions in urban and online space EPJ Data Sci. (IF 2.873) Pub Date : 2020-07-10 Xiaowen Dong; Alfredo J. Morales; Eaman Jahani; Esteban Moro; Bruno Lepri; Burcin Bozkaya; Carlos Sarraute; Yaneer Bar-Yam; Alex Pentland
Urban income segregation is a widespread phenomenon that challenges societies across the globe. Classical studies on segregation have largely focused on the geographic distribution of residential neighborhoods rather than on patterns of social behaviors and interactions. In this study, we analyze segregation in economic and social interactions by observing credit card transactions and Twitter mentions
-
Temporal social network reconstruction using wireless proximity sensors: model selection and consequences EPJ Data Sci. (IF 2.873) Pub Date : 2020-07-08 Sicheng Dai; Hélène Bouchet; Aurélie Nardy; Eric Fleury; Jean-Pierre Chevrot; Márton Karsai
The emerging technologies of wearable wireless devices open entirely new ways to record various aspects of human social interactions in a broad range of settings. Such technologies allow to log the temporal dynamics of face-to-face interactions by detecting the physical proximity of participants. However, despite the wide usage of this technology and the collected datasets, precise reconstruction methods
-
Which politicians receive abuse? Four factors illuminated in the UK general election 2019 EPJ Data Sci. (IF 2.873) Pub Date : 2020-07-02 Genevieve Gorrell; Mehmet E. Bakir; Ian Roberts; Mark A. Greenwood; Kalina Bontcheva
The 2019 UK general election took place against a background of rising online hostility levels toward politicians, and concerns about the impact of this on democracy, as a record number of politicians cited the abuse they had been receiving as a reason for not standing for re-election. We present a four-factor framework in understanding who receives online abuse and why. The four factors are prominence
-
Economic outcomes predicted by diversity in cities EPJ Data Sci. (IF 2.873) Pub Date : 2020-06-24 Shi Kai Chong; Mohsen Bahrami; Hao Chen; Selim Balcisoy; Burcin Bozkaya; Alex ‘Sandy’ Pentland
Much recent work has illuminated the growth, innovation, and prosperity of entire cities, but there is relatively less evidence concerning the growth and prosperity of individual neighborhoods. In this paper we show that diversity of amenities within a city neighborhood, computed from openly available points of interest on digital maps, accurately predicts human mobility (“flows”) between city neighborhoods
-
Hypernetwork science via high-order hypergraph walks EPJ Data Sci. (IF 2.873) Pub Date : 2020-06-10 Sinan G. Aksoy; Cliff Joslyn; Carlos Ortiz Marrero; Brenda Praggastis; Emilie Purvine
We propose high-order hypergraph walks as a framework to generalize graph-based network science techniques to hypergraphs. Edge incidence in hypergraphs is quantitative, yielding hypergraph walks with both length and width. Graph methods which then generalize to hypergraphs include connected component analyses, graph distance-based metrics such as closeness centrality, and motif-based measures such
-
Efficient modeling of higher-order dependencies in networks: from algorithm to application for anomaly detection EPJ Data Sci. (IF 2.873) Pub Date : 2020-06-09 Mandana Saebi; Jian Xu; Lance M. Kaplan; Bruno Ribeiro; Nitesh V. Chawla
Complex systems, represented as dynamic networks, comprise of components that influence each other via direct and/or indirect interactions. Recent research has shown the importance of using Higher-Order Networks (HONs) for modeling and analyzing such complex systems, as the typical Markovian assumption in developing the First Order Network (FON) can be limiting. This higher-order network representation
-
In search of art: rapid estimates of gallery and museum visits using Google Trends EPJ Data Sci. (IF 2.873) Pub Date : 2020-06-05 Federico Botta; Tobias Preis; Helen Susannah Moat
Measuring collective human behaviour has traditionally been a time-consuming and expensive process, impairing the speed at which data can be made available to decision makers in policy. Can data generated through widespread use of online services help provide faster insights? Here, we consider an example relating to policymaking for culture and the arts: publicly funded museums and galleries in the
-
PepMusic: motivational qualities of songs for daily activities EPJ Data Sci. (IF 2.873) Pub Date : 2020-05-24 Yongsung Kim; Luca Maria Aiello; Daniele Quercia
Music can motivate many daily activities as it can regulate mood, increase productivity and sports performance, and raise spirits. However, we know little about how to recommend songs that are motivational for people given their contexts and activities. As a first step towards dealing with this issue, we adopt a theory-driven approach and operationalize the Brunel Music Rating Inventory (BMRI) to identify
-
Public debate in the media matters: evidence from the European refugee crisis EPJ Data Sci. (IF 2.873) Pub Date : 2020-05-13 Caleb M. Koch; Izabela Moise; Dirk Helbing; Karsten Donnay
In this paper, we take a novel approach to study the empirical relationship between public debate in the media and asylum acceptance rates in Europe from 2002–2016. In theory, an asylum seeker should experience the same likelihood of being granted refugee status from each of the 20 European countries we study. Yet, in practice, acceptance rates vary widely for nearly every asylum country of origin
-
Comparative analysis of layered structures in empirical investor networks and cellphone communication networks EPJ Data Sci. (IF 2.873) Pub Date : 2020-05-07 Peng Wang; Jun-Chao Ma; Zhi-Qiang Jiang; Wei-Xing Zhou; Didier Sornette
Empirical investor networks (EIN) proposed by Ozsoylev et al. are assumed to capture the information spreading path among investors. Here, we perform a comparative analysis between the EIN and the cellphone communication networks (CN) to test whether EIN is an information exchanging network from the perspective of the layer structures of ego networks. We employ two clustering algorithms (k-means algorithm
-
News and the city: understanding online press consumption patterns through mobile data EPJ Data Sci. (IF 2.873) Pub Date : 2020-04-29 Salvatore Vilella; Daniela Paolotti; Giancarlo Ruffo; Leo Ferres
The always increasing mobile connectivity affects every aspect of our daily lives, including how and when we keep ourselves informed and consult news media. By studying a DPI (deep packet inspection) dataset, provided by one of the major Chilean telecommunication companies, we investigate how different cohorts of the population of Santiago De Chile consume news media content through their smartphones
-
Success and luck in creative careers EPJ Data Sci. (IF 2.873) Pub Date : 2020-04-28 Milán Janosov; Federico Battiston; Roberta Sinatra
Luck is considered a crucial ingredient to achieve impact in all creative domains, despite their diversity. For instance, in science, the movie industry, music, and art, the occurrence of the highest impact work and a hot streak within a creative career are very difficult to predict. Are there domains that are more prone to luck than others? Here, we provide new insights on the role of randomness in
-
Correction to: Gendered behavior as a disadvantage in open source software development EPJ Data Sci. (IF 2.873) Pub Date : 2019-09-19 Balazs Vedres, Orsolya Vasarhelyi
Following publication of the original article [1], we have been notified that one more affiliation of the corresponding author is missing. Currently Balasz Vedres affiliation is: 1 Oxford Internet Institute, University of Oxford, Oxford, United Kingdom It should be: 1 Oxford Internet Institute, University of Oxford, Oxford, United Kingdom; 2 Department of Network and Data Science, Central European
-
A new set of cluster driven composite development indicators EPJ Data Sci. (IF 2.873) Pub Date : 2020-04-10 Anshul Verma; Orazio Angelini; Tiziana Di Matteo
Composite development indicators used in policy making often subjectively aggregate a restricted set of indicators. We show, using dimensionality reduction techniques, including Principal Component Analysis (PCA) and for the first time information filtering and hierarchical clustering, that these composite indicators miss key information on the relationship between different indicators. In particular
-
Fake news propagates differently from real news even at early stages of spreading EPJ Data Sci. (IF 2.873) Pub Date : 2020-04-03 Zilong Zhao; Jichang Zhao; Yukie Sano; Orr Levy; Hideki Takayasu; Misako Takayasu; Daqing Li; Junjie Wu; Shlomo Havlin
Social media can be a double-edged sword for society, either as a convenient channel exchanging ideas or as an unexpected conduit circulating fake news through a large population. While existing studies of fake news focus on theoretical modeling of propagation or identification methods based on machine learning, it is important to understand the realistic propagation mechanisms between theoretical
-
Measuring the effect of node aggregation on community detection EPJ Data Sci. (IF 2.873) Pub Date : 2020-03-11 Yérali Gandica; Adeline Decuyper; Christophe Cloquet; Isabelle Thomas; Jean-Charles Delvenne
Many times the nodes of a complex network, whether deliberately or not, are aggregated for technical, ethical, legal limitations or privacy reasons. A common example is the geographic position: one may uncover communities in a network of places, or of individuals identified with their typical geographical position, and then aggregate these places into larger entities, such as municipalities, thus obtaining
-
Measuring and mitigating behavioural segregation using Call Detail Records EPJ Data Sci. (IF 2.873) Pub Date : 2020-03-06 Daniel Rhoads; Ivan Serrano; Javier Borge-Holthoefer; Albert Solé-Ribalta
The overwhelming amounts of data we generate in our daily routine and in social networks has been crucial for the understanding of various social and economic factors. The use of this data represents a low-cost alternative source of information in parallel to census data and surveys. Here, we advocate for such an approach to assess and alleviate the segregation of Syrian refugees in Turkey. Using a
-
The shocklet transform: a decomposition method for the identification of local, mechanism-driven dynamics in sociotechnical time series EPJ Data Sci. (IF 2.873) Pub Date : 2020-02-07 David Rushing Dewhurst; Thayer Alshaabi; Dilan Kiley; Michael V. Arnold; Joshua R. Minot; Christopher M. Danforth; Peter Sheridan Dodds
We introduce a qualitative, shape-based, timescale-independent time-domain transform used to extract local dynamics from sociotechnical time series—termed the Discrete Shocklet Transform (DST)—and an associated similarity search routine, the Shocklet Transform And Ranking (STAR) algorithm, that indicates time windows during which panels of time series display qualitatively-similar anomalous behavior
-
Novelty and influence of creative works, and quantifying patterns of advances based on probabilistic references networks EPJ Data Sci. (IF 2.873) Pub Date : 2020-01-30 Doheum Park; Juhan Nam; Juyong Park
Recent advances in the quantitative, computational methodology for the modeling and analysis of heterogeneous large-scale data are leading to new opportunities for understanding human behaviors and faculties, including creativity that drives creative enterprises such as science. While innovation is crucial for novel and influential achievements, quantifying these qualities in creative works remains
-
The individual dynamics of affective expression on social media EPJ Data Sci. (IF 2.873) Pub Date : 2020-01-09 Max Pellert; Simon Schweighofer; David Garcia
Understanding the temporal dynamics of affect is crucial for our understanding human emotions in general. In this study, we empirically test a computational model of affective dynamics by analyzing a large-scale dataset of Facebook status updates using text analysis techniques. Our analyses support the central assumptions of our model: After stimulation, affective states, quantified as valence and
-
The higher education space: connecting degree programs from individuals’ choices EPJ Data Sci. (IF 2.873) Pub Date : 2019-12-30 Cristian Candia; Sara Encarnação; Flávio L. Pinheiro
Data on the applicants’ revealed preferences when entering higher education is used as a proxy to build the Higher Education Space (HES) of Portugal (2008–2015) and Chile (2006–2017). The HES is a network that connects pairs of degree programs according to their co-occurrence in the applicants’ preferences. We show that both HES network structures reveal the existence of positive assortment in features
-
What did you see? A study to measure personalization in Google’s search engine EPJ Data Sci. (IF 2.873) Pub Date : 2019-12-16 Tobias D. Krafft; Michael Gamer; Katharina A. Zweig
In this paper we present the results of the project “#Datenspende” where during the German election in 2017 more than 4000 people contributed their search results regarding keywords connected to the German election campaign.Analyzing the donated result lists we prove, that the room for personalization of the search results is very small. Thus the opportunity for the effect mentioned in Eli Pariser’s
-
Following the footsteps of giants: modeling the mobility of historically notable individuals using Wikipedia EPJ Data Sci. (IF 2.873) Pub Date : 2019-12-12 Lorenzo Lucchini; Sara Tonelli; Bruno Lepri
The steady growth of digitized historical information is continuously stimulating new different approaches to the fields of Digital Humanities and Computational Social Science. In this work we use Natural Language Processing techniques to retrieve large amounts of historical information from Wikipedia. In particular, the pages of a set of historically notable individuals are processed to catch the
-
Gravity law in the Chinese highway freight transportation networks EPJ Data Sci. (IF 2.873) Pub Date : 2019-12-12 Li Wang; Jun-Chao Ma; Zhi-Qiang Jiang; Wanfeng Yan; Wei-Xing Zhou
The gravity law has been documented in many socioeconomic networks, which states that the flow between two nodes positively correlates with the strengths of the nodes and negatively correlates with the distance between the two nodes. However, such research on highway freight transportation networks (HFTNs) is rare. We construct the directed and undirected highway freight transportation networks between
-
Quantifying echo chamber effects in information spreading over political communication networks EPJ Data Sci. (IF 2.873) Pub Date : 2019-12-09 Wesley Cota; Silvio C. Ferreira; Romualdo Pastor-Satorras; Michele Starnini
Echo chambers in online social networks, in which users prefer to interact only with ideologically-aligned peers, are believed to facilitate misinformation spreading and contribute to radicalize political discourse. In this paper, we gauge the effects of echo chambers in information spreading phenomena over political communication networks. Mining 12 million Twitter messages, we reconstruct a network
-
From individual to collective behaviours: exploring population heterogeneity of human mobility based on social media data EPJ Data Sci. (IF 2.873) Pub Date : 2019-11-14 Yuan Liao; Sonia Yeh; Gustavo S. Jeuken
This paper examines the population heterogeneity of travel behaviours from a combined perspective of individual actors and collective behaviours. We use a social media dataset of 652,945 geotagged tweets generated by 2,933 Swedish Twitter users covering an average time span of 3.6 years. No explicit geographical boundaries, such as national borders or administrative boundaries, are applied to the data
-
Mapping the physics research space: a machine learning approach EPJ Data Sci. (IF 2.873) Pub Date : 2019-11-06 Matteo Chinazzi; Bruno Gonçalves; Qian Zhang; Alessandro Vespignani
Scientific discoveries do not occur in vacuum but rather by connecting existing pieces of knowledge in new and creative ways. Mapping the relation and structure of scientific knowledge is therefore central to our understanding of the dynamics of scientific production. Here we introduce a new approach to generate scientific knowledge maps based on a machine learning approach that, starting from the
-
Assessing the risk of default propagation in interconnected sectoral financial networks EPJ Data Sci. (IF 2.873) Pub Date : 2019-11-04 Adrià Barja; Alejandro Martínez; Alex Arenas; Pablo Fleurquin; Jordi Nin; José J. Ramasco; Elena Tomás
Systemic risk of financial institutions and sectoral companies relies on their inter-dependencies. The inter-connectivity of the financial networks has proven to be crucial to understand the propagation of default, as it plays a central role to assess the impact of single default events in the full system. Here, we take advantage of complex network theory to shed light on the mechanisms behind default
-
Knowledge-based biomedical Data Science. EPJ Data Sci. (IF 2.873) Pub Date : 2017-01-01 Lawrence E Hunter
Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled "Big Data to Knowledge (BD2K)." The main emphasis of the more than $200M allocated to that program has been on "Big Data;" the "Knowledge" component has largely been the implicit assumption that
-
Success in books: predicting book sales before publication EPJ Data Sci. (IF 2.873) Pub Date : 2019-10-17 Xindi Wang; Burcu Yucesoy; Onur Varol; Tina Eliassi-Rad; Albert-László Barabási
Reading remains a preferred leisure activity fueling an exceptionally competitive publishing market: among more than three million books published each year, only a tiny fraction are read widely. It is largely unpredictable, however, which book will that be, and how many copies it will sell. Here we aim to unveil the features that affect the success of books by predicting a book’s sales prior to its
-
Complete trajectory reconstruction from sparse mobile phone data EPJ Data Sci. (IF 2.873) Pub Date : 2019-10-12 Guangshuo Chen; Aline Carneiro Viana; Marco Fiore; Carlos Sarraute
Mobile phone data are a popular source of positioning information in many recent studies that have largely improved our understanding of human mobility. These data consist of time-stamped and geo-referenced communication events recorded by network operators, on a per-subscriber basis. They allow for unprecedented tracking of populations of millions of individuals over long periods that span months
Contents have been reproduced by permission of the publishers.