当前期刊: Big Data Research Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • Data-Driven Computational Social Science: A Survey
    Big Data Res. (IF 2.673) Pub Date : 2020-08-06
    Jun Zhang, Wei Wang, Feng Xia, Yu-Ru Lin, Hanghang Tong

    Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research

  • Anytime Frequent Itemset Mining of Transactional Data Streams
    Big Data Res. (IF 2.673) Pub Date : 2020-07-28
    Poonam Goyal, Jagat Sesh Challa, Shivin Shrivastava, Navneet Goyal

    Mining frequent itemsets from transactional data streams has become very essential in today's world with many applications such as stock market analysis, retail chain analysis, web log analysis, etc. Various algorithms have been proposed to efficiently mine single-port and multi-port transactional streams within the constraints of limited time and memory. However, all of them are budget algorithms

  • Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration
    Big Data Res. (IF 2.673) Pub Date : 2020-05-08
    A.M. Fernández, D. Gutiérrez-Avilés, A. Troncoso, F. Martínez–Álvarez

    The vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm for cluster deployment and big data analytics. However, to get started up is still a task that may take much time when manually done, due to the requisites that all nodes must fulfill. This work introduces

  • PatSeg: A Sequential Patent Segmentation Approach
    Big Data Res. (IF 2.673) Pub Date : 2020-05-04
    Maryam Habibi, Astrid Rheinlaender, Wolfgang Thielemann, Robert Adams, Peter Fischer, Sylvia Krolkiewicz, David Luis Wiegandt, Ulf Leser

    Patents are an important source of information in industry and academia. However, quickly grasping the essence of a given patent is difficult as they typically are very long and written in a rather inaccessible style. These essential information, especially the invention itself and the experimental part of the invention, are usually contained in the description section. However, in many patents the

  • Entity Resolution with Recursive Blocking
    Big Data Res. (IF 2.673) Pub Date : 2020-04-30
    Shao-Qing Yu

    Entity resolution is a well-known challenge in data management for the lack of unique identifiers of records and various errors hidden in the data, undermining the identifiability of entities they refer to. To reveal matching records, every record potentially needs to be compared with all other records in the database, which is computationally intractable even for moderately-sized databases. To circumvent

  • A Hierarchical Dimension Reduction Approach for Big Data with Application to Fault Diagnostics
    Big Data Res. (IF 2.673) Pub Date : 2019-08-22
    R. Krishnan, V.A. Samaranayake, S. Jagannathan

    About four zetta bytes of data, which falls into the category of big data, is generated by complex manufacturing systems annually. Big data can be utilized to improve the efficiency of an aging manufacturing system, provided, several challenges are handled. In this paper, a novel methodology is presented to detect faults in manufacturing systems while overcoming some of these challenges. Specifically

  • Parallelizing Computations of Full Disjunctions
    Big Data Res. (IF 2.673) Pub Date : 2019-07-12
    Matteo Paganelli, Domenico Beneventano, Francesco Guerra, Paolo Sottovia

    In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations

  • HiePaCo: Scalable Hierarchical Exploration in Abstract Parallel Coordinates Under Budget Constraints
    Big Data Res. (IF 2.673) Pub Date : 2019-07-08
    Gaëlle Richer, Joris Sansen, Frédéric Lalanne, David Auber, Romain Bourqui

    In exploratory visualization systems, interactions allow to manipulate a visual representation and thereby gain insight into its supporting data. The responsiveness of these interactions is crucial, but achieving it on common hardware becomes increasingly difficult with the ever-growing size of datasets. Moreover, the representation of a large dataset itself is challenging since screen space is limited

  • Interactive Visual Analytics for Sensemaking with Big Text
    Big Data Res. (IF 2.673) Pub Date : 2019-04-25
    Michelle Dowling, Nathan Wycoff, Brian Mayer, John Wenskovitch, Scotland Leman, Leanna House, Nicholas Polys, Chris North, Peter Hauck

    Analysts face many steep challenges when performing sensemaking tasks on collections of textual information larger than can be reasonably analyzed without computational assistance. To scale up such sensemaking tasks, new methods are needed to interactively integrate human cognitive sensemaking activity with machine learning. Towards that goal, we offer a human-in-the-loop computational model that mirrors

  • Bi-objective Traffic Optimization in Geo-distributed Data Flows
    Big Data Res. (IF 2.673) Pub Date : 2019-04-25
    Anna-Valentini Michailidou, Anastasios Gounaris

    Recently, there have been several proposals in the area of geo-distributed big data processing. In this work, we aim to address a limitation of the existing solutions, namely to optimize task allocation across geographically distributed data centers, in a way that both the total traffic and the running time of the whole processing in complex multi-stage flows are targeted. Apart from proposing concrete

  • Anomaly Detection and Repair for Accurate Predictions in Geo-distributed Big Data
    Big Data Res. (IF 2.673) Pub Date : 2019-04-17
    Roberto Corizzo, Michelangelo Ceci, Nathalie Japkowicz

    The increasing presence of geo-distributed sensor networks implies the generation of huge volumes of data from multiple geographical locations at an increasing rate. This raises important issues which become more challenging when the final goal is that of the analysis of the data for forecasting purposes or, more generally, for predictive tasks. This paper proposes a framework which supports predictive

  • Systematic Review of the Literature on Big Data in the Transportation Domain: Concepts and Applications
    Big Data Res. (IF 2.673) Pub Date : 2019-03-19
    Alex Neilson, Indratmo, Ben Daniel, Stevanus Tjandra

    Research in Big Data and analytics offers tremendous opportunities to utilize evidence in making decisions in many application domains. To what extent can the paradigms of Big Data and analytics be used in the domain of transport? This article reports on an outcome of a systematic review of published articles in the last five years that discuss Big Data concepts and applications in the transportation

  • Joint Contour Net Analysis for Feature Detection in Lattice Quantum Chromodynamics Data
    Big Data Res. (IF 2.673) Pub Date : 2019-03-01
    Dean P. Thomas, Rita Borgo, Robert S. Laramee, Simon J. Hands

    In this paper we demonstrate the use of multivariate topological algorithms to analyse and interpret Lattice Quantum Chromodynamics (QCD) data. Lattice QCD is a long established field of theoretical physics research in the pursuit of understanding the strong nuclear force. Complex computer simulations model interactions between quarks and gluons to test theories regarding the behaviour of matter in

  • Towards Hybrid Multi-Cloud Storage Systems: Understanding How to Perform Data Transfer
    Big Data Res. (IF 2.673) Pub Date : 2019-03-01
    Antonio Celesti, Antonino Galletta, Maria Fazio, Massimo Villari

    Nowadays, storing and retrieving information over the Cloud is critical for the survival and growth of organizations and people. In this context, the possibility to store a huge amount of data and files on remote third-party Cloud storage providers is becoming an even more concrete practice. Unfortunately, there is not any guarantee regarding data availability and reliability of such providers. In

  • Visual Exploration of Geolocated Time Series with Hybrid Indexing
    Big Data Res. (IF 2.673) Pub Date : 2019-02-07
    Georgios Chatzigeorgakidis, Kostas Patroumpas, Dimitrios Skoutas, Spiros Athanasiou, Spiros Skiadopoulos

    Geolocated time series are time series that correspond to specific locations. They can represent, for example, visitor check-ins at certain venues or readings of sensors installed at various places. The amount and significance of such time series have increased in many domains over the last years. However, although several works exist for time series visualization and visual analytics in general, there

  • Label-Aware Distributed Ensemble Learning: A Simplified Distributed Classifier Training Model for Big Data
    Big Data Res. (IF 2.673) Pub Date : 2018-11-16
    Shadi Khalifa, Patrick Martin, Rebecca Young

    Label-Aware Distributed Ensemble Learning (LADEL) is a programming model and an associated implementation for distributing any classifier training to handle Big Data. It only requires users to specify the training data source, the classification algorithm and the desired parallelization level. First, a distributed stratified sampling algorithm is proposed to generate stratified samples from large,

  • Forecasting Price Movements in Betting Exchanges Using Cartesian Genetic Programming and ANN
    Big Data Res. (IF 2.673) Pub Date : 2018-10-11
    Ivars Dzalbs, Tatiana Kalganova

    Since the introduction of betting exchanges in 2000, there has been increased interest of ways to monetize on the new technology. Betting exchange markets are fairly similar to the financial markets in terms of their operation. Due to the lower market share and newer technology, there are very few tools available for automated trading for betting exchanges. The in-depth analysis of features available

  • Models and Practices in Urban Data Science at Scale
    Big Data Res. (IF 2.673) Pub Date : 2018-08-14
    Marco Balduini, Marco Brambilla, Emanuele Della Valle, Christian Marazzi, Tahereh Arabghalizi, Behnam Rahdari, Michele Vescovi

    Cities can be observed through a broad set of sensing technologies, spanning from physical sensors in the streets, to socio-economic reports, to other kinds of sources that are able to represent the behaviour of the citizens and visitors, such as mobile phone records, social media posts, and other digital traces. In this paper, we propose a conceptual framework for putting at use this variety of Big

  • Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study
    Big Data Res. (IF 2.673) Pub Date : 2018-08-14
    Jacek Cała, Paolo Missier

    The value of knowledge assets generated by analytics processes using Data Science techniques tends to decay over time, as a consequence of changes in the elements the process depends on: external data sources, libraries, and system dependencies. For large-scale problems, refreshing those outcomes through greedy re-computation is both expensive and inefficient, as some changes have limited impact. In

  • From Homomorphisms to Embeddings: A Novel Approach for Mining Embedded Patterns from Large Tree Data
    Big Data Res. (IF 2.673) Pub Date : 2018-08-08
    Xiaoying Wu, Dimitri Theodoratos, Timos Sellis

    Many modern applications and systems represent and exchange data in tree-structured form and process and produce large tree datasets. Discovering informative patterns in large tree datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful

  • SMARTLET: A Dynamic Architecture for Real Time Face Recognition in Smartphone Using Cloudlets and Cloud
    Big Data Res. (IF 2.673) Pub Date : 2018-07-20
    Md Fazlay Rabbi Masum Billah, Muhammad Abdullah Adnan

    Face recognition in smartphone has become an important utility in a smart city for ensuring security by law enforcement. It has various applications such as capturing real life events, tracking movement of a celebrity, detecting wanted criminals, searching for missing child, surveillance, etc. Involving smartphones to do this job is challenging. Low computation power and limited battery life are the

  • Lossless Pruned Naive Bayes for Big Data Classifications
    Big Data Res. (IF 2.673) Pub Date : 2018-07-17
    Nanfei Sun, Bingjun Sun, Jian (Denny) Lin, Michael Yu-Chi Wu

    In a fast growing big data era, volume and varieties of data processed in Internet applications drastically increase. Real-world search engines commonly use text classifiers with thousands of classes to improve relevance or data quality. These large scale classification problems lead to severe runtime performance challenges, so practitioners often resort to fast approximation techniques. However, the

  • Fast Gaussian Process Regression for Big Data
    Big Data Res. (IF 2.673) Pub Date : 2018-07-04
    Sourish Das, Sasanka Roy, Rajiv Sambasivan

    Gaussian Processes are widely used for regression tasks. A known limitation in the application of Gaussian Processes to regression tasks is that the computation of the solution requires performing a matrix inversion. The solution also requires the storage of a large matrix in memory. These factors restrict the application of Gaussian Process regression to small and moderate size datasets. We present

  • Towards Sustainable Smart City by Particulate Matter Prediction Using Urban Big Data, Excluding Expensive Air Pollution Infrastructures
    Big Data Res. (IF 2.673) Pub Date : 2018-06-26
    Ali Reza Honarvar, Ashkan Sami

    Living in the age of data and the new era of digitalization of cities have created a large volume of datasets and data flows associated with the urban environments. It is significantly vital to capture and analyze the data from various resources in smart cities. For instance, the real-time air pollution data are remarkably important in controlling air pollution for urban sustainability and protecting

  • kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning
    Big Data Res. (IF 2.673) Pub Date : 2018-06-01
    Hossein Estiri, Behzad Abounia Omran, Shawn N. Murphy

    The majority of the clinical observation data stored in large-scale Electronic Health Record (EHR) research data networks are unlabeled. Unsupervised clustering can provide invaluable tools for studying patient sub-groups in these data. Many of the popular unsupervised clustering algorithms are dependent on identifying the number of clusters. Multiple statistical methods are available to approximate

  • Novel Approach to Predict Hospital Readmissions Using Feature Selection from Unstructured Data with Class Imbalance
    Big Data Res. (IF 2.673) Pub Date : 2018-06-01
    Arun Sundararaman, Srinivasan Valady Ramanathan, Ramprasad Thati

    Feature selection for predictive analytics continues to be a major challenge in the healthcare industry, particularly as it relates to readmission prediction. Several research works in mining healthcare data have focused on structured data for readmission prediction. Even within those works that are based on unstructured data, significant gaps exist in addressing class imbalance, context specific noise

  • evoStream – Evolutionary Stream Clustering Utilizing Idle Times
    Big Data Res. (IF 2.673) Pub Date : 2018-05-30
    Matthias Carnein, Heike Trautmann

    Clustering is an important field in data mining that aims to reveal hidden patterns in data sets. It is widely popular in marketing or medical applications and used to identify groups of similar objects. Clustering possibly unbounded and evolving data streams is of particular interest due to the widespread deployment of large and fast data sources such as sensors. The vast majority of stream clustering

  • A Novel Clustering Method Using Enhanced Grey Wolf Optimizer and MapReduce
    Big Data Res. (IF 2.673) Pub Date : 2018-05-21
    Ashish Kumar Tripathi, Kapil Sharma, Manju Bala

    With advancement of the technology, data size is increasing rapidly. For making intelligent decisions based on data, efficacious analytic methods are required. Data clustering, a prominent analytic method of data mining, is being efficiently employed in data analytics. To analyze massive data sets, the improvement in the traditional methods is the urge of todays scenario. In this paper, an efficient

  • Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems
    Big Data Res. (IF 2.673) Pub Date : 2018-05-16
    Araek Tashkandi, Ingmar Wiese, Lena Wiese

    Patient similarity analysis is a precondition to apply machine learning technology on medical data. In this sense, patient similarity analysis harnesses the information wealth of electronic medical records (EMRs) to support medical decision making. A pairwise similarity computation can be used as the basis for personalized health prediction. With n patients the amount of (n2) similarity calculations

  • Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service
    Big Data Res. (IF 2.673) Pub Date : 2018-05-10
    Radwa Elshawi, Sherif Sakr, Domenico Talia, Paolo Trunfio

    Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is, nowadays, called Big Data Science. Big Data Science requires scalable architectures for storing and processing data

  • A Dynamic Neural Network Architecture with Immunology Inspired Optimization for Weather Data Forecasting
    Big Data Res. (IF 2.673) Pub Date : 2018-05-08
    Abir Jaafar Hussain, Panos Liatsis, Mohammed Khalaf, Hissam Tawfik, Haya Al-Asker

    Recurrent neural networks are dynamical systems that provide for memory capabilities to recall past behaviour, which is necessary in the prediction of time series. In this paper, a novel neural network architecture inspired by the immune algorithm is presented and used in the forecasting of naturally occurring signals, including weather big data signals. Big Data Analysis is a major research frontier

  • Predicting Adverse Events After Surgery
    Big Data Res. (IF 2.673) Pub Date : 2018-04-27
    Senjuti Basu Roy, Moushumi Maria, Tina Wang, Anne Ehlers, David Flum

    Predicting risk of adverse events (AEs) following surgical procedure is of significant interest, as that may guide in better resource utilization and an improved quality of care. Currently available comorbidity indices are largely inaccurate to predict adverse events other than death, as well as off-the-shelf machine learning models do not typically account for the temporal sequence of events to enable

  • Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining
    Big Data Res. (IF 2.673) Pub Date : 2018-04-25
    Lakshmipadmaja D, B. Vishnuvardhan

    This study focuses on feature subset selection from high dimensionality databases and presents modification to the existing Random Subset Feature Selection (RSFS) algorithm for the random selection of feature subsets and for improving stability. A standard k-nearest-neighbor (kNN) classifier is used for classification. The RSFS algorithm is used for reducing the dimensionality of a data set by selecting

  • Revealing Physicians Referrals from Health Insurance Claims Data
    Big Data Res. (IF 2.673) Pub Date : 2018-04-25
    Vagner Figueredo de Santana, Ana Paula Appel, Luis Gregorio Moyano, Marcia Ito, Claudio Santos Pinhanez

    Health insurance companies in Brazil have their data about claims organized having the view only for service providers. In this way, they lose the view of physicians' activity and how physicians share patients. Partnership between physicians can be seen as fruitful, when they team up to help a patient, but could represent an issue as well, when a recommendation to visit another physician occurs only

  • Hybrid Bridge-Based Memetic Algorithms for Finding Bottlenecks in Complex Networks
    Big Data Res. (IF 2.673) Pub Date : 2018-04-16
    David Chalupa, Ken A. Hawick, James A. Walker

    We propose a memetic approach to find bottlenecks in complex networks based on searching for a graph partitioning with minimum conductance. Finding the optimum of this problem, also known in statistical mechanics as the Cheeger constant, is one of the most interesting NP-hard network optimisation problems. The existence of low conductance minima indicates bottlenecks in complex networks. However, the

  • A Direction Aware Particle Swarm Optimization with Sensitive Swarm Leader
    Big Data Res. (IF 2.673) Pub Date : 2018-03-28
    Krishn Kumar Mishra, Hemant Bisht, Tribhuvan Singh, Victor Chang

    This paper proposes a new Direction Aware Particle Swarm Optimization algorithm with Sensitive Swarm Leader (DAPSO-SSL). DAPSO-SSL maps the basic human nature of awareness, maturity, leader and followers relationship and leadership qualities to the popular PSO algorithm. It assigns these qualities to swarm leader and individual particles. In practical life, it is the moral responsibility of the leader

  • Scalable Machine Learning for Predicting At-Risk Profiles Upon Hospital Admission
    Big Data Res. (IF 2.673) Pub Date : 2018-03-05
    Pierre Genevès, Thomas Calmant, Nabil Layaïda, Marion Lepelley, Svetlana Artemova, Jean-Luc Bosson

    We show how the analysis of very large amounts of drug prescription data make it possible to detect, on the day of hospital admission, patients at risk of developing complications during their hospital stay. We explore, for the first time, to which extent volume and variety of big prescription data help in constructing predictive models for the automatic detection of at-risk profiles. Our methodology

  • Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Fabrizio Celli, Fabio Cumbo, Emanuel Weitschek

    DNA methylation is a well-studied genetic modification crucial to regulate the functioning of the genome. Its alterations play an important role in tumorigenesis and tumor-suppression. Thus, studying DNA methylation data may help biomarker discovery in cancer. Since public data on DNA methylation become abundant – and considering the high number of methylated sites (features) present in the genome

  • An Investigation to Identify Factors that Lead to Delay in Healthcare Reimbursement Process: A Brazilian case
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Ricardo Gerhardt, João F. Valiati, José Vicente Canto dos Santos

    Healthcare reimbursement has had a tremendous impact on healthcare institutions and the economy. The healthcare reimbursement process consists of coding, billing, and payment based on the care provided to the patient. The rapid development of new medical treatments and procedures and changes in regulations and policies have been increasing the complexity of the reimbursement process, resulting in financial

  • Insights into Antidepressant Prescribing Using Open Health Data
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Brian Cleland, Jonathan Wallace, Raymond Bond, Michaela Black, Maurice Mulvenna, Deborah Rankin, Austin Tanney

    The growth of big data is transforming many economic sectors, including the medical and healthcare sector. Despite this, research into the practical application of data analytics to the development of health policy is still limited. In this study we examine how data science and machine learning methods can be applied to a variety of open health datasets, including GP prescribing data, disease prevalence

  • A Novel Adaptive Feature Extraction for Detection of Cardiac Arrhythmias Using Hybrid Technique MRDWT & MPNN Classifier from ECG Big Data
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Hari Mohan Rai, Kalyan Chatterjee

    The efficient automatic detection of cardiac arrhythmia using a hybrid technique from ECG big data has been proposed with novel feature extraction technique using Multiresolution Discrete Wavelet Transform (MRDWT) and Multilayer Probabilistic Neural Network (MPNN) classifier. Big Data of ECG signals have been selected from MIT–BIH arrhythmia database for detection of two types of arrhythmias LBBB (Left

  • Big Data Compliance for Innovative Clinical Models
    Big Data Res. (IF 2.673) Pub Date : 2018-02-05
    Massimiliano Giacalone, Carlo Cusatelli, Vito Santarcangelo

    In the healthcare sector, information is the most important aspect, and the human body in particular is the major source of data production: as a result, the new challenge for world healthcare is to take advantage of these huge amounts of data de-structured among themselves. In order to benefit from this advantage, technology offers a solution called Big Data Analysis that allows the management of

  • What Are Data? A Categorization of the Data Sensitivity Spectrum
    Big Data Res. (IF 2.673) Pub Date : 2017-12-02
    John M.M. Rumbold, Barbara K. Pierscionek

    The definition of data might at first glance seem prosaic, but formulating a definitive and useful definition is surprisingly difficult. This question is important because of the protection given to data in law and ethics. Healthcare data are universally considered sensitive (and confidential), so it might seem that the categorisation of less sensitive data is relatively unimportant for medical data

  • Community Detection Algorithm for Big Social Networks Using Hybrid Architecture
    Big Data Res. (IF 2.673) Pub Date : 2017-10-26
    Rahil Sharma, Suely Oliveira

    One of the most relevant and widely studied structural properties of networks is their community structure. Detecting communities is of great importance in social networks where systems are often represented as graphs. With the advent of web-based social networks like Twitter, Facebook and LinkedIn. community detection became even more difficult due to the massive network size, which can reach up to

  • Towards Visualizing Big Data with Large-Scale Edge Constraint Graph Drawing
    Big Data Res. (IF 2.673) Pub Date : 2017-10-23
    Ariyawat Chonbodeechalermroong, Rattikorn Hewett

    Visualization plays an important role in enabling understanding of big data. Graphs are crucial tools for visual analytics of big data networks such as social, biological, traffic and security networks. Graph drawing has been intensively researched to enhance aesthetic features (i.e., layouts, symmetry, cross-free edges). Early physic-inspired techniques have focused on synthetic abstract graphs whose

  • Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks
    Big Data Res. (IF 2.673) Pub Date : 2017-10-19
    Cihan Küçükkeçeci, Adnan Yazıcı

    Sensors are present in various forms all around the world such as mobile phones, surveillance cameras, smart televisions, intelligent refrigerators and blood pressure monitors. Usually, most of the sensors are a part of some other system with similar sensors that compose a network. One of such networks is composed of millions of sensors connected to the Internet which is called Internet of Things (IoT)

  • Generating High-Dimensional Datastreams for Change Detection
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Diego Carrera, Giacomo Boracchi

    A popular testbed for change-detection algorithms consists in detecting changes that have been synthetically injected in real-world datastreams. Unfortunately, most of experimental practices in the literature lead to injecting changes whose magnitude is unknown and can not be controlled. As a consequence, results are difficult to interpret, reproduce, and compare with. Most importantly, controlling

  • Variations on the Clustering Algorithm BIRCH
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Boris Lorbeer, Ana Kosareva, Bersant Deva, Dženan Softić, Peter Ruppel, Axel Küpper

    Clustering algorithms are recently regaining attention with the availability of large datasets and the rise of parallelized computing architectures. However, most clustering algorithms suffer from two drawbacks: they do not scale well with increasing dataset sizes and often require proper parametrization which is usually difficult to provide. A very important example is the cluster count, a parameter

  • Big Data for Context Aware Computing – Perspectives and Challenges
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Kalyan P. Subbu, Athanasios V. Vasilakos

    Big data has arrived. Myriad applications, systems generate data of humongous volumes, variety and velocity which traditional computing systems and databases are unable to manage. The proliferation of sensors in every possible device is also becoming one of the major generators of Big data. Of particular interest in this article is how context aware computing systems which derive context from data

  • A Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Fabio Pulvirenti, Pietro Michiardi

    In today's world, large volumes of data are being continuously generated by many scientific applications, such as bioinformatics or networking. Since each monitored event is usually characterized by a variety of features, high-dimensional datasets have been continuously generated. To extract value from these complex collections of data, different exploratory data mining algorithms can be used to discover

  • Partial Rollback-based Scheduling on In-memory Transactional Data Grids
    Big Data Res. (IF 2.673) Pub Date : 2017-08-24
    Junwhan Kim

    In-memory transactional data girds, often referred to as NoSQL data grids demand high concurrency for scalability and high performance in data-intensive applications. As an alternative concurrency control model, distributed transactional memory (DTM) promises to alleviate the difficulties of lock-based distributed synchronization. However, if a transaction aborts, DTM suffers from additional communication

  • Frequent Itemsets Mining for Big Data: A Comparative Analysis
    Big Data Res. (IF 2.673) Pub Date : 2017-08-24
    Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Fabio Pulvirenti, Luca Venturini

    Itemset mining is a well-known exploratory data mining technique used to discover interesting correlations hidden in a data collection. Since it supports different targeted analyses, it is profitably exploited in a wide range of different domains, ranging from network traffic data to medical records. With the increasing amount of generated data, different scalable algorithms have been developed, exploiting

  • BLADYG: A Graph Processing Framework for Large Dynamic Graphs
    Big Data Res. (IF 2.673) Pub Date : 2017-08-23
    Sabeur Aridhi, Alberto Montresor, Yannis Velegrakis

    Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, PowerGraph, GraphLab, and Trinity. However, these systems deal only with static graphs and do not consider

  • Random Forests for Big Data
    Big Data Res. (IF 2.673) Pub Date : 2017-08-23
    Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot, Nathalie Villa-Vialaneix

    Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision

  • Tensor Decomposition Based Approach for Training Extreme Learning Machines
    Big Data Res. (IF 2.673) Pub Date : 2017-08-23
    Nikhitha K. Nair, S. Asharaf

    Conventional Extreme Learning Machines utilize Moore–Penrose generalized pseudo-inverse to solve hidden layer activation matrix and perform analytical determination of output weights. Scalability is the major concern to be addressed in Extreme Learning Machines while dealing with large dataset. Motivated by these scalability concerns, this paper proposes a novel tensor decomposition based Extreme Learning

  • MapReduce Based Multilevel Consistent and Inconsistent Association Rule Detection from Big Data Using Interestingness Measures
    Big Data Res. (IF 2.673) Pub Date : 2017-08-23
    Dinesh J. Prajapati, Sanjay Garg, N.C. Chauhan

    Multilevel association rule mining in distributed environment plays an important role in big data analysis for making marketing strategy. Multilevel association rule provides more significant information than single level rule, and also discovers the conceptual hierarchy of knowledge from the hierarchical dataset. In this era of internet, various online marketing sites and social networking sites are

  • Data Reduction Through Increased Data Utilization in Chemical Dynamics Simulations
    Big Data Res. (IF 2.673) Pub Date : 2017-08-23
    Misha Ahmadian, Yu Zhuang, William L. Hase, Yong Chen

    Many scientific applications consist of heavy computational and analysis workload on data, and often require producing intermediate data for ongoing calculations. For instance, chemical dynamics simulations are known as heavy workload applications in terms of calculation in many aspects. There is a strong desire of seeking a solution to minimize expensive calculations by replacing them with light-weight

  • A Systematic Literature Review of the Data Replication Techniques in the Cloud Environments
    Big Data Res. (IF 2.673) Pub Date : 2017-07-25
    Bahareh Alami Milani, Nima Jafari Navimipour

    Cloud computing has various challenges, one of them is using copied data. Data replication is an important technique for distributed mass data management. The aim of the general idea of data replication is placing replications at different places, while there are several replications of a specific file at different points. Replication is one of the most broadly studied phenomena in the distributed

  • Hadoop MapReduce Performance on SSDs for Analyzing Social Networks
    Big Data Res. (IF 2.673) Pub Date : 2017-07-13
    M. Bakratsas, P. Basaras, D. Katsaros, L. Tassiulas

    The advent of Solid State Drives (SSDs) stimulated a lot of research to investigate and exploit to the extent possible the potentials of the new drive. The focus of this work is on the investigation of the relative performance and benefits of SSDs versus hard disk drives (HDDs) when they are used as underlying storage for Hadoop's MapReduce. In particular, we depart from all earlier relevant works

  • Fast Deep Convolutional Face Detection in the Wild Exploiting Hard Sample Mining
    Big Data Res. (IF 2.673) Pub Date : 2017-07-12
    Danai Triantafyllidou, Paraskevi Nousi, Anastasios Tefas

    Face detection constitutes a key visual information analysis task in Machine Learning. The rise of Big Data has resulted in the accumulation of a massive volume of visual data which requires proper and fast analysis. Deep Learning methods are powerful approaches towards this task as training with large amounts of data exhibiting high variability has been shown to significantly enhance their effectiveness

Contents have been reproduced by permission of the publishers.
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷