当前期刊: Big Data Research Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • PXDedup: Deduplicating Massive Visually Identical JPEG Image Data
    Big Data Res. (IF 2.673) Pub Date : 2020-11-20
    Hengxiang Xie; Yuhui Deng; Hao Feng; Lei Si

    The explosive growth of data brings a big challenge for the data storage and backup of data centers. Moreover, existing techniques of mobile phone make image become one of the main ways for information presentation. Most images are compressed to JPEG format, and the image data accounts for a large part of the data growth. To reduce the storage cost, data deduplication is proposed and has now become

  • Knowledge Graph-Based Spatial-Aware User Community Preference Query Algorithm for LBSNs
    Big Data Res. (IF 2.673) Pub Date : 2020-11-16
    Yanjun Wang; Liang Zhu; Jiangtao Ma; Guangwu Hu; Jiangchuan Liu; Yaqiong Qiao

    User community preference in Location-Based Social Networks (LBSNs) can meet the diversified location demands of group LBSN users. Although individual's location-based service recommendation or personal spatial preference query problem has been well addressed by many studies, user group or user community preference query is still under way and most only consider the spatial distance factor, which causes

  • Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework
    Big Data Res. (IF 2.673) Pub Date : 2020-11-12
    Zhong-Zhen Long; Guoxia Xu; Jiao Du; Hu Zhu; Taiyu Yan; Yu-Feng Yu

    Regarding as an important computing paradigm, cloud computing is to address big and distributed databases and rather simple computation. In this paradigm, data mining is one of the most important and fundamental problems. A large amount of data is generated by sensors and other intelligent devices. Data mining for these big data is crucial in various applications. K-means clustering is a typical technique

  • Efficient Mining of Hotspot Regional Patterns with Multi-Semantic Trajectories
    Big Data Res. (IF 2.673) Pub Date : 2020-10-14
    Zhen Zhang; Xiangguo Zhao; Yingchun Zhang; Jing Zhang; Haojie Nie; Youming Lou

    Due to the explosive growth in trajectory data with multi-semantic labels, mining hotspots based on extracted sequential patterns from multi-semantic trajectories are an emerging need in various applications of crowd behavior analysis, such as location-based advertising, business location, and urban planning. However, most existing regional pattern mining methods only focus on temporal continuity,

  • Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph
    Big Data Res. (IF 2.673) Pub Date : 2020-10-15
    Meng Wang; Haofen Wang; Guilin Qi; Qiushuo Zheng

    Large-scale knowledge graphs such as Wikidata and DBpedia have become a powerful asset for semantic search and question answering. However, most of the knowledge graph construction works focus on organizing and discovering textual knowledge in a structured representation, while paying little attention to the proliferation of visual resources on the Web. To consolidate this recent trend, in this paper

  • Incorporating Phase-Encoded Spectrum Masking into Speaker-Independent Monaural Source Separation
    Big Data Res. (IF 2.673) Pub Date : 2020-10-22
    Wen Zhang; Xiaoyong Li; Aolong Zhou; Kaijun Ren; Junqiang Song

    Typical speech separation systems usually operate in the time-frequency (T-F) domain by enhancing the magnitude response and leaving the phase response unaltered. Recent studies, however, suggest that phase is important for perceptual quality, leading many researchers to consider magnitude and phase spectrum enhancements. The complex-valued Fourier spectrum and real-valued shifted real spectrum (SRS)

  • WISE: Workload-Aware Partitioning for RDF Systems
    Big Data Res. (IF 2.673) Pub Date : 2020-10-27
    Xintong Guo; Hong Gao; Zhaonian Zou

    Masses of large-scale knowledge graphs on various domains have sprung up in recent years. They are no longer able to be managed on a single machine. The distributed RDF systems intervene in the scalability issue using partitioning techniques. However, most of these systems are unaware of query workload and employ static partitioning. As diverse and dynamic workloads keep emerging in the knowledge graph

  • JECI++: A Modified Joint Knowledge Graph Embedding Model for Concepts and Instances
    Big Data Res. (IF 2.673) Pub Date : 2020-10-23
    Peng Wang; Jing Zhou

    Concepts and instances are important parts in knowledge graphs, but most knowledge graph embedding models treat them as entities equally, that leads to inaccurate embeddings of concepts and instances. Aiming to solve this problem, we propose a novel knowledge graph embedding model called JECI++ to jointly embed concepts and instances. First, JECI++ simplifies hierarchical concepts based on subClassOf

  • Retrofitting Soft Rules for Knowledge Representation Learning
    Big Data Res. (IF 2.673) Pub Date : 2020-10-22
    Bo An; Xianpei Han; Cheng Fu; Le Sun
  • Data-Driven Computational Social Science: A Survey
    Big Data Res. (IF 2.673) Pub Date : 2020-08-06
    Jun Zhang, Wei Wang, Feng Xia, Yu-Ru Lin, Hanghang Tong

    Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research

  • Anytime Frequent Itemset Mining of Transactional Data Streams
    Big Data Res. (IF 2.673) Pub Date : 2020-07-28
    Poonam Goyal, Jagat Sesh Challa, Shivin Shrivastava, Navneet Goyal

    Mining frequent itemsets from transactional data streams has become very essential in today's world with many applications such as stock market analysis, retail chain analysis, web log analysis, etc. Various algorithms have been proposed to efficiently mine single-port and multi-port transactional streams within the constraints of limited time and memory. However, all of them are budget algorithms

  • Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration
    Big Data Res. (IF 2.673) Pub Date : 2020-05-08
    A.M. Fernández, D. Gutiérrez-Avilés, A. Troncoso, F. Martínez–Álvarez

    The vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm for cluster deployment and big data analytics. However, to get started up is still a task that may take much time when manually done, due to the requisites that all nodes must fulfill. This work introduces

  • PatSeg: A Sequential Patent Segmentation Approach
    Big Data Res. (IF 2.673) Pub Date : 2020-05-04
    Maryam Habibi, Astrid Rheinlaender, Wolfgang Thielemann, Robert Adams, Peter Fischer, Sylvia Krolkiewicz, David Luis Wiegandt, Ulf Leser

    Patents are an important source of information in industry and academia. However, quickly grasping the essence of a given patent is difficult as they typically are very long and written in a rather inaccessible style. These essential information, especially the invention itself and the experimental part of the invention, are usually contained in the description section. However, in many patents the

  • Entity Resolution with Recursive Blocking
    Big Data Res. (IF 2.673) Pub Date : 2020-04-30
    Shao-Qing Yu

    Entity resolution is a well-known challenge in data management for the lack of unique identifiers of records and various errors hidden in the data, undermining the identifiability of entities they refer to. To reveal matching records, every record potentially needs to be compared with all other records in the database, which is computationally intractable even for moderately-sized databases. To circumvent

  • A Hierarchical Dimension Reduction Approach for Big Data with Application to Fault Diagnostics
    Big Data Res. (IF 2.673) Pub Date : 2019-08-22
    R. Krishnan, V.A. Samaranayake, S. Jagannathan

    About four zetta bytes of data, which falls into the category of big data, is generated by complex manufacturing systems annually. Big data can be utilized to improve the efficiency of an aging manufacturing system, provided, several challenges are handled. In this paper, a novel methodology is presented to detect faults in manufacturing systems while overcoming some of these challenges. Specifically

  • Parallelizing Computations of Full Disjunctions
    Big Data Res. (IF 2.673) Pub Date : 2019-07-12
    Matteo Paganelli, Domenico Beneventano, Francesco Guerra, Paolo Sottovia

    In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations

  • HiePaCo: Scalable Hierarchical Exploration in Abstract Parallel Coordinates Under Budget Constraints
    Big Data Res. (IF 2.673) Pub Date : 2019-07-08
    Gaëlle Richer, Joris Sansen, Frédéric Lalanne, David Auber, Romain Bourqui

    In exploratory visualization systems, interactions allow to manipulate a visual representation and thereby gain insight into its supporting data. The responsiveness of these interactions is crucial, but achieving it on common hardware becomes increasingly difficult with the ever-growing size of datasets. Moreover, the representation of a large dataset itself is challenging since screen space is limited

  • Interactive Visual Analytics for Sensemaking with Big Text
    Big Data Res. (IF 2.673) Pub Date : 2019-04-25
    Michelle Dowling, Nathan Wycoff, Brian Mayer, John Wenskovitch, Scotland Leman, Leanna House, Nicholas Polys, Chris North, Peter Hauck

    Analysts face many steep challenges when performing sensemaking tasks on collections of textual information larger than can be reasonably analyzed without computational assistance. To scale up such sensemaking tasks, new methods are needed to interactively integrate human cognitive sensemaking activity with machine learning. Towards that goal, we offer a human-in-the-loop computational model that mirrors

  • Bi-objective Traffic Optimization in Geo-distributed Data Flows
    Big Data Res. (IF 2.673) Pub Date : 2019-04-25
    Anna-Valentini Michailidou, Anastasios Gounaris

    Recently, there have been several proposals in the area of geo-distributed big data processing. In this work, we aim to address a limitation of the existing solutions, namely to optimize task allocation across geographically distributed data centers, in a way that both the total traffic and the running time of the whole processing in complex multi-stage flows are targeted. Apart from proposing concrete

  • Anomaly Detection and Repair for Accurate Predictions in Geo-distributed Big Data
    Big Data Res. (IF 2.673) Pub Date : 2019-04-17
    Roberto Corizzo, Michelangelo Ceci, Nathalie Japkowicz

    The increasing presence of geo-distributed sensor networks implies the generation of huge volumes of data from multiple geographical locations at an increasing rate. This raises important issues which become more challenging when the final goal is that of the analysis of the data for forecasting purposes or, more generally, for predictive tasks. This paper proposes a framework which supports predictive

  • Systematic Review of the Literature on Big Data in the Transportation Domain: Concepts and Applications
    Big Data Res. (IF 2.673) Pub Date : 2019-03-19
    Alex Neilson, Indratmo, Ben Daniel, Stevanus Tjandra

    Research in Big Data and analytics offers tremendous opportunities to utilize evidence in making decisions in many application domains. To what extent can the paradigms of Big Data and analytics be used in the domain of transport? This article reports on an outcome of a systematic review of published articles in the last five years that discuss Big Data concepts and applications in the transportation

  • Joint Contour Net Analysis for Feature Detection in Lattice Quantum Chromodynamics Data
    Big Data Res. (IF 2.673) Pub Date : 2019-03-01
    Dean P. Thomas, Rita Borgo, Robert S. Laramee, Simon J. Hands

    In this paper we demonstrate the use of multivariate topological algorithms to analyse and interpret Lattice Quantum Chromodynamics (QCD) data. Lattice QCD is a long established field of theoretical physics research in the pursuit of understanding the strong nuclear force. Complex computer simulations model interactions between quarks and gluons to test theories regarding the behaviour of matter in

  • Towards Hybrid Multi-Cloud Storage Systems: Understanding How to Perform Data Transfer
    Big Data Res. (IF 2.673) Pub Date : 2019-03-01
    Antonio Celesti, Antonino Galletta, Maria Fazio, Massimo Villari

    Nowadays, storing and retrieving information over the Cloud is critical for the survival and growth of organizations and people. In this context, the possibility to store a huge amount of data and files on remote third-party Cloud storage providers is becoming an even more concrete practice. Unfortunately, there is not any guarantee regarding data availability and reliability of such providers. In

  • Visual Exploration of Geolocated Time Series with Hybrid Indexing
    Big Data Res. (IF 2.673) Pub Date : 2019-02-07
    Georgios Chatzigeorgakidis, Kostas Patroumpas, Dimitrios Skoutas, Spiros Athanasiou, Spiros Skiadopoulos

    Geolocated time series are time series that correspond to specific locations. They can represent, for example, visitor check-ins at certain venues or readings of sensors installed at various places. The amount and significance of such time series have increased in many domains over the last years. However, although several works exist for time series visualization and visual analytics in general, there

  • Label-Aware Distributed Ensemble Learning: A Simplified Distributed Classifier Training Model for Big Data
    Big Data Res. (IF 2.673) Pub Date : 2018-11-16
    Shadi Khalifa, Patrick Martin, Rebecca Young

    Label-Aware Distributed Ensemble Learning (LADEL) is a programming model and an associated implementation for distributing any classifier training to handle Big Data. It only requires users to specify the training data source, the classification algorithm and the desired parallelization level. First, a distributed stratified sampling algorithm is proposed to generate stratified samples from large,

  • Forecasting Price Movements in Betting Exchanges Using Cartesian Genetic Programming and ANN
    Big Data Res. (IF 2.673) Pub Date : 2018-10-11
    Ivars Dzalbs, Tatiana Kalganova

    Since the introduction of betting exchanges in 2000, there has been increased interest of ways to monetize on the new technology. Betting exchange markets are fairly similar to the financial markets in terms of their operation. Due to the lower market share and newer technology, there are very few tools available for automated trading for betting exchanges. The in-depth analysis of features available

  • Models and Practices in Urban Data Science at Scale
    Big Data Res. (IF 2.673) Pub Date : 2018-08-14
    Marco Balduini, Marco Brambilla, Emanuele Della Valle, Christian Marazzi, Tahereh Arabghalizi, Behnam Rahdari, Michele Vescovi

    Cities can be observed through a broad set of sensing technologies, spanning from physical sensors in the streets, to socio-economic reports, to other kinds of sources that are able to represent the behaviour of the citizens and visitors, such as mobile phone records, social media posts, and other digital traces. In this paper, we propose a conceptual framework for putting at use this variety of Big

  • Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study
    Big Data Res. (IF 2.673) Pub Date : 2018-08-14
    Jacek Cała, Paolo Missier

    The value of knowledge assets generated by analytics processes using Data Science techniques tends to decay over time, as a consequence of changes in the elements the process depends on: external data sources, libraries, and system dependencies. For large-scale problems, refreshing those outcomes through greedy re-computation is both expensive and inefficient, as some changes have limited impact. In

  • From Homomorphisms to Embeddings: A Novel Approach for Mining Embedded Patterns from Large Tree Data
    Big Data Res. (IF 2.673) Pub Date : 2018-08-08
    Xiaoying Wu, Dimitri Theodoratos, Timos Sellis

    Many modern applications and systems represent and exchange data in tree-structured form and process and produce large tree datasets. Discovering informative patterns in large tree datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful

  • SMARTLET: A Dynamic Architecture for Real Time Face Recognition in Smartphone Using Cloudlets and Cloud
    Big Data Res. (IF 2.673) Pub Date : 2018-07-20
    Md Fazlay Rabbi Masum Billah, Muhammad Abdullah Adnan

    Face recognition in smartphone has become an important utility in a smart city for ensuring security by law enforcement. It has various applications such as capturing real life events, tracking movement of a celebrity, detecting wanted criminals, searching for missing child, surveillance, etc. Involving smartphones to do this job is challenging. Low computation power and limited battery life are the

  • Lossless Pruned Naive Bayes for Big Data Classifications
    Big Data Res. (IF 2.673) Pub Date : 2018-07-17
    Nanfei Sun, Bingjun Sun, Jian (Denny) Lin, Michael Yu-Chi Wu

    In a fast growing big data era, volume and varieties of data processed in Internet applications drastically increase. Real-world search engines commonly use text classifiers with thousands of classes to improve relevance or data quality. These large scale classification problems lead to severe runtime performance challenges, so practitioners often resort to fast approximation techniques. However, the

  • Fast Gaussian Process Regression for Big Data
    Big Data Res. (IF 2.673) Pub Date : 2018-07-04
    Sourish Das, Sasanka Roy, Rajiv Sambasivan

    Gaussian Processes are widely used for regression tasks. A known limitation in the application of Gaussian Processes to regression tasks is that the computation of the solution requires performing a matrix inversion. The solution also requires the storage of a large matrix in memory. These factors restrict the application of Gaussian Process regression to small and moderate size datasets. We present

  • Towards Sustainable Smart City by Particulate Matter Prediction Using Urban Big Data, Excluding Expensive Air Pollution Infrastructures
    Big Data Res. (IF 2.673) Pub Date : 2018-06-26
    Ali Reza Honarvar, Ashkan Sami

    Living in the age of data and the new era of digitalization of cities have created a large volume of datasets and data flows associated with the urban environments. It is significantly vital to capture and analyze the data from various resources in smart cities. For instance, the real-time air pollution data are remarkably important in controlling air pollution for urban sustainability and protecting

  • kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning
    Big Data Res. (IF 2.673) Pub Date : 2018-06-01
    Hossein Estiri, Behzad Abounia Omran, Shawn N. Murphy

    The majority of the clinical observation data stored in large-scale Electronic Health Record (EHR) research data networks are unlabeled. Unsupervised clustering can provide invaluable tools for studying patient sub-groups in these data. Many of the popular unsupervised clustering algorithms are dependent on identifying the number of clusters. Multiple statistical methods are available to approximate

  • Novel Approach to Predict Hospital Readmissions Using Feature Selection from Unstructured Data with Class Imbalance
    Big Data Res. (IF 2.673) Pub Date : 2018-06-01
    Arun Sundararaman, Srinivasan Valady Ramanathan, Ramprasad Thati

    Feature selection for predictive analytics continues to be a major challenge in the healthcare industry, particularly as it relates to readmission prediction. Several research works in mining healthcare data have focused on structured data for readmission prediction. Even within those works that are based on unstructured data, significant gaps exist in addressing class imbalance, context specific noise

  • evoStream – Evolutionary Stream Clustering Utilizing Idle Times
    Big Data Res. (IF 2.673) Pub Date : 2018-05-30
    Matthias Carnein, Heike Trautmann

    Clustering is an important field in data mining that aims to reveal hidden patterns in data sets. It is widely popular in marketing or medical applications and used to identify groups of similar objects. Clustering possibly unbounded and evolving data streams is of particular interest due to the widespread deployment of large and fast data sources such as sensors. The vast majority of stream clustering

  • A Novel Clustering Method Using Enhanced Grey Wolf Optimizer and MapReduce
    Big Data Res. (IF 2.673) Pub Date : 2018-05-21
    Ashish Kumar Tripathi, Kapil Sharma, Manju Bala

    With advancement of the technology, data size is increasing rapidly. For making intelligent decisions based on data, efficacious analytic methods are required. Data clustering, a prominent analytic method of data mining, is being efficiently employed in data analytics. To analyze massive data sets, the improvement in the traditional methods is the urge of todays scenario. In this paper, an efficient

  • Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems
    Big Data Res. (IF 2.673) Pub Date : 2018-05-16
    Araek Tashkandi, Ingmar Wiese, Lena Wiese

    Patient similarity analysis is a precondition to apply machine learning technology on medical data. In this sense, patient similarity analysis harnesses the information wealth of electronic medical records (EMRs) to support medical decision making. A pairwise similarity computation can be used as the basis for personalized health prediction. With n patients the amount of (n2) similarity calculations

  • Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service
    Big Data Res. (IF 2.673) Pub Date : 2018-05-10
    Radwa Elshawi, Sherif Sakr, Domenico Talia, Paolo Trunfio

    Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is, nowadays, called Big Data Science. Big Data Science requires scalable architectures for storing and processing data

  • A Dynamic Neural Network Architecture with Immunology Inspired Optimization for Weather Data Forecasting
    Big Data Res. (IF 2.673) Pub Date : 2018-05-08
    Abir Jaafar Hussain, Panos Liatsis, Mohammed Khalaf, Hissam Tawfik, Haya Al-Asker

    Recurrent neural networks are dynamical systems that provide for memory capabilities to recall past behaviour, which is necessary in the prediction of time series. In this paper, a novel neural network architecture inspired by the immune algorithm is presented and used in the forecasting of naturally occurring signals, including weather big data signals. Big Data Analysis is a major research frontier

  • Predicting Adverse Events After Surgery
    Big Data Res. (IF 2.673) Pub Date : 2018-04-27
    Senjuti Basu Roy, Moushumi Maria, Tina Wang, Anne Ehlers, David Flum

    Predicting risk of adverse events (AEs) following surgical procedure is of significant interest, as that may guide in better resource utilization and an improved quality of care. Currently available comorbidity indices are largely inaccurate to predict adverse events other than death, as well as off-the-shelf machine learning models do not typically account for the temporal sequence of events to enable

  • Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining
    Big Data Res. (IF 2.673) Pub Date : 2018-04-25
    Lakshmipadmaja D, B. Vishnuvardhan

    This study focuses on feature subset selection from high dimensionality databases and presents modification to the existing Random Subset Feature Selection (RSFS) algorithm for the random selection of feature subsets and for improving stability. A standard k-nearest-neighbor (kNN) classifier is used for classification. The RSFS algorithm is used for reducing the dimensionality of a data set by selecting

  • Revealing Physicians Referrals from Health Insurance Claims Data
    Big Data Res. (IF 2.673) Pub Date : 2018-04-25
    Vagner Figueredo de Santana, Ana Paula Appel, Luis Gregorio Moyano, Marcia Ito, Claudio Santos Pinhanez

    Health insurance companies in Brazil have their data about claims organized having the view only for service providers. In this way, they lose the view of physicians' activity and how physicians share patients. Partnership between physicians can be seen as fruitful, when they team up to help a patient, but could represent an issue as well, when a recommendation to visit another physician occurs only

  • Hybrid Bridge-Based Memetic Algorithms for Finding Bottlenecks in Complex Networks
    Big Data Res. (IF 2.673) Pub Date : 2018-04-16
    David Chalupa, Ken A. Hawick, James A. Walker

    We propose a memetic approach to find bottlenecks in complex networks based on searching for a graph partitioning with minimum conductance. Finding the optimum of this problem, also known in statistical mechanics as the Cheeger constant, is one of the most interesting NP-hard network optimisation problems. The existence of low conductance minima indicates bottlenecks in complex networks. However, the

  • A Direction Aware Particle Swarm Optimization with Sensitive Swarm Leader
    Big Data Res. (IF 2.673) Pub Date : 2018-03-28
    Krishn Kumar Mishra, Hemant Bisht, Tribhuvan Singh, Victor Chang

    This paper proposes a new Direction Aware Particle Swarm Optimization algorithm with Sensitive Swarm Leader (DAPSO-SSL). DAPSO-SSL maps the basic human nature of awareness, maturity, leader and followers relationship and leadership qualities to the popular PSO algorithm. It assigns these qualities to swarm leader and individual particles. In practical life, it is the moral responsibility of the leader

  • Scalable Machine Learning for Predicting At-Risk Profiles Upon Hospital Admission
    Big Data Res. (IF 2.673) Pub Date : 2018-03-05
    Pierre Genevès, Thomas Calmant, Nabil Layaïda, Marion Lepelley, Svetlana Artemova, Jean-Luc Bosson

    We show how the analysis of very large amounts of drug prescription data make it possible to detect, on the day of hospital admission, patients at risk of developing complications during their hospital stay. We explore, for the first time, to which extent volume and variety of big prescription data help in constructing predictive models for the automatic detection of at-risk profiles. Our methodology

  • Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Fabrizio Celli, Fabio Cumbo, Emanuel Weitschek

    DNA methylation is a well-studied genetic modification crucial to regulate the functioning of the genome. Its alterations play an important role in tumorigenesis and tumor-suppression. Thus, studying DNA methylation data may help biomarker discovery in cancer. Since public data on DNA methylation become abundant – and considering the high number of methylated sites (features) present in the genome

  • An Investigation to Identify Factors that Lead to Delay in Healthcare Reimbursement Process: A Brazilian case
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Ricardo Gerhardt, João F. Valiati, José Vicente Canto dos Santos

    Healthcare reimbursement has had a tremendous impact on healthcare institutions and the economy. The healthcare reimbursement process consists of coding, billing, and payment based on the care provided to the patient. The rapid development of new medical treatments and procedures and changes in regulations and policies have been increasing the complexity of the reimbursement process, resulting in financial

  • Insights into Antidepressant Prescribing Using Open Health Data
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Brian Cleland, Jonathan Wallace, Raymond Bond, Michaela Black, Maurice Mulvenna, Deborah Rankin, Austin Tanney

    The growth of big data is transforming many economic sectors, including the medical and healthcare sector. Despite this, research into the practical application of data analytics to the development of health policy is still limited. In this study we examine how data science and machine learning methods can be applied to a variety of open health datasets, including GP prescribing data, disease prevalence

  • A Novel Adaptive Feature Extraction for Detection of Cardiac Arrhythmias Using Hybrid Technique MRDWT & MPNN Classifier from ECG Big Data
    Big Data Res. (IF 2.673) Pub Date : 2018-03-02
    Hari Mohan Rai, Kalyan Chatterjee

    The efficient automatic detection of cardiac arrhythmia using a hybrid technique from ECG big data has been proposed with novel feature extraction technique using Multiresolution Discrete Wavelet Transform (MRDWT) and Multilayer Probabilistic Neural Network (MPNN) classifier. Big Data of ECG signals have been selected from MIT–BIH arrhythmia database for detection of two types of arrhythmias LBBB (Left

  • Big Data Compliance for Innovative Clinical Models
    Big Data Res. (IF 2.673) Pub Date : 2018-02-05
    Massimiliano Giacalone, Carlo Cusatelli, Vito Santarcangelo

    In the healthcare sector, information is the most important aspect, and the human body in particular is the major source of data production: as a result, the new challenge for world healthcare is to take advantage of these huge amounts of data de-structured among themselves. In order to benefit from this advantage, technology offers a solution called Big Data Analysis that allows the management of

  • What Are Data? A Categorization of the Data Sensitivity Spectrum
    Big Data Res. (IF 2.673) Pub Date : 2017-12-02
    John M.M. Rumbold, Barbara K. Pierscionek

    The definition of data might at first glance seem prosaic, but formulating a definitive and useful definition is surprisingly difficult. This question is important because of the protection given to data in law and ethics. Healthcare data are universally considered sensitive (and confidential), so it might seem that the categorisation of less sensitive data is relatively unimportant for medical data

  • Community Detection Algorithm for Big Social Networks Using Hybrid Architecture
    Big Data Res. (IF 2.673) Pub Date : 2017-10-26
    Rahil Sharma, Suely Oliveira

    One of the most relevant and widely studied structural properties of networks is their community structure. Detecting communities is of great importance in social networks where systems are often represented as graphs. With the advent of web-based social networks like Twitter, Facebook and LinkedIn. community detection became even more difficult due to the massive network size, which can reach up to

  • Towards Visualizing Big Data with Large-Scale Edge Constraint Graph Drawing
    Big Data Res. (IF 2.673) Pub Date : 2017-10-23
    Ariyawat Chonbodeechalermroong, Rattikorn Hewett

    Visualization plays an important role in enabling understanding of big data. Graphs are crucial tools for visual analytics of big data networks such as social, biological, traffic and security networks. Graph drawing has been intensively researched to enhance aesthetic features (i.e., layouts, symmetry, cross-free edges). Early physic-inspired techniques have focused on synthetic abstract graphs whose

  • Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks
    Big Data Res. (IF 2.673) Pub Date : 2017-10-19
    Cihan Küçükkeçeci, Adnan Yazıcı

    Sensors are present in various forms all around the world such as mobile phones, surveillance cameras, smart televisions, intelligent refrigerators and blood pressure monitors. Usually, most of the sensors are a part of some other system with similar sensors that compose a network. One of such networks is composed of millions of sensors connected to the Internet which is called Internet of Things (IoT)

  • Generating High-Dimensional Datastreams for Change Detection
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Diego Carrera, Giacomo Boracchi

    A popular testbed for change-detection algorithms consists in detecting changes that have been synthetically injected in real-world datastreams. Unfortunately, most of experimental practices in the literature lead to injecting changes whose magnitude is unknown and can not be controlled. As a consequence, results are difficult to interpret, reproduce, and compare with. Most importantly, controlling

  • Variations on the Clustering Algorithm BIRCH
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Boris Lorbeer, Ana Kosareva, Bersant Deva, Dženan Softić, Peter Ruppel, Axel Küpper

    Clustering algorithms are recently regaining attention with the availability of large datasets and the rise of parallelized computing architectures. However, most clustering algorithms suffer from two drawbacks: they do not scale well with increasing dataset sizes and often require proper parametrization which is usually difficult to provide. A very important example is the cluster count, a parameter

  • Big Data for Context Aware Computing – Perspectives and Challenges
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Kalyan P. Subbu, Athanasios V. Vasilakos

    Big data has arrived. Myriad applications, systems generate data of humongous volumes, variety and velocity which traditional computing systems and databases are unable to manage. The proliferation of sensors in every possible device is also becoming one of the major generators of Big data. Of particular interest in this article is how context aware computing systems which derive context from data

  • A Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data
    Big Data Res. (IF 2.673) Pub Date : 2017-10-18
    Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Fabio Pulvirenti, Pietro Michiardi

    In today's world, large volumes of data are being continuously generated by many scientific applications, such as bioinformatics or networking. Since each monitored event is usually characterized by a variety of features, high-dimensional datasets have been continuously generated. To extract value from these complex collections of data, different exploratory data mining algorithms can be used to discover

  • Partial Rollback-based Scheduling on In-memory Transactional Data Grids
    Big Data Res. (IF 2.673) Pub Date : 2017-08-24
    Junwhan Kim

    In-memory transactional data girds, often referred to as NoSQL data grids demand high concurrency for scalability and high performance in data-intensive applications. As an alternative concurrency control model, distributed transactional memory (DTM) promises to alleviate the difficulties of lock-based distributed synchronization. However, if a transaction aborts, DTM suffers from additional communication

Contents have been reproduced by permission of the publishers.
Springer 纳米技术权威期刊征稿
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷