-
Development of a regional voice dataset and speaker classification based on machine learning J. Big Data Pub Date : 2021-03-02 Muhammad Ismail; Shahzad Memon; Lachhman Das Dhomeja; Shahid Munir Shah; Dostdar Hussain; Sabit Rahim; Imran Ali
At present, voice biometrics are commonly used for identification and authentication of users through their voice. Voice based services such as mobile banking, access to personal devices, and logging into social networks are the common examples of authenticating users through voice biometrics. In Pakistan, voice-based services are very common in banking and mobile/cellular sector, however, these services
-
An analysis of COVID-19 economic measures and attitudes: evidence from social media mining J. Big Data Pub Date : 2021-03-01 Dorota Domalewska
This paper explores the public perception of economic measures implemented as a reaction to the COVID-19 pandemic in Poland in March–June 2020. A mixed-method approach was used to analyse big data coming from tweets and Facebook posts related to the mitigation measures to provide evidence for longitudinal trends, correlations, theme classification and perception. The online discussion oscillated around
-
Annotating and detecting topics in social media forum and modelling the annotation to derive directions-a case study J. Big Data Pub Date : 2021-02-27 B. Athira, Josette Jones, Sumam Mary Idicula, Anand Kulanthaivel, Enming Zhang
The widespread influence of social media impacts every aspect of life, including the healthcare sector. Although medics and health professionals are the final decision makers, the advice and recommendations obtained from fellow patients are significant. In this context, the present paper explores the topics of discussion posted by breast cancer patients and survivors on online forums. The study examines
-
A survey on bandwidth-aware geo-distributed frameworks for big-data analytics J. Big Data Pub Date : 2021-02-25 Mohammed Bergui, Said Najah, Nikola S. Nikolov
In the era of global-scale services, organisations produce huge volumes of data, often distributed across multiple data centres, separated by vast geographical distances. While cluster computing applications, such as MapReduce and Spark, have been widely deployed in data centres to support commercial applications and scientific research, they are not designed for running jobs across geo-distributed
-
An alternative approach to dimension reduction for pareto distributed data: a case study J. Big Data Pub Date : 2021-02-25 Marco Roccetti, Giovanni Delnevo, Luca Casini, Silvia Mirri
Deep learning models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to identify which data descriptors are the most adequate to represent a given specific phenomenon of interest
-
Class center-based firefly algorithm for handling missing data J. Big Data Pub Date : 2021-02-23 Heru Nugroho, Nugraha Priya Utama, Kridanto Surendro
A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes. However, an adaptive search procedure helps to determine the estimates of the missing data when correlations
-
Detecting cybersecurity attacks across different network features and learners J. Big Data Pub Date : 2021-02-23 Joffrey L. Leevy, John Hancock, Richard Zuech, Taghi M. Khoshgoftaar
Machine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic
-
Bayesian multilevel model on maternal mortality in Ethiopia J. Big Data Pub Date : 2021-02-17 Shibiru Jabessa, Dabala Jabessa
Maternal mortality is one of the socio-economic problems and widely considered a serious indicator of the quality of a health. Ethiopia is considered to be one of the top six sub-Saharan countries with severe maternal mortality. The objective of this study was to investigate the effects of the Demographic and Socio-economic determinant factors of maternal mortality in Ethiopia. Data from the 2016 Ethiopia
-
Evaluation of recent advances in recommender systems on Arabic content J. Big Data Pub Date : 2021-02-17 Mehdi Srifi, Ahmed Oussous, Ayoub Ait Lahcen, Salma Mouline
Various recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through
-
An analytics model for TelecoVAS customers’ basket clustering using ensemble learning approach J. Big Data Pub Date : 2021-02-17 Mohammadsadegh Vahidi Farashah, Akbar Etebarian, Reza Azmi, Reza Ebrahimzadeh Dastjerdi
Value-Added Services at a Mobile Telecommunication company provide customers with a variety of services. Value-added services generate significant revenue annually for telecommunication companies. Providing solutions that can provide customers of a telecommunication company with relevant and engaging services has become a major challenge in this field. Numerous methods have been proposed so far to
-
Modeling and tracking Covid-19 cases using Big Data analytics on HPCC system platformm J. Big Data Pub Date : 2021-02-15 Flavio Villanustre, Arjuna Chala, Roger Dev, Lili Xu, Jesse Shaw LexisNexis, Borko Furht, Taghi Khoshgoftaar
This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December
-
Understanding quality of analytics trade-offs in an end-to-end machine learning-based classification system for building information modeling J. Big Data Pub Date : 2021-02-15 Minjung Ryu, Hong-Linh Truong, Matti Kannala
Optimizing quality trade-offs in an end-to-end big data science process is challenging, as not only do we need to deal with different types of software components, but also the domain knowledge has to be incorporated along the process. This paper focuses on methods for tackling quality trade-offs in a common data science process for classifying Building Information Modeling (BIM) elements, an important
-
A hybrid recommender system based-on link prediction for movie baskets analysis J. Big Data Pub Date : 2021-02-15 Mohammadsadegh Vahidi Farashah, Akbar Etebarian, Reza Azmi, Reza Ebrahimzadeh Dastjerdi
Over the past decade, recommendation systems have been one of the most sought after by various researchers. Basket analysis of online systems’ customers and recommending attractive products (movies) to them is very important. Providing an attractive and favorite movie to the customer will increase the sales rate and ultimately improve the system. Various methods have been proposed so far to analyze
-
A sample decreasing threshold greedy-based algorithm for big data summarisation J. Big Data Pub Date : 2021-02-09 Teng Li, Hyo-Sang Shin, Antonios Tsourdos
As the scale of datasets used for big data applications expands rapidly, there have been increased efforts to develop faster algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-negative submodular objective functions subject to k-extendible system constraints. Leveraging a random sampling
-
Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier J. Big Data Pub Date : 2021-02-04 Micheal Olaolu Arowolo, Marion Olubunmi Adebiyi, Ayodele Ariyo Adebiyi, Oludayo Olugbara
RNA-Seq data are utilized for biological applications and decision making for the classification of genes. A lot of works in recent time are focused on reducing the dimension of RNA-Seq data. Dimensionality reduction approaches have been proposed in the transformation of these data. In this study, a novel optimized hybrid investigative approach is proposed. It combines an optimized genetic algorithm
-
Array databases: concepts, standards, implementations J. Big Data Pub Date : 2021-02-02 Peter Baumann, Dimitar Misev, Vlad Merticariu, Bang Pham Huu
Multi-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly in silo solutions, with architectures that tend to
-
Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation J. Big Data Pub Date : 2021-01-29 Rianto, Achmad Benny Mutiara, Eri Prasetyo Wibowo, Paulus Insap Santosa
Background Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root. In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level. However, there are not many stemming methods for non-formal Indonesian text processing. This study introduces a new
-
A survey on generative adversarial networks for imbalance problems in computer vision tasks J. Big Data Pub Date : 2021-01-29 Vignesh Sampath, Iñaki Maurtua, Juan José Aguilar Martín, Aitor Gutierrez
Any computer vision application development starts off by acquiring images and data, then preprocessing and pattern recognition steps to perform a task. When the acquired images are highly imbalanced and not adequate, the desired task may not be achievable. Unfortunately, the occurrence of imbalance problems in acquired image datasets in certain complex real-world problems such as anomaly detection
-
A novel approach for learning ontology from relational database: from the construction to the evaluation J. Big Data Pub Date : 2021-01-28 Bilal Ben Mahria, Ilham Chaker, Azeddine Zahi
The aim of converting relational database into Ontology is to provide applications that are based on the semantic representation of the data. Whereas, representing the data using ontologies has shown to be a useful mechanism for managing and exchanging data. This is the reason why bridging the gap between relational databases and ontologies has attracted the interest of the ontology community from
-
CaReAl: capturing read alignments in a BAM file rapidly and conveniently J. Big Data Pub Date : 2021-01-26 Yoomi Park, Heewon Seo, Kyunghun Yoo, Ju Han Kim
Some of the variants detected by high-throughput sequencing (HTS) are often not reproducible. To minimize the technical-induced artifacts, secondary experimental validation is required but this step is unnecessarily slow and expensive. Thus, developing a rapid and easy to use visualization tool is necessary to systematically review the statuses of sequence read alignments. Here, we developed a high-performance
-
A survey on data‐efficient algorithms in big data era J. Big Data Pub Date : 2021-01-26 Amina Adadi
The leading approaches in Machine Learning are notoriously data-hungry. Unfortunately, many application domains do not have access to big data because acquiring data involves a process that is expensive or time-consuming. This has triggered a serious debate in both the industrial and academic communities calling for more data-efficient models that harness the power of artificial learners while achieving
-
Composition of weighted finite transducers in MapReduce J. Big Data Pub Date : 2021-01-22 Bilal Elghadyry, Faissal Ouardi, Sébastien Verel
Weighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge
-
CoPart: a context-based partitioning technique for big data J. Big Data Pub Date : 2021-01-19 Sara Migliorini, Alberto Belussi, Elisa Quintarelli, Damiano Carra
The MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems
-
Designing a relational model to identify relationships between suspicious customers in anti-money laundering (AML) using social network analysis (SNA) J. Big Data Pub Date : 2021-01-14 Abdul Khalique Shaikh, Malik Al-Shamli, Amril Nazir
The stability of the economy and political system of any country highly depends on the policy of anti-money laundering (AML). If government policies are incapable of handling money laundering activities in an appropriate way, the control of the economy can be transferred to criminals. The current literature provides various technical solutions, such as clustering-based anomaly detection techniques
-
SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis J. Big Data Pub Date : 2021-01-13 Khawla Tadist, Fatiha Mrabti, Nikola S. Nikolov, Azeddine Zahi, Said Najah
The Dimensionality Curse is one of the most critical issues that are hindering faster evolution in several fields broadly, and in bioinformatics distinctively. To counter this curse, a conglomerate solution is needed. Among the renowned techniques that proved efficacy, the scaling-based dimensionality reduction techniques are the most prevalent. To insure improved performance and productivity, horizontal
-
Deep Learning applications for COVID-19 J. Big Data Pub Date : 2021-01-11 Connor Shorten, Taghi M. Khoshgoftaar, Borko Furht
This survey explores how Deep Learning has battled the COVID-19 pandemic and provides directions for future research on COVID-19. We cover Deep Learning applications in Natural Language Processing, Computer Vision, Life Sciences, and Epidemiology. We describe how each of these applications vary with the availability of big data and how learning tasks are constructed. We begin by evaluating the current
-
A novel multi-source information-fusion predictive framework based on deep neural networks for accuracy enhancement in stock market prediction J. Big Data Pub Date : 2021-01-09 Isaac Kofi Nti, Adebayo Felix Adekoya, Benjamin Asubam Weyori
The stock market is very unstable and volatile due to several factors such as public sentiments, economic factors and more. Several Petabytes volumes of data are generated every second from different sources, which affect the stock market. A fair and efficient fusion of these data sources (factors) into intelligence is expected to offer better prediction accuracy on the stock market. However, integrating
-
A method for extracting travel patterns using data polishing J. Big Data Pub Date : 2021-01-07 Mio Hosoe, Masashi Kuwano, Taku Moriyama
With recent developments in ICT, the interest in using large amounts of accumulated data for traffic policy planning has increased significantly. In recent years, data polishing has been proposed as a new method of big data analysis. Data polishing is a graphical clustering method, which can be used to extract patterns that are similar or related to each other by identifying the cluster structures
-
A GPU based multidimensional amplitude analysis to search for tetraquark candidates J. Big Data Pub Date : 2021-01-07 Nairit Sur, Leonardo Cristella, Adriano Di Florio, Vincenzo Mastrapasqua
The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned
-
Multiclass emotion prediction using heart rate and virtual reality stimuli J. Big Data Pub Date : 2021-01-07 Aaron Frederick Bulagang, James Mountstephens, Jason Teo
Background Emotion prediction is a method that recognizes the human emotion derived from the subject’s psychological data. The problem in question is the limited use of heart rate (HR) as the prediction feature through the use of common classifiers such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Random Forest (RF) in emotion prediction. This paper aims to investigate whether HR signals
-
Sleep stage classification using extreme learning machine and particle swarm optimization for healthcare big data J. Big Data Pub Date : 2021-01-07 Nico Surantha, Tri Fennia Lesmana, Sani Muhamad Isa
Recent developments of portable sensor devices, cloud computing, and machine learning algorithms have led to the emergence of big data analytics in healthcare. The condition of the human body, e.g. the ECG signal, can be monitored regularly by means of a portable sensor device. The use of the machine learning algorithm would then provide an overview of a patient’s current health on a regular basis
-
Stable bagging feature selection on medical data J. Big Data Pub Date : 2021-01-07 Salem Alelyani
In the medical field, distinguishing genes that are relevant to a specific disease, let’s say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a
-
The effect of driver variables on the estimation of bivariate probability density of peak loads in long-term horizon J. Big Data Pub Date : 2021-01-07 Zohreh Kaheh, Morteza Shabanzadeh
It is evident that developing more accurate forecasting methods is the pillar of building robust multi-energy systems (MES). In this context, long-term forecasting is also indispensable to have a robust expansion planning program for modern power systems. While very short-term and short-term forecasting are usually represented with point estimation, this approach is highly unreliable in medium-term
-
Resampling imbalanced data for network intrusion detection datasets J. Big Data Pub Date : 2021-01-06 Sikha Bagui, Kunqi Li
Machine learning plays an increasingly significant role in the building of Network Intrusion Detection Systems. However, machine learning models trained with imbalanced cybersecurity data cannot recognize minority data, hence attacks, effectively. One way to address this issue is to use resampling, which adjusts the ratio between the different classes, making the data more balanced. This research looks
-
Rating prediction of peer-to-peer accommodation through attributes and topics from customer review J. Big Data Pub Date : 2021-01-06 Athor Subroto, Marcel Christianis
This study aims to predict customers’ behavior in classifying their reviews as high rated or low rated using associated attributes and topics found in the review. Knowing customer reviewing action better can lead to a successful strategy implementation of the relevant parties related to this study such as policy to manage customer reviews by keeping their satisfaction high. We applied a big data approach
-
Designing a Permissioned Blockchain Network for the Halal Industry using Hyperledger Fabric with multiple channels and the raft consensus mechanism J. Big Data Pub Date : 2021-01-06 Isti Surjandari, Harman Yusuf, Enrico Laoh, Rayi Maulida
Halal Supply Chain Management requires an assurance that the entire process of procurement, distribution, handling, and processing materials, spare parts, livestock, work-in-process, or finished inventory to be well documented and performed fit to the Halal and Toyyib. Blockchain technology is one alternative solution that can improve Halal Supply Chain as it can integrate technology for information
-
Bayesian zero-inflated regression model with application to under-five child mortality J. Big Data Pub Date : 2021-01-06 Mekuanint Simeneh Workie, Abebaw Gedef Azene
Under-five mortality is defined as the likelihood of a child born alive to die between birth and fifth birthday. Mortality of under the age of five has been the most targets of public health policies and may be a common indicator of mortality levels. Thus, this study aimed to assess the under-five child mortality and modeling Bayesian zero-inflated regression model of the determinants of under-five
-
Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection J. Big Data Pub Date : 2021-01-06 Asmaa El Hannani, Rahhal Errattahi, Fatima Zahra Salmam, Thomas Hain, Hassan Ouahmane
Speech based human-machine interaction and natural language understanding applications have seen a rapid development and wide adoption over the last few decades. This has led to a proliferation of studies that investigate Error detection and classification in Automatic Speech Recognition (ASR) systems. However, different data sets and evaluation protocols are used, making direct comparisons of the
-
Analyzing Bangkok city taxi ride: reforming fares for profit sustainability using big data driven model J. Big Data Pub Date : 2021-01-06 Thananut Phiboonbanakit, Teerayut Horanont
With the trend toward the use of large-scale vehicle probe data, an urban-scale analysis can now provide useful information for taxi drivers and passengers. Unfortunately, traffic congestion has become a critical problem in urban cities. Road traffic congestion reduces productivity in transportation services, and the daily profit earned is consequently reduced. This is opposite to the cost of living
-
Assessing the accuracy of record linkages with Markov chain based Monte Carlo simulation approach J. Big Data Pub Date : 2021-01-06 Shovanur Haque, Kerrie Mengersen, Steven Stern
Record linkage is the process of finding matches and linking records from different data sources so that the linked records belong to the same entity. There is an increasing number of applications of record linkage in statistical, health, government and business organisations to link administrative, survey, population census and other files to create a complete set of information for more complete
-
Querying knowledge graphs in natural language J. Big Data Pub Date : 2021-01-06 Shiqi Liang, Kurt Stockinger, Tarcisio Mendes de Farias, Maria Anisimova, Manuel Gil
Knowledge graphs are a powerful concept for querying large amounts of data. These knowledge graphs are typically enormous and are often not easily accessible to end-users because they require specialized knowledge in query languages such as SPARQL. Moreover, end-users need a deep understanding of the structure of the underlying data models often based on the Resource Description Framework (RDF). This
-
A novel community detection based genetic algorithm for feature selection J. Big Data Pub Date : 2021-01-04 Mehrdad Rostami, Kamal Berahmand, Saman Forouzandeh
The feature selection is an essential data preprocessing stage in data mining. The core principle of feature selection seems to be to pick a subset of possible features by excluding features with almost no predictive information as well as highly associated redundant features. In the past several years, a variety of meta-heuristic methods were introduced to eliminate redundant and irrelevant features
-
Analysis and best parameters selection for person recognition based on gait model using CNN algorithm and image augmentation J. Big Data Pub Date : 2021-01-03 Abeer Mohsin Saleh, Talal Hamoud
Person Recognition based on Gait Model (PRGM) and motion features is are indeed a challenging and novel task due to their usages and to the critical issues of human pose variation, human body occlusion, camera view variation, etc. In this project, a deep convolution neural network (CNN) was modified and adapted for person recognition with Image Augmentation (IA) technique depending on gait features
-
Correction to: Cooperative co‑evolution for feature selection in Big Data with random feature grouping J. Big Data Pub Date : 2020-12-28 A. N. M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell‑Dowland
An amendment to this paper has been published and can be accessed via the original article.
-
A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench J. Big Data Pub Date : 2020-12-14 N. Ahmed, Andre L. C. Barczak, Teo Susnjak, Mohammed A. Rashid
Big Data analytics for storing, processing, and analyzing large-scale datasets has become an essential tool for the industry. The advent of distributed computing frameworks such as Hadoop and Spark offers efficient solutions to analyze vast amounts of data. Due to the application programming interface (API) availability and its performance, Spark becomes very popular, even more popular than the MapReduce
-
Arabic text summarization using deep learning approach J. Big Data Pub Date : 2020-12-11 Molham Al-Maleh, Said Desouki
Natural language processing has witnessed remarkable progress with the advent of deep learning techniques. Text summarization, along other tasks like text translation and sentiment analysis, used deep neural network models to enhance results. The new methods of text summarization are subject to a sequence-to-sequence framework of encoder–decoder model, which is composed of neural networks trained jointly
-
Predictors of outpatients’ no-show: big data analytics using apache spark J. Big Data Pub Date : 2020-12-09 Tahani Daghistani, Huda AlGhamdi, Riyad Alshammari, Raed H. AlHazme
Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature
-
Cooperative co-evolution for feature selection in Big Data with random feature grouping J. Big Data Pub Date : 2020-12-04 A. N. M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland
A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset
-
Flight delay prediction based on deep learning and Levenberg-Marquart algorithm J. Big Data Pub Date : 2020-11-26 Maryam Farshchian Yazdi, Seyed Reza Kamel, Seyyed Javad Mahdavi Chabok, Maryam Kheirabadi
Flight delay is inevitable and it plays an important role in both profits and loss of the airlines. An accurate estimation of flight delay is critical for airlines because the results can be applied to increase customer satisfaction and incomes of airline agencies. There have been many researches on modeling and predicting flight delays, where most of them have been trying to predict the delay through
-
Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset J. Big Data Pub Date : 2020-11-25 Sydney M. Kasongo, Yanxia Sun
Computer networks intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are critical aspects that contribute to the success of an organization. Over the past years, IDSs and IPSs using different approaches have been developed and implemented to ensure that computer networks within enterprises are secure, reliable and available. In this paper, we focus on IDSs that are built using
-
A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data J. Big Data Pub Date : 2020-11-23 Joffrey L. Leevy, Taghi M. Khoshgoftaar
The exponential growth in computer networks and network applications worldwide has been matched by a surge in cyberattacks. For this reason, datasets such as CSE-CIC-IDS2018 were created to train predictive models on network-based intrusion detection. These datasets are not meant to serve as repositories for signature-based detection systems, but rather to promote research on anomaly-based detection
-
Big data actionable intelligence architecture J. Big Data Pub Date : 2020-11-23 Tian J. Ma, Rudy J. Garcia, Forest Danford, Laura Patrizi, Jennifer Galasso, Jason Loyd
The amount of data produced by sensors, social and digital media, and Internet of Things (IoTs) are rapidly increasing each day. Decision makers often need to sift through a sea of Big Data to utilize information from a variety of sources in order to determine a course of action. This can be a very difficult and time-consuming task. For each data source encountered, the information can be redundant
-
Automatic LIDAR building segmentation based on DGCNN and euclidean clustering J. Big Data Pub Date : 2020-11-17 Ahmad Gamal, Ari Wibisono, Satrio Bagus Wicaksono, Muhammad Alvin Abyan, Nur Hamid, Hanif Arif Wisesa, Wisnu Jatmiko, Ronny Ardhianto
There has been growing demand for 3D modeling from earth observations, especially for purposes of urban and regional planning and management. The results of 3D observations has slowly become the primary source of data in terms of policy determination and infrastructure planning. In this research, we presented an automatic building segmentation method that directly uses LIDAR data. Previous works have
-
Comparison of sort algorithms in Hadoop and PCJ J. Big Data Pub Date : 2020-11-16 Marek Nowicki
Sorting algorithms are among the most commonly used algorithms in computer science and modern software. Having efficient implementation of sorting is necessary for a wide spectrum of scientific applications. This paper describes the sorting algorithm written using the partitioned global address space (PGAS) model, implemented using the Parallel Computing in Java (PCJ) library. The iterative implementation
-
Utilizing technologies of fog computing in educational IoT systems: privacy, security, and agility perspective J. Big Data Pub Date : 2020-11-12 Amr Adel
Fog computing architecture is referred to the architecture that is distributed over the geographical area. This architectural arrangement mainly focuses on physical and logical network elements, and software for the purpose of implementing proper network. Fog computing architecture allows the users to have a flexible communication and also ensures that the storage services are maintained efficiently
-
Deep learning accelerators: a case study with MAESTRO J. Big Data Pub Date : 2020-11-12 Hamidreza Bolhasani, Somayyeh Jafarali Jassbi
In recent years, deep learning has become one of the most important topics in computer sciences. Deep learning is a growing trend in the edge of technology and its applications are now seen in many aspects of our life such as object detection, speech recognition, natural language processing, etc. Currently, almost all major sciences and technologies are benefiting from the advantages of deep learning
-
A robust machine learning approach to SDG data segmentation J. Big Data Pub Date : 2020-11-11 Kassim S. Mwitondi, Isaac Munyakazi, Barnabas N. Gatsheni
In the light of the recent technological advances in computing and data explosion, the complex interactions of the Sustainable Development Goals (SDG) present both a challenge and an opportunity to researchers and decision makers across fields and sectors. The deep and wide socio-economic, cultural and technological variations across the globe entail a unified understanding of the SDG project. The
-
Support vector machine based feature extraction for gender recognition from objects using lasso classifier J. Big Data Pub Date : 2020-11-11 Damodara Krishna Kishore Galla, Babu Reddy Mukamalla, Rama Prakasha Reddy Chegireddy
Object detection and gender recognition were two different categories to be classified in a single section is a complicated task and this approach helps in supporting the blind people for an artificial vision. In this paper, our method to the betters vision sensation of blind persons by conversion of visualized data to audio data. Therefore this artificial intelligence model helps in detecting the
-
A data model for enhanced data comparability across multiple organizations J. Big Data Pub Date : 2020-11-10 Patrick Obilikwu, Emeka Ogbuju
Organizations may be related in terms of similar operational procedures, management, and supervisory agencies coordinating their operations. Supervisory agencies may be governmental or non-governmental but, in all cases, they perform oversight functions over the activities of the organizations under their control. Multiple organizations that are related in terms of oversight functions by their supervisory
-
Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test J. Big Data Pub Date : 2020-11-10 Pradeep S. Virdee, Alice Fuller, Michael Jacobs, Tim Holt, Jacqueline Birks
A Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data in one EHR. The Test dataset from the Clinical Research Practice Datalink (CPRD) was accessed, which contains
Contents have been reproduced by permission of the publishers.