• Int. J. Gen. Syst. (IF 2.259) Pub Date : 2019-09-02
Prakash P. Shenoy

The main contribution of this paper is a new definition of expected value of belief functions in the Dempster–Shafer (D–S) theory of evidence. Our definition shares many of the properties of the expectation operator in probability theory. Also, for Bayesian belief functions, our definition provides the same expected value as the probabilistic expectation operator. A traditional method of computing expected of real-valued functions is to first transform a D–S belief function to a corresponding probability mass function, and then use the expectation operator for probability mass functions. Transforming a belief function to a probability function involves loss of information. Our expectation operator works directly with D–S belief functions. Another definition is using Choquet integration, which assumes belief functions are credal sets, i.e. convex sets of probability mass functions. Credal sets semantics are incompatible with Dempster's combination rule, the center-piece of the D–S theory. In general, our definition provides different expected values than, e.g. if we use probabilistic expectation using the pignistic transform or the plausibility transform of a belief function. Using our definition of expectation, we provide new definitions of variance, covariance, correlation, and other higher moments and describe their properties.

更新日期：2020-01-26
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-20
Hoi-To Wai; Santiago Segarra; Asuman E. Ozdaglar; Anna Scaglione; Ali Jadbabaie

This paper considers a new framework to detect communities in a graph from the observation of signals at its nodes. We model the observed signals as noisy outputs of an unknown network process, represented as a graph filter that is excited by a set of unknown low-rank inputs/excitations. Application scenarios of this model include diffusion dynamics, pricing experiments, and opinion dynamics. Rather than learning the precise parameters of the graph itself, we aim at retrieving the community structure directly. The paper shows that communities can be detected by applying a spectral method to the covariance matrix of graph signals. Our analysis indicates that the community detection performance depends on an intrinsic ‘low-pass’ property of the graph filter. We also show that the performance can be improved via a low-rank matrix plus sparse decomposition method when the latent parameter vectors are known. Numerical results demonstrate that our approach is effective.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-11
Puoya Tabaghi; Ivan Dokmanić; Martin Vetterli

Euclidean distance matrices (EDMs) are a major tool for localization from distances, with applications ranging from protein structure determination to global positioning and manifold learning. They are, however, static objects which serve to localize points from a snapshot of distances. If the objects move, one expects to do better by modeling the motion. In this paper, we introduce Kinetic Euclidean Distance Matrices (KEDMs)—a new kind of time-dependent distance matrices that incorporate motion. The entries of KEDMs become functions of time, the squared time-varying distances. We study two smooth trajectory models—polynomial and bandlimited trajectories—and show that these trajectories can be reconstructed from incomplete, noisy distance observations, scattered over multiple time instants. Our main contribution is a semidefinite relaxation, inspired by similar strategies for static EDMs. Similarly to the static case, the relaxation is followed by a spectral factorization step; however, because spectral factorization of polynomial matrices is more challenging than for constant matrices, we propose a new factorization method that uses anchor measurements. Extensive numerical experiments show that KEDMs and the new semidefinite relaxation accurately reconstruct trajectories from noisy, incomplete distance data and that, in fact, motion improves rather than degrades localization if properly modeled. This makes KEDMs a promising tool for problems in geometry of dynamic points sets.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2020-01-06
Fangzhou Wang; Hongbin Li

We consider a hybrid active-passive radar system that employs a wireless source as a passive illuminator of opportunity (IO) and a co-channel active radar transmitter operating in the same frequency band to seek spectral efficiency. The hybrid system can take advantage of the strengths of passive radar (e.g., energy efficiency, bi-/multi-static configuration, and spatial diversity) as well as those of active radar (dedicated transmitter, flexible transmit beam steering, waveform optimized for sensing, etc.). To mitigate the mutual interference and location-induced timing uncertainty between the radar and communication signals, we propose two designs for the joint optimization of the radar waveform and receive filters. The first is a max-min (MM) criterion that optimizes a worst-case performance metric over a timing uncertainty interval, and the other a weighted-sum (WS) criterion that forms a weighted sum of the performance metric at each delay within the delay uncertainty interval. Both design criteria result in nonconvex constrained optimization problems that are solved by sequential convex programming methods. When timing uncertainty vanishes, the two designs become identical and admit a simpler solution. Numerical results are presented to demonstrate the performance of the proposed hybrid schemes in comparison with conventional active-only and passive-only radar systems.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-20
Péter Kovács; Sándor Fridli; Ferenc Schipp

In this paper we develop an adaptive transform-domain technique based on rational function systems. It is of general importance in several areas of signal theory, including filter design, transfer function approximation, system identification, control theory etc. The construction of the proposed method is discussed in the framework of a general mathematical model called variable projection. First we generalize this method by adding dimension type free parameters. Then we deal with the optimization problem of the free parameters. To this order, based on the well-known particle swarm optimization (PSO) algorithm, we develop the multi-dimensional hyperbolic PSO algorithm. It is designed especially for the rational transforms in question. As a result, the system along with its dimension is dynamically optimized during the process. The main motivation was to increase the adaptivity while keeping the computational complexity manageable. We note that the proposed method is of general nature. As a case study the problem of electrocardiogram (ECG) signal compression is discussed. By means of comparison tests performed on the PhysioNet MIT-BIH Arrhythmia database we demonstrate that our method outperforms other transformation techniques.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2020-01-06
Jing Bai; Yongchao Wang; Qingjiang Shi

This paper presents an efficient quadratic programming (QP) decoder via the alternating direction method of multipliers (ADMM) technique, called QP-ADMM, for binary low-density parity-check (LDPC) codes. Its main contents are as follows: first, we relax the maximum likelihood (ML) decoding problem to a non-convex quadratic program. Then, we develop an ADMM solving algorithm for the formulated non-convex QP decoding model. In the proposed QP-ADMM decoder, complex Euclidean projections onto the check polytope are eliminated and variables in each updated step can be solved analytically in parallel. Moreover, it is proved that the proposed ADMM algorithm converges to a stationary point of the non-convex QP problem under the assumption of sequence convergence. We also verify that the proposed decoder satisfies the favorable property of the all-zeros assumption . Furthermore, by exploiting the inside structures of the QP model, the complexity of the proposed algorithm in each iteration is shown to be linear in terms of LDPC code length. Simulation results demonstrate the effectiveness of the proposed QP-ADMM decoder.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2020-01-06
Ives Rey-Otero; Jeremias Sulam; Michael Elad

Over the past decade, the celebrated sparse representation model has achieved impressive results in various signal and image processing tasks. A convolutional version of this model, termed convolutional sparse coding (CSC), has been recently reintroduced and extensively studied. CSC brings a natural remedy to the limitation of typical sparse enforcing approaches of handling global and high-dimensional signals by local, patch-based, processing. While the classic field of sparse representations has been able to cater for the diverse challenges of different signal processing tasks by considering a wide range of problem formulations, almost all available algorithms that deploy the CSC model consider the same $\ell _1 - \ell _2$ problem form. As we argue in this paper, this CSC pursuit formulation is also too restrictive as it fails to explicitly exploit some local characteristics of the signal. This work expands the range of formulations for the CSC model by proposing two convex alternatives that merge global norms with local penalties and constraints. The main contribution of this work is the derivation of efficient and provably converging algorithms to solve these new sparse coding formulations.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2020-01-06
Brian P. Day; Aaron Evers; Daniel E. Hack

In continuous wave (CW) radar systems, multiple signal copies impinge the receiver simultaneously. Often, undesired multipath and direct-path copies are many times stronger than potential targets. When applying matched filter signal processing techniques, the undesired signal components can mask weaker targets and decrease performance of post-processing techniques, such as target indication or estimation. In this manuscript, we propose a method of rejecting multipath-scattered returns over a continuous region in range and Doppler. We explore the computational cost of this method and additionally propose an approximate method of rejection which leverages the well-known discrete prolate spheroidal sequences (DPSS)–typically referred to as Slepian sequences–to gain a computational advantage. Results are shown to decrease the effective noise floor when applying matched filtering techniques as well as increase target signal-to-interference-plus-noise ratio (SINR) outside of an undesired multipath region. Comparisons are shown to traditional CW multipath removal in terms of rejection performance and run-time.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2020-01-06
Zahra Sabetsarvestani; Francesco Renna; Franz Kiraly; Miguel Rodrigues

In this paper, we propose an algorithm for source separation with side information where one observes the linear superposition of two source signals plus two additional signals that are correlated with the mixed ones. Our algorithm is based on two ingredients: first, we learn a Gaussian mixture model (GMM) for the joint distribution of a source signal and the corresponding correlated side information signal; second, we separate the signals using standard computationally efficient conditional mean estimators. The paper also puts forth new recovery guarantees for this source separation algorithm. In particular, under the assumption that the signals can be perfectly described by a GMM model, we characterize necessary and sufficient conditions for reliable source separation in the asymptotic regime of low-noise as a function of the geometry of the underlying signals and their interaction. It is shown that if the subspaces spanned by the innovation components of the source signals with respect to the side information signals have zero intersection, provided that we observe a certain number of linear measurements from the mixture, then we can reliably separate the sources; otherwise we cannot. Our proposed framework – which provides a new way to incorporate side information to aid the solution of source separation problems where the decoder has access to linear projections of superimposed sources and side information – is also employed in a real-world art investigation application involving the separation of mixtures of X-ray images. The simulation results showcase the superiority of our algorithm against other state-of-the-art algorithms.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2020-01-06
Leibo Liu; Guiqiang Peng; Pan Wang; Sheng Zhou; Qiushi Wei; Shouyi Yin; Shaojun Wei

Minimum-mean-square-error (MMSE) detection is increasingly relevant for massive multiple-input multiple-output (MIMO) systems. MMSE suffers from high computational complexity and low parallelism because of the increasing number of users and antennas in massive MIMO systems. This paper proposes a recursive conjugate gradient (RCG) method to iteratively estimate signals. First, a recursive conjugate gradient detection algorithm is proposed that achieves high parallelism and low complexity through iteration. Second, a quadrant-certain-based initial method that improves detection accuracy without added complexity is proposed. Third, an approximated log likelihood ratio (LLR) computation method is proposed to achieve simplified calculation. The analyses show that compared with related methods, the proposed RCG algorithm reduces computational complexity and exploits the potential parallelism. RCG is mathematically demonstrated to achieve low approximated error. Based on the RCG method, an architecture is proposed in a 128 × 8 64-QAM massive MIMO system. First, a parallel processing element array with single-sided input is adopted; this array eliminates the throughput limitation. Second, a deeply pipelined user-level method based on the recursive conjugate gradient method is proposed. Third, an approximated architecture is proposed to compute the soft output. The architecture is verified on an FPGA and fabricated on 1.87 × 1.87 mm $^2$ silicon with TSMC 65 nm CMOS technology. The chip achieves 2.69 Mbps/mW and 1.09 Mbps/kG energy efficiency (throughput/power) and area efficiency (throughput/area), respectively, which are 2.39 to 10.60× and 1.15 to 8.81× those of the normalized state-of-the-art designs.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-11-20
Saeid Sedighi; Bhavani Shankar Mysore Rama Rao; Björn Ottersten

Co-array-based Direction of Arrival (DoA) estimation using Sparse Linear Arrays (SLAs) has recently gained considerable interest in array processing thanks to its capability of providing enhanced degrees of freedom. Although the literature presents a variety of estimators in this context, none of them are proven to be statistically efficient. This work introduces a novel estimator for the co-array-based DoA estimation employing the Weighted Least Squares (WLS) method. An analytical expression for the large sample performance of the proposed estimator is derived. Then, an optimal weighting is obtained so that the asymptotic performance of the proposed WLS estimator coincides with the Cramér-Rao Bound (CRB), thereby ensuring asymptotic statistical efficiency of resulting WLS estimator. This implies that the proposed WLS estimator has a significantly better performance compared to existing methods. Numerical simulations are provided to validate the analytical derivations and corroborate the improved performance.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2020-01-06
An Liu; Lixiang Lian; Vincent Lau; Guanying Liu; Min-Jian Zhao

Due to the high resolution of angles of arrivals (AoAs) provided by the massive MIMO base station in 5 G wireless systems, it is promising to integrate 5G-based localization technology into autonomous driving to improve the accuracy and robustness of vehicle localization. In this paper, we investigate the problem of 5G cloud-assisted cooperative localization for vehicle platoons. The existing 5G-based localization algorithms focused on single-user localization and are not efficient for the localization of vehicle platoon where the positions of the vehicles are highly correlated. To the best of our knowledge, cloud-assisted cooperative localization tailored to vehicle platoons has not been studied before. To address this challenging problem, we first propose a Gamma-Markov-Group-Sparse (GMGS) model to capture the joint distribution of the vehicle positions in a vehicle platoon. Then we formulate the vehicle platoon cooperative localization as a sparse Bayesian inference (SBI) problem. The existing standard SBI algorithms such as variational Bayesian inference (VBI) and approximate message passing (AMP) cannot be applied to our platoon localization problem due to the complicated GMGS prior and the ill-conditioned measurement matrix. As such, we propose a novel turbo vehicle platoon cooperative localization (Turbo-VPCL) algorithm to fully exploit the correlations of the vehicle positions (as captured by the GMGS prior) under the ill-conditioned measurement matrix. Simulation results verify that the proposed Turbo-VPCL can achieve significant gain over the-state-of-art SBI algorithms.

更新日期：2020-01-24
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-11
Ali Ahmed

This paper considers the blind deconvolution of multiple modulated signals/filters, and an arbitrary filter/signal. Multiple inputs $\boldsymbol{s}_1, \boldsymbol{s}_2, \ldots, \boldsymbol{s}_N =: [\boldsymbol{s}_n]$ are modulated (pointwise multiplied) with random sign sequences $\boldsymbol{r}_1, \boldsymbol{r}_2, \ldots, \boldsymbol{r}_N =: [\boldsymbol{r}_n]$ , respectively, and the resultant inputs $(\boldsymbol{s}_n \odot \boldsymbol{r}_n) \in \mathbb {C}^Q, \ n \in [N]$ are convolved against an arbitrary input $\boldsymbol{h} \in \mathbb {C}^M$ to yield the measurements $\boldsymbol{y}_n = (\boldsymbol{s}_n\odot \boldsymbol{r}_n)\circledast \boldsymbol{h}, \ n \in [N] := 1,2,\ldots,N,$ where $\odot$ and $\circledast$ denote pointwise multiplication, and circular convolution. Given $[\boldsymbol{y}_n]$ , we want to recover the unknowns $[\boldsymbol{s}_n]$ and $\boldsymbol{h}$ . We make a structural assumption that unknowns $[\boldsymbol{s}_n]$ are members of a known $K$ -dimensional (not necessarily random) subspace, and prove that the unknowns can be recovered from sufficiently many observations using a regularized gradient descent algorithm whenever the modulated inputs $\boldsymbol{s}_n \odot \boldsymbol{r}_n$ are long enough, i.e, $Q \gtrsim KN+M$ (to within logarithmic factors, and signal dispersion/coherence parameters). Under the bilinear model, this is the first result on multichannel ( $N\geq 1$ ) blind deconvolution with provable recovery guarantees under near optimal (in the $N=1$ case) sample complexity estimates, and comparatively lenient structural assumptions on the convolved inputs. A neat conclusion of this result is that modulation of a bandlimited signal protects it against an unknown convolutive distortion. We discuss the applications of this result in passive imaging, wireless communication in unknown environment, and image deblurring. A thorough numerical investigation of the theoretical results is also presented using phase transitions, image deblurring experiments, and noise stability plots.

更新日期：2020-01-17
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-20
Matthew R. O’Shaughnessy; Mark A. Davenport; Christopher J. Rozell

Many signal processing applications require estimation of time-varying sparse signals, potentially with the knowledge of an imperfect dynamics model. In this paper, we propose an algorithm for dynamic filtering of time-varying sparse signals based on the sparse Bayesian learning (SBL) framework. The key idea underlying the algorithm, termed SBL-DF, is the incorporation of a signal prediction generated from a dynamics model and estimates of previous time steps into the hyperpriors of the SBL probability model. The proposed algorithm is online, robust to imperfect dynamics models (due to the propagation of dynamics information through higher-order statistics), robust to certain undesirable dictionary properties such as coherence (due to properties of the SBL framework), allows the use of arbitrary dynamics models, and requires the tuning of fewer parameters than many other dynamic filtering algorithms do. We also extend the fast marginal likelihood SBL inference procedure to the informative hyperprior setting to create a particularly efficient version of the SBL-DF algorithm. Numerical simulations show that SBL-DF converges much faster and to more accurate solutions than standard SBL and other dynamical filtering algorithms. In particular, we show that SBL-DF outperforms state of the art algorithms when the dictionary contains the challenging coherence and column scaling structure found in many practical applications.

更新日期：2020-01-17
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-20
Amir Masoud Molaei; Bijan Zakeri; Seyed Mehdi Hosseini Andargoli

In recent years, the sources localization has noticed an increase in research conducted on the problem of mixed far-field sources (FFSs) and near-field sources (NFSs). The main assumption of the existing researches is that the signals should be uncorrelated. Therefore, they cannot be used for multipath environments. The present paper provides a method called components separation algorithm (CSA) for the localization of multiple mixed FFSs and NFSs, including uncorrelated, lowly correlated and coherent signals. Firstly, by constructing one special cumulant matrix, and using a MUSIC-based technique, the noncoherent DOA vector (NDOAV) is extracted. By constructing another special cumulant matrix, and with respect to NDOAV, an estimate of the range, as well as a signal classification is obtained for noncoherent sources. Then, by estimating their kurtosis, the noncoherent component and consequently the coherent one of the second cumulant matrix is obtained. Finally, by introducing a novel approach based on squaring, projection, spatial smoothing, array interpolation transform and coherent component restoring, the parameters of coherent signals in each coherent group are estimated separately. The CSA prevents severe loss of the aperture. Furthermore, it does not require any pairing. The simulation results validate its satisfactory performance in terms of estimation accuracy, resolution, computational complexity, reasonable classification, and also its robustness against lowly correlated sources.

更新日期：2020-01-17
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-20
Xiaodan Shao; Xiaoming Chen; Rundong Jia

Grant-free random access is a promising protocol to support massive access in beyond fifth-generation (B5G) cellular Internet-of-Things (IoT) with sporadic traffic. Specifically, in each coherence interval, the base station (BS) performs joint activity detection and channel estimation (JADCE) before data transmission. Due to the deployment of a large-scale antennas array and the existence of a huge number of IoT devices, JADCE usually has high computational complexity and needs long pilot sequences. To solve these challenges, this paper proposes a dimension reduction method, which projects the original device state matrix to a low-dimensional space by exploiting its sparse and low-rank structure. Then, we develop an optimized design framework with a coupled full column rank constraint for JADCE to reduce the size of the search space. However, the resulting problem is non-convex and highly intractable, for which the conventional convex relaxation approaches are inapplicable. To this end, we propose a logarithmic smoothing method for the non-smoothed objective function and transform the interested matrix to a positive semidefinite matrix, followed by giving a Riemannian trust-region algorithm to solve the problem in complex field. Simulation results show that the proposed algorithm is efficient to a large-scale JADCE problem and requires shorter pilot sequences than the state-of-art algorithms which only exploit the sparsity of device state matrix.

更新日期：2020-01-17
• Acta Inform. (IF 1.042) Pub Date : 2020-01-13
Jan A. Bergstra, Alban Ponse

Abstract We consider several novel congruences on the signature of meadows with the aim to survey different notions of fractions. In particular we suggest a notion of “true fraction”.

更新日期：2020-01-13
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-11
Daniyal Amir Awan; Renato L. G. Cavalcante; Slawomir Stanczak

Learning of the cell-load in radio access networks (RANs) has to be performed within a short time period. Therefore, we propose a learning framework that is robust against uncertainties resulting from the need for learning based on a relatively small training set. To this end, we incorporate prior knowledge about the cell-load in the learning framework. For example, an inherent property of the cell-load is that it is monotonic in downlink (data) rates. To obtain additional prior knowledge we first study the feasible rate region, i.e., the set of all vectors of user rates that can be supported by the network. We prove that the feasible rate region is compact. Moreover, we show the existence of a Lipschitz function that maps feasible rate vectors to cell-load vectors. With these results in hand, we present a learning technique that guarantees a minimum approximation error in the worst-case scenario by using prior knowledge and a small training sample set. Simulations in the network simulator NS3 demonstrate that the proposed method exhibits better robustness and accuracy than standard learning techniques, especially for small training sample sets.

更新日期：2020-01-10
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-11
Ahmed Douik; Xing Liu; Tarig Ballal; Tareq Y. Al-Naffouri; Babak Hassibi

In the past few years, Global Navigation Satellite Systems (GNSS) based attitude determination has been widely used thanks to its high accuracy, low cost, and real-time performance. This paper presents a novel 3-D GNSS attitude determination method based on Riemannian optimization techniques. The paper first exploits the antenna geometry and baseline lengths to reformulate the 3-D GNSS attitude determination problem as an optimization over a non-convex set. Since the solution set is a manifold, in this manuscript we formulate the problem as an optimization over a Riemannian manifold. The study of the geometry of the manifold allows the design of efficient first and second order Riemannian algorithms to solve the 3-D GNSS attitude determination problem. Despite the non-convexity of the problem, the proposed algorithms are guaranteed to globally converge to a critical point of the optimization problem. To assess the performance of the proposed framework, numerical simulations are provided for the most challenging attitude determination cases: the unaided, single-epoch, and single-frequency scenarios. Numerical results reveal that the proposed algorithms largely outperform state-of-the-art methods for various system configurations with lower complexity than generic non-convex solvers, e.g., interior point methods.

更新日期：2020-01-10
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-11
Yasuhiro Takano; Hsuan-Jung Su; Yoshiaki Shiraishi; Masakatu Morii

Spatial–temporal (ST) subspace-based channel estimation techniques formulated with $\ell 2$ minimum mean square error (MMSE) criterion alleviate the multi-access interference (MAI) problem when the interested signals exhibit low-rank property. However, the conventional $\ell 2$ ST subspace-based methods suffer from mean squared error (MSE) deterioration in unknown interference channels, due to the difficulty to separate the interested signals from the channel covariance matrices (CCMs) contaminated with unknown interference. As a solution to the problem, we propose a new $\ell 1$ regularized ST channel estimation algorithm by applying the expectation-maximization (EM) algorithm to iteratively examine the signal subspace and the corresponding sparse-supports. The new algorithm updates the CCM independently of the slot-dependent $\ell 1$ regularization, which enables it to correctly perform the sparse-independent component analysis (ICA) with a reasonable complexity order. Simulation results shown in this paper verify that the proposed technique significantly improves MSE performance in unknown interference MIMO channels, and hence, solves the BER floor problems from which the conventional receivers suffer.

更新日期：2020-01-10
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-12-13
Maja Taseska; Toon van Waterschoot; Emanuël A. P. Habets; Ronen Talmon

Frameworks for efficient and accurate data processing often rely on a suitable representation of measurements that capture phenomena of interest. Typically, such representations are high-dimensional vectors obtained by a transformation of raw sensor signals such as time-frequency transform, lag-map, etc. In this work, we focus on representation learning approaches that consider the measurements as the nodes of a weighted graph, with edge weights computed by a given kernel . If the kernel is chosen properly, the eigenvectors of the resulting graph affinity matrix provide suitable representation coordinates for the measurements. Consequently, tasks such as regression, classification, and filtering, can be done more efficiently than in the original domain of the data. In this paper, we address the problem of representation learning from measurements, which besides the phenomenon of interest contain undesired sources of variability. We propose data-driven kernels to learn representations that accurately parametrize the phenomenon of interest, while reducing variations due to other sources of variability. This is a non-linear filtering problem, which we approach under the assumption that certain geometric information about the undesired variables can be extracted from the measurements, e.g., using an auxiliary sensor. The applicability of the proposed kernels is demonstrated in toy problems and in a real signal processing task.

更新日期：2020-01-10
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-10-02
Xuejing Zhang; Xiang-Gen Xia; Zishu He; Xuepan Zhang

This paper presents two secure transmission algorithms for millimeter-wave wireless communication, which are computationally attractive and have analytical solutions. In the proposed algorithms, we consider phased-array transmission structure and focus on phase shift keying (PSK) modulation. It is found that the traditional constellation synthesis problem can be solved with the aid of polygon construction in the complex plane. A detailed analysis is then carried out and an analytical procedure is developed to obtain a qualified phase solution. For a given synthesis task, it is derived that there exist infinite weight vector solutions under a mild condition. Based on this result, we propose the first secure transmission algorithm by varying the transmitting weight vector at symbol rate, thus resulting exact phases at the intended receiver and producing randomnesses at the undesired eavesdroppers. To improve the security without significantly degrading the symbol detection reliability for target receiver, the second secure transmission algorithm is devised by allowing a relaxed symbol region for the intended receiver. Compared to the first algorithm, the second one incorporates an additional random phase rotation operation to the transmitting weight vector and brings extra disturbance for the undesired eavesdroppers. Different from the existing works that are only feasible for the case of single-path mmWave channels, our proposed algorithms are applicable to more general multi-path channels. Moreover, all the antennas are active in the proposed algorithms and the on-off switching circuit is not needed. Simulations are presented to demonstrate the effectivenesses of the proposed algorithms under various situations.

更新日期：2020-01-10
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-11-20
Geethu Joseph; Chandra R. Murthy

Dictionary learning (DL) is a well-researched problem, where the goal is to learn a dictionary from a finite set of noisy training signals, such that the training data admits a sparse representation over the dictionary. While several solutions are available in the literature, relatively little is known about their convergence and optimality properties. In this paper, we make progress on this problem by analyzing a Bayesian algorithm for DL. Specifically, we cast the DL problem into the sparse Bayesian learning (SBL) framework by imposing a hierarchical Gaussian prior on the sparse vectors. This allows us to simultaneously learn the dictionary as well as the parameters of the prior on the sparse vectors using the expectation-maximization algorithm. The dictionary update step turns out to be a non-convex optimization problem, and we present two solutions, namely, an alternating minimization (AM) procedure and an Armijo line search (ALS) method. We analytically show that the ALS procedure is globally convergent, and establish the stability of the solution by characterizing its limit points. Further, we prove the convergence and stability of the overall DL-SBL algorithm, and show that the minima of the cost function of the overall algorithm are achieved at sparse solutions. As a concrete example, we consider the application of the SBL-based DL algorithm to image denoising, and demonstrate the efficacy of the algorithm relative to existing DL algorithms.

更新日期：2020-01-10
• IEEE Trans. Signal Process. (IF 5.230) Pub Date : 2019-11-28
Paolo Braca; Augusto Aubry; Leonardo Maria Millefiori; Antonio De Maio; Stefano Marano

Covariance matrix estimation is a crucial task in adaptive signal processing applied to several surveillance systems, including radar and sonar. In this paper we propose a dynamic learning strategy to track both the covariance matrix of data and its structure (class). We assume that, given the class, the posterior distribution of the covariance is described through a mixture of inverse Wishart distributions, while the class evolves according to a Markov chain. Hence, we devise a novel and general filtering strategy, called multi-class inverse Wishart mixture filter, able to capitalize on previous observations so as to accurately track and estimate the covariance. Some case studies are provided to highlight the effectiveness of the proposed technique, which is shown to outperform alternative methods in terms of both covariance estimation accuracy and probability of correct model selection. Specifically, the proposed filter is compared with class-clairvoyant covariance estimators, e.g., the maximum likelihood and the knowledge-based recursive least square filter, and with the model order selection method based on the Bayesian information criterion.

更新日期：2020-01-10
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-08-22
Chuli Hu; Jie Li; Changjiang Xiao; Ke Wang; Nengcheng Chen

When facing a specific emergent geographical environment observation task (GeoTask), people need to be able to handle reliable and comprehensive disaster information in the shortest possible time. The lack of effective cognition of multi-sensor collaborated observation capability is a hindrance to performance. By adopting the GIS object field concept as the bottom framework, we propose a sensor observation capability object field (SOCO-Field) with sensor observation capability particle (SOC-Particle) as its core. SOCO-Field integrates SOC-Objects and GeoField for the discovery and association of sensors. SOC-Particle objectively exists on every location point in the geospatial environment, and SOC-Particles in space-continuous areas can further aggregate into SOC-Particle cluster to represent single- or multi-sensor-associated observation capability information. SOCO-Field includes three basic association behaviours and four further association behaviours to solve associated observation capability, in which the dynamic GeoField is the influential factor. An experiment on flood monitoring in the lower reaches of Jinsha River Basin is conducted. The sensor planner can view any sensor combination’s associated observation capability under a specific association mode and can effectively dispatch a multi-sensor for collaborated observation due to the effective modelling of associated sensor observation capability information (SOCInfo).

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-08-27
Antoni Moore; Ben Daniel; Greg Leonard; Holger Regenbrecht; Judy Rodda; Lewis Baker; Raki Ryan; Steven Mills

Augmented Reality (AR) sandtables facilitate the shaping of sand to form a surface that is transformed into a digital terrain map which is projected back onto the sand. Although a mature technology, there are still few instances of sandtables being used in surface analysis. Fundamentally there has not been any reported formal assessment of how well sandtables perform in an educational context compared to other conventional learning environments. We compared learning outcomes from using an AR sandtable versus a conventional 3D GIS to convey key concepts in terrain and hydrological analyses via usability and knowledge testing. Overall results from students at a research-intensive New Zealand university reveal a faster task performance and more learning satisfaction when using the sandtable to undertake experimental tasks. Effectiveness and knowledge quiz results revealed no significant difference between the technologies though there was a trend for more accurate answers with 3D GIS tasks. Student learning wise, the sandtable integrated core concepts (especially morphometry) more effectively though both technologies were otherwise similar. We conclude that sandtables have high potential in geospatial teaching, fostering accessible and engaging means of introducing terrain and hydrological concepts, prior to undertaking a more accurate and precise surface analysis with 3D GIS.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-09-03
Maryam Barzegar; Abbas Rajabifard; Mohsen Kalantari; Behnam Atazadeh

In the cadastral system of Victoria, the legal extent of properties is defined by a wide range of property boundaries referencing physical objects. These boundary types are typically represented in 2D plans. However, to address the challenges of 2D plans in complex buildings, there has been a growing trend towards the adoption of 3D models. These 3D models have been mainly used for visualization purposes, with no spatial query developed for identifying property boundaries in 3D models. Among 3D models, the Building Information Modelling (BIM) environment provides the potential capabilities for modelling property boundaries. In this paper, a spatial query approach predicated on topological relationships between a legal space and physical objects is developed to identify four types of property boundaries. A BIM model based on a case study in Victoria is used for implementation of the developed approach and the retrieved boundaries related to several properties with different levels of structural complexities are represented. In addition, various challenges during this process such as the impact of different design methods, and issues related to balconies and doors are discussed. The importance of this study is highlighted by a common scenario related to querying property boundaries.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-09-05
Wei Wei; Weidong Yang; Weibin Yao; Heyang Xu

The shortest-path algorithm is one of the most important algorithms in geographical information systems. Bellman’s principle of optimization (BPO) is implicit in the shortest-path problem; that is, any involved node must be located in the simple paths between source and destination nodes. Unfortunately, BPO has never been explicitly used to exclude irrelevant nodes in existing methods, potentially leading to unnecessary searches among irrelevant nodes. To address this problem, we propose a BPO-based shortest-path acceleration algorithm (BSPA). In BSPA, a high-level graph is built to locate the necessary nodes and is used to partition the graph and divide a given task into independent subtasks. This allows the speed of any existing method to be improved using parallel computing. In a test using random graphs, on average, at most only 1.209% of the nodes need to be involved in the calculation. When compared with existing algorithms in real-world road networks, the BSPA shows faster preprocessing and query times, being respectively 118 and 463 times faster in the best case. In the worst case, they remain slightly faster.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-10-02
Motti Zohar; Tali Erickson-Gini

As early as the fourth century BCE, the Nabateans established the Incense Road to facilitate the transport of aromatic substances (frankincense and myrrh) from the Arabian Peninsula to the Mediterranean basin. An important part of this road was the segment between Petra and Gaza. Although studied before, the accurate route of parts of this segment is still vague since evidence of Roman milestones are scarce and significant portions of the landscape have changed dramatically in modern times, essentially wiping out the tracks of ancient roads. In this study, we use Geographic Information Systems (GIS) and Least Cost Path (LCP) analyses for reconstructing the original path of the Incense Road as well as verifying the factors influencing its establishment. The implemented analyses support the archeological evidence of two travel phases between Petra and Oboda (Avdat): During the first phase the Nabateans used the Darb es-Sultan route; during the second phase, from the first century BCE onwards, they passed through the Ramon Crater. This is the first time such reconstruction is made in the southern Levant. It was found that slope degree and the distance to water resources are dominant factors in reconstructing the accurate path of the Incense Road.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-10-09
Wenwen Li; Michael Batty; Michael F. Goodchild

(2020). Real-time GIS for smart cities. International Journal of Geographical Information Science: Vol. 34, No. 2, pp. 311-324.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-03-17
Balamurugan Soundararaj; James Cheshire; Paul Longley

The accurate measurement of human activity with high spatial and temporal granularity is crucial for understanding the structure and function of the built environment. With increasing mobile ownership, the Wi-Fi ‘probe requests’ generated by mobile devices can act as a cheap, scalable and real-time source of data for establishing such measures. The two major challenges we face in using these probe requests for estimating human activity are: filtering the noise generated by the uncertain field of measurement and clustering anonymised probe requests generated by the same devices together without compromising the privacy of the users. In this paper, we demonstrate that we can overcome these challenges by using class intervals and a novel graph-based technique for filtering and clustering the probe requests which in turn, enables us to reliably measure real-time pedestrian footfall at retail high streets.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-06-21
Samuel Stehle; Rob Kitchin

City dashboards have become a common smart city technology, emerging as a key means of sharing and visualising urban data for the benefit of the public and city administrations. Operating as the front-end of many cities’ data stores, dashboards display and benchmark indicators relating to city operations, characteristics, and trends, displayed through interactive visual representations of spatial and temporal patterns. Many dashboards collect, archive, and present data collected in real-time, as well as more traditional time-sliced administrative data. In this paper, we evaluate the techniques that dashboards employ to present real-time data to dashboard users. Our analysis identifies two factors that shape and differentiate real-time visual analytic tools: the dynamic nature of the data, how they are refreshed, and how the realtimeness of the data is communicated to the user; and how the tool enables archival comparison. We assess dashboard design according to the strategies used to address specific challenges associated with each factor, specifically change blindness and temporal pattern detection. We conclude by proposing effective techniques for city dashboard design.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-04-30
Yongha Park; Jerry Mount; Luyu Liu; Ningchuan Xiao; Harvey J. Miller

Public transit vehicles such as buses operate within shared transportation networks subject to dynamic conditions and disruptions such as traffic congestion. The operational delays caused by these conditions can propagate downstream through scheduled transit routes, affecting system performance beyond the initial delay. This paper develops an approach to measuring and assessing vehicle delay propagation in public transit systems. We fuse data on scheduled bus service with real-time vehicle location data to measure the originating, cascading and recovery locations of delay events across space with respect to time. We integrate the resulting patterns to construct stop-specific delay propagation networks. We also analyze the spatiotemporal patterns of propagating delays using parameters such as 1) transit line-based network distance, 2) total propagating delay size, and 3) distance decay. We apply our methodology using publicly available schedule and real-time location data from the Central Ohio Transit Authority (COTA) public bus system in Columbus, Ohio, USA. We find that delay initiation is spatially and temporally uneven, concentrating on specific stops in downtown and specific suburban locations. Core stops play a critical role in propagating delays to a wide range of connected stops, eventually having a disproportional impact on the on-time performance of the bus system.

更新日期：2020-01-08
• Int. J. Geograph. Inform. Sci. (IF 3.545) Pub Date : 2019-07-15
Iranga Subasinghe; Silvia Nittel; Michael Cressey; Melissa Landon; Prashanta Bajracharya

Natural disasters such as flooding, wildfires, and mudslides are rare events, but they affect citizens at unpredictable times and the impact on human life can be significant. Citizens located close to events can provide detailed, real-time data streams capturing their event response. Instead of visualizing individual updates, an integrated spatiotemporal map yields ‘big picture’ event information. We investigate the question of whether information from affected citizens is sufficient to generate a map of an unfolding natural disaster. We built the Citizen Disaster Reaction Multi-Agent Simulation (CDR-MAS), a multi-agent system that simulates the reaction of citizens to a natural disaster in an urban region. We proposed an rkNN classification algorithm to aggregate the update streams into a series of colored Voronoi event maps. We simulated the 2018 Montecito Creek mudslide and customized the CDR-MAS with the local environment to systematically generate stream data sets. Our experimental evaluation showed that event mapping based on citizen update streams is significantly influenced by the amount of citizen participation and movement. Compared with a baseline of 100% participation, with 40% citizen participation, the event region was predicted with 40% accuracy, showing that citizen update streams can provide timely information in a smart city.

更新日期：2020-01-08
• VLDB J. (IF 1.973) Pub Date : 2020-01-01
Jiawei Jiang, Fangcheng Fu, Tong Yang, Yingxia Shao, Bin Cui

Abstract Distributed machine learning (ML) has been extensively studied to meet the explosive growth of training data. A wide range of machine learning models are trained by a family of first-order optimization algorithms, i.e., stochastic gradient descent (SGD). The core operation of SGD is the calculation of gradients. When executing SGD in a distributed environment, the workers need to exchange local gradients through the network. In order to reduce the communication cost, a category of quantification-based compression algorithms are used to transform the gradients to binary format, at the expense of a low precision loss. Although the existing approaches work fine for dense gradients, we find that these methods are ill-suited for many cases where the gradients are sparse and nonuniformly distributed. In this paper, we study is there a compression framework that can efficiently handle sparse and nonuniform gradients? We propose a general compression framework, called SKCompress, to compress both gradient values and gradient keys in sparse gradients. Our first contribution is a sketch-based method that compresses the gradient values. Sketch is a class of algorithm that approximates the distribution of a data stream with a probabilistic data structure. We first use a quantile sketch to generate splits, sort gradient values into buckets, and encode them with the bucket indexes. Our second contribution is a new sketch algorithm, namely MinMaxSketch, which compresses the bucket indexes. MinMaxSketch builds a set of hash tables and solves hash collisions with a MinMax strategy. Since the bucket indexes are nonuniform, we further adopt Huffman coding to compress MinMaxSketch. To compress the keys of sparse gradients, the third contribution of this paper is a delta-binary encoding method that calculates the increment of the gradient keys and encode them with binary format. An adaptive prefix is proposed to assign different sizes to different gradient keys, so that we can save more space. We also theoretically discuss the correctness and the error bound of our proposed methods. To the best of our knowledge, this is the first effort utilizing data sketch to compress gradients in ML. We implement a prototype system in a real cluster of our industrial partner Tencent Inc. and show that our method is up to $$12\times$$ faster than the existing methods.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-12-20
Silu Huang, Liqi Xu, Jialin Liu, Aaron J. Elmore, Aditya Parameswaran

Abstract Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce OrpheusDB, a dataset version control system that “bolts on” versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database “for free.” We develop and evaluate multiple data models for representing versioned data, as well as a lightweight partitioning scheme, LyreSplit, to further optimize the models for reduced query latencies. With LyreSplit, OrpheusDB is on average $$10^3\times$$ faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to $$20\times$$ relative to schemes without partitioning. LyreSplit can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by $$10\times$$ on average.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-12-14
Jianbin Qin, Chuan Xiao, Sheng Hu, Jie Zhang, Wei Wang, Yoshiharu Ishikawa, Koji Tsuda, Kunihiko Sadakane

Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper, we study the problem of query autocompletion that tolerates errors in users’ input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes of data strings whose edit distances from the query string are within the given threshold. The major inherent drawback of these approaches is that the number of such prefixes is huge for the first few characters of the query string and is exponential in the alphabet size. This results in slow query response even if the entire query approximately matches only few prefixes. We propose a novel neighborhood generation-based method to process error-tolerant query autocompletion. Our proposed method only maintains a small set of active nodes, thus saving both space and time to process the query. We also study efficient duplicate removal, a core problem in fetching query answers, and extend our method to support top-k queries. Optimization techniques are proposed to reduce the index size. The efficiency of our method is demonstrated through extensive experiments on real datasets.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-10-26
Fan Zhang, Xuemin Lin, Ying Zhang, Lu Qin, Wenjie Zhang

Abstract In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we consider the pairwise similarity among users based on their attributes. Efficient algorithms are proposed to enumerate all maximal (k,r)-cores and find the maximum (k,r)-core, where both problems are shown to be NP-hard. Effective pruning techniques substantially reduce the search space of two algorithms. A novel ($$k$$,$$k'$$)-core based ($$k$$,$$r$$)-core size upper bound enhances the performance of the maximum (k,r)-core computation. We also devise effective search orders for two algorithms with different search priorities for vertices. Besides, we study the diversified ($$k$$,$$r$$)-core search problem to find l maximal ($$k$$,$$r$$)-cores which cover the most vertices in total. These maximal ($$k$$,$$r$$)-cores are distinctive and informationally rich. An efficient algorithm is proposed with a guaranteed approximation ratio. We design a tight upper bound to prune unpromising partial ($$k$$,$$r$$)-cores. A new search order is designed to speed up the search. Initial candidates with large size are generated to further enhance the pruning power. Comprehensive experiments on real-life data demonstrate that the maximal (k,r)-cores enable us to find interesting cohesive subgraphs, and performance of three mining algorithms is effectively improved by all the proposed techniques.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-09-28
Tianming Zhang, Yunjun Gao, Lu Chen, Wei Guo, Shiliang Pu, Baihua Zheng, Christian S. Jensen

Reachability computation is a fundamental graph functionality with a wide range of applications. In spite of this, little work has as yet been done on efficient reachability queries over temporal graphs, which are used extensively to model time-varying networks, such as communication networks, social networks, and transportation schedule networks. Moreover, we are faced with increasingly large real-world temporal networks that may be distributed across multiple data centers. This state of affairs motivates the paper’s study of efficient reachability queries on distributed temporal graphs. We propose an efficient index, called Temporal Vertex Labeling (TVL), which is a labeling scheme for distributed temporal graphs. We also present algorithms that exploit TVL to achieve efficient support for distributed reachability querying over temporal graphs in Pregel-like systems. The algorithms exploit several optimizations that hinge upon non-trivial lemmas. Extensive experiments using massive real and synthetic temporal graphs are conducted to provide detailed insight into the efficiency and scalability of the proposed methods, covering both index construction and query processing. Compared with the state-of-the-art methods, the TVL based query algorithms are capable of up to an order of magnitude speedup with lower index construction overhead.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-10-17
Weilong Ren, Xiang Lian, Kambiz Ghazinour

Abstract Nowadays, efficient and effective processing over massive stream data has attracted much attention from the database community, which are useful in many real applications such as sensor data monitoring, network intrusion detection, and so on. In practice, due to the malfunction of sensing devices or imperfect data collection techniques, real-world stream data may often contain missing or incomplete data attributes. In this paper, we will formalize and tackle a novel and important problem, named skyline query over incomplete data stream (Sky-iDS), which retrieves skyline objects (in the presence of missing attributes) with high confidences from incomplete data stream. In order to tackle the Sky-iDS problem, we will design efficient approaches to impute missing attributes of objects from incomplete data stream via differential dependency (DD) rules. We will propose effective pruning strategies to reduce the search space of the Sky-iDS problem, devise cost-model-based index structures to facilitate the data imputation and skyline computation at the same time, and integrate our proposed techniques into an efficient Sky-iDS query answering algorithm. Extensive experiments have been conducted to confirm the efficiency and effectiveness of our Sky-iDS processing approach over both real and synthetic data sets.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-10-04
Xuelian Lin, Jiahao Jiang, Shuai Ma, Yimeng Zuo, Chunming Hu

Abstract Various mobile devices have been used to collect, store and transmit tremendous trajectory data, and it is known that raw trajectory data seriously wastes the storage, network bandwidth and computing resource. To attack this issue, one-pass line simplification ($$\textsf {LS}$$) algorithms have been developed, by compressing data points in a trajectory to a set of continuous line segments. However, these algorithms adopt the perpendicular Euclidean distance, and none of them uses the synchronous Euclidean distance ($$\textsf {SED}$$), and cannot support spatiotemporal queries. To do this, we develop two one-pass error bounded trajectory simplification algorithms ($$\textsf {CISED}$$-$$\textsf {S}$$ and $$\textsf {CISED}$$-$$\textsf {W}$$) using $$\textsf {SED}$$, based on a novel spatiotemporal cone intersection technique. Using four real-life trajectory datasets, we experimentally show that our approaches are both efficient and effective. In terms of running time, algorithms $$\textsf {CISED}$$-$$\textsf {S}$$ and $$\textsf {CISED}$$-$$\textsf {W}$$ are on average 3 times faster than $$\textsf {SQUISH}$$-$$\textsf {E}$$ (the fastest existing $$\textsf {LS}$$ algorithm using $$\textsf {SED}$$). In terms of compression ratios, $$\textsf {CISED}$$-$$\textsf {S}$$ is close to and $$\textsf {CISED}$$-$$\textsf {W}$$ is on average $$19.6\%$$ better than $$\textsf {DPSED}$$ (the existing sub-optimal $$\textsf {LS}$$ algorithm using $$\textsf {SED}$$ and having the best compression ratios), and they are $$21.1\%$$ and $$42.4\%$$ better than $$\textsf {SQUISH}$$-$$\textsf {E}$$ on average, respectively.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-09-25
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, Themis Palpanas

Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing cannot be sorted while keeping similar data series close to each other in the sorted order. To address this problem, we present Coconut, the first data series index based on sortable summarizations and the first efficient solution for indexing and querying streaming series. The first innovation in Coconut is an inverted, sortable data series summarization that organizes data series based on a z-order curve, keeping similar series close to each other in the sorted order. As a result, Coconut is able to use bulk loading and updating techniques that rely on sorting to quickly build and maintain a contiguous index using large sequential disk I/Os. We then explore prefix-based and median-based splitting policies for bottom-up bulk loading, showing that median-based splitting outperforms the state of the art, ensuring that all nodes are densely populated. Finally, we explore the impact of sortable summarizations on variable-sized window queries, showing that they can be supported in the presence of updates through efficient merging of temporal partitions. Overall, we show analytically and empirically that Coconut dominates the state-of-the-art data series indexes in terms of construction speed, query speed, and storage costs.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-10-11
Geoff Langdale, Daniel Lemire

Abstract JavaScript Object Notation or JSON is a ubiquitous data exchange format on the web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible. Despite the maturity of the problem of JSON parsing, we show that substantial speedups are possible. We present the first standard-compliant JSON parser to process gigabytes of data per second on a single core, using commodity processors. We can use a quarter or fewer instructions than a state-of-the-art reference parser like RapidJSON. Unlike other validating parsers, our software (simdjson) makes extensive use of single instruction and multiple data instructions. To ensure reproducibility, simdjson is freely available as open-source software under a liberal license.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-10-08
Runhui Wang, Sibo Wang, Xiaofang Zhou

Abstract Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR) $$\pi (s,t)$$ measures the importance of node t with respect to node s. In this work, we study the single-source PPR query, which takes a source node s as input and outputs the PPR values of all nodes in G with respect to s. The single-source PPR query finds many important applications, e.g., community detection and recommendation. Deriving the exact answers for single-source PPR queries is prohibitive, so most existing work focuses on approximate solutions. Nevertheless, existing approximate solutions are still inefficient, and it is challenging to compute single-source PPR queries efficiently for online applications. This motivates us to devise efficient parallel algorithms running on shared-memory multi-core systems. In this work, we present how to efficiently parallelize the state-of-the-art index-based solution FORA, and theoretically analyze the complexity of the parallel algorithms. Theoretically, we prove that our proposed algorithm achieves a time complexity of $$O(W/P+\log ^2{n})$$, where W is the time complexity of sequential FORA algorithm, P is the number of processors used, and n is the number of nodes in the graph. FORA includes a forward push phase and a random walk phase, and we present optimization techniques to both phases, including effective maintenance of active nodes, improving the efficiency of memory access, and cache-aware scheduling. Extensive experimental evaluation demonstrates that our solution achieves up to 37$$\times$$ speedup on 40 cores and 3.3$$\times$$ faster than alternatives on 40 cores. Moreover, the forward push alone can be used for local graph clustering, and our parallel algorithm for forward push is 4.8$$\times$$ faster than existing parallel alternatives.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-19
Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, Wolfgang Lehner

Abstract The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. In prior work, we have proposed general dynamic Yannakakis (GDyn), a general framework for dynamically processing acyclic conjunctive queries with $$\theta$$-joins in the presence of data updates. Whereas traditional approaches face a trade-off between materialization of subresults (to avoid inefficient recomputation) and recomputation of subresults (to avoid the potentially large space overhead of materialization), GDyn is able to avoid this trade-off. It intelligently maintains a succinct data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In this paper, we consolidate and extend the development of GDyn. First, we give full formal proof of GDyn ’s correctness and complexity. Second, we present a novel algorithm for computing GDyn query plans. Finally, we instantiate GDyn to the case where all $$\theta$$-joins are inequalities and present extended experimental comparison against state-of-the-art engines. Our approach performs consistently better than the competitor systems with multiple orders of magnitude improvements in both time and memory consumption.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-19
Dingming Wu, Hao Zhou, Jieming Shi, Nikos Mamoulis

Abstract RDF data are traditionally accessed using structured query languages, such as SPARQL. However, this requires users to understand the language as well as the RDF schema. Keyword search on RDF data aims at relieving users from these requirements; users only input a set of keywords, and the goal is to find small RDF subgraphs that contain all keywords. At the same time, popular RDF knowledge bases also include spatial and temporal semantics, which opens the road to spatiotemporal-based search operations. In this work, we propose and study novel keyword-based search queries with spatial semantics on RDF data, namely kSP queries. The objective of the kSP query is to find RDF subgraphs which contain the query keywords and are rooted at spatial entities close to the query location. To add temporal semantics to the kSP query, we propose the kSPT query that uses two ways to incorporate temporal information. One way is considering the temporal differences between the keyword-matched vertices and the query timestamp. The other way is using a temporal range to filter keyword-matched vertices. The novelty of kSP and kSPT queries is that they are spatiotemporal-aware and that they do not rely on the use of structured query languages. We design an efficient approach containing two pruning techniques and a data preprocessing technique for the processing of kSP queries. The proposed approach is extended and improved with four optimizations to evaluate kSPT queries. Extensive empirical studies on two real datasets demonstrate the superior and robust performance of our proposals compared to baseline methods.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-19
Xuedi Qin, Yuyu Luo, Nan Tang, Guoliang Li

Data visualization is crucial in today’s data-driven business world, which has been widely used for helping decision making that is closely related to major revenues of many industrial companies. However, due to the high demand of data processing w.r.t. the volume, velocity, and veracity of data, there is an emerging need for database experts to help for efficient and effective data visualization. In response to this demand, this article surveys techniques that make data visualization more efficient and effective. (1) Visualization specifications define how the users can specify their requirements for generating visualizations. (2) Efficient approaches for data visualization process the data and a given visualization specification, which then produce visualizations with the primary target to be efficient and scalable at an interactive speed. (3) Data visualization recommendation is to auto-complete an incomplete specification, or to discover more interesting visualizations based on a reference visualization.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-15
Matthaios Olma, Manos Karpathiotakis, Ioannis Alagiannis, Manos Athanassoulis, Anastasia Ailamaki

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-13
Protiva Rahman, Lilong Jiang, Arnab Nandi

Abstract Interactive query interfaces have become a popular tool for ad hoc data analysis and exploration. Compared with traditional systems that are optimized for throughput or batched performance, these systems focus more on user-centric interactivity. This poses a new class of performance challenges to the backend, which are further exacerbated by the advent of new interaction modes (e.g., touch, gesture) and query interface paradigms (e.g., sliders, maps). There is, thus, a need to clearly articulate the evaluation space for interactive systems. In this paper, we extensively survey the literature to guide the development and evaluation of interactive data systems. We highlight unique characteristics of interactive workloads, discuss confounding factors when conducting user studies, and catalog popular metrics for evaluation. We further delineate certain behaviors not captured by these metrics and propose complementary ones to provide a complete picture of interactivity. We demonstrate how to analyze and employ user behavior for system enhancements through three case studies. Our survey and case studies motivate the need for behavior-driven evaluation and optimizations when building interactive interfaces.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-11
Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, Xuemin Lin

In the original article, the Table 1 was published with incorrect figures. The correct Table 1 is given below

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-08
Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro

Abstract Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes of constraints. In this paper, we develop a general chase-based repairing framework, referred to as Llunatic, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values. The framework is based on an elegant formalization in terms of labeled instances and partially ordered preference labels. In this context, we revisit concepts such as upgrades, repairs and the chase. In Llunatic, various repairing strategies can be slotted in, without the need for changing the underlying implementation. Furthermore, Llunatic is the first data repairing system which is DBMS-based. We report experimental results that confirm its good scalability and show that various instantiations of the framework result in repairs of good quality.

更新日期：2020-01-06
• VLDB J. (IF 1.973) Pub Date : 2019-11-04
Fragkiskos D. Malliaros, Christos Giatsidis, Apostolos N. Papadopoulos, Michalis Vazirgiannis

Abstract The core decomposition of networks has attracted significant attention due to its numerous applications in real-life problems. Simply stated, the core decomposition of a network (graph) assigns to each graph node v, an integer number c(v) (the core number), capturing how well v is connected with respect to its neighbors. This concept is strongly related to the concept of graph degeneracy, which has a long history in graph theory. Although the core decomposition concept is extremely simple, there is an enormous interest in the topic from diverse application domains, mainly because it can be used to analyze a network in a simple and concise manner by quantifying the significance of graph nodes. Therefore, there exists a respectable number of research works that either propose efficient algorithmic techniques under different settings and graph types or apply the concept to another problem or scientific area. Based on this large interest in the topic, in this survey, we perform an in-depth discussion of core decomposition, focusing mainly on: (i) the basic theory and fundamental concepts, (ii) the algorithmic techniques proposed for computing it efficiently under different settings, and (iii) the applications that can benefit significantly from it.

更新日期：2020-01-06
• Acta Inform. (IF 1.042) Pub Date : 2019-12-20
Nahal Mirzaie, Fathiyeh Faghih, Swen Jacobs, Borzoo Bonakdarpour

Self-stabilization in distributed systems is a technique to guarantee convergence to a set of legitimate states without external intervention when a transient fault or bad initialization occurs. Recently, there has been a surge of efforts in designing techniques for automated synthesis of self-stabilizing algorithms that are correct by construction. Most of these techniques, however, are not parameterized, meaning that they can only synthesize a solution for a fixed and predetermined number of processes. In this paper, we report a breakthrough in parameterized synthesis of self-stabilizing algorithms in symmetric networks, including ring, line, mesh, and torus. First, we develop cutoffs that guarantee (1) closure in legitimate states, and (2) deadlock-freedom outside the legitimate states. We also develop a sufficient condition for convergence in self-stabilizing systems. Since some of our cutoffs grow with the size of the local state space of processes, scalability of the synthesis procedure is still a problem. We address this problem by introducing a novel SMT-based technique for counterexample-guided synthesis of self-stabilizing algorithms in symmetric networks. We have fully implemented our technique and successfully synthesized solutions to maximal matching, three coloring, and maximal independent set problems for ring and line topologies.

更新日期：2020-01-04
• Acta Inform. (IF 1.042) Pub Date : 2019-12-09
Joost Engelfriet, Kazuhiro Inaba, Sebastian Maneth

Abstract Compositions of tree-walking tree transducers form a hierarchy with respect to the number of transducers in the composition. As main technical result it is proved that any such composition can be realized as a linear-bounded composition, which means that the sizes of the intermediate results can be chosen to be at most linear in the size of the output tree. This has consequences for the expressiveness and complexity of the translations in the hierarchy. First, if the computed translation is a function of linear size increase, i.e., the size of the output tree is at most linear in the size of the input tree, then it can be realized by just one, deterministic, tree-walking tree transducer. For compositions of deterministic transducers it is decidable whether or not the translation is of linear size increase. Second, every composition of deterministic transducers can be computed in deterministic linear time on a RAM and in deterministic linear space on a Turing machine, measured in the sum of the sizes of the input and output tree. Similarly, every composition of nondeterministic transducers can be computed in simultaneous polynomial time and linear space on a nondeterministic Turing machine. Their output tree languages are deterministic context-sensitive, i.e., can be recognized in deterministic linear space on a Turing machine. The membership problem for compositions of nondeterministic translations is nondeterministic polynomial time and deterministic linear space. All the above results also hold for compositions of macro tree transducers. The membership problem for the composition of a nondeterministic and a deterministic tree-walking tree translation (for a nondeterministic IO macro tree translation) is log-space reducible to a context-free language, whereas the membership problem for the composition of a deterministic and a nondeterministic tree-walking tree translation (for a nondeterministic OI macro tree translation) is possibly NP-complete.

更新日期：2020-01-04
• Acta Inform. (IF 1.042) Pub Date : 2019-12-07
Bernd Finkbeiner, Christopher Hahn, Philip Lukert, Marvin Stenger, Leander Tentrup

Abstract We study the reactive synthesis problem for hyperproperties given as formulas of the temporal logic HyperLTL. Hyperproperties generalize trace properties, i.e., sets of traces, to sets of sets of traces. Typical examples are information-flow policies like noninterference, which stipulate that no sensitive data must leak into the public domain. Such properties cannot be expressed in standard linear or branching-time temporal logics like LTL, CTL, or $$\hbox {CTL}^*$$. Furthermore, HyperLTL subsumes many classical extensions of the LTL realizability problem, including realizability under incomplete information, distributed synthesis, and fault-tolerant synthesis. We show that, while the synthesis problem is undecidable for full HyperLTL, it remains decidable for the $$\exists ^*$$, $$\exists ^*\forall ^1$$, and the $${{ linear }}\;\forall ^*$$ fragments. Beyond these fragments, the synthesis problem immediately becomes undecidable. For universal HyperLTL, we present a semi-decision procedure that constructs implementations and counterexamples up to a given bound. We report encouraging experimental results obtained with a prototype implementation on example specifications with hyperproperties like symmetric responses, secrecy, and information flow.

更新日期：2020-01-04
• Acta Inform. (IF 1.042) Pub Date : 2019-12-06
Alessandro Abate, Iury Bessa, Lucas Cordeiro, Cristina David, Pascal Kesseli, Daniel Kroening, Elizabeth Polgreen

We present a sound and automated approach to synthesizing safe, digital controllers for physical plants represented as time-invariant models. Models are linear differential equations with inputs, evolving over a continuous state space. The synthesis precisely accounts for the effects of finite-precision arithmetic introduced by the controller. The approach uses counterexample-guided inductive synthesis: an inductive generalization phase produces a controller that is known to stabilize the model but that may not be safe for all initial conditions of the model. Safety is then verified via bounded model checking: if the verification step fails, a counterexample is provided to the inductive generalization, and the process further iterates until a safe controller is obtained. We demonstrate the practical value of this approach by automatically synthesizing safe controllers for physical plant models from the digital control literature.

更新日期：2020-01-04
• Acta Inform. (IF 1.042) Pub Date : 2019-12-05
Jörg Endrullis, Jan Willem Klop, Rena Bakhshi

Abstract Although finite state transducers are very natural and simple devices, surprisingly little is known about the transducibility relation they induce on streams (infinite words). We collect some intriguing problems that have been unsolved since several years. The transducibility relation arising from finite state transduction induces a partial order of stream degrees, which we call Transducer degrees, analogous to the well-known Turing degrees or degrees of unsolvability. We show that there are pairs of degrees without supremum and without infimum. The former result is somewhat surprising since every finite set of degrees has a supremum if we strengthen the machine model to Turing machines, but also if we weaken it to Mealy machines.

更新日期：2020-01-04
• Acta Inform. (IF 1.042) Pub Date : 2019-12-05
Elizabeth Firman, Shahar Maoz, Jan Oliver Ringert

Reactive synthesis for the GR(1) fragment of LTL has been implemented and studied in many works. In this work we present and evaluate a list of heuristics to potentially reduce running times for GR(1) synthesis and related algorithms. The list includes several heuristics for controlled predecessor computation and BDDs, early detection of fixed-points and unrealizability, fixed-point recycling, and several heuristics for unrealizable core computations. We have implemented the heuristics and integrated them in our synthesis environment Spectra Tools, a set of tools for writing specifications and running synthesis and related analyses. We evaluate the presented heuristics on SYNTECH15, a total of 78 specifications of 6 autonomous Lego robots, on SYNTECH17, a total of 149 specifications of 5 autonomous Lego robots, all written by 3rd year undergraduate computer science students in two project classes we have taught, as well as on benchmarks from the literature. The evaluation investigates not only the potential of the suggested heuristics to improve computation times, but also the difference between existing benchmarks and the robot’s specifications in terms of the effectiveness of the heuristics. Our evaluation shows positive results for the application of all the heuristics together, which get more significant for specifications with slower original running times. It also shows differences in effectiveness when applied to different sets of specifications. Furthermore, a comparison between Spectra, with all the presented heuristics, and two existing tools, RATSY and Slugs, over two well-known benchmarks, shows that Spectra outperforms both on most of the specifications; the larger the specification, the faster Spectra becomes relative to the two other tools.

更新日期：2020-01-04
• Acta Inform. (IF 1.042) Pub Date : 2019-12-02
Arnab Bhattacharyya, Ashutosh Gupta, Lakshmanan Kuppusamy, Somya Mani, Ankit Shukla, Mandayam Srivas, Mukund Thattai

Abstract Vesicle traffic systems (VTSs) transport cargo among the intracellular compartments of eukaryotic cells. The compartments are viewed as nodes that are labeled by their chemical identity and the transport vesicles are similarly viewed as labeled edges between the nodes. Several interesting questions about VTSs translate to combinatorial search and synthesis problems. We present novel encodings for the problems based on Boolean satisfiability (SAT), satisfiability modulo theories and quantified Boolean formula of the properties over vesicle traffic systems. We have implemented the presented encodings in a tool that searches for the networks that satisfy properties related to transport consistency conditions using these solvers. In our numerical experiments, we show that our tool can search for networks of sizes that are relevant to real cellular systems. Our work illustrates the potential of novel biological applications of SAT solving technology.

更新日期：2020-01-04
• Acta Inform. (IF 1.042) Pub Date : 2019-12-02
Maciej Gazda, Wan Fokkink, Vittorio Massaro

Abstract A basic sanity property of a process semantics is that it constitutes a congruence with respect to standard process operators. This issue has been traditionally addressed by developing, for a specific process semantics, a syntactic format for operational semantics specifications. We suggest a novel, orthogonal approach, which focuses on a specific process operator and determines a class of congruence relations for this operator. To this end, we impose syntactic restrictions on Hennessy–Milner logic, so that a process semantics whose modal characterization satisfies those criteria is guaranteed to be a congruence with respect to the operator in question. We investigate alternative composition, action prefix, projection, encapsulation, renaming, and parallel composition with communication, in the context of both concrete and weak process semantics.

更新日期：2020-01-04
Contents have been reproduced by permission of the publishers.

down
wechat
bug