-
Rejoinder: Fitting a folded normal distribution without EM Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Iain L. MacDonald
Iain L. MacDonald. Source: Annals of Applied Statistics, Volume 14, Number 4, 2101--2101.
-
Response to ‘Fitting a folded normal distribution without EM’ Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Sungkyu Jung; Mark Foskey; J. S. Marron
Sungkyu Jung, Mark Foskey, J. S. Marron. Source: Annals of Applied Statistics, Volume 14, Number 4, 2099--2100.
-
Letter to the Editor: Fitting a folded normal distribution without EM Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Iain L. MacDonald
The problem of fitting a folded normal distribution by maximum likelihood has been described as ‘not straightforward’, and alternatives such as EM proposed. We suggest here that it is in fact straightforward to fit such a distribution by direct numerical maximization of the likelihood. We demonstrate this in an example. The relevant R code is included.
-
Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Patrick M. Schnell; Georgia Papadogeorgou
Confounding by unmeasured spatial variables has received some attention in the spatial statistics and causal inference literatures, but concepts and approaches have remained largely separated. In this paper we aim to bridge these distinct strands of statistics by considering unmeasured spatial confounding within a causal inference framework and estimating effects using outcome regression tools popular
-
Region-referenced spectral power dynamics of EEG signals: A hierarchical modeling approach Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Qian Li; John Shamshoian; Damla Şentürk; Catherine Sugar; Shafali Jeste; Charlotte DiStefano; Donatello Telesca
Functional brain imaging through electroencephalography (EEG) relies upon the analysis and interpretation of high-dimensional, spatially organized time series. We propose to represent time-localized frequency domain characterizations of EEG data as region-referenced functional data. This representation is coupled with a hierarchical regression modeling approach to multivariate functional observations
-
Modelling the sound production of narwhals using a point process framework with memory effects Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Aleksander Søltoft-Jensen; Mads Peter Heide-Jørgensen; Susanne Ditlevsen
Obtaining an adequate description of the behaviour of narwhals in a pristine environment is important to understand natural behaviour as well as providing the means to determine potential changes in behaviour directly or indirectly caused by human activity. Based on $\text{Acousonde}^{\text{TM}}$ data from five narwhals in Scoresby Sound, this paper aims at modelling buzzing and calling rates of East
-
Structured discrepancy in Bayesian model calibration for ChemCam on the Mars Curiosity rover Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 K. Sham Bhat; Kary Myers; Earl Lawrence; James Colgan; Elizabeth Judge
The Mars rover Curiosity carries an instrument called ChemCam to determine the composition of the soil and rocks via laser-induced breakdown spectroscopy (LIBS). Los Alamos National Laboratory has developed a simulation capability that can predict spectra from ChemCam, but there are major-scale differences between the prediction and observation. This presents a challenge when using Bayesian model calibration
-
Nonparametric Bayesian multiarmed bandits for single-cell experiment design Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Federico Camerlenghi; Bianca Dumitrascu; Federico Ferrari; Barbara E. Engelhardt; Stefano Favaro
The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper we introduce a simple, computationally efficient and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large-scale experiment for the collection of scRNA-seq
-
Hawkes binomial topic model with applications to coupled conflict-Twitter data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 George Mohler; Erin McGrath; Cody Buntain; Gary LaFree
We consider the problem of modeling and clustering heterogeneous event data arising from coupled conflict event and social media data sets. In this setting conflict events trigger responses on social media, and, at the same time, signals of grievance detected in social media may serve as leading indicators for subsequent conflict events. For this purpose we introduce the Hawkes Binomial Topic Model
-
Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Naomi E. Hannaford; Sarah E. Heaps; Tom M. W. Nye; Tom A. Williams; T. Martin Embley
Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree’s root
-
Integrative statistical methods for exposure mixtures and health Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Brian J. Reich; Yawen Guan; Denis Fourches; Joshua L. Warren; Stefanie E. Sarnat; Howard H. Chang
Humans are concurrently exposed to chemically, structurally and toxicologically diverse chemicals. A critical challenge for environmental epidemiology is to quantify the risk of adverse health outcomes resulting from exposures to such chemical mixtures and to identify which mixture constituents may be driving etiologic associations. A variety of statistical methods have been proposed to address these
-
Bayesian inference for multistrain epidemics with application to ESCHERICHIA COLI O157:H7 in feedlot cattle Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Panayiota Touloupou; Bärbel Finkenstädt; Thomas E. Besser; Nigel P. French; Simon E. F. Spencer
For most pathogens, testing procedures can be used to distinguish between different strains with which individuals are infected. Due to the growing availability of such data, multistrain models have increased in popularity over the past few years. Quantifying the interactions between different strains of a pathogen is crucial in order to obtain a more complete understanding of the transmission process
-
Bayesian profiling multiple imputation for missing hemoglobin values in electronic health records Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Yajuan Si; Mari Palta; Maureen Smith
Electronic health records (EHRs) are increasingly used for clinical and comparative effectiveness research but suffer from missing data. Motivated by health services research on diabetes care, we seek to increase the quality of EHRs by focusing on missing values of longitudinal glycosylated hemoglobin (A1c), a key risk factor for diabetes complications and adverse events. Under the framework of multiple
-
A Bayesian time-varying effect model for behavioral mHealth data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Matthew D. Koslovsky; Emily T. Hébert; Michael S. Businelle; Marina Vannucci
The integration of mobile health (mHealth) devices into behavioral health research has fundamentally changed the way researchers and interventionalists are able to collect data as well as deploy and evaluate intervention strategies. In these studies, researchers often collect intensive longitudinal data (ILD) using ecological momentary assessment methods which aim to capture psychological, emotional
-
RNDClone: Tumor subclone reconstruction based on integrating DNA and RNA sequence data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Tianjian Zhou; Subhajit Sengupta; Peter Müller; Yuan Ji
Tumor cell population consists of genetically heterogeneous subpopulations, known as subclones. Bulk sequencing data using high-throughput sequencing technology provide total and variant DNA and RNA read counts for many nucleotide loci as a mixture of signals from different subclones. We present RNDClone as a tool to deconvolute the mixture and reconstruct the subclones with distinct DNA genotypes
-
Mixture of hidden Markov models for accelerometer data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Marie Du Roy de Chaumaray; Matthieu Marbac; Fabien Navarro
Motivated by the analysis of accelerometer data taken across a population of individuals, we introduce a specific finite mixture of hidden Markov models with particular characteristics that adapt well to the specific nature of this type of longitudinal data. Our model allows for the computation of statistics that characterize the physical activity of a subject (e.g., the mean time spent at different
-
The statistical performance of matching-adjusted indirect comparisons: Estimating treatment effects with aggregate external control data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 David Cheng; Rajeev Ayyagari; James Signorovitch
Indirect comparisons of treatment-specific outcomes across separate studies often inform decision making in the absence of head-to-head randomized comparisons. Differences in baseline characteristics between study populations may introduce confounding bias in such comparisons. Matching-adjusted indirect comparison (MAIC) (Pharmacoeconomics 28 (2010) 935–945) has been used to adjust for differences
-
A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Trambak Banerjee; Bhaswar B. Bhattacharya; Gourab Mukherjee
An important problem in contemporary immunology studies based on single-cell protein expression data is to determine whether cellular expressions are remodeled postinfection by a pathogen. One natural approach for detecting such changes is to use nonparametric two-sample statistical tests. However, in single-cell studies direct application of these tests is often inadequate, because single-cell level
-
Effective model calibration via sensible variable identification and adjustment with application to composite fuselage simulation Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Yan Wang; Xiaowei Yue; Rui Tuo; Jeffrey H. Hunt; Jianjun Shi
Estimation of model parameters of computer simulators, also known as calibration, is an important topic in many engineering applications. In this paper we consider the calibration of computer model parameters with the help of engineering design knowledge. We introduce the concept of sensible (calibration) variables. Sensible variables are model parameters, which are sensitive in the engineering modeling
-
Identifying main effects and interactions among exposures using Gaussian processes Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Federico Ferrari; David B. Dunson
This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instead on selection of main effects and interactions. For interpretability we decompose the expected health outcome into a linear main effect, pairwise
-
Classification from only positive and unlabeled functional data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Yoshikazu Terada; Issei Ogasawara; Ken Nakata
In various fields, data recorded continuously during a time interval and curve data, such as spectral data, become common. These kinds of data can be interpreted as functional data. In this paper we have studied binary classification from only positive and unlabeled functional data (PU classification for functional data). Our first contribution is to present a simple classification algorithm for this
-
Mining events with declassified diplomatic documents Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Yuanjun Gao; Jack Goetz; Matthew Connelly; Rahul Mazumder
Since 1973, the U.S. State Department has been using electronic record systems to preserve classified communications. Recently, approximately 1.9 million of these records from 1973–77 have been made available by the U.S. National Archives. While some of these communication streams have periods witnessing an acceleration in the rate of transmission, others do not show any notable patterns in communication
-
Feature selection for data integration with mixed multiview data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Yulia Baker; Tiffany M. Tang; Genevera I. Allen
Data integration methods that analyze multiple sources of data simultaneously can often provide more holistic insights than can separate inquiries of each data source. Motivated by the advantages of data integration in the era of “big data,” we investigate feature selection for high-dimensional multiview data with mixed data types (e.g., continuous, binary, count-valued). This heterogeneity of multiview
-
Data fusion model for speciated nitrogen to identify environmental drivers and improve estimation of nitrogen in lakes Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Erin M. Schliep; Sarah M. Collins; Shirley Rojas-Salazar; Noah R. Lottig; Emily H. Stanley
Concentrations of nitrogen provide a critical metric for understanding ecosystem function and water quality in lakes. However, varying approaches for quantifying nitrogen concentrations may bias the comparison of water quality across lakes and regions. Different measurements of total nitrogen exist based on its composition (e.g., organic versus inorganic, dissolved versus particulate), which we refer
-
Monotonic effects of characteristics on returns Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Jared D. Fisher; David W. Puelz; Carlos M. Carvalho
This paper considers the problem of modeling a firm’s expected return as a nonlinear function of its observable characteristics. We investigate whether theoretically-motivated monotonicity constraints on characteristics and nonstationarity of the conditional expectation function provide statistical and economic benefit. We present an interpretable model that has similar out-of-sample performance to
-
Measuring timeliness of annual reports filing by jump additive models Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Yicheng Kang
Foreign public issuers (FPIs) are required by the Securities and Exchanges Commission (SEC) to file Form 20-F as comprehensive annual reports. In an effort to increase the usefulness of 20-Fs, the SEC recently enacted a regulation to accelerate the deadline of 20-F filing from six months to four months after the fiscal year-end. The rationale is that the shortened reporting lag would improve the informational
-
Hierarchical multidimensional scaling for the comparison of musical performance styles Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-12-19 Anna K. Yanchenko; Peter D. Hoff
Quantification of stylistic differences between musical artists is of academic interest to the music community and is also useful for other applications, such as music information retrieval and recommendation systems. Information about stylistic differences can be obtained by comparing the performances of different artists across common musical pieces. In this article we develop a statistical methodology
-
Inferring a consensus problem list using penalized multistage models for ordered data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Philip S. Boonstra; John C. Krauss
A patient’s medical problem list describes his or her current health status and aids in the coordination and transfer of care between providers. Because a problem list is generated once and then subsequently modified or updated, what is not usually observable is the provider-effect. That is, to what extent does a patient’s problem in the electronic medical record actually reflect a consensus communication
-
Log-contrast regression with functional compositional predictors: Linking preterm infants’ gut microbiome trajectories to neurobehavioral outcome Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Zhe Sun; Wanli Xu; Xiaomei Cong; Gen Li; Kun Chen
The neonatal intensive care unit (NICU) experience is known to be one of the most crucial factors that drive preterm infants’ neurodevelopmental and health outcome. It is hypothesized that stressful early life experience of very preterm neonate is imprinting gut microbiome by the regulation of the so-called brain-gut axis, and, consequently, certain microbiome markers are predictive of later infant
-
Identifying overlapping terrorist cells from the Noordin Top actor–event network Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Saverio Ranciati; Veronica Vinciotti; Ernst C. Wit
Actor–event data are common in sociological settings, whereby one registers the pattern of attendance of a group of social actors to a number of events. We focus on 79 members of the Noordin Top terrorist network, who were monitored attending 45 events. The attendance or nonattendance of the terrorist to events defines the social fabric, such as group coherence and social communities. The aim of the
-
Adaptive log-linear zero-inflated generalized Poisson autoregressive model with applications to crime counts Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Xiaofei Xu; Ying Chen; Cathy W. S. Chen; Xiancheng Lin
This research proposes a comprehensive ALG model (Adaptive Log-linear zero-inflated Generalized Poisson integer-valued GARCH) to describe the dynamics of integer-valued time series of crime incidents with the features of autocorrelation, heteroscedasticity, overdispersion and excessive number of zero observations. The proposed ALG model captures time-varying nonlinear dependence and simultaneously
-
A Bayesian model of microbiome data for simultaneous identification of covariate associations and prediction of phenotypic outcomes Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Matthew D. Koslovsky; Kristi L. Hoffman; Carrie R. Daniel; Marina Vannucci
One of the major research questions regarding human microbiome studies is the feasibility of designing interventions that modulate the composition of the microbiome to promote health and to cure disease. This requires extensive understanding of the modulating factors of the microbiome, such as dietary intake, as well as the relation between microbial composition and phenotypic outcomes, such as body
-
A Bayesian hierarchical model for evaluating forensic footwear evidence Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Neil A. Spencer; Jared S. Murray
When a latent shoeprint is discovered at a crime scene, forensic analysts inspect it for distinctive patterns of wear such as scratches and holes (known as accidentals) on the source shoe’s sole. If its accidentals correspond to those of a suspect’s shoe, the print can be used as forensic evidence to place the suspect at the crime scene. The strength of this evidence depends on the random match probability—the
-
Causal inference from observational studies with clustered interference, with application to a cholera vaccine study Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Brian G. Barkley; Michael G. Hudgens; John D. Clemens; Mohammad Ali; Michael E. Emch
Understanding the population-level effects of vaccines has important public health policy implications. Inferring vaccine effects from an observational study is challenging because participants are not randomized to vaccine (i.e., treatment). Observational studies of infectious diseases present the additional challenge that vaccinating one participant may affect another participant’s outcome, that
-
Doubly robust treatment effect estimation with missing attributes Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Imke Mayer; Erik Sverdrup; Tobias Gauss; Jean-Denis Moyer; Stefan Wager; Julie Josse
Missing attributes are ubiquitous in causal inference, as they are in most applied statistical work. In this paper we consider various sets of assumptions under which causal inference is possible despite missing attributes and discuss corresponding approaches to average treatment effect estimation, including generalized propensity score methods and multiple imputation. Across an extensive simulation
-
Quantifying time-varying sources in magnetoencephalography—A discrete approach Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Zhigang Yao; Zengyan Fan; Masahito Hayashi; William F. Eddy
We study the distribution of brain source from the most advanced brain imaging technique, Magnetoencephalography (MEG) which measures the magnetic fields outside of the human head produced by the electrical activity inside the brain. Common time-varying source localization methods assume the source current with a time-varying structure and solve the MEG inverse problem by mainly estimating the source
-
Spatiotemporal probabilistic wind vector forecasting over Saudi Arabia Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Amanda Lenzi; Marc G. Genton
Saudi Arabia has recently begun promoting renewable energy as a potential alternative to fossil fuels for domestic power generation. In order to efficiently connect wind energy to the existing power grids, reliable wind forecasts and an accurate way of quantifying the uncertainties of these forecasts are required. Motivated by a data set of hourly wind speeds from 28 stations in Saudi Arabia, we build
-
Climate extreme event attribution using multivariate peaks-over-thresholds modeling and counterfactual theory Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Anna Kiriliouk; Philippe Naveau
Numerical climate models are complex and combine a large number of physical processes. They are key tools in quantifying the relative contribution of potential anthropogenic causes (e.g., the current increase in greenhouse gases) on high-impact atmospheric variables like heavy rainfall. These so-called climate extreme event attribution problems are particularly challenging in a multivariate context
-
The Jensen effect and functional single index models: Estimating the ecological implications of nonlinear reaction norms Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Zi Ye; Giles Hooker; Stephen P. Ellner
This paper develops tools to characterize how species are affected by environmental variability, based on a functional single index model relating a response such as growth rate to environmental conditions. In ecology the curvature of such responses are used, via Jensen’s inequality, to determine whether environmental variability is harmful or beneficial, and differing nonlinear responses to environmental
-
PTEM: A popularity-based topical expertise model for community question answering Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Hohyun Jung; Jae-Gil Lee; Namgil Lee; Sung-Ho Kim
Community Question Answering (CQA) websites are widely used in sharing knowledge, where users can ask questions, reply answers and evaluate answers. So far, the evaluation of answers has been explained by the contents of answers through the investigation of users’ topics of interest and expertise levels. In this paper we focus on modeling the user’s evaluation behavior, in that users can see the answerer’s
-
Does terrorism trigger online hate speech? On the association of events and time series Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Erik Scharwächter; Emmanuel Müller
Hate speech is ubiquitous on the Web. Recently, the offline causes that contribute to online hate speech have received increasing attention. A recurring question is whether the occurrence of extreme events offline systematically triggers bursts of hate speech online, indicated by peaks in the volume of hateful social media posts. Formally, this question translates into measuring the association between
-
A novel change-point approach for the detection of gas emission sources using remotely contained concentration data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Idris Eckley; Claudia Kirch; Silke Weber
Motivated by an example from remote sensing of gas emission sources, we derive two novel change-point procedures for multivariate time series where, in contrast to classical change-point literature, the changes are not required to be aligned in the different components of the time series. Instead, the change points are described by a functional relationship where the precise shape depends on unknown
-
A semiparametric mixture method for local false discovery rate estimation from multiple studies Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Seok-Oh Jeong; Dongseok Choi; Woncheol Jang
Antineutrophil cytoplasmic antibody associated vasculitis (AAV) is extremely heterogeneous in clinical presentation and involves multiple organ systems. While the clinical presentation of AAV is diverse, we hypothesized that all AAV share common pathways and tested the hypothesis based on three different microarray studies of peripheral leukocytes, sinus and orbital inflammation disease. For the hypothesis
-
Size estimation of key populations in the HIV epidemic in eSwatini using incomplete and misaligned capture-recapture data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Abhirup Datta; Andrew Pita; Amrita Rao; Bhekie Sithole; Zandile Mnisi; Stefan Baral
In 2020, our understanding of the distributions of HIV risks in the most burdened settings, including eSwatini, remains limited. In part, this is driven by the limited availability of the size and burden of the populations at the greatest risk for HIV. Given pervasive social and healthcare stigmas, the size estimations of these populations often rely on the multiplier method—a variant of the capture-recapture
-
Active matrix factorization for surveys Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Chelsea Zhang; Sean J. Taylor; Curtiss Cobb; Jasjeet Sekhon
Amid historically low response rates, survey researchers seek ways to reduce respondent burden while measuring desired concepts with precision. We propose to ask fewer questions of respondents and impute missing responses via probabilistic matrix factorization. A variance-minimizing active learning criterion chooses the most informative questions per respondent. In simulations of our matrix sampling
-
Optimal EMG placement for a robotic prosthesis controller with sequential, adaptive functional estimation (SAFE) Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Jonathan Stallrich; Md Nazmul Islam; Ana-Maria Staicu; Dustin Crouch; Lizhi Pan; He Huang
Robotic hand prostheses require a controller to decode muscle contraction information, such as electromyogram (EMG) signals, into the user’s desired hand movement. State-of-the-art decoders demand extensive training, require data from a large number of EMG sensors and are prone to poor predictions. Biomechanical models of a single movement degree-of-freedom tell us that relatively few muscles, and
-
Statistical methods for analysis of combined categorical biomarker data from multiple studies Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Chao Cheng; Molin Wang
In the analysis of pooled data from multiple studies involving a biomarker exposure, the biomarker measurements can vary across laboratories and usually require calibration to a reference assay prior to pooling. Previous researches consider the measurements from a reference laboratory as the gold standard, even though measurements in the reference laboratory are not necessarily closer to the underlying
-
Markov decision processes with dynamic transition probabilities: An analysis of shooting strategies in basketball Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Nathan Sandholtz; Luke Bornn
In this paper we model basketball plays as episodes from team-specific nonstationary Markov decision processes (MDPs) with shot clock dependent transition probabilities. Bayesian hierarchical models are employed in the modeling and parametrization of the transition probabilities to borrow strength across players and through time. To enable computational feasibility, we combine lineup-specific MDPs
-
Efficiency in lung transplant allocation strategies Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Jingjing Zou; David J. Lederer; Daniel Rabinowitz
Currently in the United States, lung transplantations are allocated to candidates according to each candidate’s lung allocation score (LAS). The LAS is an ad hoc ranking system for patients’ priorities of transplantation. The goal of this study is to develop a framework for improving patients’ life expectancies over the LAS based on a comprehensive modeling of the lung transplantation waiting list
-
Statistical methods for replicability assessment Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-09-18 Kenneth Hung; William Fithian
Large-scale replication studies like the Reproducibility Project: Psychology (RP:P) provide invaluable systematic data on scientific replicability, but most analyses and interpretations of the data fail to agree on the definition of “replicability” and disentangle the inexorable consequences of known selection bias from competing explanations. We discuss three concrete definitions of replicability
-
Accounting for dependent errors in predictors and time-to-event outcomes using electronic health records, validation samples and multiple imputation Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Mark J. Giganti; Pamela A. Shaw; Guanhua Chen; Sally S. Bebawy; Megan M. Turner; Timothy R. Sterling; Bryan E. Shepherd
Data from electronic health records (EHR) are prone to errors which are often correlated across multiple variables. The error structure is further complicated when analysis variables are derived as functions of two or more error-prone variables. Such errors can substantially impact estimates, yet we are unaware of methods that simultaneously account for errors in covariates and time-to-event outcomes
-
Analyses of preventive care measures with incomplete historical data in electronic medical records: An example from colorectal cancer screening Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Yingye Zheng; Douglas A. Corley; Chyke Doubeni; Ethan Halm; Susan M. Shortreed; William E. Barlow; Ann Zauber; Tor Devin Tosteson; Jessica Chubak
The calculation of quality of care measures based on electronic medical records (EMRs) may be inaccurate because of incomplete capture of past services. We evaluate the influence of different statistical approaches for calculating the proportion of patients who are up-to-date for a preventive service, using the example of colorectal cancer (CRC) screening. We propose an extension of traditional mixture
-
A random effects stochastic block model for joint community detection in multiple networks with applications to neuroimaging Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Subhadeep Paul; Yuguo Chen
To analyze data from multisubject experiments in neuroimaging studies, we develop a modeling framework for joint community detection in a group of related networks that can be considered as a sample from a population of networks. The proposed random effects stochastic block model facilitates the study of group differences and subject-specific variations in the community structure. The model proposes
-
Early identification of an impending rockslide location via a spatially-aided Gaussian mixture model Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Shuo Zhou; Howard Bondell; Antoinette Tordesillas; Benjamin I. P. Rubinstein; James Bailey
Movement of soil and rocks in an unstable slope under gravitational forces is an example of a complex system that is highly dynamic in space and time. A typical failure in such systems is a landslide. Fundamental studies of granular media failure combined with a complex network analysis of radar monitoring data show that distinct partitions emerge in the kinematic field in the early stages of the prefailure
-
Generalized accelerated recurrence time model in the presence of a dependent terminal event Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Bo Wei; Zhumin Zhang; HuiChuan J. Lai; Limin Peng
Recurrent events are commonly encountered in longitudinal studies. The observation of recurrent events is often stopped by a dependent terminal event in practice. For this data scenario, we propose two sensible adaptations of the generalized accelerated recurrence time (GART) model (J. Amer. Statist. Assoc. 111 (2016) 145–156) to provide useful alternative analyses that can offer physical interpretations
-
Seasonal warranty prediction based on recurrent event data Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Qianqian Shan; Yili Hong; William Q. Meeker
Warranty return data from repairable systems, such as home appliances, lawn mowers, computers and automobiles, result in recurrent event data. The nonhomogeneous Poisson process (NHPP) model is used widely to describe such data. Seasonality in the repair frequencies and other variabilities, however, complicate the modeling of recurrent event data. Not much work has been done to address the seasonality
-
A global-local approach for detecting hotspots in multiple-response regression Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Hélène Ruffieux; Anthony C. Davison; Jörg Hager; Jamie Inshaw; Benjamin P. Fairfax; Sylvia Richardson; Leonardo Bottolo
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional
-
Focused model selection for linear mixed models with an application to whale ecology Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Céline Cunen; Lars Walløe; Nils Lid Hjort
A central point of disagreement, in certain long-standing discussions about a particular whaling dataset in the Scientific Committee of the International Whaling Commission, has directly involved model selection issues for linear mixed effect models. The biological question under discussion is associated with a clearly defined parameter of primary interest, a focus parameter, which makes model selection
-
A causal exposure response function with local adjustment for confounding: Estimating health effects of exposure to low levels of ambient fine particulate matter Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Georgia Papadogeorgou; Francesca Dominici
In the last two decades ambient levels of air pollution have declined substantially. At the same time the Clean Air Act mandates that the National Ambient Air Quality Standards (NAAQS) must be routinely assessed to protect populations based on the latest science. Therefore, researchers should continue to address the following question: is exposure to levels of air pollution below the NAAQS harmful
-
Evidence factors in a case-control study with application to the effect of flexible sigmoidoscopy screening on colorectal cancer Ann. Appl. Stat. (IF 1.675) Pub Date : 2020-06-29 Bikram Karmakar; Chyke A. Doubeni; Dylan S. Small
As in any observational study, in a case-control study a primary concern is potential unmeasured confounders. Bias, due to unmeasured confounders, can result in a false discovery of an apparent treatment effect when there is none. Replication of an observational study, which tries to provide multiple analyses of the data where the biases affecting each analysis are thought to be different, is one way
Contents have been reproduced by permission of the publishers.