• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-23
Mohammad Hossein Panahi; Mostafa Mohseni; Razieh Bidhendi Yarandi; Fahimeh Ramezani Tehrani

Antidepressants are prescribed widely to manage low back pain. There are a number of systematic reviews and meta-analyses which have investigated the efficacy of the treatments, while the methodological quality of them has not been assessed yet. This study aims to evaluate the methodological quality of the systematic reviews and meta-analyses investigating the effect of antidepressants on low back pain. A systematic search was conducted in PubMed, EMBASE, Medline, and Cochrane Library databases up to November 2018. The 16-item Assessment of Multiple Systematic Reviews (AMSTAR2) scale was used to assess the methodological quality of the studies. Systematic reviews and meta-analyses of the Antidepressants treatment effects on low back pain published in English language were included. There was no limitation on the type of Antidepressants drugs, clinical setting, and study population, while non-systematical reviews and qualitative and narrative reviews were excluded. A total of 25 systematic reviews and meta-analyses were evaluated; the studies were reported between 1992 and 2017. Obtained results from AMSTAR2 showed that 11 (44%), 9 (36%) and 5 (20%) of the included studies had high, moderate and low qualities, respectively. 13(52%) of studies assessed risk of bias and 2(20%) of meta analyses considered publication bias. Also, 16 (64%) of the included reviews provided a satisfactory explanation for any heterogeneity observed in the results. Although the trend of publishing high quality papers in ADs effect on LBP increased recently, performing more high-quality SRs and MAs in this field with precise subgroups of the type of pains, the class of drugs and their dosages may give clear and more reliable evidence to help clinicians and policymakers.

更新日期：2020-01-23
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-21
Raechel A. Damarell; Suzanne Lewis; Camilla Trenerry; Jennifer J. Tieman

Integrated care is an increasingly important principle for organising healthcare. Integrated care models show promise in reducing resource wastage and service fragmentation whilst improving the accessibility, patient-centredness and quality of care for patients. Those needing reliable access to the growing research evidence base for integrated care can be frustrated by search challenges reflective of the topic’s complexity. The aim of this study is to report the empirical development and validation of two search filters for rapid and effective retrieval of integrated care evidence in PubMed. One filter is optimised for recall and the other for precision. An Expert Advisory Group comprising international integrated care experts guided the study. A gold standard test set of citations was formed from screening Handbook Integrated Care chapter references for relevance. This set was divided into a Term Identification Set (20%) for determining candidate terms using frequency analysis; a Filter Development Set (40%) for testing performance of term combinations; and a Filter Validation Set (40%) reserved for confirming final filter performance. In developing the high recall filter, recall was steadily increased while maintaining precision at ≥50%. Similarly, the high precision filter sought to maximise precision while keeping recall ≥50%. For each term combination tested, an approximation of precision was obtained by reviewing the first 100 citations retrieved in Medline for relevance. The gold standard set comprised 534 citations. The search filter optimised for recall (‘Broad Integrated Care Search’) achieved 86.0–88.3% recall with corresponding low precision (47–53%). The search filter optimised for precise searching (‘Narrow Integrated Care Search’) demonstrated precision of 73–95% with recall reduced to between 55.9 and 59.8%. These filters are now available as one-click URL hyperlinks in the website of International Foundation for Integrated Care. The Broad and Narrow Integrated Care Search filters provide potential users, such as policy makers and researchers, seamless, reliable and ongoing access to integrated care evidence for decision making. These filters were developed according to a rigorous and transparent methodology designed to circumvent the challenges of information retrieval posed by this complex, multifaceted topic.

更新日期：2020-01-22
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-21
Lot Nyirenda; Meghan Bruce Kumar; Sally Theobald; Malabika Sarker; Musonda Simwinga; Moses Kumwenda; Cheryl Johnson; Karin Hatzold; Elizabeth L. Corbett; Euphemia Sibanda; Miriam Taegtmeyer

Qualitative research networks (QRNs) bring together researchers from diverse contexts working on multi-country studies. The networks may themselves form a consortium or may contribute to a wider research agenda within a consortium with colleagues from other disciplines. The purpose of a QRN is to ensure robust methods and processes that enable comparisons across contexts. Under the Self-Testing Africa (STAR) initiative and the REACHOUT project on community health systems, QRNs were established, bringing together researchers across countries to coordinate multi-country qualitative research and to ensure robust methods and processes allowing comparisons across contexts. QRNs face both practical challenges in facilitating this iterative exchange process across sites and conceptual challenges interpreting findings between contexts. This paper distils key lessons and reflections from both QRN experiences on how to conduct trustworthy qualitative research across different contexts with examples from Bangladesh, Ethiopia, Kenya, Indonesia, Malawi, Mozambique, Zambia and Zimbabwe. The process of generating evidence for this paper followed a thematic analysis method: themes initially identified were refined during several rounds of discussions in an iterative process until final themes were agreed upon in a joint learning process. Four guiding principles emerged from our analysis: a) explicit communication strategies that sustain dialogue and build trust and collective reflexivity; b) translation of contextually embedded concepts; c) setting parameters for contextualizing, and d) supporting empirical and conceptual generalisability. Under each guiding principle, we describe how credibility, dependability, confirmability and transferability can be enhanced and share good practices to be considered by other researchers. Qualitative research is often context-specific with tools designed to explore local experiences and understandings. Without efforts to synthesise and systematically share findings, common understandings, experiences and lessons are missed. The logistical and conceptual challenges of qualitative research across multiple partners and contexts must be actively managed, including a shared commitment to continuous ‘joint learning’ by partners. Clarity and agreement on concepts and common methods and timelines at an early stage is critical to ensure alignment and focus in intercountry qualitative research and analysis processes. Building good relationships and trust among network participants enhance the quality of qualitative research findings.

更新日期：2020-01-22
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-16
S. D. Walter; H. Han; G. H. Guyatt; D. Bassler; N. Bhatnagar; V. Gloy; S. Schandelmaier; M. Briel

Randomised trial protocols may incorporate interim analyses, with the potential to stop the study for futility if early data show insufficient promise of a treatment benefit. Previously, we have shown that this approach will theoretically lead to mis-estimation of the treatment effect. We now wished to ascertain the importance of this phenomenon in practice. We reviewed the methods and results in a set of trials that had stopped for futility, identified through an extensive literature search. We recorded clinical areas, interventions, study design, outcomes, trial setting, sponsorship, planned and actual treatment effects, sample sizes; power; and if there was a data safety monitoring board, or a published protocol. We identified: if interim analyses were pre-specified, and how many analyses actually occurred; what pre-specified criteria might define futility; if a futility analysis formed the basis for stopping; who made the decision to stop; and the conditional power of each study, i.e. the probability of statistically significant results if the study were to continue to its complete sample size. We identified 52 eligible trials, covering many clinical areas. Most trials had multiple centres, tested drugs, and 40% were industry sponsored. There were 75% where at least one interim analysis was planned a priori; a majority had only one interim analysis, typically with about half the target total sample size. A majority of trials did not pre-define a stopping rule, and a variety of reasons were given for stopping. Few studies calculated and reported low conditional power to justify the early stop. When conditional power could be calculated, it was typically low, especially under the current trend hypothesis. However, under the original design hypothesis, a few studies had relatively high conditional power. Data collection often continued after the interim analysis. Although other factors will typically be involved, we conclude that, from the perspective of conditional power, stopping early for futility was probably reasonable in most cases, but documentation of the basis for stopping was often missing or vague. Interpretation of truncated trials would be enhanced by improved reporting of stopping protocols, and of their actual execution.

更新日期：2020-01-17
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-17
Heather Munthe-Kaas; Heid Nøkleby; Simon Lewin; Claire Glenton

Systematic reviews are a key input to health and social welfare decisions. Studies included in systematic reviews often vary with respect to contextual factors that may impact on how transferable review findings are to the review context. However, many review authors do not consider the transferability of review findings until the end of the review process, for example when assessing confidence in the evidence using GRADE or GRADE-CERQual. This paper describes the TRANSFER Approach, a novel approach for supporting collaboration between review authors and stakeholders from the beginning of the review process to systematically and transparently consider factors that may influence the transferability of systematic review findings. We developed the TRANSFER Approach in three stages: (1) discussions with stakeholders to identify current practices and needs regarding the use of methods to consider transferability, (2) systematic search for and mapping of 25 existing checklists related to transferability, and (3) using the results of stage two to develop a structured conversation format which was applied in three systematic review processes. None of the identified existing checklists related to transferability provided detailed guidance for review authors on how to assess transferability in systematic reviews, in collaboration with decision makers. The content analysis uncovered seven categories of factors to consider when discussing transferability. We used these to develop a structured conversation guide for discussing potential transferability factors with stakeholders at the beginning of the review process. In response to feedback and trial and error, the TRANSFER Approach has developed, expanding beyond the initial conversation guide, and is now made up of seven stages which are described in this article. The TRANSFER Approach supports review authors in collaborating with decision makers to ensure an informed consideration, from the beginning of the review process, of the transferability of the review findings to the review context. Further testing of TRANSFER is needed.

更新日期：2020-01-17
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-14
Thomas Woodcock; Yewande Adeleke; Christine Goeschel; Peter Pronovost; Mary Dixon-Woods

The design and execution of measurement in quality improvement (QI) initiatives is often poor. Better guidance on “what good looks like” might help to mitigate some of the problems. We report a consensus-building process that sought to identify which features are important to include in QI measurement plans. We conducted a three-stage consensus-building approach: (1) identifying the list of features of measurement plans that were potential candidates for inclusion based on literature review and the study team’s experience; (2) a two-round modified Delphi exercise with a panel of experts to establish consensus on the importance of these features; and (3) a small in-person consensus group meeting to finalise the list of features. A list of 104 candidate questions was generated. A panel of 19 experts in the Delphi reviewed these questions and produced consensus on retaining 46 questions in the first round and on a further 22 in the second round. Thematic analysis of open text responses from the panellists suggested a number of areas of debate that were explicitly considered by the consensus group. The exercise yielded 74 questions (71% of 104) on which there was consensus in five categories of measurement relating to: design, data collection and management, analysis, action, and embedding. This study offers a consensus-based view on the features of a good measurement plan for a QI project in healthcare. The results may be of use to QI teams, funders and evaluators, but are likely to require further development and testing to ensure feasibility and usefulness.

更新日期：2020-01-15
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-14
Gang Yu; Xian Zeng; Shaoqing Ni; Zheng Jia; Weihong Chen; Xudong Lu; Jiye An; Huilong Duan; Qiang Shu; Haomin Li

Drug safety in children is a major concern; however, there is still a lack of methods for quantitatively measuring, let alone to improving, drug safety in children under different clinical conditions. To assess pediatric drug safety under different clinical conditions, a computational method based on Electronic Medical Record (EMR) datasets was proposed. In this study, a computational method was designed to extract the significant drug-diagnosis associations (based on a Bonferroni-adjusted hypergeometric P-value < 0.05) among drug and diagnosis co-occurrence in EMR datasets. This allows for differences between pediatric and adult drug use to be compared based on different EMR datasets. The drug-diagnosis associations were further used to generate drug clusters under specific clinical conditions using unsupervised clustering. A 5-layer quantitative pediatric drug safety level was proposed based on the drug safety statement of the pediatric labeling of each drug. Therefore, the drug safety levels under different pediatric clinical conditions were calculated. Two EMR datasets from a 1900-bed children’s hospital and a 2000-bed general hospital were used to test this method. The comparison between the children’s hospital and the general hospital showed unique features of pediatric drug use and identified the drug treatment gap between children and adults. In total, 591 drugs were used in the children’s hospital; 18 drug clusters that were associated with certain clinical conditions were generated based on our method; and the quantitative drug safety levels of each drug cluster (under different clinical conditions) were calculated, analyzed, and visualized. With this method, quantitative drug safety levels under certain clinical conditions in pediatric patients can be evaluated and compared. If there are longitudinal data, improvements can also be measured. This method has the potential to be used in many population-level, health data-based drug safety studies.

更新日期：2020-01-15
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-13
Hannah Harrison; Simon J. Griffin; Isla Kuhn; Juliet A. Usher-Smith

Systematic reviews are vital to the pursuit of evidence-based medicine within healthcare. Screening titles and abstracts (T&Ab) for inclusion in a systematic review is an intensive, and often collaborative, step. The use of appropriate tools is therefore important. In this study, we identified and evaluated the usability of software tools that support T&Ab screening for systematic reviews within healthcare research. We identified software tools using three search methods: a web-based search; a search of the online “systematic review toolbox”; and screening of references in existing literature. We included tools that were accessible and available for testing at the time of the study (December 2018), do not require specific computing infrastructure and provide basic screening functionality for systematic reviews. Key properties of each software tool were identified using a feature analysis adapted for this purpose. This analysis included a weighting developed by a group of medical researchers, therefore prioritising the most relevant features. The highest scoring tools from the feature analysis were then included in a user survey, in which we further investigated the suitability of the tools for supporting T&Ab screening amongst systematic reviewers working in medical research. Fifteen tools met our inclusion criteria. They vary significantly in relation to cost, scope and intended user community. Six of the identified tools (Abstrackr, Colandr, Covidence, DRAGON, EPPI-Reviewer and Rayyan) scored higher than 75% in the feature analysis and were included in the user survey. Of these, Covidence and Rayyan were the most popular with the survey respondents. Their usability scored highly across a range of metrics, with all surveyed researchers (n = 6) stating that they would be likely (or very likely) to use these tools in the future. Based on this study, we would recommend Covidence and Rayyan to systematic reviewers looking for suitable and easy to use tools to support T&Ab screening within healthcare research. These two tools consistently demonstrated good alignment with user requirements. We acknowledge, however, the role of some of the other tools we considered in providing more specialist features that may be of great importance to many researchers.

更新日期：2020-01-13
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-10
Augusto César Ferreira De Moraes; Marcus Vinícius Nascimento-Ferreira; Claudia Lucia de Moraes Forjaz; Juan Carlos Aristizabal; Leticia Azzaretti; Walter Viana Nascimento Junior; Maria L. Miguel-Berges; Estela Skapino; Carlos Delgado; Luis A. Moreno; Heráclito Barbosa Carvalho

Multicenter studies from Europe and the United States have developed specifically standardized questionnaires for assessing and comparing sedentary behavior, but they cannot be directly applied for South American countries. The aim of this study was to assess the reliability and validity of the South American Youth Cardiovascular and Environmental (SAYCARE) sedentary behavior questionnaire. Children and adolescents from seven South American cities were involved in the test-retest reliability (children: n = 55; adolescents: n = 106) and concurrent validity (children: n = 93; adolescents: n = 94) studies. The SAYCARE sedentary behavior questionnaire was administered twice with two-week interval and the behaviors were parent-reported for children and self-reported for adolescents. Questions included time spent watching television, using a computer, playing console games, passive playing (only in children) and studying (only in adolescents) over the past week. Accelerometer was used for at least 3 days, including at least one weekend day. We compared values of sedentary time, using accelerometers, by quartiles of reported sedentary behavior time and their sum. The reliability of sedentary behavior time was moderate for children (rho ≥0.45 and k ≥ 0.40) and adolescents (rho ≥0.30). Comparisons between the questionnaire and accelerometer showed a low overall agreement, with the questionnaire systematically underreporting sedentary time in children (at least, − 332.6 ± 138.5 min/day) and adolescents (at least, − 399.7 ± 105.0 min/day). The SAYCARE sedentary behavior questionnaire has acceptable reliability in children and adolescents. However, the findings of current study indicate that SAYCARE questionnaire is not surrogate of total sedentary time.

更新日期：2020-01-11
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-10
Rachel G. Curtis; Timothy Olds; Ronald Plotnikoff; Corneel Vandelanotte; Sarah Edney; Jillian Ryan; Carol Maher

This study examined the criterion validity of the online Active Australia Survey, using accelerometry as the criterion, and whether self-report bias was related to level of activity, age, sex, education, body mass index and health-related quality of life. The online Active Australia Survey was validated against the GENEActiv accelerometer as a direct measure of activity. Participants (n = 344) wore an accelerometer for 7 days, completed the Active Australia Survey, and reported their health and demographic characteristics. A Spearman’s rank coefficient examined the association between minutes of moderate-to-vigorous physical activity recorded on the Active Australia Survey and GENEActiv accelerometer. A Bland-Altman plot illustrated self-report bias (the difference between methods). Linear mixed effects modelling was used to examine whether participant factors predicted self-report bias. The association between moderate-to-vigorous physical activity reported on the online Active Australia Survey and accelerometer was significant (rs = .27, p < .001). Participants reported 4 fewer minutes per day on the Active Australia Survey than was recorded by accelerometry (95% limits of agreement −104 – 96 min) but the difference was not significant (t(343) = −1.40, p = .16). Self-report bias was negatively associated with minutes of accelerometer-recorded moderate-to-vigorous physical activity and positively associated with mental health-related quality of life. The online Active Australia Survey showed limited criterion validity against accelerometry. Self-report bias was related to activity level and mental health-related quality of life. Caution is recommended when interpreting studies using the online Active Australia Survey.

更新日期：2020-01-11
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-07
Nigel Stallard; Susan Todd; Elizabeth G. Ryan; Simon Gates

There is a growing interest in the use of Bayesian adaptive designs in late-phase clinical trials. This includes the use of stopping rules based on Bayesian analyses in which the frequentist type I error rate is controlled as in frequentist group-sequential designs. This paper presents a practical comparison of Bayesian and frequentist group-sequential tests. Focussing on the setting in which data can be summarised by normally distributed test statistics, we evaluate and compare boundary values and operating characteristics. Although Bayesian and frequentist group-sequential approaches are based on fundamentally different paradigms, in a single arm trial or two-arm comparative trial with a prior distribution specified for the treatment difference, Bayesian and frequentist group-sequential tests can have identical stopping rules if particular critical values with which the posterior probability is compared or particular spending function values are chosen. If the Bayesian critical values at different looks are restricted to be equal, O’Brien and Fleming’s design corresponds to a Bayesian design with an exceptionally informative negative prior, Pocock’s design to a Bayesian design with a non-informative prior and frequentist designs with a linear alpha spending function are very similar to Bayesian designs with slightly informative priors.This contrasts with the setting of a comparative trial with independent prior distributions specified for treatment effects in different groups. In this case Bayesian and frequentist group-sequential tests cannot have the same stopping rule as the Bayesian stopping rule depends on the observed means in the two groups and not just on their difference. In this setting the Bayesian test can only be guaranteed to control the type I error for a specified range of values of the control group treatment effect. Comparison of frequentist and Bayesian designs can encourage careful thought about design parameters and help to ensure appropriate design choices are made.

更新日期：2020-01-07
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-06
Maria Jose Monsalves; Ananta Shrikant Bangdiwala; Alex Thabane; Shrikant Ishver Bangdiwala

Researchers have been utilizing linear mixed models (LMMs) for different hierarchical study designs and under different names, which emphasizes the need for a standard in reporting such models [1, 2]. Mixed effects models, multilevel data, contextual analysis, hierarchical studies, longitudinal studies, panel data and repeated-measures designs are some of the different names used when referring to study designs and/or analytical tools for correlated data. In addition, there is usually no distinction made between having a data structure that is multilevel, and having a research question that requires a multilevel analysis. There are multiple excellent tutorials on multilevel analyses [3,4,5]. However, there is inconsistency in how the results of LMMs are reported in the literature [6]. Casals et al. conducted a systematic review of how various LMMs were reported in the medical literature, and found that important aspects were not reported in most cases [6]. As an example, a cohort study of children that selects a sample of schools, then selects students within schools, and conducts multiple measurements over time in the same students, would be a 3-level dataset: with school as the highest level (Level 3), student as a lower level (Level 2), and time-point as the lowest level (Level 1). Repeated measurements of a variable over time within a student are likely to be similar, i.e. positively correlated. Also, values of a variable measured on students of a particular school may be more similar to each other than to the values of the same variable measured on students from different schools, i.e. they are also likely to be positively correlated. These within-level correlations reduce the overall information in the data. Considering the correlations typically leads to larger estimates of variances and consequently lower power if sample sizes are not increased at the design stage. At the analysis stage, incorporating random effects into a regression model is one way to acknowledge the variation among upper-level units. Random intercepts and random slopes help to attribute the variation in values of the outcome variable to the relevant levels and independent variables. A standardized checklist for the reporting of multilevel data and the presentation of linear mixed models will promote adequate reporting of correlated data analyses. In this manuscript, we propose LEVEL (Logical Explanations & Visualizations of Estimates in Linear mixed models), a systematic approach for the presentation of studies with correlated data from multilevel study designs, with an accompanying checklist for standardizing the reporting of results from linear mixed models. These models are quite complex, and the intention of this manuscript is not to be a statistical tutorial, but to mention aspects of the study design and analysis methods that we propose should be addressed in a publication. We present the basics of a linear mixed model simply to introduce the terminology and to help understand the proposed reporting recommendations. The linear mixed model Written as an equation, the ‘null’ (no covariate) linear mixed model for a 2-level hierarchical study is: $${Y}_{ij}=\mu +{\tau}_i+{\varepsilon}_{ij},$$ where i = 1, …,m indexes the number of upper-level units, j = 1, …, ni indexes the number of base-level units in the ith upper-level unit, μ denotes the overall mean of the dependent random variable Y, τi is the random intercept effect of the ith upper-level unit, and εij is the random error of the jth lower-level unit in the ith upper-level unit. We assume Normal distributions for the random effects, such that $${\tau}_i\sim N\left(0,{\sigma}_I^2\right)$$ and $${\varepsilon}_{ij}\sim N\left(0,{\sigma}_E^2\right)$$, where $${\sigma}_I^2$$ is the component of variation due to variability among upper-level units, and $${\sigma}_E^2$$ is the residual component of variation due to variability among lower-level units. We assume that these two random effects are independent of each other. By acknowledging multiple sources of variability and then attributing the variation to the appropriate level, the multilevel model can more accurately and precisely estimate the effects of all variables included in the model [7]. Variance components are used to calculate the “intra-level” or intraclass correlation coefficient (ICC), a statistic that quantifies the degree to which data at the lower level are correlated. The ICC, also referred to as the variance partition coefficient (VPC), is calculated by the following proportion, $$ICC= VPC=\frac{\sigma_I^2}{\sigma_I^2+{\sigma}_E^2},$$ which helps answer the question: of the total variation in the outcome variable, how much is accounted for by the variation among the upper-level units? As the term ICC is often mistaken for an estimate of a correlation coefficient, we will use the more appropriate term VPC. A VPC close to 0 suggests that little to no variation in the outcome is attributable to variation among upper-level units, so most of the variation in the outcome is among the lower-level units and thus there is little correlation among them. On the other hand, a VPC close to 1 suggests that most of the variation in the outcome is attributable to variation among upper-level units, so little variation is to be found among the lower-level units; thus, there is high correlation among them. Calculating the VPC can help determine the presence of correlation at the lower level and the need to account for it in the analyses. Interpretation of the magnitude of the ICC/VPC is context dependent. In hierarchical data structures with more than 2-levels (see multilevel diagram in Fig. 1), the VPC can be calculated for outcomes measured on units of each lower-level, with the numerator as the variation in outcome between units on all levels above [8]. For the example in Fig. 1, if we have the following ‘null’ model for the observation at time t on the jth pupil from the ith school, $${Y}_{ij t}=\mu +{S}_i+{P}_{ij}+{\varepsilon}_{ij t},$$ then VPC1 quantifies the correlation among all the values between and within pupils nested within schools and is given by $${VPC}_1=\frac{\sigma_I^2}{\sigma_I^2+{\sigma}_J^2+{\sigma}_E^2},$$ while VPC2 quantifies the correlation among the repeated measurements within pupils nested within schools and is given by $${VPC}_2=\frac{\sigma_I^2+{\sigma}_J^2}{\sigma_I^2+{\sigma}_J^2+{\sigma}_E^2},$$ where $${\sigma}_I^2$$ is the component of variation due to variability among schools, $${\sigma}_J^2$$ is the component of variation due to variability among pupils nested within schools, and $${\sigma}_E^2$$ is the component of residual variation due to variability in the repeated measurements within pupils. Fig. 1 Schematic example of the ‘multilevel diagram’ for a 3-level hierarchical study of students nested in schools and repeated measurements over time in students, in a table format and b flowchart formatFull size image Understanding the implications that correlations among observations may have on the design and analyses of research studies is essential. At the design stage, if the contribution to the VPC for a particular level (the variance component) is small, it implies that there is little variation among units at that level; it is therefore more advantageous to sample more units from higher levels from an efficiency and power standpoint. These important statistical considerations in planning sample sizes at the different levels are accounted for with the variance inflation factor (VIF), also called the ‘design effect’. For a given level, k, the VIF is [1 + (mk-1) VPCk], where mk is the average number of units in a member of the kth level. At the analysis stage, depending on the study design, linear mixed models can include random effects to account for correlation in space or in a social group (clustering), time (repeated-measures), or both. Table 1 presents example linear mixed models with dependent variable Y in hypothetical 2-level and 3-level study designs, with a single independent variable X. If the data were from a 1-level study design, the model would have no random effects (except the residual error!): Yj = β0 + β1Xj + εj, where $${\varepsilon}_j\sim N\left(0,{\sigma}_E^2\right)$$. Table 1 Example simple linear mixed models in 2-level and 3-level study designsFull size table The random effects applied in the simple linear mixed models in Table 1 are assumed to have Normal distributions and to be independent from the error distribution. If there is more than one random effect, one must also specify if they are independent amongst themselves, and if not, should specify the covariance structure amongst the random effects. The statistical literature is confusing and contradictory as to whether to consider effects as fixed or as random [9]. Many textbooks state that level effects must be considered as fixed effects if all possible members of that level were studied, and as random effects if members of that level are a sample from some population. Others state that fixed effects are to be used if the specific member effects are of interest, and as random effects if not. The Hausman test for the difference between the within-level and between-level regression coefficients is sometimes used as a test for deciding whether to use a random or fixed coefficient model [10]. We are not stating a position on this argument, but insist that one must acknowledge the hierarchical study design, not ignore the correlations, and justify the random intercepts and random slopes used. Multilevel data versus multilevel research question The first step in analyzing multilevel data is to decide if the research question is a multilevel question. The design of a study may be hierarchical and thus have correlated data, but the research question may be one that does not require multilevel analyses. For example, in a clustered study design, research questions where the dependent variable is at the highest level will not require multilevel analyses since the members of the highest level are uncorrelated. In this case, the variation amongst members of the upper level is the only variance component, and a fixed effects model analyzed by ordinary least squares (OLS) is appropriate [10]. In a repeated measures study design, if the dependent variable in the research question is at a single time point, it is not a multilevel question as there are no repeated measures. Also, if the dependent variable is the time to the occurrence of an event (survival data), the research question is no longer multilevel; unless there is additional hierarchical structure, as in ‘frailty’ models. In a 2 or more level hierarchical clustered study, any research question using as dependent variable any lower level variable will require a multilevel analysis. The next step is to consider using the multilevel diagram [8] as presented in Fig. 1. The multilevel diagram allows visualizing the levels of a study, the structure of the levels, and the variables collected at each level. Variables collected at higher levels than the dependent variable are usually called contextual variables. The diagram readily allows one to see if the dependent variable for a particular research question requires a multilevel analysis. Another important consideration are ‘aggregated’ or ‘collapsed’ variables, which are variables derived by summarizing the values of observations from lower levels. For example, if years of education is available at the individual level for each adult in a household, the variable ‘highest education level of the household’ is an aggregated variable at the household level. If we have the sex and the grade-points for each student in multiple schools, the proportion of boys per school and the school-wide average grade-point are school-level aggregated variables. Note that for a research question to be multilevel, the crucial decision is whether the dependent variable is at a lower level. One can have independent variables at a different (lower) level, but if the dependent variable is at the highest level, it is not a multilevel research question. For example, in a repeated measures design, the outcome at the end of treatment for a given person (e.g. treatment success) is measured only once, but may depend on values of a variable measured at different time points (e.g. hypertension at baseline and at times t1 and t2 prior to end of treatment). How to report descriptive analyses With a hierarchical study design, a correct multilevel descriptive analysis should include analyses of the outcomes of interest at all relevant levels and distribution of the variables in all levels. This step will also help the researcher uncover irregularities in the data, such as unusual patterns of missingness, lack of heteroscedasticity, or unusual shapes of distributions. It is also helpful in understanding which variables are correlated and how to possibly consider them in the modeling. The choice of summary statistics to use, as with non-multilevel descriptive statistical analysis, will depend on the type of variable. When presenting summary statistics (e.g. means for continuous variables, proportions for categorical variables) of variables collected at lower levels, measures of variability and confidence intervals must account for the variance inflation factors (VIFs). When presenting plots, univariate and bivariate graphs should allow comparison of variables measured at the same level. With clustered data, plots of lower level variables should identify membership in upper level groups. With longitudinal data, plots of repeated measurements over time should identify points that come from the same subject (e.g. ‘spaghetti plots’) rather than summaries over time that obscure the fact that some of the same subjects are included across the summaries [11]. How to report modeling analyses Descriptive bivariate analyses that assess significance of correlation and association measures should adjust for the correlation in the observations. Once the focus shifts to the dependent variable of interest, the correlation among the observations of the dependent variable of interest at each level must be studied and presented. Variance decomposition must be performed and the VPCs or ICCs should be reported. An initial ‘null’ multilevel model with no independent variables is strongly encouraged. The modeling, variable selection, and arriving at a ‘final’ model, is a process that every investigator can follow according to their choice, and is therefore not addressed. Note that adding dummy (indicator) variables as fixed effects for members of a higher level is not exactly equivalent to adding random intercept effects for members of a higher level. While both approaches do have the effect of explaining some of the variability in the outcome, only the latter decomposes the residual variance into components. For the ‘final’ model, in addition to reporting the results for the fixed effects, one must report either the variance components or the VPCs or ICCs. It may be of special interest to report these for the ‘null’ model (i.e. with no independent variables), as well as for the final model (and other ‘intermediate’ models), so that the reader may understand the impact of explanatory variables on the variance components. Note also that if random intercepts and random slopes are included in the models, the estimated correlation structure among the random effects should also be presented. Finally, measures of model fit, such as either the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC), or the area under the receiver operating characteristic (ROC) curve (AUC) for logistic regression models, may be useful for readers. Example of reporting multilevel data structure and analyses: the Chilean dental study We use a 3-level study that measured presence of caries in temporary dentition in 2275 children from 40 pre-schools in 13 districts (comunas) of the Metropolitan Region (around the capital of Santiago) in Chile, to illustrate what and how to present results from a multilevel analysis. All the districts in the Metropolitan Region were classified according to the United Nations Development Program (UNDP) Human Development Index (HDI) [12], then stratified into 5 groups: Very High, High, Middle, Low, and Very Low. Estimation of the necessary number of children and pre-schools to include took into account the expected ICCs and VIFs based on the literature. Thirteen districts were randomly selected across the strata. Within a district, educational establishments (pre-schools) were categorized into private (paid), private (subsidized) or public, and approached for participation. All selected districts participated, but the private pre-schools of the highest HDI district refused to participate; thus that district only had public (municipal) pre-schools participating. All eligible children of a school were invited to participate, and refusal (by parents) rates were less than 1%. The study was approved by the Comité de Ética de Investigación en Seres Humanos (ethics committee) of the Facultad de Medicina of the Universidad de Chile. Table 2 displays the multilevel diagram for this study. The research question was: ‘Which factors are related to the presence of caries in temporary dentition in children of different districts of the Metropolitan Region?’ The prevalence of caries in temporary dentition in a group can be calculated from the presence of caries in temporary teeth at the individual-level. We note that our dependent variable is at a lower level, while the independent variables of interest are from various levels. Table 2 Example multilevel diagram in table format for The Chilean Dental StudyFull size table Table 3 presents the results of three different random-intercept logistic regression models: the ‘null’ model, an ‘intermediate’ model, and a ‘final’ model, fitted using maximum likelihood. Usually only a final model is presented, but we illustrate how the other models can help in understanding changes in the VPC when one introduces independent variables from different levels in multilevel models. See model equations in the Additional file 1. Table 3 Random-intercept logistic regression models for the presence of caries in The Chilean Dental StudyFull size table The effect estimates and 95% confidence intervals (CIs) do account for the correlation among the observations; at the bottom of the table of results, one presents the corresponding intraclass correlation coefficients and the model fit criteria. We first note that in the intermediate model, which only includes district-level and school-level covariates, the district-level variables of HDI and rurality, and the type of school are statistically significant – the higher the human development index of the district, the lower the probability of caries among the children, while children in private (paid) pre-schools have lower probability of caries. In the final model, which now includes child-level covariates, the odds ratios (ORs) for school type and rural location are no longer significant. The sex and age of the child are significant, while family income and access to health care were not significantly associated with caries presence. Secondary school education of the main caretaker was associated with higher likelihood of caries. It could be that district-level factors like HDI account for the effect of child-level socioeconomic factors. From the ‘null’ model, we note that the correlation of the presence of caries of children from the same district is not negligible (ICC = 0.0495), but also that this correlation is more than doubled (ICC = 0.1278) among children within the same school. When we consider district-level and school-level covariates, the ICC for district and for school within district are reduced. The ICC for district is not reduced further when we add child-level covariates in the ‘final’ model. However, the correlation among presence of caries among children within the same school is reduced when child-level covariates are included in the model. The final model, as expected, has a much better fit than the intermediate model (much lower AIC), since it incorporates child-level covariates, which explain well the child-level variable of presence of caries. The objective of this manuscript is to recommend how to report and present multilevel data and the results of linear mixed models. The need for such a checklist has been previously established by Casals et al. [6], who conducted a systematic review of the quality of the presentation of results and information from LMMs in the field of clinical medicine. Their extensive and systematic review of indexed medical journals included longitudinal studies, repeated measurements and multilevel design studies, from various medical disciplines. They found that “most of the useful information about generalized linear mixed models was not reported in most cases.” [6] Less than 10% reported the variance estimates of random effects. Aspects that apply to all modeling, such as covariate selection, estimation method, and goodness of fit, were also not universally reported. They conclude that “it is important to consider the use of minimal rules as standardized guidelines when presenting generalized linear mixed model results in medical journals.” [6] This manuscript is limited since it is not intended to be a tutorial on statistical methods for analyzing correlated data. Many such tutorials do exist. We do not review the complex statistical considerations behind all the aspects that are important in LMMs. We provided a real-data example using a mixed effects logistic regression analysis of a 3-level study to illustrate how they such analyzes could be reported following our recommendations. Table 4 presents a checklist of items that we recommend for reporting multilevel data and modelling results, where items are either suggested (S), expected (E) or necessary (N). The checklist was developed by the authors based on their experience in conducting and presenting multilevel data analyses. We thus welcome comments from users of the proposed checklist and from journal editors. We welcome considering extending our recommended checklist to other multilevel models. Checklists such as the PRISMA [13], STROBE [14], CONSORT [15] and others have improved the quality of reporting of scientific medical research studies in abstracts and full manuscripts [16]. More recently, reporting guidelines for models have been proposed [17, 18]. The proposed LEVEL checklist is modeled on STROBE guidelines, modified for multilevel studies. Table 4 LEVEL (Logical Explanations & Visualizations of Estimates in Linear mixed models) checklist of items for reports of multilevel data and modelling analysesFull size table A standardized checklist for the reporting of multilevel data and the presentation of linear mixed models will promote adequate reporting of correlated data analyses, and ensure that appropriate statistics are contained and explained thoroughly in manuscripts. The implementation of our checklist of items to report when presenting results of a multilevel analysis hopes to increase transparency, completeness, and the quality of reporting. The datasets used and/or analysed during the current study are available from the first author (maria.monsalves@uss.cl) on reasonable request. AIC: Akaike Information Criterion AUC: Area under the receiver operating characteristic curve BIC: Bayesian Information Criterion CI: Confidence interval CONSORT: CONsolidated Standards Of Reporting Trials E: Expected HDI: Human Development Index ICC: Intraclass correlation coefficient LEVEL: Logical Explanations & Visualizations of Estimates in Linear mixed models LMM: Linear mixed model N: Necessary OLS: Ordinary least squares OR: Odds ratio PRISMA: Preferred Reporting Items for Systematic reviews and Meta-Analyses ROC: Receiver operating characteristic S: Suggested STROBE: Strengthening The Reporting of OBservational studies in Epidemiology UNDP: United Nations Development Program VIF: Variance inflation factor VPC: Variance partition coefficient 1. Basagaña X, Pedersen M, Barrera-Gómez J, et. al. Analysis of multicentre epidemiological studies: contrasting fixed or random effects modelling and meta-analysis. Int J Epidemiol 2018;47(4):1343–1354. Article Google Scholar 2. Greenland S. Principles of multilevel modelling. Int J Epidemiol. 2000;29:158–67. CAS Article Google Scholar 3. Merlo J. Multilevel analytical approaches in social epidemiology: measures of health variation compared with traditional measures of association. J Epidemiol Community Health. 2003;57:550–2. CAS Article Google Scholar 4. Merlo J, Chaix B, Yang M, et al. A brief conceptual tutorial of multilevel analysis in social epidemiology: linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Community Health. 2005;59:443–9. Article Google Scholar 5. Merlo J, Chaix B, Ohlsson H, et al. A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena. J Epidemiol Community Health. 2006;60:290–7. Article Google Scholar 6. Casals M, Girabent-Farrés M, Carrasco JL. Methodological quality and reporting of generalized linear mixed models in clinical medicine (2000-2012): a systematic review. PLoS One. 2014;9(11):e112653. Article Google Scholar 7. Clubine-Ito C. Multilevel modeling for historical data. Hist Methods J Quant Interdiscip Hist. 2010;37(1):5–22. Article Google Scholar 8. Bangdiwala SI. The multilevel diagram. Int J Inj Control Saf Promot. 2012;19(4):388–90. Article Google Scholar 9. Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press; 2016. p. 245–6. Google Scholar 10. Snijders TAB, Berkhof J. Diagnostic checks for multilevel models, Chapter 3. In: de Leeuw J, Meijer E, editors. Handbook of Multilevel Analysis. New York: Springer Science; 2008. p. 141–75. Google Scholar 11. Diggle PJ, Liang K-Y, Zeger SL. Analysis of longitudinal data. Oxford: Clarendon Press; 1994. p. 33–54. Google Scholar 12. PNUD (2014) Las trayectorias del Desarrollo Humano en las comunas de Chile (1994–2003), Programa Naciones Unidas para el Desarrollo. https://catalogo.ministeriodesarrollosocial.gob.cl/cgi-bin/koha/opac-detail.pl?biblionumber=7779 (Accessed 2 Dec 2019). Google Scholar 13. Moher D, Liberati A, Tetzlaff J, The PRISMA Group, et al. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA Statement. PLoS Med. 2009;6(7):e1000097. Article Google Scholar 14. von Elm E, Altman DG, Egger M, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7. Article Google Scholar 15. Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Open Med. 2010;4(1):e60. PubMed PubMed Central Google Scholar 16. Jin Y, Sanger N, Shams I, et al. Does the medical literature remain inadequately described despite having reporting guidelines for 21 years? – a systematic review of reviews: an update. J Multidiscip Healthc. 2018;11:495–510. Article Google Scholar 17. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162:55–63. Article Google Scholar 18. Moher D, Schulz KF, Simera I, Altman DG. Guidance for developers of health research reporting guidelines. PLoS Med. 2010;7:e1000217. Article Google Scholar Download references None. Funding There was no funding for the work of this manuscript. Affiliations Facultad de Medicina y Ciencia, Universidad San Sebastián, Santiago, Chile Maria Jose Monsalves University of Minnesota, Minneapolis, MN, USA Ananta Shrikant Bangdiwala McMaster University, Hamilton, ON, Canada Alex Thabane  & Shrikant Ishver Bangdiwala Population Health Research Institute, Hamilton, ON, Canada Shrikant Ishver Bangdiwala Institute for Social and Health Sciences, University of South Africa, Johannesburg, South Africa Shrikant Ishver Bangdiwala Department of Health Research Methods, Evidence and Impact, Population Health Research Institute, McMaster University Faculty of Health Sciences, 237 Barton Street East, DBCVSRI Building, #C2-210, Hamilton, Ontario, L8L 2X2, Canada Shrikant Ishver BangdiwalaAuthors Search for Maria Jose Monsalves in: PubMed • Google Scholar Search for Ananta Shrikant Bangdiwala in: PubMed • Google Scholar Search for Alex Thabane in: PubMed • Google Scholar Search for Shrikant Ishver Bangdiwala in: PubMed • Google Scholar Contributions SIB and MJM conceptualized the manuscript and led the writing. All authors (MJM, ASB, AT, SIB) reviewed and commented on all drafts of the manuscript. MJM facilitated the data from a previously done study for purposes of illustrating the proposed methodology. All authors (MJM, ASB, AT, SIB) have read and approved the final manuscript. Corresponding author Correspondence to Shrikant Ishver Bangdiwala. Ethics approval and consent to participate For the example illustrating the proposed methodology, data from the The Chilean Dental Study were used. That study was approved by the Comité de Ética de Investigación en Seres Humanos [Ethics Committee for Research on Human Beings] of the Facultad de Medicina [School of Medicine]]of the Universidad de Chile [University of Chile]. In that study, educational establishments were approached for consent to participation. All eligible children of a participating school were invited to participate. Written informed consent was obtained from parents, and verbal assent approval was obtained from children. As mentioned in the manuscript, refusal (by parents) rates were less than 1%. Consent for publication The manuscript is not using any details, images, or videos relating to an individual person; thus, informed consent for publication is not applicable. Competing interests The authors declare that they have no competing interests. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Additional file 1. Model equations for the Example mixed effects logistic regression models used for The Chilean Dental Study. Three model equations are provided: 1. ‘Null’ logistic regression model – no independent variables. 2. ‘Intermediate’ logistic regression model – with selected district- and school-level independent variables. 3. ‘Final’ logistic regression model – with selected district, school- and child-level independent variables. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Reprints and Permissions Cite this article Monsalves, M.J., Bangdiwala, A.S., Thabane, A. et al. LEVEL (Logical Explanations & Visualizations of Estimates in Linear mixed models): recommendations for reporting multilevel data and analyses. BMC Med Res Methodol 20, 3 (2020) doi:10.1186/s12874-019-0876-8 Download citation Received 23 July 2019 Accepted 22 November 2019 Published 06 January 2020 DOI https://doi.org/10.1186/s12874-019-0876-8 Keywords Multilevel models Reporting guidelines Variance partition coefficients Multilevel diagram

更新日期：2020-01-06
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2020-01-03
Noa’a Shimoni; Siripanth Nippita; Paula M. Castaño

Researchers and clinicians use text messages to collect data with the advantage of real time capture when compared with standard data collection methods. This article reviews project setup and management for successfully collecting patient-reported data through text messages. We review our experience enrolling over 2600 participants in six clinical trials that used text messages to relay information or collect data. We also reviewed the literature on text messages used for repeated data collection. We classify recommendations according to common themes: the text message, the data submitted and the phone used. We present lessons learned and discuss how to create text message content, select a data collection platform with practical features, manage the data thoughtfully and consistently, and work with patients, participants and their phones to protect privacy. Researchers and clinicians should design text messages to include short, simple prompts and answer choices. They should decide whether and when to send reminders if participants do not respond and set parameters regarding when and how often to contact patients for missing data. Data collection platforms send, receive, and store messages. They can validate responses and send error messages. Researchers should develop a protocol to append and correct data in order to improve consistency with data handling. At the time of enrollment, researchers should ensure that participants can receive and respond to messages. Researchers should address privacy concerns and plan for service interruptions by obtaining alternate participant contact information and providing participants with a backup data collection method. Careful planning and execution can reward clinicians and investigators with complete, timely and accurate data sets.

更新日期：2020-01-04
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-31
Shannon Wongvibulsin; Katherine C. Wu; Scott L. Zeger

Clinical research and medical practice can be advanced through the prediction of an individual’s health state, trajectory, and responses to treatments. However, the majority of current clinical risk prediction models are based on regression approaches or machine learning algorithms that are static, rather than dynamic. To benefit from the increasing emergence of large, heterogeneous data sets, such as electronic health records (EHRs), novel tools to support improved clinical decision making through methods for individual-level risk prediction that can handle multiple variables, their interactions, and time-varying values are necessary. We introduce a novel dynamic approach to clinical risk prediction for survival, longitudinal, and multivariate (SLAM) outcomes, called random forest for SLAM data analysis (RF-SLAM). RF-SLAM is a continuous-time, random forest method for survival analysis that combines the strengths of existing statistical and machine learning methods to produce individualized Bayes estimates of piecewise-constant hazard rates. We also present a method-agnostic approach for time-varying evaluation of model performance. We derive and illustrate the method by predicting sudden cardiac arrest (SCA) in the Left Ventricular Structural (LV) Predictors of Sudden Cardiac Death (SCD) Registry. We demonstrate superior performance relative to standard random forest methods for survival data. We illustrate the importance of the number of preceding heart failure hospitalizations as a time-dependent predictor in SCA risk assessment. RF-SLAM is a novel statistical and machine learning method that improves risk prediction by incorporating time-varying information and accommodating a large number of predictors, their interactions, and missing values. RF-SLAM is designed to easily extend to simultaneous predictions of multiple, possibly competing, events and/or repeated measurements of discrete or continuous variables over time.Trial registration: LV Structural Predictors of SCD Registry (clinicaltrials.gov, NCT01076660), retrospectively registered 25 February 2010

更新日期：2019-12-31
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-28
Allison Tong; Anneliese Synnot; Sally Crowe; Sophie Hill; Andrea Matus; Nicole Scholes-Robertson; Sandy Oliver; Katherine Cowan; Mona Nasser; Soumyadeep Bhaumik; Talia Gutman; Amanda Baumgart; Jonathan C. Craig

Research priority setting with stakeholders can help direct the limited resources for health research toward priority areas of need. Ensuring transparency of the priority setting process can strengthen legitimacy and credibility for influencing the research agenda. This study aims to develop a reporting guideline for priority setting of health research. We searched electronic databases and relevant websites for sources (frameworks, guidelines, or models for conducting, appraising, reporting or evaluating health research priority setting, and reviews (including systematic reviews)), and primary studies of research priority setting to July 2019. We inductively developed a list of reporting items and piloted the preliminary guideline with a diverse range of 30 priority setting studies from the records retrieved. From 21,556 records, we included 26 sources for the candidate REPRISE framework and 455 primary research studies. The REporting guideline for PRIority SEtting of health research (REPRISE) has 31 reporting items that cover 10 domains: context and scope, governance and team, framework for priority setting, stakeholders/participants, identification and collection of priorities, prioritization of research topics, output, evaluation and feedback, translation and implementation, and funding and conflict of interest. Each reporting item includes a descriptor and examples. The REPRISE guideline can facilitate comprehensive reporting of studies of research priority setting. Improved transparency in research priority setting may strengthen the acceptability and implementation of the research priorities identified, so that efforts and funding are invested in generating evidence that is of importance to all stakeholders. Not applicable.

更新日期：2019-12-30
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-30
Kulandaipalayam Natarajan Sindhu; Manikandan Srinivasan; Sathyapriya Subramaniam; Anita Shirley David; Venkata Raghava Mohan; Jacob John; Gagandeep Kang

Cohort studies are pivotal in understanding the natural history, and to thereby determine the incidence of a disease. The conduct of large-scale community-based cohort studies is challenging with reference to money, manpower and time. Further, attrition inherent to cohort studies can affect the power, and thereby the study’s validity. Our objective was to estimate the percentage of participant withdrawal and to subsequently understand reasons for the same in the Vellore Typhoid Surveillance (VTS) cohort. VTS study, a prospective community-based pediatric cohort, was established in a semi-urban settlement of Vellore to estimate the incidence rate of typhoid fever. An active weekly surveillance identified children with fever, and blood cultures were performed for fevers of ≥3 days. Reasons for participant drop-out in the cohort were documented. Nine focus group discussions (FGD), each with 5 to 7 parents/primary caregivers of former as well current participants were conducted separately, to understand reasons for consent withdrawal as well as the good aspects of the study that the current participants perceived. A descriptive, as well as an interpretative account of the themes that emerged from the FGDs were done. Of the 5639 children in the VTS cohort, 404 (7.2%) withdrew consent during the 12-month surveillance. Of these, 50% dropped out due to migration from study area; 18.1% as their parents were unhappy with the blood draws for blood culture; and 14.4% did not clearly put forth the reason for consent withdrawal. Being from an orthodox background, high socio-economic status and joint family were associated with a decision to drop-out. Frequent and voluminous blood draws, male field research assistants (FRA) making weekly home-visits, the perception that inquiring about fever made their child fall sick, and that the study clinic did not initiate antibiotics immediately, were the important themes that emerged from the FGDs conducted among drop-outs. Our study showed that specific beliefs and behaviours within the community influenced the drop-out rate of the VTS cohort. Background characteristics and perceptions that exist, along with attrition data from previous cohort studies in the specific community are important to be considered while implementing large-scale cohort studies.

更新日期：2019-12-30
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-30
Hanna E. Tervonen; Stuart Purdie; Nicola Creighton

Aboriginal people are known to be under-recorded in routinely collected datasets in Australia. This study examined methods for enhancing the reporting of cancer incidence among Aboriginal people using linked data methodologies. Invasive cancers diagnosed in New South Wales (NSW), Australia, in 2010–2014 were identified from the NSW Cancer Registry (NSWCR). The NSWCR data were linked to the NSW Admitted Patient Data Collection, the NSW Emergency Department Data Collection and the Australian Coordinating Register Cause of Death Unit Record File. The following methods for enhancing the identification of Aboriginal people were used: ‘ever-reported’, ‘reported on most recent record’, ‘weight of evidence’ and ‘multi-stage median’. The impact of these methods on the number of cancer cases and age-standardised cancer incidence rates (ASR) among Aboriginal people was explored. Of the 204,948 cases of invasive cancer, 2703 (1.3%) were recorded as Aboriginal on the NSWCR. This increased with enhancement methods to 4184 (2.0%, ‘ever’), 3257 (1.6%, ‘most recent’), 3580 (1.7%, ‘weight of evidence’) and 3583 (1.7%, ‘multi-stage median’). Enhancement was generally greater in relative terms for males, people aged 25–34 years, people with cancers of localised or unknown degree of spread, people living in urban areas and areas with less socio-economic disadvantage. All enhancement methods increased ASRs for Aboriginal people. The weight of evidence method increased the overall ASR by 42% for males (894.1 per 100,000, 95% CI 844.5–945.4) and 27% for females (642.7 per 100,000, 95% CI 607.9–678.7). Greatest relative increases were observed for melanoma and prostate cancer incidence (126 and 63%, respectively). ASRs for prostate and breast cancer increased from below to above the ASRs of non-Aboriginal people with enhancement of Aboriginal status. All data linkage methods increased the number of cancer cases and ASRs for Aboriginal people. Enhancement varied by demographic and cancer characteristics. We considered the weight of evidence method to be most suitable for population-level reporting of cancer incidence among Aboriginal people. The impact of enhancement on disparities in cancer outcomes between Aboriginal and non-Aboriginal people should be further examined.

更新日期：2019-12-30
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-26
Ugochukwu N. Udogwu; Andrea Howe; Katherine Frey; Marckenley Isaac; Daniel Connelly; Dimitrius Marinos; Mitchell Baker; Renan C. Castillo; Gerard P. Slobogean; Robert V. O’Toole; Nathan N. O’Hara

This study aimed to address the current limitations of the use of composite endpoints in orthopaedic trauma research by quantifying the relative importance of clinical outcomes common to orthopaedic trauma patients and use those values to develop a patient-centered composite endpoint weighting technique. A Best-Worst Scaling choice experiment was administered to 396 adult surgically-treated fracture patients. Respondents were presented with ten choice sets, each consisting of three out of ten plausible clinical outcomes. Hierarchical Bayesian modeling was used to determine the utilities associated with the outcomes. Death was the outcome of greatest importance (mean utility = − 8.91), followed by above knee amputation (− 7.66), below knee amputation (− 6.97), severe pain (− 5.90), deep surgical site infection (SSI) (− 5.69), bone healing complications (− 5.20), and moderate pain (− 4.59). Mild pain (− 3.30) and superficial SSI (− 3.29), on the other hand, were the outcomes of least importance to respondents. This study revealed that patients’ relative importance towards clinical outcomes followed a logical gradient, with distinct and quantifiable preferences for each possible component outcome. These findings were incorporated into a novel composite endpoint weighting technique.

更新日期：2019-12-27
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-18
Jasper Frese; Annalice Gode; Gerhard Heinrichs; Armin Will; Arndt-Peter Schulz

Subsequent to a three-month pilot phase, recruiting patients for the newly established BFCC (Baltic Fracture Competence Centre) transnational fracture registry, a validation of the data quality needed to be carried out, applying a standardized method. During the literature research, the method of “adaptive monitoring” fulfilled the requirements of the registry and was applied. It consisted of a three-step audit process; firstly, scoring of the overall data quality, followed by source data verification of a sample size, relative to the scoring result, and finally, feedback to the registry on measures to improve data quality. Statistical methods for scoring of data quality and visualisation of discrepancies between registry data and source data were developed and applied. Initially, the data quality of the registry scored as medium. During source data verification, missing items in the registry, causing medium data quality, turned out to be absent in the source as well. A subsequent adaptation of the score evaluated the registry’s data quality as good. It was suggested to add variables to some items in order to improve the accuracy of the registry. The application of the method of adaptive monitoring has only been published by Jacke et al., with a similar improvement of the scoring result following the audit process. Displaying data from the registry in graphs helped to find missing items and discover issues with data formats. Graphically comparing the degree of agreement between the registry and source data allowed to discover systematic faults. The method of adaptive monitoring gives a substantiated guideline for systematically evaluating and monitoring a registry’s data quality and is currently second to none. The resulting transparency of the registry’s data quality could be helpful in annual reports, as published by most major registries. As the method has been rarely applied, further successive applications in established registries would be desirable.

更新日期：2019-12-19
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-17
Bradley C. Johnston; Pablo Alonso-Coello; Malgorzata M. Bala; Dena Zeraatkar; Montserrat Rabassa; Claudia Valli; Catherine Marshall; Regina El Dib; Robin W. M. Vernooij; Per O. Vandvik; Gordon H. Guyatt

Following publication of the original article [1], the authors reported a change in the ‘Competing interests’ section as described below.

更新日期：2019-12-18
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-16
Jocelyn Kuhn; Radley Christopher Sheldrick; Sarabeth Broder-Fingert; Andrea Chu; Lisa Fortuna; Megan Jordan; Dana Rubin; Emily Feinberg

The Multiphase Optimization Strategy (MOST) is designed to maximize the impact of clinical healthcare interventions, which are typically multicomponent and increasingly complex. MOST often relies on factorial experiments to identify which components of an intervention are most effective, efficient, and scalable. When assigning participants to conditions in factorial experiments, researchers must be careful to select the assignment procedure that will result in balanced sample sizes and equivalence of covariates across conditions while maintaining unpredictability. In the context of a MOST optimization trial with a 2x2x2x2 factorial design, we used computer simulation to empirically test five subject allocation procedures: simple randomization, stratified randomization with permuted blocks, maximum tolerated imbalance (MTI), minimal sufficient balance (MSB), and minimization. We compared these methods across the 16 study cells with respect to sample size balance, equivalence on key covariates, and unpredictability. Leveraging an existing dataset to compare these procedures, we conducted 250 computerized simulations using bootstrap samples of 304 participants. Simple randomization, the most unpredictable procedure, generated poor sample balance and equivalence of covariates across the 16 study cells. Stratified randomization with permuted blocks performed well on stratified variables but resulted in poor equivalence on other covariates and poor balance. MTI, MSB, and minimization had higher complexity and cost. MTI resulted in balance close to pre-specified thresholds and a higher degree of unpredictability, but poor equivalence of covariates. MSB had 19.7% deterministic allocations, poor sample balance and improved equivalence on only a few covariates. Minimization was most successful in achieving balanced sample sizes and equivalence across a large number of covariates, but resulted in 34% deterministic allocations. Small differences in proportion of correct guesses were found across the procedures. Based on the computer simulation results and priorities within the study context, minimization with a random element was selected for the planned research study. Minimization with a random element, as well as computer simulation to make an informed randomization procedure choice, are utilized infrequently in randomized experiments but represent important technical advances that researchers implementing multi-arm and factorial studies should consider.

更新日期：2019-12-17
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-12
Dmitri Nepogodiev

Trainee research collaboratives (TRCs) have pioneered high quality, prospective ‘snap-shot’ surgical cohort studies in the UK. Outcomes After Kidney injury in Surgery (OAKS) was the first TRC cohort study to attempt to collect one-year follow-up data. The aims of this study were to evaluate one-year follow-up and data completion rates, and to identify factors associated with improved follow-up rates. In this multicentre study, patients undergoing major gastrointestinal surgery were prospectively identified and followed up at one-year following surgery for six clinical outcomes. The primary outcome for this report was the follow-up rate for mortality at 1 year. The secondary outcome was the data completeness rate in those patients who were followed-up. An electronic survey was disseminated to investigators to identify strategies associated with improved follow-up. Of the 173 centres that collected baseline data, 126 centres registered to participate in one-year follow-up. Overall 62.3% (3482/5585) of patients were followed-up at 1 year; in centres registered to collect one-year outcomes, the follow-up rate was 82.6% (3482/4213). There were no differences in sex, comorbidity, operative urgency, or 7-day postoperative AKI rate between patients who were lost to follow-up and those who were successfully followed-up. In centres registered to collect one-year follow-up outcomes, overall data completeness was 83.1%, with 57.9% (73/126) of centres having ≥95% data completeness. Factors associated with increased likelihood of achieving ≥95% data completeness were total number of patients to be followed-up (77.4% in centres with < 15 patients, 59.0% with 15–29 patients, 51.4% with 30–59 patients, and 36.8% with > 60 patients, p = 0.030), and central versus local storage of patient identifiers (72.5% vs 48.0%, respectively, p = 0.006). TRC methodology can be used to follow-up patients identified in prospective cohort studies at one-year. Follow-up rates are maximized by central storage of patient identifiers.

更新日期：2019-12-13
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-12
Manuella Lech Cantuaria; Victoria Blanes-Vidal

Internet has been broadly employed as a facilitator for epidemiological surveys, as a way to provide a more economical and practical alternative to traditional survey modes. A current trend in survey research is to combine Web-based surveys with other survey modes by offering the participant the possibility of choosing his/her preferred response method (i.e. mixed-mode approach). However, studies have also demonstrated that the use of different survey modes may produce different responses to the same questions, posing potential challenges on the use of mixed-mode approaches. In this paper, we have implemented a statistical comparison between mixed-mode survey responses collected via mail (i.e. paper) and Web methods obtained from a cross-sectional study in non-urban areas of Denmark. Responses provided by mail and Web participants were compared in terms of: 1) the impact of reminder letters in increasing response rates; 2) differences in socio-demographic characteristics between response groups; 3) changes on the likelihood of reporting health symptoms and negative attitudes towards environmental stressors. Comparisons were mainly performed by two sample t-test, Pearson’s Chi-squared test and multinomial logistic regression models. Among 3104 contacted households, 1066 residents decided to participate on the study. Out of those, 971 selected to respond via mail, whereas 275 preferred the Web method. The majority of socio-demographic characteristics between these two groups of respondents were shown to be statistically different. The use of mailed surveys increased the likelihood of reporting health symptoms and negative attitudes towards environmental stressors, even after controlling for demographic characteristics. Furthermore, the use of reminder letters had a higher positive impact in increasing responses of Web surveys when compared to mail surveys. Our main findings suggest that the use of mail and Web surveys may produce different responses to the same questions posed to participants, but, at the same time, may reach different groups of respondents, given that the overall characteristics of both groups considerably differ. Therefore, the tradeoff between using mixed-mode survey as a way to increase response rate and obtaining undesirable measurement changes may be attentively considered in future survey studies.

更新日期：2019-12-13
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-11
Ana Penedones; Carlos Alves; Francisco Batel-Marques

This scoping review aims to identify, review and characterize the published recommendations to conduct and/or to report a systematic review in medical interventions area. A search was carried out in PubMed, EMBASE and Cochrane Library databases, using systematic reviews search filters. The search comprises all recommendations to conduct and/or report a systematic review. Data on methods were extracted from each recommendation. A descriptive analysis was performed. Eighty-three recommendations were identified. Approximately 60% of retrieved references were published in the last 6 years. Recommendations to both conduct and report a systematic review were issued in 47% studies. The guidance presented in each recommendation to conduct and/ or report a systematic review varied. Almost 96% of the recommendations offer guidance on systematic review methods section. The need and time for updating was only recommended in 29% of recommendations. Forty percent of recommendations endorsed their methods to any subject related to medical interventions. Half of the studies did not specify the design of studies to be included in a systematic review. Several recommendations to conduct and/or report a systematic review were published and offered different guidance. Further research on the impact of such heterogeneity can improve systematic reviews quality.

更新日期：2019-12-11
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-11
Anna Graves; Deirdre McLaughlin; Janni Leung; Jennifer Powers

更新日期：2019-12-11
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-11
Amy C. W. Tan; Lindy Clemson; Lynette Mackenzie; Catherine Sherrington; Chris Roberts; Anne Tiedemann; Constance D. Pond; Fiona White; Judy M. Simpson

Falls are common among older people, and General Practitioners (GPs) could play an important role in implementing strategies to manage fall risk. Despite this, fall prevention is not a routine activity in general practice settings. The iSOLVE cluster randomised controlled trial aimed to evaluate implementation of a fall prevention decision tool in general practice. This paper sought to describe the strategies used and reflect on the enablers and barriers relevant to successful recruitment of general practices, GPs and their patients. Recruitment was conducted within the geographical area of a Primary Health Network in Northern Sydney, Australia. General practices and GPs were engaged via online surveys, mailed invitations to participate, educational workshops, practitioner networks and promotional practice visits. Patients 65 years or older were recruited via mailed invitations, incorporating the practice letterhead and the name(s) of participating GP(s). Observations of recruitment strategies, results and enabling factors were recorded in field notes as descriptive and narrative data, and analysed using mixed-methods. It took 19 months to complete recruitment of 27 general practices, 75 GPs and 560 patients. The multiple strategies used to engage general practices and GPs were collectively useful in reaching the targeted sample size. Practice visits were valuable in engaging GPs and staff, establishing interest in fall prevention and commitment to the trial. A mix of small, medium and large practices were recruited. While some were recruited as a whole-practice, other practices had few or half of the number of GPs recruited. The importance of preventing falls in older patients, simplicity of research design, provision of resources and logistic facilitation of patient recruitment appealed to GPs. Recruitment of older patients was successfully achieved by mailed invitations which was a strategy that was familiar to practice staff and patients. Patient response rates were above the expected 10% for most practices. Many practices (n = 17) achieved the targeted number of 20 or more patients. Recruitment in general practice settings can be successfully achieved through multiple recruitment strategies, effective communication and rapport building, ensuring research topic and design suit general practice needs, and using familiar communication strategies to engage patients. The trial was prospectively registered on 29 April 2015 with the Australian New Zealand Clinical Trial Registry www.anzctr.org.au (trial ID: ACTRN12615000401550).

更新日期：2019-12-11
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-09
Hafizur Rahman Chowdhury; Abraham D. Flaxman; Jonathan C. Joseph; Riley H. Hazard; Nurul Alam; Ian Douglas Riley; Alan D. Lopez

Verbal autopsy (VA) is increasingly being considered as a cost-effective method to improve cause of death information in countries with low quality vital registration. VA algorithms that use empirical data have an advantage over expert derived algorithms in that they use responses to the VA instrument as a reference instead of physician opinion. It is unclear how stable these data driven algorithms, such as the Tariff 2.0 method, are to cultural and epidemiological variations in populations where they might be employed. VAs were conducted in three sites as part of the Improving Methods to Measure Comparable Mortality by Cause (IMMCMC) study: Bohol, Philippines; Chandpur and Comila Districts, Bangladesh; and Central and Eastern Highlands Provinces, Papua New Guinea. Similar diagnostic criteria and cause lists as the Population Health Metrics Research Consortium (PHMRC) study were used to identify gold standard (GS) deaths. We assessed changes in Tariffs by examining the proportion of Tariffs that changed significantly after the addition of the IMMCMC dataset to the PHMRC dataset. The IMMCMC study added 3512 deaths to the GS VA database (2491 adults, 320 children, and 701 neonates). Chance-corrected cause specific mortality fractions for Tariff improved with the addition of the IMMCMC dataset for adults (+ 5.0%), children (+ 5.8%), and neonates (+ 1.5%). 97.2% of Tariffs did not change significantly after the addition of the IMMCMC dataset. Tariffs generally remained consistent after adding the IMMCMC dataset. Population level performance of the Tariff method for diagnosing VAs improved marginally for all age groups in the combined dataset. These findings suggest that cause-symptom relationships of Tariff 2.0 might well be robust across different population settings in developing countries. Increasing the total number of GS deaths improves the validity of Tariff and provides a foundation for the validation of other empirical algorithms.

更新日期：2019-12-11
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-09
Bart Hiemstra; Frederik Keus; Jørn Wetterslev; Christian Gluud; Iwan C. C. van der Horst

All clinical research benefits from transparency and validity. Transparency and validity of studies may increase by prospective registration of protocols and by publication of statistical analysis plans (SAPs) before data have been accessed to discern data-driven analyses from pre-planned analyses. Like clinical trials, recommendations for SAPs for observational studies increase the transparency and validity of findings. We appraised the applicability of recently developed guidelines for the content of SAPs for clinical trials to SAPs for observational studies. Of the 32 items recommended for a SAP for a clinical trial, 30 items (94%) were identically applicable to a SAP for our observational study. Power estimations and adjustments for multiplicity are equally important in observational studies and clinical trials as both types of studies usually address multiple hypotheses. Only two clinical trial items (6%) regarding issues of randomisation and definition of adherence to the intervention did not seem applicable to observational studies. We suggest to include one new item specifically applicable to observational studies to be addressed in a SAP, describing how adjustment for possible confounders will be handled in the analyses. With only few amendments, the guidelines for SAP of a clinical trial can be applied to a SAP for an observational study. We suggest SAPs should be equally required for observational studies and clinical trials to increase their transparency and validity.

更新日期：2019-12-11
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-09
Siaka Koné; Bassirou Bonfoh; Daouda Dao; Inza Koné; Günther Fink

In low-income settings, key outcomes such as biomarkers or clinical assessments are often missing for a substantial proportion of the study population. The aim of this study was to assess the extent to which Heckman-type selection models can create unbiased estimates in such settings. We introduce the basic Heckman model in a first stage, and then use simulation models to compare the performance of the model to alternative approaches used in the literature for missing outcome data, including complete case analysis (CCA), multiple imputations by chained equations (MICE) and pattern imputation with delta adjustment (PIDA). Last, we use a large population-representative data set on antenatal supplementation (AS) and birth outcomes from Côte d’Ivoire to illustrate the empirical relevance of this method. All models performed well when data were missing at random. When missingness in the outcome data was related to unobserved determinants of the outcome, large and systematic biases were found for CCA and MICE, while Heckman-style selection models yielded unbiased estimates. Using Heckman-type selection models to correct for missingness in our empirical application, we found supplementation effect sizes that were very close to those reported in the most recent systematic review of clinical AS trials. Missingness in health outcome can lead to substantial bias. Heckman-selection models can correct for this selection bias and yield unbiased estimates, even when the proportion of missing data is substantial.

更新日期：2019-12-09
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-04
C. M. W. Gaasterland; M. C. Jansen van der Weide; K. C. B. Roes; J. H. van der Lee

Goal Attainment Scaling (GAS) is an instrument that is intended to evaluate the effect of an intervention by assessing change in daily life activities on an individual basis. However, GAS has not been validated adequately in an RCT setting. In this paper we propose a conceptual validation plan of GAS in the setting of rare disease drug trials, and describe a hypothetical trial where GAS could be validated. We have used the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) taxonomy to deduce which measurement properties of GAS can be evaluated, and how. As individual GAS scores cannot be interpreted outside the context of a RCT, the validation of GAS needs to be done on trial as well as on individual level. The procedure of GAS consists of three steps. For the step of goal selection (step 1) and definition of levels of attainment (step 2), face validity may be assessed by clinical experts. For the evaluation of the goal attainment (step 3), the inter and intra rater reliability can be evaluated on an individual level. Construct validity may be evaluated by comparison with change scores on other instruments measuring in the same domain as particular goals, if available, and by testing hypotheses about differences between groups. A difference in mean GAS scores between a group who received an efficacious intervention and a control group is an indication of well-chosen goals, and corroborates construct validity of GAS on trial level. Responsiveness of GAS cannot be evaluated due to the nature of the construct being assessed. GAS may be useful as an instrument to assess functional change as an outcome measure in heterogeneous chronic rare diseases, but it can only be interpreted and validated when used in RCTs with blinded outcome assessment. This proposed theoretical validation plan can be used as a starting point to validate GAS in specific conditions.

更新日期：2019-12-05
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-05
Christina Daskalopoulou; Martin Prince; Artemis Koukounari; Josep Maria Haro; Demosthenes B. Panagiotakos; A. Matthew Prina

In the absence of a consensus on definition and measurement of healthy ageing, we created a healthy ageing index tallying with the functional ability framework provided by the World Health Organization. To create this index, we employed items of functional ability and intrinsic capacity. The current study aims to establish the predictive validity and discrimination properties of this healthy ageing index in settings in Latin American, part of the 10/66 cohort. Population-based cohort studies including 12,865 people ≥65 years old in catchment areas of Cuba, Dominican Republic, Venezuela, Mexico and Peru. We employed latent variable modelling to estimate the healthy ageing scores of each participant. We grouped participants according to the quintiles of the healthy ageing score distribution. Cox’s proportional hazard models for mortality and sub-hazard (competing risks) models for incident dependence (i.e. needing care) were calculated per area after a median of 3.9 years and 3.7 years, respectively. Results were pooled together via fixed-effects meta-analysis. Our findings were compared with those obtained from self-rated health. Participants with lowest levels, compared to participants with highest level of healthy ageing, had increased risk of mortality and incident dependence, even after adjusting for sociodemographic and health conditions (HR: 3.25, 95%CI: 2.63–4.02; sub-HR: 5.21, 95%CI: 4.02–6.75). Healthy ageing scores compared to self-rated health had higher population attributable fractions (PAFs) for mortality (43.6% vs 19.3%) and incident dependence (58.6% vs 17.0%), and better discriminative power (Harrell’s c-statistic: mortality 0.74 vs 0.72; incident dependence 0.76 vs 0.70). These results provide evidence that our healthy ageing index could be a valuable tool for prevention strategies as it demonstrated predictive and discriminative properties. Further research in other cultural settings will assist moving from a theoretical conceptualisation of healthy ageing to a more practical one.

更新日期：2019-12-05
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-05
Christina Daskalopoulou; Kia-Chong Chua; Artemis Koukounari; Francisco Félix Caballero; Martin Prince; A. Matthew Prina

Our population is ageing and in 2050 more than one out of five people will be 60 years or older; 80% of whom will be living in a low-and-middle income country. Living longer does not entail living healthier; however, there is not a widely accepted measure of healthy ageing hampering policy and research. The World Health Organization defines healthy ageing as the process of developing and maintaining functional ability that will enable well-being in older age. We aimed to create a healthy ageing index (HAI) in a subset of six low-and-middle income countries, part of the 10/66 study, by using items of functional ability and intrinsic capacity. The study sample included residents 65-years old and over (n = 12,865) from catchment area sites in Cuba, Dominican Republic, Peru, Venezuela, Mexico and Puerto Rico. Items were collected by interviewing participants or key informants between 2003 and 2010. Two-stage factor analysis was employed and we compared one-factor, second-order and bifactor models. The psychometric properties of the index, including reliability, replicability, unidimensionality and concurrent convergent validity as well as measurement invariance per ethnic group and gender were further examined in the best fit model. The bifactor model displayed superior model fit statistics supporting that a general factor underlies the various items but other subdomain factors are also needed. The HAI indicated excellent reliability (ω = 0.96, ωΗ = 0.84), replicability (H = 0.96), some support for unidimensionality (Explained Common Variance = 0.65) and some concurrent convergent validity with self-rated health. Scalar measurement invariance per ethnic group and gender was supported. A HAI with excellent psychometric properties was created by using items of functional ability and intrinsic capacity in a subset of six low-and-middle income countries. Further research is needed to explore sub-population differences and to validate this index to other cultural settings.

更新日期：2019-12-05
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-05
Di Shu; Jessica G. Young; Sengwee Toh

Multi-center studies can generate robust and generalizable evidence, but privacy considerations and legal restrictions often make it challenging or impossible to pool individual-level data across data-contributing sites. With binary outcomes, privacy-protecting distributed algorithms to conduct logistic regression analyses have been developed. However, the risk ratio often provides a more transparent interpretation of the exposure-outcome association than the odds ratio. Modified Poisson regression has been proposed to directly estimate adjusted risk ratios and produce confidence intervals with the correct nominal coverage when individual-level data are available. There are currently no distributed regression algorithms to estimate adjusted risk ratios while avoiding pooling of individual-level data in multi-center studies. By leveraging the Newton-Raphson procedure, we adapted the modified Poisson regression method to estimate multivariable-adjusted risk ratios using only summary-level information in multi-center studies. We developed and tested the proposed method using both simulated and real-world data examples. We compared its results with the results from the corresponding pooled individual-level data analysis. Our proposed method produced the same adjusted risk ratio estimates and standard errors as the corresponding pooled individual-level data analysis without pooling individual-level data across data-contributing sites. We developed and validated a distributed modified Poisson regression algorithm for valid and privacy-protecting estimation of adjusted risk ratios and confidence intervals in multi-center studies. This method allows computation of a more interpretable measure of association for binary outcomes, along with valid construction of confidence intervals, without sharing of individual-level data.

更新日期：2019-12-05
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-05
Arvind Oemrawsingh; Nikki van Leeuwen; Esmee Venema; Martien Limburg; Frank-Erik de Leeuw; Markus P. Wijffels; Aafke J. de Groot; Pieter H. E. Hilkens; Jan A. Hazelzet; Diederik W. J. Dippel; Carla H. Bakker; Helene R. Voogdt-Pruis; Hester F. Lingsma

Patient-Reported Outcome Measures (PROMs) have been proposed for benchmarking health care quality across hospitals, which requires extensive case-mix adjustment. The current study’s aim was to develop and compare case-mix models for mortality, a functional outcome, and a patient-reported outcome measure (PROM) in ischemic stroke care. Data from ischemic stroke patients, admitted to four stroke centers in the Netherlands between 2014 and 2016 with available outcome information (N = 1022), was analyzed. Case-mix adjustment models were developed for mortality, modified Rankin Scale (mRS) scores and EQ-5D index scores with respectively binary logistic, proportional odds and linear regression models with stepwise backward selection. Predictive ability of these models was determined with R-squared (R2) and area-under-the-receiver-operating-characteristic-curve (AUC) statistics. Age, NIHSS score on admission, and heart failure were the only common predictors across all three case-mix adjustment models. Specific predictors for the EQ-5D index score were sex (β = 0.041), socio-economic status (β = − 0.019) and nationality (β = − 0.074). R2-values for the regression models for mortality (5 predictors), mRS score (9 predictors) and EQ-5D utility score (12 predictors), were respectively R2 = 0.44, R2 = 0.42 and R2 = 0.37. The set of case-mix adjustment variables for the EQ-5D at three months differed considerably from the set for clinical outcomes in stroke care. The case-mix adjustment variables that were specific to this PROM were sex, socio-economic status and nationality. These variables should be considered in future attempts to risk-adjust for PROMs during benchmarking of hospitals.

更新日期：2019-12-05
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-05
Michael G. Smith; Maryam Witte; Sarah Rocha; Mathias Basner

Questionnaires are valuable data collection instruments in public health research, and can serve to pre-screen respondents for suitability in future studies. Survey non-response leads to reduced effective sample sizes and can decrease representativeness of the study population, so high response rates are needed to minimize the risk of bias. Here we present results on the success of different postal questionnaire strategies at effecting response, and the effectiveness of these strategies at recruiting participants for a field study on the effects of aircraft noise on sleep. In total, we mailed 17 rounds of 240 questionnaires (total n = 4080) to randomly selected households around Atlanta International Airport. Different mailing rounds were varied in the length of the questionnaire (11, 26 or 55 questions), survey incentive (gift card or $2 cash), number of follow-up waves (0, 2 or 3), incentive for participating in a 5-night in-home sleep study ($100, $150 or$200), and address personalization. We received completed questionnaires from 407 respondents (response rate 11.4%). Personalizing the address, enclosing a \$2 cash incentive with the initial questionnaire mailing and repeated follow-up mailings were effective at increasing response rate. Despite the increased expense of these approaches in terms of each household mailed, the higher response rates meant that they were more cost-effective overall for obtaining an equivalent number of responses. Interest in participating in the field study decreased with age, but was unaffected by the mailing strategies or cash incentives for field study participation. The likelihood that a respondent would participate in the field study was unaffected by survey incentive, survey length, number of follow-up waves, field study incentive, age or sex. Pre-issued cash incentives and sending follow-up waves could maximize the representativeness and numbers of people from which to recruit, and may be an effective strategy for improving recruitment into field studies.

更新日期：2019-12-05
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-04
Kaat Goorts; Charlotte Vanovenberghe; Charlotte Lambreghts; Eline Bruneel; Dorina Rusu; Marc Du Bois; Sofie Vandenbroeck; Lode Godderis

In the original publication of this article [1] the author Marc Du Bois was omitted. In this correction article the author and the corresponding details are provided. The publisher apologizes to the readers and authors for the inconvenience.

更新日期：2019-12-04
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-03
Ellen B. M. Elsman; Valerija Tadić; Carel F. W. Peeters; Ger H. M. B. van Rens; Jugnoo S. Rahi; Ruth M. A. van Nispen

To assess cross-cultural validity between Dutch and English versions of the FVQ_CYP, a patient-reported outcome measure developed in the United Kingdom (UK) for children and adolescents with (severe) visual impairment or blindness (VI for brevity) to measure functional vision. The 36-item FVQ_CYP was translated and adapted into Dutch using standard guidelines. The questionnaire was administered to Dutch children and adolescents aged 7–17 years (N = 253) with impaired vision (no restrictions regarding acuity). Data were compared to existing UK data of children and adolescents aged 10–15 years (N = 91) with VI (acuity LogMar worse than 0.48). As with the original UK FVQ_CYP validation, a rating scale model (RSM) was applied to the Dutch data. Minor adaptations were needed in translation-rounds. Significant differences in item responses were found between the Dutch and UK data. Item response theory assumptions were met, but fit to the RSM was unsatisfactory. Therefore, psychometric properties of the Dutch FVQ_CYP were analysed irrespective of the original model and criteria used. A graded response model led to the removal of 12 items due to missing data, low information, overlapping content and limited relevance to Dutch children. Fit indices for the remaining 24 items were adequate. Differences in population characteristics, distribution of responses, non-invariance at the model level and small sample sizes challenged the cross-cultural validation process. However, the Dutch adapted FVQ_CYP showed high measurement precision and broad coverage of items measuring children’s functional vision. The underlying reasons for differences between countries in instrument performance are discussed with implications for future studies.

更新日期：2019-12-03
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-03
Maeregu W. Arisido; Laura Antolini; Davide P. Bernasconi; Maria G. Valsecchi; Paola Rebora

The recent progress in medical research generates an increasing interest in the use of longitudinal biomarkers for characterizing the occurrence of an outcome. The present work is motivated by a study, where the objective was to explore the potential of the long pentraxin 3 (PTX3) as a prognostic marker of Acute Graft-versus-Host Disease (GvHD) after haematopoietic stem cell transplantation. Time-varying covariate Cox model was commonly used, despite its limiting assumptions that marker values are constant in time and measured without error. A joint model has been developed as a viable alternative; however, the approach is computationally intensive and requires additional strong assumptions, in which the impacts of their misspecification were not sufficiently studied. We conduct an extensive simulation to clarify relevant assumptions for the understanding of joint models and assessment of its robustness under key model misspecifications. Further, we characterize the extent of bias introduced by the limiting assumptions of the time-varying covariate Cox model and compare its performance with a joint model in various contexts. We then present results of the two approaches to evaluate the potential of PTX3 as a prognostic marker of GvHD after haematopoietic stem cell transplantation. Overall, we illustrate that a joint model provides an unbiased estimate of the association between a longitudinal marker and the hazard of an event in the presence of measurement error, showing improvement over the time-varying Cox model. However, a joint model is severely biased when the baseline hazard or the shape of the longitudinal trajectories are misspecified. Both the Cox model and the joint model correctly specified indicated PTX3 as a potential prognostic marker of GvHD, with the joint model providing a higher hazard ratio estimate. Joint models are beneficial to investigate the capability of the longitudinal marker to characterize time-to-event endpoint. However, the benefits are strictly linked to the correct specification of the longitudinal marker trajectory and the baseline hazard function, indicating a careful consideration of assumptions to avoid biased estimates.

更新日期：2019-12-03
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-03
Myra B. McGuinness; Jessica Kasza; Amalia Karahalios; Robyn H. Guymer; Robert P. Finger; Julie A. Simpson

Attrition due to death and non-attendance are common sources of bias in studies of age-related diseases. A simulation study is presented to compare two methods for estimating the survivor average causal effect (SACE) of a binary exposure (sex-specific dietary iron intake) on a binary outcome (age-related macular degeneration, AMD) in this setting. A dataset of 10,000 participants was simulated 1200 times under each scenario with outcome data missing dependent on measured and unmeasured covariates and survival. Scenarios differed by the magnitude and direction of effect of an unmeasured confounder on both survival and the outcome, and whether participants who died following a protective exposure would also die if they had not received the exposure (validity of the monotonicity assumption). The performance of a marginal structural model (MSM, weighting for exposure, survival and missing data) was compared to a sensitivity approach for estimating the SACE. As an illustrative example, the SACE of iron intake on AMD was estimated using data from 39,918 participants of the Melbourne Collaborative Cohort Study. The MSM approach tended to underestimate the true magnitude of effect when the unmeasured confounder had opposing directions of effect on survival and the outcome. Overestimation was observed when the unmeasured confounder had the same direction of effect on survival and the outcome. Violation of the monotonicity assumption did not increase bias. The estimates were similar between the MSM approach and the sensitivity approach assessed at the sensitivity parameter of 1 (assuming no survival bias). In the illustrative example, high iron intake was found to be protective of AMD (adjusted OR 0.57, 95% CI 0.40–0.82) using complete case analysis via traditional logistic regression. The adjusted SACE odds ratio did not differ substantially from the complete case estimate, ranging from 0.54 to 0.58 for each of the SACE methods. On average, MSMs with weighting for exposure, missing data and survival produced biased estimates of the SACE in the presence of an unmeasured survival-outcome confounder. The direction and magnitude of effect of unmeasured survival-outcome confounders should be considered when assessing exposure-outcome associations in the presence of attrition due to death.

更新日期：2019-12-03
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-12-02
Jose Francisco Meneses-Echavez; Indira Rodriguez-Prieto; Mark Elkins; Javier Martínez-Torres; Lien Nguyen; Julia Bidonde

Exercise is an effective therapeutic intervention for cancer survivors. Concerns about the completeness of reporting of exercise interventions have been raised in the literature, but without any formal analysis. This study aimed to evaluate the completeness of reporting of exercise interventions for cancer survivors in a large sample of randomized clinical trials (RCTs). We developed a pre-defined protocol. We searched MEDLINE, EMBASE, and CENTRAL for exercise trials in oncology between 2010 and 2017. Pairs of independent researchers screened the records, extracted study characteristics, and assessed 16 items on the TIDieR checklist (i.e., the 12 items, with item 5 divided into two and item 8 divided into four). For each of these items, the percentage of interventions in the included studies that reported the item was calculated. We included 131 RCTs reporting 138 interventions in the analysis. Breast cancer was the most common type of cancer (69, 50%), and aerobic exercise was the most studied exercise modality (43, 30%) followed by combined aerobic and resistance training (40, 28%). Completeness of reporting ranged from 42 to 96% among the TIDieR items; none of the items was fully reported. ‘Intervention length’ was the most reported item across interventions (133, 96%), followed by ‘rationale’ (131, 95%), whereas ‘provider’ (58, 42%) and ‘how well (planned)’ (63, 46%) were the two least reported items. Half of the TIDieR items were completely reported in 50 to 70% of the interventions, and only four items were reported in more than 80% of the interventions (Items 2 and 8a to c). The seven items deemed to be core for replication (Items 3 to 9) exhibited a mean reporting of 71%, ranging from 42 to 96%. Exercise training interventions for cancer survivors are incompletely reported across RCTs published between 2010 and 2017. The reporting of information about the provider, materials, and modifications require urgent improvements. Stronger reporting will enhance usability of trial reports by both healthcare providers and survivors, and will help to reduce research waste.

更新日期：2019-12-02
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-11-29
Matthew Krouwel; Kate Jolly; Sheila Greenfield

Within qualitative research in-person interviews have the reputation for being the highest standard of interviewer-participant encounter. However, there are other approaches to interviewing such as telephone and e-mail, which may be appropriate for a variety of reasons such as cost, time and privacy. Although there has been much discussion of the relative values of different interview methods, little research has been conducted to assess what differentiates them using quantifiable measures. None of this research has addressed the video call, which is the interview mode most like the in-person interview. This study uses quantifiable measures generated by the interview to explore the relative value of in-person and video call interview modes. Interview data gathered by a qualitative research study exploring the views of people with IBS about hypnotherapy for their condition were used. In-person and video call interviews using the same topic guide were compared on measures of length (time and word count), proportion of time the interviewer was dominant, the number of topics generated (codes) and the number of individual statements on which those topics were based. Both interview methods produced a similar number of words and a similar number of topics (codes) were discussed, however the number of statements upon which the variety of topics was based was notably larger for the in-person interviews. These findings suggest that in in-person study interviews were marginally superior to video calls in that interviewees said more, although this was on a similar range of topics. However, the difference is sufficiently modest that time and budget constraints may justify the use of some video call interviews within a qualitative research study.

更新日期：2019-11-30
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-02
Laura Sheard; Claire Marsh

Longitudinal qualitative research is starting to be used in applied health research, having been popular in social research for several decades. There is potential for a large volume of complex data to be captured, over a span of months or years across several different methods. How to analyse this volume of data – with its inherent complexity - represents a problem for health researchers. There is a previous dearth of methodological literature which describes an appropriate analytic process which can be readily employed. We document a worked example of the Pen Portrait analytic process, using the qualitative dataset for which the process was originally developed. Pen Portraits are recommended as a way in which longitudinal health research data can be concentrated into a focused account. The four stages of undertaking a pen portrait are: 1) understand and define what to focus on 2) design a basic structure 3) populate the content 4) interpretation. Instructive commentary and guidance is given throughout with consistent reference to the original study for which Pen Portraits were devised. The Pen Portrait analytic process was developed by the authors, borne out of a need to effectively integrate multiple qualitative methods collected over time. Pen Portraits are intended to be adaptable and flexible, in order to meet the differing analytic needs of qualitative longitudinal health studies. The Pen Portrait analytic process provides a useful framework to enable researchers to conduct a robust analysis of multiple sources of qualitative data collected over time.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-05
Ognjen Barcot; Matija Boric; Tina Poklepovic Pericic; Marija Cavar; Svjetlana Dosenovic; Ivana Vuka; Livia Puljak

Assessing the risk of bias (RoB) in included studies is one of the key methodological aspects of systematic reviews. Cochrane systematic reviews appraise RoB of randomised controlled trials (RCTs) with the Cochrane RoB tool. Detailed instructions for using the Cochrane RoB tool are provided in the Cochrane Handbook for Systematic Reviews of Interventions (The Cochrane Handbook). The purpose of this study was to analyse whether Cochrane authors use adequate judgments about the RoB for random sequence generation of RCTs included in Cochrane reviews. We extracted authors’ judgments (high, low or unclear RoB) and supports for judgments (comments accompanying judgments which explain the rationale for a judgment) for random sequence generation of included RCTs from RoB tables of Cochrane reviews using automated data scraping. We categorised all supporting comments, analysed the number and type of various supporting comments and assessed adequacy of RoB judgment for randomisation in line with recommendations from the Cochrane Handbook. We analysed 10,103 RCTs that were included in 704 Cochrane reviews. For 5,706 RCTs, randomisation was not described, but for the remaining RCTs, it was indicated that randomisation was performed using computer/software/internet (N = 2,850), random number table (N = 883), mechanical method (N = 359) or it was incomplete/inappropriate (N = 305). Overall, 1,220/10,103 trials (12%) did not have a RoB judgment in line with Cochrane Handbook guidance about randomisation. The highest proportion of misjudgements was found for trials with high RoB (28%), followed by those with low (20%) or unclear (3%). Therefore, one in eight judgments for the analysed domain in Cochrane reviews was not in line with Cochrane Handbook, and one in four if the judgment was "high risk". Authors of Cochrane reviews often make judgments about the RoB related to random sequence generation that are not in line with instructions given in the Cochrane Handbook, which compromises the reliability of the systematic reviews. Our results can help authors of both Cochrane and non-Cochrane reviews which use Cochrane RoB tool to avoid making common mistakes when assessing RoB in included trials.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-07
Simone Dahrouge; Catherine Deri Armstrong; William Hogg; Jatinderpreet Singh; Clare Liddy

Participants in voluntary research present a different demographic profile than those who choose not to participate, affecting the generalizability of many studies. Efforts to evaluate these differences have faced challenges, as little information is available from non-participants. Leveraging data from a recent randomized controlled trial that used health administrative databases in a jurisdiction with universal medical coverage, we sought to compare the quality of care provided by participating and non-participating physicians prior to the program’s implementation in order to assess whether participating physicians provided a higher baseline quality of care. We conducted clustered regression analyses of baseline data from provincial health administrative databases. Participants included all family physicians who were eligible to participate in the Improved Delivery of Cardiovascular Care (IDOCC) project, a quality improvement project rolled out in a geographically defined region in Ontario (Canada) between 2008 and 2011. We assessed 14 performance indicators representing measures of access, continuity, and recommended care for cancer screening and chronic disease management. In unadjusted and patient-adjusted models, patients of IDOCC-participating physicians had higher continuity scores at the provider (Odds Ratio (OR) [95% confidence interval]: 1.06 [1.03–1.09]) and practice (1.06 [1.04–1.08]) level, lower risk of emergency room visits (Rate Ratio (RR): 0.93 [0.88–0.97]) and hospitalizations (RR:0.87 [0.77–0.99]), and were more likely to have received recommended diabetes tests (OR: 1.25 [1.06–1.49]) and cancer screening for cervical cancer (OR: 1.32 [1.08–1.61] and breast cancer (OR: 1.32 [1.19–1.46]) than patients of non-participating physicians. Some indicators remained statistically significant in the model after adjusting for provider factors. Our study demonstrated a participation bias for several quality indicators. Physician characteristics can explain some of these differences. Other underlying physician or practice attributes also influence interest in participating in quality improvement initiatives and existing quality levels. The standard for addressing participation bias by controlling for basic physician and practice level variables is inadequate for ensuring that results are generalizable to primary care providers and practices.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-07
Michelle S. Fitts; Taeha Condon; John Gilroy; Katrina Bird; Erica Bleakley; Lauren Matheson; Jennifer Fleming; Alan R. Clough; Adrian Esterman; Paul Maruff; India Bohanna

Hospitals are common recruitment sites for injury and disability studies. However, the clinical and rehabilitation environment can create unique challenges for researchers to recruit participant populations. While there is growing injury and disability focused research involving Indigenous people to understand the types of services and supports required by this population to enhance their recovery experiences, there is limited knowledge of researchers’ experiences implementing recruitment processes in the tertiary hospital environment. This paper reflects on the specific challenges of recruiting Indigenous patients following a traumatic brain injury from two tertiary hospitals in Northern Australia. Between July 2016 and April 2018, research staff recruited eligible patients from one hospital in Queensland and one hospital in the Northern Territory. Qualitative records summarising research staff contact with patients, family members and clinical hospital staff were documented. These qualitative records, in addition to field trip notes and researcher reflections were reviewed to summarise the main challenges in gaining access to patients who fit the eligibility criteria. During the recruitment process, there were five main challenges encountered: (1) Patients discharging against medical advice from hospital; (2) Discharge prior to formal emergence from Post Traumatic Amnesia as per the Westmead Post Trauma Amnesia Scale; (3) Patients under adult guardianship orders; (4) Narrow participant eligibility criteria and (5) Coordinating around patient commitments and treatment. Details of how the recruitment processes were modified throughout the recruitment phase of the study to ensure greater access to patients that met the criteria are described. Based on our recruitment experiences, several recommendations are proposed for future TBI studies with Indigenous Australians. In addition to treatment, Indigenous TBI patients have wide range of needs that must be addressed while in hospital. Patient engagement and data collection processes should be flexible to respond to patient needs and the hospital environment. Employment of a centralized recruiter at each hospital site may help to minimise the challenges researchers need to navigate in the hospital environment. To improve recruitment processes in hospitals, it is essential for researchers examining other health or injury outcomes to describe their recruitment experiences.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-09
Tania Huria; Suetonia C. Palmer; Suzanne Pitama; Lutz Beckert; Cameron Lacey; Shaun Ewen; Linda Tuhiwai Smith

Research reporting guidelines are increasingly commonplace and shown to improve the quality of published health research and health outcomes. Despite severe health inequities among Indigenous Peoples and the potential for research to address the causes, there is an extended legacy of health research exploiting Indigenous Peoples. This paper describes the development of the CONSolIDated critERtia for strengthening the reporting of health research involving Indigenous Peoples (CONSIDER) statement. A collaborative prioritization process was conducted based on national and international statements and guidelines about Indigenous health research from the following nations (Peoples): Australia (Aboriginal and Torres Strait Islanders), Canada (First Nations Peoples, Métis), Hawaii (Native Hawaiian), New Zealand (Māori), Taiwan (Taiwan Indigenous Tribes), United States of America (First Nations Peoples) and Northern Scandinavian countries (Sami). A review of seven research guidelines was completed, and meta-synthesis was used to construct a reporting guideline checklist for transparent and comprehensive reporting of research involving Indigenous Peoples. A list of 88 possible checklist items was generated, reconciled, and categorized. Eight research domains and 17 criteria for the reporting of research involving Indigenous Peoples were identified. The research reporting domains were: (i) governance; (ii) relationships; (iii) prioritization; (iv) methodologies; (v) participation; (vi) capacity; (vii) analysis and findings; and (viii) dissemination. The CONSIDER statement is a collaborative synthesis and prioritization of national and international research statements and guidelines. The CONSIDER statement provides a checklist for the reporting of health research involving Indigenous peoples to strengthen research praxis and advance Indigenous health outcomes.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-09
Xiaoxue Chen; Abiy Agiro; Ann S. Martin; Ann M. Lucas; Kevin Haynes

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-14
Layla Parast; Megan Mathews; Mark W. Friedberg

Dynamic risk models, which incorporate disease-free survival and repeated measurements over time, might yield more accurate predictions of future health status compared to static models. The objective of this study was to develop and apply a dynamic prediction model to estimate the risk of developing type 2 diabetes mellitus. Both a static prediction model and a dynamic landmark model were used to provide predictions of a 2-year horizon time for diabetes-free survival, updated at 1, 2, and 3 years post-baseline i.e., predicting diabetes-free survival to 2 years and predicting diabetes-free survival to 3 years, 4 years, and 5 years post-baseline, given the patient already survived past 1 year, 2 years, and 3 years post-baseline, respectively. Prediction accuracy was evaluated at each time point using robust non-parametric procedures. Data from 2057 participants of the Diabetes Prevention Program (DPP) study (1027 in metformin arm, 1030 in placebo arm) were analyzed. The dynamic landmark model demonstrated good prediction accuracy with area under curve (AUC) estimates ranging from 0.645 to 0.752 and Brier Score estimates ranging from 0.088 to 0.135. Relative to a static risk model, the dynamic landmark model did not significantly differ in terms of AUC but had significantly lower (i.e., better) Brier Score estimates for predictions at 1, 2, and 3 years (e.g. 0.167 versus 0.099; difference − 0.068 95% CI − 0.083 to − 0.053, at 3 years in placebo group) post-baseline. Dynamic prediction models based on longitudinal, repeated risk factor measurements have the potential to improve the accuracy of future health status predictions.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-16
Ray Pawson

The paper opens with a brief history of two of the major intellectual components of the recent utilitarian turn in clinical research, namely ‘pragmatic trials’ and ‘implementation science’. The two schools of thought developed independently and the paper scrutinises their mutual compatibilities and incompatibilities, asking: i) what do the leading advocates of pragmatic trials assume about the transfer of research findings to real-world practice and ii) what role pragmatic trials can and should play in the evaluation of implementation science strategies. The paper utilises ‘explication de texte’: i) providing a close reading of the inferential logics contained in major published expositions of the two paradigms, and ii) interrogating the conclusions of a pragmatic trial of an intervention providing guidelines on retinal screening aimed at family practitioners. The paper is in two parts. Part 1 unearths some significant incommensurability – the pragmatic trial literature retains an antiquated view of knowledge transfer and is overly optimistic about the wide applicability the findings of pragmatic trials to ‘real world’ conditions. Part 2 of the paper outlines an empirical strategy to better penetrate the mechanisms of knowledge transfer and to tackle the issue of the generalisabilty of research findings in implementation science. Pragmatism, classically, is about problem solving and the melding of perspectives. The core research requirement in implementation science is a fundamental shift from the narrow shoulders of pragmatic trials to a model of explanation building based upon a multi-case, multi-method body of evidence.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-19
Patricia Luhn; Deborah Kuk; Gillis Carrigan; Nathan Nussbaum; Rachael Sorg; Rebecca Rohrer; Melisa G. Tucker; Brandon Arnieri; Michael D. Taylor; Neal J. Meropol

The use of real-world data to generate evidence requires careful assessment and validation of critical variables before drawing clinical conclusions. Prospective clinical trial data suggest that anatomic origin of colon cancer impacts prognosis and treatment effectiveness. As an initial step in validating this observation in routine clinical settings, we explored the feasibility and accuracy of obtaining information on tumor sidedness from electronic health records (EHR) billing codes. Nine thousand four hundred three patients with metastatic colorectal cancer (mCRC) were selected from the Flatiron Health database, which is derived from de-identified EHR data. This study included a random sample of 200 mCRC patients. Tumor site data derived from International Classification of Diseases (ICD) codes were compared with data abstracted from unstructured documents in the EHR (e.g. surgical and pathology notes). Concordance was determined via observed agreement and Cohen’s kappa coefficient (κ). Accuracy of ICD codes for each tumor site (left, right, transverse) was determined by calculating the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and corresponding 95% confidence intervals, using abstracted data as the gold standard. Study patients had similar characteristics and side of colon distribution compared with the full mCRC dataset. The observed agreement between the ICD codes and abstracted data for tumor site for all sampled patients was 0.58 (κ = 0.41). When restricting to the 62% of patients with a side-specific ICD code, the observed agreement was 0.84 (κ = 0.79). The specificity (92–98%) of structured data for tumor location was high, with lower sensitivity (49–63%), PPV (64–92%) and NPV (72–97%). Demographic and clinical characteristics were similar between patients with specific and non-specific side of colon ICD codes. ICD codes are a highly reliable indicator of tumor location when the specific location code is entered in the EHR. However, non-specific side of colon ICD codes are present for a sizable minority of patients, and structured data alone may not be adequate to support testing of some research hypotheses. Careful assessment of key variables is required before determining the need for clinical abstraction to supplement structured data in generating real-world evidence from EHRs.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-20
D. L. Katz; M. C. Karlsen; M. Chung; M. M. Shams-White; L. W. Green; J. Fielding; A. Saito; W. Willett

Current methods for assessing strength of evidence prioritize the contributions of randomized controlled trials (RCTs). The objective of this study was to characterize strength of evidence (SOE) tools in recent use, identify their application to lifestyle interventions for improved longevity, vitality, or successful aging, and to assess implications of the findings. The search strategy was created in PubMed and modified as needed for four additional databases: Embase, AnthropologyPlus, PsycINFO, and Ageline, supplemented by manual searching. Systematic reviews and meta-analyses of intervention trials or observational studies relevant to lifestyle intervention were included if they used a specified SOE tool. Data was collected for each SOE tool. Conditions necessary for assigning the highest SOE grading and treatment of prospective cohort studies within each SOE rating framework were summarized. The expert panel convened to discuss the implications of findings for assessing evidence in the domain of lifestyle medicine. A total of 15 unique tools were identified. Ten were tools developed and used by governmental agencies or other equivalent professional bodies and were applicable in a variety of settings. Of these 10, four require consistent results from RCTs of high quality to award the highest rating of evidence. Most SOE tools include prospective cohort studies only to note their secondary contribution to overall SOE as compared to RCTs. We developed a new construct, Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM), to illustrate the feasibility of a tool based on the specific contributions of diverse research methods to understanding lifetime effects of health behaviors. Assessment of evidence relevant to lifestyle medicine requires a potential adaptation of SOE approaches when outcomes and/or exposures obviate exclusive or preferential reliance on RCTs. This systematic review was registered with the International Prospective Register of Systematic Reviews, PROSPERO [CRD42018082148].

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-20
Helen Eke; Astrid Janssens; Johnny Downs; Richard M. Lynn; Cornelius Ani; Tamsin Ford

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-20
Chiara Trevisiol; Michela Cinquini; Aline S. C. Fabricio; Massimo Gion; Anne W. S. Rutjes

The use of systematic review methods are widely recognized to be essential in the development of recommendations in clinical practice guidelines to prove their trustworthiness. The objective of this study was to assess the use of systematic search methods by authors of guidelines published in the oncology field. We analyzed 590 guidance documents identified in PubMed, NGC, GIN and web sites for guidelines in 2009–2015 in oncology. The main outcome measure used was incidence of guidance documents supported by a systematic search of the literature. In addition to descriptive analyses, logistic regression was used to evaluate if adequate search methods were explained by guideline characteristics. Of 590 guidance documents included in the study, 305 (51.7%) declared the use of systematic search methods but only 168 (28.5%) applied methods meeting minimum standards for quality and provided sufficient details to allow classification. 164 (27.8%) guidance documents did not report any use of literature evaluation. Guidance documents produced by a Government Agency in North America (OR 2.16, 95% CI 1.16–4.17) and those with a focused scope (OR 2.35, 95% CI 0.97–5.56) were positively associated with the use of systematic search methods. We found no association between the year of publication and use of systematic search methods. A relatively small number of guidance documents was informed by scientific evidence identified through adequate systematic search methods. We observed substantial room for improvement of applied methods and reporting, especially in documents with a broad focus, or those produced by professional societies or independent expert panels in other continents than North America.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-08-29
Janet Ige; Lynn Gibbons; Issy Bray; Selena Gray

Loneliness and social isolation are major determinants of mental wellbeing, especially among older adults. The effectiveness of interventions to address loneliness and social isolation among older adults has been questioned due to the lack of transparency in identifying and recruiting populations at risk. This paper aims to systematically review methods used to identify and recruit older people at risk of loneliness and social isolation into research studies that seek to address loneliness and social isolation. In total, 751 studies were identified from a structured search of eleven electronic databases combined with hand searching of reference bibliography from identified studies for grey literature. Studies conducted between January 1995 and December 2017 were eligible provided they recruited community living individuals aged 50 and above at risk of social isolation or loneliness into an intervention study. A total of 22 studies were deemed eligible for inclusion. Findings from these studies showed that the most common strategy for inviting people to participate in intervention studies were public-facing methods including mass media and local newspaper advertisements. The majority of participants identified this way were self-referred, and in many cases self-identified as lonely. In most cases, there was no standardised tool for defining loneliness or social isolation. However, studies that recruited via referral by recognised agencies reported higher rates of eligibility and enrolment. Referrals from primary care were only used in a few studies. Studies that included agency referral either alone or in combination with multiple forms of recruitment showed more promising recruitment rates than those that relied on only public facing methods. Further research is needed to establish the cost-effectiveness of multiple forms of referral. Findings from this study demonstrate the need for transparency in writing up the methods used to approach, assess and enrol older adults at risk of becoming socially isolated. None of the intervention studies included in this review justified their recruitment strategies. The ability of researchers to share best practice relies greatly on the transparency of research.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-09-02
Shannon Cope; Dieter Ayers; Jie Zhang; Katharine Batt; Jeroen P. Jansen

Long-term clinical outcomes are necessary to assess the cost-effectiveness of new treatments over a lifetime horizon. Without long-term clinical trial data, current practice to extrapolate survival beyond the trial period involves fitting alternative parametric models to the observed survival. Choosing the most appropriate model is based on how well each model fits to the observed data. Supplementing trial data with feedback from experts may improve the plausibility of survival extrapolations. We demonstrate the feasibility of formally integrating long-term survival estimates from experts with empirical clinical trial data to provide more credible extrapolated survival curves. The case study involved relapsed or refractory B-cell pediatric and young adult acute lymphoblastic leukemia (r/r pALL) regarding long-term survival for tisagenlecleucel (chimeric antigen receptor T-cell [CAR-T]) with evidence from the phase II ELIANA trial. Seven pediatric oncologists and hematologists experienced with CAR-T therapies were recruited. Relevant evidence regarding r/r pALL and tisagenlecleucel provided a common basis for expert judgments. Survival rates and related uncertainty at 2, 3, 4, and 5 years were elicited from experts using a web-based application adapted from Sheffield Elicitation Framework. Estimates from each expert were combined with observed data using time-to-event parametric models that accounted for experts’ uncertainty, producing an overall distribution of survival over time. These results were validated based on longer term follow-up (median duration 24.2 months) from ELIANA following the elicitation. Extrapolated survival curves based on ELIANA trial without expert information were highly uncertain, differing substantially depending on the model choice. Survival estimates between 2 to 5 years from individual experts varied with a fair amount of uncertainty. However, incorporating expert estimates improved the precision in the extrapolated survival curves. Predictions from a Gompertz model, which experts believed was most appropriate, suggested that more than half of the ELIANA patients treated with tisagenlecleucel will survive up to 5 years. Expert estimates at 24 months were validated by longer follow-up. This study provides an example of how expert opinion can be elicited and synthesized with observed survival data using a transparent and formal procedure, capturing expert uncertainty, and ensuring projected long-term survival is clinically plausible.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-09-02
Michail Belias; Maroeska M. Rovers; Johannes B. Reitsma; Thomas P. A. Debray; Joanna IntHout

Individual participant data meta-analysis (IPD-MA) is considered the gold standard for investigating subgroup effects. Frequently used regression-based approaches to detect subgroups in IPD-MA are: meta-regression, per-subgroup meta-analysis (PS-MA), meta-analysis of interaction terms (MA-IT), naive one-stage IPD-MA (ignoring potential study-level confounding), and centred one-stage IPD-MA (accounting for potential study-level confounding). Clear guidance on the analyses is lacking and clinical researchers may use approaches with suboptimal efficiency to investigate subgroup effects in an IPD setting. Therefore, our aim is to overview and compare the aforementioned methods, and provide recommendations over which should be preferred. We conducted a simulation study where we generated IPD of randomised trials and varied the magnitude of subgroup effect (0, 25, 50% relative reduction), between-study treatment effect heterogeneity (none, medium, large), ecological bias (none, quantitative, qualitative), sample size (50,100,200), and number of trials (5,10) for binary, continuous and time-to-event outcomes. For each scenario, we assessed the power, false positive rate (FPR) and bias of aforementioned five approaches. Naive and centred IPD-MA yielded the highest power, whilst preserving acceptable FPR around the nominal 5% in all scenarios. Centred IPD-MA showed slightly less biased estimates than naïve IPD-MA. Similar results were obtained for MA-IT, except when analysing binary outcomes (where it yielded less power and FPR < 5%). PS-MA showed similar power as MA-IT in non-heterogeneous scenarios, but power collapsed as heterogeneity increased, and decreased even more in the presence of ecological bias. PS-MA suffered from too high FPRs in non-heterogeneous settings and showed biased estimates in all scenarios. Meta-regression showed poor power (< 20%) in all scenarios and completely biased results in settings with qualitative ecological bias. Our results indicate that subgroup detection in IPD-MA requires careful modelling. Naive and centred IPD-MA performed equally well, but due to less bias of the estimates in the presence of ecological bias, we recommend the latter.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-09-05
José M. Ramos-Rincón; Héctor Pinargote-Celorio; Isabel Belinchón-Romero; Gregorio González-Alcaide

This article describes a bibliometric review of the scientific production, geographical distribution, collaboration, impact, and subject area focus of pneumonia research indexed on the Web of Science over a 15-year period. We searched the Web of Science database using the Medical Subject Heading (MeSH) of “Pneumonia” from January 1, 2001 to December 31, 2015. The only document types we studied were original articles and reviews, analyzing descriptive indicators by five-year periods and the scientific production by country, adjusting for population, economic, and research-related parameters. A total of 22,694 references were retrieved. The number of publications increased steadily over time, from 981 publications in 2001 to 1977 in 2015 (R2 = 0.956). The most productive country was the USA (38.49%), followed by the UK (7.18%) and Japan (5.46%). Research production from China increased by more than 1000%. By geographical area, North America (42.08%) and Europe (40.79%) were most dominant. Scientific production in low- and middle-income countries more than tripled, although their overall contribution to the field remained limited (< 15%). Overall, 18.8% of papers were the result of an international collaboration, although this proportion was much higher in sub-Saharan Africa (46.08%) and South Asia (23.43%). According to the specific MeSH terms used, articles focused mainly on “Pneumonia, Bacterial” (19.99%), followed by “Pneumonia, Pneumococcal” (7.02%) and “Pneumonia, Ventilator-Associated” (6.79%). Pneumonia research increased steadily over the 15-year study period, with Europe and North America leading scientific production. About a fifth of all papers reflected international collaborations, and these were most evident in papers from sub-Saharan Africa and South Asia.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-09-05
Curt Hagquist

An overarching objective in research comparing different sample groups is to ensure that the reported differences in outcomes are not affected by differences between groups in the functioning of the measurement instruments, i.e. the items have to work in the same way for the different sample groups to be compared. Lack of invariance across sample groups are commonly called Differential Item Functioning (DIF). There is a sense in which the DIF of an item can be taken account of by resolving (splitting) the item into group specific items, rather than deleting the item. Resolving improves fit, retains the reliability and content provided by the item, and compensates for the DIF in estimation of person parameters on the scale of the instrument. However, it destroys invariance of the item’s parameter value among the groups. Whether or not a DIF item should be resolved depends on whether the source of the DIF is relevant or irrelevant for the content of the variable. The present paper shows how external information can be used to investigate if the gender DIF found in the item “Stomach ache” in a psychosomatic symptoms scale used among adolescents may reflect abdominal pain because of a biological factor, the girls’ menstrual periods. Swedish data from the international Health Behaviour in School-aged Children study (HBSC) collected in 2005/06, 2009/10 and 2013/14 were used, comprising a total of 18,983 students in grades 5, 7 and 9. A composite measure of eight items of psychosomatic problems was analysed for DIF with respect to gender and menstrual periods using the Rasch model. The results support the hypothesis that the source of the gender DIF for the item “Stomach ache” is a gender specific biological factor. In that case the DIF should be resolved if the psychosomatic measure is not intended to tap information about abdominal pain caused by a gender specific biological factor. In contrast, if the measure is intended to tap such information, the DIF should not be resolved. The conceptualisation of the measure governs whether the item showing DIF should be resolved or not.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-09-11
John Ferguson; Neil O’Leary; Fabrizio Maturo; Salim Yusuf; Martin O’Donnell

Population attributable fractions (PAF) measure the proportion of disease prevalence that would be avoided in a hypothetical population, similar to the population of interest, but where a particular risk factor is eliminated. They are extensively used in epidemiology to quantify and compare disease burden due to various risk factors, and directly influence public policy regarding possible health interventions. In contrast to individual specific metrics such as relative risks and odds ratios, attributable fractions depend jointly on both risk factor prevalence and relative risk. The relative contributions of these two components is important, and usually needs to be presented in summary tables that are presented together with the attributable fraction calculation. However, representing PAF in an accessible graphical format, that captures both prevalence and relative risk, may assist interpretation. Taylor-series approximations to PAF in terms of risk factor prevalence and log-odds ratio are derived that facilitate simultaneous representation of PAF, risk factor prevalence and risk-factor/disease log-odds ratios on a single co-ordinate axis. Methods are developed for binary, multi-category and continuous exposure variables. The methods are demonstrated using INTERSTROKE, a large international case control dataset focused on risk factors for stroke. The described methods could be used as a complement to tables summarizing prevalence, odds ratios and PAF, and may convey the same information in a more intuitive and visually appealing manner. The suggested nomogram can also be used to visually estimate the effects of health interventions which only partially reduce risk factor prevalence. Finally, in the binary risk factor case, the approximations can also be used to quickly convert logistic regression coefficients for a risk factor into approximate PAFs.

更新日期：2019-11-28
• BMC Med. Res. Methodol. (IF 2.509) Pub Date : 2019-09-18
Camille Aupiais; Corinne Alberti; Thomas Schmitz; Olivier Baud; Moreno Ursino; Sarah Zohar

When conducing Phase-III trial, regulatory agencies and investigators might want to get reliable information about rare but serious safety outcomes during the trial. Bayesian non-inferiority approaches have been developed, but commonly utilize historical placebo-controlled data to define the margin, depend on a single final analysis, and no recommendation is provided to define the prespecified decision threshold. In this study, we propose a non-inferiority Bayesian approach for sequential monitoring of rare dichotomous safety events incorporating experts’ opinions on margins. A Bayesian decision criterion was constructed to monitor four safety events during a non-inferiority trial conducted on pregnant women at risk for premature delivery. Based on experts’ elicitation, margins were built using mixtures of beta distributions that preserve experts’ variability. Non-informative and informative prior distributions and several decision thresholds were evaluated through an extensive sensitivity analysis. The parameters were selected in order to maintain two rates of misclassifications under prespecified rates, that is, trials that wrongly concluded an unacceptable excess in the experimental arm, or otherwise. The opinions of 44 experts were elicited about each event non-inferiority margins and its relative severity. In the illustrative trial, the maximal misclassification rates were adapted to events’ severity. Using those maximal rates, several priors gave good results and one of them was retained for all events. Each event was associated with a specific decision threshold choice, allowing for the consideration of some differences in their prevalence, margins and severity. Our decision rule has been applied to a simulated dataset. In settings where evidence is lacking and where some rare but serious safety events have to be monitored during non-inferiority trials, we propose a methodology that avoids an arbitrary margin choice and helps in the decision making at each interim analysis. This decision rule is parametrized to consider the rarity and the relative severity of the events and requires a strong collaboration between physicians and the trial statisticians for the benefit of all. This Bayesian approach could be applied as a complement to the frequentist analysis, so both Data Safety Monitoring Boards and investigators can benefit from such an approach.

更新日期：2019-11-28
Contents have been reproduced by permission of the publishers.

down
wechat
bug