当前期刊: BMC Medical Informatics and Decision Making Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Inverse-probability weighting and multiple imputation for evaluating selection bias in the estimation of childhood obesity prevalence using data from electronic health records
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-20
    Carmen Sayon-Orea; Conchi Moreno-Iribas; Josu Delfrade; Manuela Sanchez-Echenique; Pilar Amiano; Eva Ardanaz; Javier Gorricho; Garbiñe Basterra; Marian Nuin; Marcela Guevara

    Height and weight data from electronic health records are increasingly being used to estimate the prevalence of childhood obesity. Here, we aim to assess the selection bias due to missing weight and height data from electronic health records in children older than five. Cohort study of 10,811 children born in Navarra (Spain) between 2002 and 2003, who were still living in this region by December 2016. We examined the differences between measured and non-measured children older than 5 years considering weight-associated variables (sex, rural or urban residence, family income and weight status at 2–5 yrs). These variables were used to calculate stabilized weights for inverse-probability weighting and to conduct multiple imputation for the missing data. We calculated complete data prevalence and adjusted prevalence considering the missing data using inverse-probability weighting and multiple imputation for ages 6 to 14 and group ages 6 to 9 and 10 to 14. For 6–9 years, complete data, inverse-probability weighting and multiple imputation obesity age-adjusted prevalence were 13.18% (95% CI: 12.54–13.85), 13.22% (95% CI: 12.57–13.89) and 13.02% (95% CI: 12.38–13.66) and for 10–14 years 8.61% (95% CI: 8.06–9.18), 8.62% (95% CI: 8.06–9.20) and 8.24% (95% CI: 7.70–8.78), respectively. Ages at which well-child visits are scheduled and for the 6 to 9 and 10 to 14 age groups, weight status estimations are similar using complete data, multiple imputation and inverse-probability weighting. Readily available electronic health record data may be a tool to monitor the weight status in children.

    更新日期:2020-01-21
  • Assessing stroke severity using electronic health record data: a machine learning approach
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-08
    Emily Kogan; Kathryn Twyman; Jesse Heap; Dejan Milentijevic; Jennifer H. Lin; Mark Alberts

    Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.

    更新日期:2020-01-09
  • Predicting diabetes clinical outcomes using longitudinal risk factor trajectories
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-08
    Gyorgy J. Simon; Kevin A. Peterson; M. Regina Castro; Michael S. Steinbach; Vipin Kumar; Pedro J. Caraballo

    The ubiquity of electronic health records (EHR) offers an opportunity to observe trajectories of laboratory results and vital signs over long periods of time. This study assessed the value of risk factor trajectories available in the electronic health record to predict incident type 2 diabetes. Analysis was based on a large 13-year retrospective cohort of 71,545 adult, non-diabetic patients with baseline in 2005 and median follow-up time of 8 years. The trajectories of fasting plasma glucose, lipids, BMI and blood pressure were computed over three time frames (2000–2001, 2002–2003, 2004) before baseline. A novel method, Cumulative Exposure (CE), was developed and evaluated using Cox proportional hazards regression to assess risk of incident type 2 diabetes. We used the Framingham Diabetes Risk Scoring (FDRS) Model as control. The new model outperformed the FDRS Model (.802 vs .660; p-values <2e-16). Cumulative exposure measured over different periods showed that even short episodes of hyperglycemia increase the risk of developing diabetes. Returning to normoglycemia moderates the risk, but does not fully eliminate it. The longer an individual maintains glycemic control after a hyperglycemic episode, the lower the subsequent risk of diabetes. Incorporating risk factor trajectories substantially increases the ability of clinical decision support risk models to predict onset of type 2 diabetes and provides information about how risk changes over time.

    更新日期:2020-01-08
  • Exploring the usefulness of Lexis diagrams for quality improvement
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-08
    Sara Dahlin

    Visualization is important to aid practitioners in understanding local care processes and drive quality improvement (QI). Important aspects include timely feedback and ability to plot data over time. Moreover, the complexity of care also needs to be understood, as it affects the variation of care processes. However, there is a lack of QI methods visualizing multiple, related factors such as diagnosis date, death date, and cause of death to unravel their complexity, which is necessary to understand processes related to survival data. Lexis diagrams visualize individual patient processes as lines and mark additional factors such as key events. This study explores the potential of Lexis diagrams to support QI through survival data analysis, focusing on feedback, timeliness, and complexity, in a gynecological cancer setting in Sweden. Lexis diagrams were produced based on data from a gynecological cancer quality registry (4481 patients). The usefulness of Lexis diagrams was explored through iterative data identification and analysis through semi-structured dialogues between the researcher and domain experts (clinically active care process owners) during five meetings. Visualizations were produced and adapted by the researcher between meetings, based on the dialogues, to ensure clinical relevance, resulting in three relevant types of visualizations. Domain experts identified different uses depending on diagnosis group and data visualization. Key results include timely feedback through close-to-real-time visualizations, supporting discussion and understanding of trends and hypothesis-building. Visualization of care process complexity facilitated evaluation of given care. Combined visualization of individual and population levels increased patient focus and may possibly also function to motivate practitioners and management. Lexis diagrams can aid understanding of survival data, triggering important dialogues between care givers and supporting care quality improvement and new perspectives, and can therefore complement survival curves in quality improvement.

    更新日期:2020-01-08
  • Can mobile health apps replace GPs? A scoping review of comparisons between mobile apps and GP tasks
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-06
    Apichai Wattanapisit; Chin Hai Teo; Sanhapan Wattanapisit; Emylia Teoh; Wing Jun Woo; Chirk Jenn Ng

    Mobile health applications (mHealth apps) are increasingly being used to perform tasks that are conventionally performed by general practitioners (GPs), such as those involved in promoting health, preventing disease, diagnosis, treatment, monitoring, and support for health services. This raises an important question: can mobile apps replace GPs? This study aimed to systematically search for and identify mobile apps that can perform GP tasks. A scoping review was carried out. The Google Play Store and Apple App Store were searched for mobile apps, using search terms derived from the UK Royal College of General Practitioners (RCGP) guideline on GPs’ core capabilities and competencies. A manual search was also performed to identify additional apps. The final analysis included 17 apps from the Google Play Store and Apple App Store, and 21 apps identified by the manual search. mHealth apps were found to have the potential to replace GPs for tasks such as recording medical history and making diagnoses; performing some physical examinations; supporting clinical decision making and management; assisting in urgent, long-term, and disease-specific care; and health promotion. In contrast, mHealth apps were unable to perform medical procedures, appropriately utilise other professionals, and coordinate a team-based approach. This scoping review highlights the functions of mHealth apps that can potentially replace GP tasks. Future research should focus on assessing the performance and quality of mHealth apps in comparison with that of real doctors.

    更新日期:2020-01-07
  • Development of a targeted client communication intervention to women using an electronic maternal and child health registry: a qualitative study
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-06
    Binyam Bogale; Kjersti Mørkrid; Brian O’Donnell; Buthaina Ghanem; Itimad Abu Ward; Khadija Abu Khader; Mervett Isbeih; Michael Frost; Mohammad Baniode; Taghreed Hijaz; Tamara Awwad; Yousef Rabah; J. Frederik Frøen

    Targeted client communication (TCC) using text messages can inform, motivate and remind pregnant and postpartum women of timely utilization of care. The mixed results of the effectiveness of TCC interventions points to the importance of theory based interventions that are co-design with users. The aim of this paper is to describe the planning, development, and evaluation of a theory led TCC intervention, tailored to pregnant and postpartum women and automated from the Palestinian electronic maternal and child health registry. We used the Health Belief Model to develop interview guides to explore women’s perceptions of antenatal care (ANC), with a focus on high-risk pregnancy conditions (anemia, hypertensive disorders in pregnancy, gestational diabetes mellitus, and fetal growth restriction), and untimely ANC attendance, issues predefined by a national expert panel as being of high interest. We performed 18 in-depth interviews with women, and eight with healthcare providers in public primary healthcare clinics in the West Bank and Gaza. Grounding on the results of the in-depth interviews, we used concepts from the Model of Actionable Feedback, social nudging and Enhanced Active Choice to compose the TCC content to be sent as text messages. We assessed the acceptability and understandability of the draft text messages through unstructured interviews with local health promotion experts, healthcare providers, and pregnant women. We found low awareness of the importance of timely attendance to ANC, and the benefits of ANC for pregnancy outcomes. We identified knowledge gaps and beliefs in the domains of low awareness of susceptibility to, and severity of, anemia, hypertension, and diabetes complications in pregnancy. To increase the utilization of ANC and bridge the identified gaps, we iteratively composed actionable text messages with users, using recommended message framing models. We developed algorithms to trigger tailored text messages with higher intensity for women with a higher risk profile documented in the electronic health registry. We developed an optimized TCC intervention underpinned by behavior change theory and concepts, and co-designed with users following an iterative process. The electronic maternal and child health registry can serve as a unique platform for TCC interventions using text messages.

    更新日期:2020-01-06
  • Digital health Systems in Kenyan Public Hospitals: a mixed-methods survey
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-06
    Naomi Muinga; Steve Magare; Jonathan Monda; Mike English; Hamish Fraser; John Powell; Chris Paton

    As healthcare facilities in Low- and Middle-Income Countries adopt digital health systems to improve hospital administration and patient care, it is important to understand the adoption process and assess the systems’ capabilities. This survey aimed to provide decision-makers with information on the digital health systems landscape and to support the rapidly developing digital health community in Kenya and the region by sharing knowledge. We conducted a survey of County Health Records Information Officers (CHRIOs) to determine the extent to which digital health systems in public hospitals that serve as internship training centres in Kenya are adopted. We conducted site visits and interviewed hospital administrators and end users who were at the facility on the day of the visit. We also interviewed digital health system vendors to understand the adoption process from their perspective. Semi-structured interview guides adapted from the literature were used. We identified emergent themes using a thematic analysis from the data. We obtained information from 39 CHRIOs, 58 hospital managers and system users, and 9 digital health system vendors through semi-structured interviews and completed questionnaires. From the survey, all facilities mentioned purchased a digital health system primarily for administrative purposes. Radiology and laboratory management systems were commonly standalone systems and there were varying levels of interoperability within facilities that had multiple systems. We only saw one in-patient clinical module in use. Users reported on issues such as system usability, inadequate training, infrastructure and system support. Vendors reported the availability of a wide range of modules, but implementation was constrained by funding, prioritisation of services, users’ lack of confidence in new technologies and lack of appropriate data sharing policies. Public hospitals in Kenya are increasingly purchasing systems to support administrative functions and this study highlights challenges faced by hospital users and vendors. Significant work is required to ensure interoperability of systems within hospitals and with other government services. Additional studies on clinical usability and the workflow fit of digital health systems are required to ensure efficient system implementation. However, this requires support from key stakeholders including the government, international donors and regional health informatics organisations.

    更新日期:2020-01-06
  • Novel prognostication of patients with spinal and pelvic chondrosarcoma using deep survival neural networks
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-06
    Sung Mo Ryu; Sung Wook Seo; Sun-Ho Lee

    We used the Surveillance, Epidemiology, and End Results (SEER) database to develop and validate deep survival neural network machine learning (ML) algorithms to predict survival following a spino-pelvic chondrosarcoma diagnosis. The SEER 18 registries were used to apply the Risk Estimate Distance Survival Neural Network (RED_SNN) in the model. Our model was evaluated at each time window with receiver operating characteristic curves and areas under the curves (AUCs), as was the concordance index (c-index). The subjects (n = 1088) were separated into training (80%, n = 870) and test sets (20%, n = 218). The training data were randomly sorted into training and validation sets using 5-fold cross validation. The median c-index of the five validation sets was 0.84 (95% confidence interval 0.79–0.87). The median AUC of the five validation subsets was 0.84. This model was evaluated with the previously separated test set. The c-index was 0.82 and the mean AUC of the 30 different time windows was 0.85 (standard deviation 0.02). According to the estimated survival probability (by 62 months), we divided the test group into five subgroups. The survival curves of the subgroups showed statistically significant separation (p < 0.001). This study is the first to analyze population-level data using artificial neural network ML algorithms for the role and outcomes of surgical resection and radiation therapy in spino-pelvic chondrosarcoma.

    更新日期:2020-01-06
  • A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2020-01-06
    André M. Carrington; Paul W. Fieguth; Hammad Qazi; Andreas Holzinger; Helen H. Chen; Franz Mayr; Douglas G. Manuel

    In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data—as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.

    更新日期:2020-01-06
  • AliClu - Temporal sequence alignment for clustering longitudinal clinical data
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    Kishan Rama; Helena Canhão; Alexandra M. Carvalho; Susana Vinga

    Patient stratification is a critical task in clinical decision making since it can allow physicians to choose treatments in a personalized way. Given the increasing availability of electronic medical records (EMRs) with longitudinal data, one crucial problem is how to efficiently cluster the patients based on the temporal information from medical appointments. In this work, we propose applying the Temporal Needleman-Wunsch (TNW) algorithm to align discrete sequences with the transition time information between symbols. These symbols may correspond to a patient’s current therapy, their overall health status, or any other discrete state. The transition time information represents the duration of each of those states. The obtained TNW pairwise scores are then used to perform hierarchical clustering. To find the best number of clusters and assess their stability, a resampling technique is applied. We propose the AliClu, a novel tool for clustering temporal clinical data based on the TNW algorithm coupled with clustering validity assessments through bootstrapping. The AliClu was applied for the analysis of the rheumatoid arthritis EMRs obtained from the Portuguese database of rheumatologic patient visits (Reuma.pt). In particular, the AliClu was used for the analysis of therapy switches, which were coded as letters corresponding to biologic drugs and included their durations before each change occurred. The obtained optimized clusters allow one to stratify the patients based on their temporal therapy profiles and to support the identification of common features for those groups. The AliClu is a promising computational strategy to analyse longitudinal patient data by providing validated clusters and by unravelling the patterns that exist in clinical outcomes. Patient stratification is performed in an automatic or semi-automatic way, allowing one to tune the alignment, clustering, and validation parameters. The AliClu is freely available at https://github.com/sysbiomed/AliClu.

    更新日期:2019-12-31
  • Acceptability of Internet-based interventions for problem gambling: a qualitative study of focus groups with clients and clinicians
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    Sherald Sanchez; Farah Jindani; Jing Shi; Mark van der Maas; Sylvia Hagopian; Robert Murray; Nigel Turner

    Although Internet-based interventions (IBIs) have been around for two decades, uptake has been slow. Increasing the acceptability of IBIs among end users may increase uptake. In this study, we explored the factors that shape acceptability of IBIs for problem gambling from the perspective of clients and clinicians. Findings from this qualitative study of focus groups informed the design and implementation of an IBI for problem gambling. Using a semi-structured interview guide, we conducted three focus groups with clients experiencing gambling problems (total n = 13) and two with clinicians providing problem gambling treatment (total n = 21). Focus groups were audio recorded, transcribed verbatim, and analyzed using a two-part inductive-deductive approach to thematic analysis. Although both user groups reported similar experiences, each group also had unique concerns. Clinician perspectives were more homogeneous reflective of healthcare professionals sharing the same practice and values. Clinicians were more concerned about issues relating to the dissemination of IBIs into clinical settings, including the development of policies and protocols and the implications of IBIs on the therapeutic relationship. In comparison, client narratives were more heterogeneous descriptive of diverse experiences and individual preferences, such as the availability of services on a 24-h basis. There was consensus among clients and clinicians on common factors influencing acceptability: access, usability, high quality technology, privacy and security, and the value of professional guidance. Acceptability is an important factor in the overall effectiveness of IBIs. Gaining an understanding of how end users perceive IBIs and why they choose to use IBIs can be instrumental in the successful and meaningful design, implementation, and evaluation of IBIs.

    更新日期:2019-12-31
  • Responsible data sharing in a big data-driven translational research platform: lessons learned
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    S. Kalkman; M. Mostert; N. Udo-Beauvisage; J. J. van Delden; G. J. van Thiel

    To foster responsible data sharing in health research, ethical governance complementary to the EU General Data Protection Regulation is necessary. A governance framework for Big Data-driven research platforms will at least need to consider the conditions as specified a priori for individual datasets. We aim to identify and analyze these conditions for the Innovative Medicines Initiative’s (IMI) BigData@Heart platform. We performed a unique descriptive case study into the conditions for data sharing as specified for datasets participating in BigData@Heart. Principle investigators of 56 participating databases were contacted via e-mail with the request to send any kind of documentation that possibly specified the conditions for data sharing. Documents were qualitatively reviewed for conditions pertaining to data sharing and data access. Qualitative content analysis of 55 relevant documents revealed overlap on the conditions: (1) only to share health data for scientific research, (2) in anonymized/coded form, (3) after approval from a designated review committee, and while (4) observing all appropriate measures for data security and in compliance with the applicable laws and regulations. Despite considerable overlap, prespecified conditions give rise to challenges for data sharing. At the same time, these challenges inform our thinking about the design of an ethical governance framework for data sharing platforms. We urge current data sharing initiatives to concentrate on: (1) the scope of the research questions that may be addressed, (2) how to deal with varying levels of de-identification, (3) determining when and how review committees should come into play, (4) align what policies and regulations mean by “data sharing” and (5) how to deal with datasets that have no system in place for data sharing.

    更新日期:2019-12-30
  • A Bayesian decision support sequential model for severity of illness predictors and intensive care admissions in pneumonia
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    Amado Alejandro Baez; Laila Cochon; Jose Maria Nicolas

    Community-acquired pneumonia (CAP) is one of the leading causes of morbidity and mortality in the USA. Our objective was to assess the predictive value on critical illness and disposition of a sequential Bayesian Model that integrates Lactate and procalcitonin (PCT) for pneumonia. Sensitivity and specificity of lactate and PCT attained from pooled meta-analysis data. Likelihood ratios calculated and inserted in Bayesian/ Fagan nomogram to calculate posttest probabilities. Bayesian Diagnostic Gains (BDG) were analyzed comparing pre and post-test probability. To assess the value of integrating both PCT and Lactate in Severity of Illness Prediction we built a model that combined CURB65 with PCT as the Pre-Test markers and later integrated the Lactate Likelihood Ratio Values to generate a combined CURB 65 + Procalcitonin + Lactate Sequential value. The BDG model integrated a CUBR65 Scores combined with Procalcitonin (LR+ and LR-) for Pre-Test Probability Intermediate and High with Lactate Positive Likelihood Ratios. This generated for the PCT LR+ Post-test Probability (POSITIVE TEST) Posterior probability: 93% (95% CI [91,96%]) and Post Test Probability (NEGATIVE TEST) of: 17% (95% CI [15–20%]) for the Intermediate subgroup and 97% for the high risk sub-group POSITIVE TEST: Post-Test probability:97% (95% CI [95,98%]) NEGATIVE TEST: Post-test probability: 33% (95% CI [31,36%]) . ANOVA analysis for CURB 65 (alone) vs CURB 65 and PCT (LR+) vs CURB 65 and PCT (LR+) and Lactate showed a statistically significant difference (P value = 0.013). The sequential combination of CURB 65 plus PCT with Lactate yielded statistically significant results, demonstrating a greater predictive value for severity of illness thus ICU level care.

    更新日期:2019-12-30
  • Forecasting one-day-forward wellness conditions for community-dwelling elderly with single lead short electrocardiogram signals
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    Xiaomao Fan; Yang Zhao; Hailiang Wang; Kwok Leung Tsui

    The accelerated growth of elderly population is creating a heavy burden to the healthcare system in many developed countries and regions. Electrocardiogram (ECG) analysis has been recognized as effective approach to cardiovascular disease diagnosis and widely utilized for monitoring personalized health conditions. In this study, we present a novel approach to forecasting one-day-forward wellness conditions for community-dwelling elderly by analyzing single lead short ECG signals acquired from a station-based monitoring device. More specifically, exponentially weighted moving-average (EWMA) method is employed to eliminate the high-frequency noise from original signals at first. Then, Fisher-Yates normalization approach is used to adjust the self-evaluated wellness score distribution since the scores among different individuals are skewed. Finally, both deep learning-based and traditional machine learning-based methods are utilized for building wellness forecasting models. The experiment results show that the deep learning-based methods achieve the best fitted forecasting performance, where the forecasting accuracy and F value are 93.21% and 91.98% respectively. The deep learning-based methods, with the merit of non-hand-crafted engineering, have superior wellness forecasting performance towards the competitive traditional machine learning-based methods. The developed approach in this paper is effective in wellness forecasting for community-dwelling elderly, which can provide insights in terms of implementing a cost-effective approach to informing healthcare provider about health conditions of elderly in advance and taking timely interventions to reduce the risk of malignant events.

    更新日期:2019-12-30
  • DeepFHR: intelligent prediction of fetal Acidemia using fetal heart rate signals based on convolutional neural network
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    Zhidong Zhao; Yanjun Deng; Yang Zhang; Yefei Zhang; Xiaohong Zhang; Lihuan Shao

    Fetal heart rate (FHR) monitoring is a screening tool used by obstetricians to evaluate the fetal state. Because of the complexity and non-linearity, a visual interpretation of FHR signals using common guidelines usually results in significant subjective inter-observer and intra-observer variability. Objective: Therefore, computer aided diagnosis (CAD) systems based on advanced artificial intelligence (AI) technology have recently been developed to assist obstetricians in making objective medical decisions. In this work, we present an 8-layer deep convolutional neural network (CNN) framework to automatically predict fetal acidemia. After signal preprocessing, the input 2-dimensional (2D) images are obtained using the continuous wavelet transform (CWT), which provides a better way to observe and capture the hidden characteristic information of the FHR signals in both the time and frequency domains. Unlike the conventional machine learning (ML) approaches, this work does not require the execution of complex feature engineering, i.e., feature extraction and selection. In fact, 2D CNN model can self-learn useful features from the input data with the prerequisite of not losing informative features, representing the tremendous advantage of deep learning (DL) over ML. Based on the test open-access database (CTU-UHB), after comprehensive experimentation, we achieved better classification performance using the optimal CNN configuration compared to other state-of-the-art methods: the averaged ten-fold cross-validation of the accuracy, sensitivity, specificity, quality index defined as the geometric mean of the sensitivity and specificity, and the area under the curve yielded results of 98.34, 98.22, 94.87, 96.53 and 97.82%, respectively Once the proposed CNN model is successfully trained, the corresponding CAD system can be served as an effective tool to predict fetal asphyxia objectively and accurately.

    更新日期:2019-12-30
  • Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    Xingyu Zhang; M. Fernanda Bellolio; Pau Medrano-Gracia; Konrad Werys; Sheng Yang; Prashant Mahajan

    To examine the association between the medical imaging utilization and information related to patients’ socioeconomic, demographic and clinical factors during the patients’ ED visits; and to develop predictive models using these associated factors including natural language elements to predict the medical imaging utilization at pediatric ED. Pediatric patients’ data from the 2012–2016 United States National Hospital Ambulatory Medical Care Survey was included to build the models to predict the use of imaging in children presenting to the ED. Multivariable logistic regression models were built with structured variables such as temperature, heart rate, age, and unstructured variables such as reason for visit, free text nursing notes and combined data available at triage. NLP techniques were used to extract information from the unstructured data. Of the 27,665 pediatric ED visits included in the study, 8394 (30.3%) received medical imaging in the ED, including 6922 (25.0%) who had an X-ray and 1367 (4.9%) who had a computed tomography (CT) scan. In the predictive model including only structured variables, the c-statistic was 0.71 (95% CI: 0.70–0.71) for any imaging use, 0.69 (95% CI: 0.68–0.70) for X-ray, and 0.77 (95% CI: 0.76–0.78) for CT. Models including only unstructured information had c-statistics of 0.81 (95% CI: 0.81–0.82) for any imaging use, 0.82 (95% CI: 0.82–0.83) for X-ray, and 0.85 (95% CI: 0.83–0.86) for CT scans. When both structured variables and free text variables were included, the c-statistics reached 0.82 (95% CI: 0.82–0.83) for any imaging use, 0.83 (95% CI: 0.83–0.84) for X-ray, and 0.87 (95% CI: 0.86–0.88) for CT. Both CT and X-rays are commonly used in the pediatric ED with one third of the visits receiving at least one. Patients’ socioeconomic, demographic and clinical factors presented at ED triage period were associated with the medical imaging utilization. Predictive models combining structured and unstructured variables available at triage performed better than models using structured or unstructured variables alone, suggesting the potential for use of NLP in determining resource utilization.

    更新日期:2019-12-30
  • The implementation of natural language processing to extract index lesions from breast magnetic resonance imaging reports
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-30
    Yi Liu; Qing Liu; Chao Han; Xiaodong Zhang; Xiaoying Wang

    There are often multiple lesions in breast magnetic resonance imaging (MRI) reports and radiologists usually focus on describing the index lesion that is most crucial to clinicians in determining the management and prognosis of patients. Natural language processing (NLP) has been used for information extraction from mammography reports. However, few studies have investigated NLP in breast MRI data based on free-form text. The objective of the current study was to assess the validity of our NLP program to accurately extract index lesions and their corresponding imaging features from free-form text of breast MRI reports. This cross-sectional study examined 1633 free-form text reports of breast MRIs from 2014 to 2017. First, the NLP system was used to extract 9 features from all the lesions in the reports according to the Breast Imaging Reporting and Data System (BI-RADS) descriptors. Second, the index lesion was defined as the lesion with the largest number of imaging features. Third, we extracted the values of each imaging feature and the BI-RADS category from each index lesion. To evaluate the accuracy of our system, 478 reports were manually reviewed by two individuals. The time taken to extract data by NLP was compared with that by reviewers. The NLP system extracted 889 lesions from 478 reports. The mean number of imaging features per lesion was 6.5 ± 2.1 (range: 3–9; 95% CI: 6.362–6.638). The mean number of imaging features per index lesion was 8.0 ± 1.1 (range: 5–9; 95% CI: 7.901–8.099). The NLP system demonstrated a recall of 100.0% and a precision of 99.6% for correct identification of the index lesion. The recall and precision of NLP to correctly extract the value of imaging features from the index lesions were 91.0 and 92.6%, respectively. The recall and precision for the correct identification of the BI-RADS categories were 96.6 and 94.8%, respectively. NLP generated the total results in less than 1 s, whereas the manual reviewers averaged 4.47 min and 4.56 min per report. Our NLP method successfully extracted the index lesion and its corresponding information from free-form text.

    更新日期:2019-12-30
  • Family member information extraction via neural sequence labeling models with different tag schemes
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-27
    Hong-Jie Dai

    Family history information (FHI) described in unstructured electronic health records (EHRs) is a valuable information source for patient care and scientific researches. Since FHI is usually described in the format of free text, the entire process of FHI extraction consists of various steps including section segmentation, family member and clinical observation extraction, and relation discovery between the extracted members and their observations. The extraction step involves the recognition of FHI concepts along with their properties such as the family side attribute of the family member concept. This study focuses on the extraction step and formulates it as a sequence labeling problem. We employed a neural sequence labeling model along with different tag schemes to distinguish family members and their observations. Corresponding to different tag schemes, the identified entities were aggregated and processed by different algorithms to determine the required properties. We studied the effectiveness of encoding required properties in the tag schemes by evaluating their performance on the dataset released by the BioCreative/OHNLP challenge 2018. It was observed that the proposed side scheme along with the developed features and neural network architecture can achieve an overall F1-score of 0.849 on the test set, which ranked second in the FHI entity recognition subtask. By comparing with the performance of conditional random fields models, the developed neural network-based models performed significantly better. However, our error analysis revealed two challenging issues of the current approach. One is that some properties required cross-sentence inferences. The other is that the current model is not able to distinguish between the narratives describing the family members of the patient and those specifying the relatives of the patient’s family members.

    更新日期:2019-12-27
  • Selected articles from the BioCreative/OHNLP challenge 2018
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-27
    Sijia Liu; Yanshan Wang; Hongfang Liu

    The wide adoption of electronic health records (EHRs) has led to an improvement in healthcare quality by electronically documenting a patient’s medical conditions, thoughts and actions among the care providers [1]. Those EHR data, with the vast majority being free-texts (e.g., clinical notes, discharge summaries, radiology reports, and pathology reports), have been utilized for primary and secondary purposes, such as documentation need in care process, clinical decision support, outcome improvement, biomedical research and epidemiologic monitoring of the nation’s health. The application of natural language processing (NLP) methods and resources to clinical and biomedical text has received growing attention over the past years, but progress has been limited by difficulties to access shared tools and resources, partially caused by patient privacy and data confidentiality constraints. Efforts to increase sharing and interoperability of the few existing resources are needed to facilitate the progress observed in the general NLP domain. Towards this goal, we organized the BioCreative/OHNLP Challenge 2018 workshop (https://sites.google.com/view/ohnlp2018/home) to promote community efforts on methodological advancements and data curation mechanisms in clinical NLP. The challenge consists of two independent clinical NLP tasks: 1) Family History Extraction; and 2) Clinical Semantic Textual Similarity. The top performing teams were invited to present their solutions during the BioCreative/OHNLP Challenge 2018 workshop in conjunction with the 9th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB) (http://acm-bcb.org/2018/) on August 30th, 2018. This supplement collects the system descriptions of top-performing solutions of the tasks. As a risk factor of many diseases, family history information (FHI) captures shared genetic variations among family members [2]. Information such as age, gender, and degree of relatives are also considered when taking into the account of risk assignment of a large number of common diseases. The fact that many care process models use FHI highlights the importance of FHI in the decision-making process of diagnosis and treatment. However, extracting accurate and complete FHI from clinical texts remains challenging as a clinical NLP problem due to the lack of standardized evaluation mechanisms and publicly available language resources. To curate a corpus that can be made publicly available without losing semantic power for potential information extraction systems, we first collected the clinical narrative from family history sections of clinical notes at Mayo Clinic Rochester, the content of which are highly relevant to FHI. A team of annotators annotated the original corpus with clinical observations, family member mentions and protected health information. Afterwards, the protected health information is replaced with synthetic yet meaningful strings, and the clinical observations, family member mentions are shuffled among the corpus to further protect patient privacy. Leveraging the synthetic corpus with FHI, we organized this shared task to encourage the community to propose and develop family history extraction (FHE) systems [3]. The task composes two subtasks. The Subtask 1 focuses on identifying family member entities and clinical observations (diseases), and the Subtask 2 expects the association the living status, side of the family and clinical observations to family members to be extracted. The Subtask 2 is an end-to-end task which is based on the result of the Subtask 1. A total of 5 teams submitted overall 14 submissions for the official evaluation, and the descriptions of 2 teams are included in this supplement. The solution proposed by Dai focused on the extraction step and formulates it as a sequence labeling task. A neural sequence labeling model along with different tag schemes to distinguish family members and FHI-related observations was developed. Corresponding to different tag schemes, the identified entities were aggregated and processed by different algorithms to determine the required properties. The effectiveness of encoding required properties in the tag schemes was evaluated by the task corpus. The developed neural network-based models performed significantly better than the conditional random fields models. Shi et al. explored two joint learned models for the two subtasks. For the entity extraction subtask, the Bidirectional Long Short Term Memory (Bi-LSTM) and Conditional Random Field (CRF) models are used to recognize FHI related entities using word embeddings and part-of-speech (POS) embedding as inputs. For the relation extraction subtask, they trained a Bi-LSTM to classify the relations. The two models are joint trained towards a customized loss function to combine the loss from the two subtasks. On top of the results from machine learning models, they used heuristic rules and post-processing to handle entity properties such as side of family and living status. The frequent use of copy-and-paste, templates, and smart phrases have resulted in redundant texts in clinical notes, which may reduce the EHR data quality and add cognitive burden of tracking complex records in clinical practice. Therefore, there is a growing need for tools that can aggregate data from diverse sources and minimize data redundancy, and organize and present the EHR data in a user-friendly way to reduce physicians’ cognitive burden. One technique for automatically reducing redundancy in free text EHRs is to compute semantic similarity between clinical text snippets and remove highly similar snippets. Semantic textual similarity (STS) is a common task in the general English domain to assess the degree to which the underlying semantics of two segments of text are equivalent to each other. The assessment is usually performed using ordinal scaled output ranging from complete semantic equivalence to complete semantic dissimilarity. The STS task has been held annually since 2012 to encourage and support research in this area. However, these series of STS tasks used texts in the general English domain and no STS shared task focuses on the text data in the clinical domain. To motivate the biomedical informatics and NLP communities to study STS in the clinical domain, we initiated the ClinicalSTS task to provide a venue for evaluation of the state-of-the-art algorithms and models. ClinicalSTS provides paired clinical text snippets for each participant. The corpus, named MedSTS, consists of deidentified clinical sentences from narrative clinical notes [4]. The participating systems were asked to return a numerical score indicating the degree of semantic similarity between the pair of two sentences. Performance is measured by the Pearson correlation coefficient between the predicted similarity scores and human judgments. The scores fall on an ordinal scale, ranging from 0 to 5 where 0 means that the two clinical text snippets are completely dissimilar (i.e., no overlap in their meanings) and 5 means that the two snippets have complete semantic equivalence. Xiong et al. proposed a novel framework based on a gated network to fuse distributed representation and one-hot representation of sentence pairs. Some current state-of-the-art distributed representation models, including Convolutional Neural Network (CNN), Bi-LSTM and Bidirectional Encoder Representations from Transformers (BERT), were used in their system. Compared with the systems only using distributed representation or one-hot representation, their proposed method achieved higher performance. Among all distributed representations, BERT performed best. Further analysis indicates that the distributed representation and one-hot representation are complementary to each other and can be fused by gated network. Chen et al. demonstrated both their participating systems and improvements after the challenge. They applied sentence embeddings pre-trained on PubMed abstracts and MIMIC-III clinical notes and updated the Random Forest and the Encoder Network. During the challenge task, no end-to-end deep learning models had better performance than machine learning models that take manually-crafted features. In contrast, with the sentence embeddings pre-trained on biomedical corpora, the Encoder Network now achieves higher performance than the original best model. The ensembled model taking the improved versions of the Random Forest and Encoder Network as inputs further improves the performance. Deep learning models with sentence embeddings pre-trained on biomedical corpora achieve the highest performance on the test set. Error analytics indicates that end-to-end deep learning models and traditional machine learning models with manually-crafted features can complement each other, which suggests that a combination of these models can better find similar sentences in practice. 1. Blumenthal D. Implementation of the Federal Health Information Technology Initiative. N Engl J Med. 2011;365:2426–31. https://doi.org/10.1056/NEJMsr1112158. CAS Article PubMed Google Scholar 2. McCarthy JJ, Mendelsohn BA. Family history. In: precision medicine: a guide to genomics in clinical practice. New York: McGraw-Hill Education; 2016. Google Scholar 3. Liu S, Rastegar-Mojarad M, Wang Y, et al. Overview of the BioCreative/OHNLP 2018 family history extraction task. In: BioCreative/OHNLP 2018 Workshop Proceedings; 2018. Google Scholar 4. Wang Y, Afzal N, Fu S, et al. MedSTS: a resource for clinical semantic textual similarity. Lang Resour Eval. 2018. https://doi.org/10.1007/s10579-018-9431-1. Download references Publication of this supplement was funded by National Institute of Health, National Institute of General Medical Sciences R01-GM102282, National Library of Medicine R01LM11934 and National Center for Advancing Translational Sciences U01TR02062. The guest editors would like to acknowledge all the authors, anonymous reviewers and the journal of BMC Medical Informatics and Decision Making for their contributions to this supplement. We also would like to thank Drs. Cathy Wu, Zhiyong Lu and Lynette Hirschman for their supports on organizing the challenge and the workshop. About this supplement This article has been published as part of BMC Medical Informatics and Decision Making Volume 19 Supplement 10, 2019: Selected Articles from the BioCreative/OHNLP Challenge 2018. The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-19-supplement-10. Affiliations Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Sijia Liu , Yanshan Wang  & Hongfang LiuAuthors Search for Sijia Liu in: PubMed • Google Scholar Search for Yanshan Wang in: PubMed • Google Scholar Search for Hongfang Liu in: PubMed • Google Scholar Contributions SL conceptualized, designed, and drafted the Task 1 section. YW conceptualized, designed, and drafted the Task 2 section. HL conceptualized and designed the challenge, and provided essential editorial support on the manuscript. All authors read and approved the final manuscript. Corresponding author Correspondence to Sijia Liu. Competing interests The authors declare that they have no competing interests. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Reprints and Permissions Cite this article Liu, S., Wang, Y. & Liu, H. Selected articles from the BioCreative/OHNLP challenge 2018. BMC Med Inform Decis Mak 19, 262 (2019) doi:10.1186/s12911-019-0994-6 Download citation Published 27 December 2019 DOI https://doi.org/10.1186/s12911-019-0994-6

    更新日期:2019-12-27
  • Family history information extraction via deep joint learning
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-27
    Xue Shi; Dehuan Jiang; Yuanhang Huang; Xiaolong Wang; Qingcai Chen; Jun Yan; Buzhou Tang

    Family history (FH) information, including family members, side of family of family members (i.e., maternal or paternal), living status of family members, observations (diseases) of family members, etc., is very important in the decision-making process of disorder diagnosis and treatment. However FH information cannot be used directly by computers as it is always embedded in unstructured text in electronic health records (EHRs). In order to extract FH information form clinical text, there is a need of natural language processing (NLP). In the BioCreative/OHNLP2018 challenge, there is a task regarding FH extraction (i.e., task1), including two subtasks: (1) entity identification, identifying family members and their observations (diseases) mentioned in clinical text; (2) family history extraction, extracting side of family of family members, living status of family members, and observations of family members. For this task, we propose a system based on deep joint learning methods to extract FH information. Our system achieves the highest F1- scores of 0.8901 on subtask1 and 0.6359 on subtask2, respectively.

    更新日期:2019-12-27
  • Improving clinical named entity recognition in Chinese using the graphical and phonetic feature
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-23
    Yifei Wang; Sophia Ananiadou; Jun’ichi Tsujii

    Clinical Named Entity Recognition is to find the name of diseases, body parts and other related terms from the given text. Because Chinese language is quite different with English language, the machine cannot simply get the graphical and phonetic information form Chinese characters. The method for Chinese should be different from that for English. Chinese characters present abundant information with the graphical features, recent research on Chinese word embedding tries to use graphical information as subword. This paper uses both graphical and phonetic features to improve Chinese Clinical Named Entity Recognition based on the presence of phono-semantic characters. This paper proposed three different embedding models and tested them on the annotated data. The data have been divided into two sections for exploring the effect of the proportion of phono-semantic characters. The model using primary radical and pinyin can improve Clinical Named Entity Recognition in Chinese and get the F-measure of 0.712. More phono-semantic characters does not give a better result. The paper proves that the use of the combination of graphical and phonetic features can improve the Clinical Named Entity Recognition in Chinese.

    更新日期:2019-12-23
  • Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-23
    Rebecka Weegar; Alicia Pérez; Arantza Casillas; Maite Oronoz

    Text mining and natural language processing of clinical text, such as notes from electronic health records, requires specific consideration of the specialized characteristics of these texts. Deep learning methods could potentially mitigate domain specific challenges such as limited access to in-domain tools and data sets. A bi-directional Long Short-Term Memory network is applied to clinical notes in Spanish and Swedish for the task of medical named entity recognition. Several types of embeddings, both generated from in-domain and out-of-domain text corpora, and a number of generation and combination strategies for embeddings have been evaluated in order to investigate different input representations and the influence of domain on the final results. For Spanish, a micro averaged F1-score of 75.25 was obtained and for Swedish, the corresponding score was 76.04. The best results for both languages were achieved using embeddings generated from in-domain corpora extracted from electronic health records, but embeddings generated from related domains were also found to be beneficial. A recurrent neural network with in-domain embeddings improved the medical named entity recognition compared to shallow learning methods, showing this combination to be suitable for entity recognition in clinical text for both languages.

    更新日期:2019-12-23
  • Muscle fatigue detection and treatment system driven by internet of things
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-23
    Bin Ma; Chunxiao Li; Zhaolong Wu; Yulong Huang; Ada Chaeli van der Zijp-Tan; Shaobo Tan; Dongqi Li; Ada Fong; Chandan Basetty; Glen M. Borchert; Ryan Benton; Bin Wu; Jingshan Huang

    Internet of things is fast becoming the norm in everyday life, and integrating the Internet into medical treatment, which is increasing day by day, is of high utility to both clinical doctors and patients. While there are a number of different health-related problems encountered in daily life, muscle fatigue is a common problem encountered by many. To facilitate muscle fatigue detection, a pulse width modulation (PWM) and ESP8266-based fatigue detection and recovery system is introduced in this paper to help alleviate muscle fatigue. The ESP8266 is employed as the main controller and communicator, and PWM technology is employed to achieve adaptive muscle recovery. Muscle fatigue can be detected by surface electromyography signals and monitored in real-time via a wireless network. With the help of the proposed system, human muscle fatigue status can be monitored in real-time, and the recovery vibration motor status can be optimized according to muscle activity state. Environmental factors had little effect on the response time and accuracy of the system, and the response time was stable between 1 and 2 s. As indicated by the consistent change of digital value, muscle fatigue was clearly diminished using this system. Experiments show that environmental factors have little effect on the response time and accuracy of the system. The response time is stably between 1 and 2 s, and, as indicated by the consistent change of digital value, our systems clearly diminishes muscle fatigue. Additionally, the experimental results show that the proposed system requires minimal power and is both sensitive and stable.

    更新日期:2019-12-23
  • Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-23
    Robinette Renner; Shengyu Li; Yulong Huang; Ada Chaeli van der Zijp-Tan; Shaobo Tan; Dongqi Li; Mohan Vamsi Kasukurthi; Ryan Benton; Glen M. Borchert; Jingshan Huang; Guoqian Jiang

    The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.

    更新日期:2019-12-23
  • Comparing different supervised machine learning algorithms for disease prediction
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-21
    Shahadat Uddin; Arif Khan; Md Ekramul Hossain; Mohammad Ali Moni

    Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study ai7ms to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

    更新日期:2019-12-22
  • Does a Mobile app improve patients’ knowledge of stroke risk factors and health-related quality of life in patients with stroke? A randomized controlled trial
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-21
    Yi-No Kang; Hsiu-Nien Shen; Chia-Yun Lin; Glyn Elwyn; Szu-Chi Huang; Tsung-Fu Wu; Wen-Hsuan Hou

    Developing a stroke health-education mobile app (SHEMA) and examining its effectiveness on improvement of knowledge of stroke risk factors and health-related quality of life (HRQOL) in patients with stroke. We recruited 76 stroke patients and randomly assigned them to either the SHEMA intervention (n = 38) or usual care where a stroke health-education booklet was provided (n = 38). Knowledge of stroke risk factors and HRQOL were assessed using the stroke-knowledge questionnaire and European Quality of Life–Five Dimensions (EQ-5D) questionnaire, respectively. Sixty-three patients completed a post-test survey (the SHEMA intervention, n = 30; traditional stroke health-education, n = 33). Our trial found that patients’ mean knowledge score of stroke risk factors was improved after the SHEMA intervention (Mean difference = 2.83; t = 3.44; p = .002), and patients’ knowledge was also improved in the after traditional stroke health-education (Mean difference = 2.79; t = 3.68; p = .001). However, patients after the SHEMA intervention did not have significantly higher changes of the stroke knowledge or HRQOL than those after traditional stroke health-education. Both the SHEMA intervention and traditional stroke health-education can improve patients’ knowledge of stroke risk factors, but the SHEMA was not superior to traditional stroke health-education. NCT02591511 Verification Date 2015-10-01.

    更新日期:2019-12-22
  • Selection process for botulinum toxin injections in patients with chronic-stage hemiplegic stroke: a qualitative study
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Sawako Arai; Yuko Fukase; Akira Okii; Yoshimi Suzukamo; Toshimitsu Suga

    Botulinum toxin (BT) injection is a new treatment for spasticity with hemiplegia after stroke. How a patient decides to receive BT injections after becoming aware of the treatment remains unclear. In this exploratory qualitative study, we aimed to investigate patients’ decision-making about treatment strategies in collaboration with family and health professionals and to identify conflicts in patients’ feelings about BT treatment. The study included six patients with stroke sequelae. Data were collected using comprehensive interviews and were analyzed using the grounded theory approach and trajectory equifinality modeling. After patients learned about BT treatment, they clearly exhibited the following two concurrent perceptions: “the restriction of one’s life due to disabilities” and “the ability to do certain things despite one’s disabilities.” Some patients reported a “fear of not being able to maintain the status quo owing to the side effects of BT.” To alleviate this fear, timely support from family members was offered, and patients overcame anxiety through creative thinking. However, there were also expressions that revealed patients’ difficulties dealing with negative events. These factors influenced the patients’ development of “expectations of BT” or “hesitations about BT.” To establish treatment strategies in collaboration with patients, healthcare professionals should show supportive attitudes and have discussions with patients and their family members to help patients resolve their conflicts and should establish treatment strategies that maintain the positive aspects of patients’ lives.

    更新日期:2019-12-20
  • Evaluating global and local sequence alignment methods for comparing patient medical records
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Ming Huang; Nilay D. Shah; Lixia Yao

    Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.

    更新日期:2019-12-19
  • Heterogeneous information network based clustering for precision traditional Chinese medicine
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Xintian Chen; Chunyang Ruan; Yanchun Zhang; Huijuan Chen

    Traditional Chinese medicine (TCM) is a highly important complement to modern medicine and is widely practiced in China and in many other countries. The work of Chinese medicine is subject to the two factors of the inheritance and development of clinical experience of famous Chinese medicine practitioners and the difficulty in improving the service capacity of basic Chinese medicine practitioners. Heterogeneous information networks (HINs) are a kind of graphical model for integrating and modeling real-world information. Through HINs, we can integrate and model the large-scale heterogeneous TCM data into structured graph data and use this as a basis for analysis. Mining categorizations from TCM data is an important task for precision medicine. In this paper, we propose a novel structured learning model to solve the problem of formula regularity, a pivotal task in prescription optimization. We integrate clustering with ranking in a heterogeneous information network. The results from experiments on the Pharmacopoeia of the People’s Republic of China (ChP) demonstrate the effectiveness and accuracy of the proposed model for discovering useful categorizations of formulas. We use heterogeneous information networks to model TCM data and propose a TCM-HIN. Combining the heterogeneous graph with the probability graph, we proposed the TCM-Clus algorithm, which combines clustering with ranking and classifies traditional Chinese medicine prescriptions. The results of the categorizations can help Chinese medicine practitioners to make clinical decision.

    更新日期:2019-12-19
  • Fast read alignment with incorporation of known genomic variants
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Hongzhe Guo; Bo Liu; Dengfeng Guan; Yilei Fu; Yadong Wang

    Many genetic variants have been reported from sequencing projects due to decreasing experimental costs. Compared to the current typical paradigm, read mapping incorporating existing variants can improve the performance of subsequent analysis. This method is supposed to map sequencing reads efficiently to a graphical index with a reference genome and known variation to increase alignment quality and variant calling accuracy. However, storing and indexing various types of variation require costly RAM space. Aligning reads to a graph model-based index including the whole set of variants is ultimately an NP-hard problem in theory. Here, we propose a variation-aware read alignment algorithm (VARA), which generates the alignment between read and multiple genomic sequences simultaneously utilizing the schema of the Landau-Vishkin algorithm. VARA dynamically extracts regional variants to construct a pseudo tree-based structure on-the-fly for seed extension without loading the whole genome variation into memory space. We developed the novel high-throughput sequencing read aligner deBGA-VARA by integrating VARA into deBGA. The deBGA-VARA is benchmarked both on simulated reads and the NA12878 sequencing dataset. The experimental results demonstrate that read alignment incorporating genetic variation knowledge can achieve high sensitivity and accuracy. Due to its efficiency, VARA provides a promising solution for further improvement of variant calling while maintaining small memory footprints. The deBGA-VARA is available at: https://github.com/hitbc/deBGA-VARA.

    更新日期:2019-12-19
  • Statistical and spectral analysis of ECG signal towards achieving non-invasive blood glucose monitoring
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Igbe Tobore; Jingzhen Li; Abhishek Kandwal; Liu Yuhang; Zedong Nie; Lei Wang

    Globally, the cases of diabetes mellitus (diabetes) have increased in the past three decades, and it is recorded as one of the leading cause of death. This epidemic is a metabolic condition where the body cannot regulate blood glucose, thereby leading to abnormally high blood sugar. Genetic condition plays a significant role to determine a person susceptibility to the condition, a sedentary lifestyle and an unhealthy diet are behaviour that supports the current global epidemic. The complication that arises from diabetes includes loss of vision, peripheral neuropathy, cardiovascular complications and so on. Victims of this condition require constant monitoring of blood glucose which is done by the pricking of the finger. This procedure is painful, inconvenient and can lead to disease infection. Therefore, it is important to find a way to measure blood glucose non-invasively to minimize or eliminate the disadvantages encountered with the usual monitoring of blood glucose. In this paper, we performed two experiments on 16 participants while electrocardiogram (ECG) data was continuously captured. In the first experiment, participants are required to consume 75 g of anhydrous glucose solution (oral glucose tolerance test) and the second experiment, no glucose solution was taken. We explored statistical and spectral analysis on HRV, HR, R-H, P-H, PRQ, QRS, QT, QTC and ST segments derived from ECG signal to investigate which segments should be considered for the possibility of achieving non-invasive blood glucose monitoring. In the statistical analysis, we examined the pattern of the data with the boxplot technique to reveal the change in the statistical properties of the data. Power spectral density estimation was adopted for the spectral analysis to show the frequency distribution of the data. HRV segment obtained a statistical score of 81% for decreasing pattern and HR segment have the same statistical score for increasing pattern among the participants in the first quartile, median and mean properties. While ST segment has a statistical score of 81% for decreasing pattern in the third quartile, QT segment has 81% for increasing pattern for the median. From a total change score of 6, ST, QT, PRQ, P-H, HR and HRV obtained 4, 5, 4, 5 and 6 respectively. For spectral analysis, HRV and HR segment scored 81 and 75% respectively. ST, QT, PRQ have 75, 62 and 68% respectively. The results obtained demonstrate that HR, HRV, PRQ, QT and ST segments under a normal, healthy condition are affected by glucose and should be considered for modelling a system to achieve the possibility of non-invasive blood glucose measurement with ECG.

    更新日期:2019-12-19
  • Incorporating medical code descriptions for diagnosis prediction in healthcare
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Fenglong Ma; Yaqing Wang; Houping Xiao; Ye Yuan; Radha Chitta; Jing Zhou; Jing Gao

    Diagnosis aims to predict the future health status of patients according to their historical electronic health records (EHR), which is an important yet challenging task in healthcare informatics. Existing diagnosis prediction approaches mainly employ recurrent neural networks (RNN) with attention mechanisms to make predictions. However, these approaches ignore the importance of code descriptions, i.e., the medical definitions of diagnosis codes. We believe that taking diagnosis code descriptions into account can help the state-of-the-art models not only to learn meaning code representations, but also to improve the predictive performance, especially when the EHR data are insufficient. We propose a simple, but general diagnosis prediction framework, which includes two basic components: diagnosis code embedding and predictive model. To learn the interpretable code embeddings, we apply convolutional neural networks (CNN) to model medical descriptions of diagnosis codes extracted from online medical websites. The learned medical embedding matrix is used to embed the input visits into vector representations, which are fed into the predictive models. Any existing diagnosis prediction approach (referred to as the base model) can be cast into the proposed framework as the predictive model (called the enhanced model). We conduct experiments on two real medical datasets: the MIMIC-III dataset and the Heart Failure claim dataset. Experimental results show that the enhanced diagnosis prediction approaches significantly improve the prediction performance. Moreover, we validate the effectiveness of the proposed framework with insufficient EHR data. Finally, we visualize the learned medical code embeddings to show the interpretability of the proposed framework. Given the historical visit records of a patient, the proposed framework is able to predict the next visit information by incorporating medical code descriptions.

    更新日期:2019-12-19
  • EEG-based image classification via a region-level stacked bi-directional deep learning framework
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Ahmed Fares; Sheng-hua Zhong; Jianmin Jiang

    As a physiological signal, EEG data cannot be subjectively changed or hidden. Compared with other physiological signals, EEG signals are directly related to human cortical activities with excellent temporal resolution. After the rapid development of machine learning and artificial intelligence, the analysis and calculation of EEGs has made great progress, leading to a significant boost in performances for content understanding and pattern recognition of brain activities across the areas of both neural science and computer vision. While such an enormous advance has attracted wide range of interests among relevant research communities, EEG-based classification of brain activities evoked by images still demands efforts for further improvement with respect to its accuracy, generalization, and interpretation, yet some characters of human brains have been relatively unexplored. We propose a region-level stacked bi-directional deep learning framework for EEG-based image classification. Inspired by the hemispheric lateralization of human brains, we propose to extract additional information at regional level to strengthen and emphasize the differences between two hemispheres. The stacked bi-directional long short-term memories are used to capture the dynamic correlations hidden from both the past and the future to the current state in EEG sequences. Extensive experiments are carried out and our results demonstrate the effectiveness of our proposed framework. Compared with the existing state-of-the-arts, our framework achieves outstanding performances in EEG-based classification of brain activities evoked by images. In addition, we find that the signals of Gamma band are not only useful for achieving good performances for EEG-based image classification, but also play a significant role in capturing relationships between the neural activations and the specific emotional states. Our proposed framework provides an improved solution for the problem that, given an image used to stimulate brain activities, we should be able to identify which class the stimuli image comes from by analyzing the EEG signals. The region-level information is extracted to preserve and emphasize the hemispheric lateralization for neural functions or cognitive processes of human brains. Further, stacked bi-directional LSTMs are used to capture the dynamic correlations hidden in EEG data. Extensive experiments on standard EEG-based image classification dataset validate that our framework outperforms the existing state-of-the-arts under various contexts and experimental setups.

    更新日期:2019-12-19
  • MultiSourcDSim: an integrated approach for exploring disease similarity
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Lei Deng; Danyi Ye; Junmin Zhao; Jingpu Zhang

    A collection of disease-associated data contributes to study the association between diseases. Discovering closely related diseases plays a crucial role in revealing their common pathogenic mechanisms. This might further imply treatment that can be appropriated from one disease to another. During the past decades, a number of approaches for calculating disease similarity have been developed. However, most of them are designed to take advantage of single or few data sources, which results in their low accuracy. In this paper, we propose a novel method, called MultiSourcDSim, to calculate disease similarity by integrating multiple data sources, namely, gene-disease associations, GO biological process-disease associations and symptom-disease associations. Firstly, we establish three disease similarity networks according to the three disease-related data sources respectively. Secondly, the representation of each node is obtained by integrating the three small disease similarity networks. In the end, the learned representations are applied to calculate the similarity between diseases. Our approach shows the best performance compared to the other three popular methods. Besides, the similarity network built by MultiSourcDSim suggests that our method can also uncover the latent relationships between diseases. MultiSourcDSim is an efficient approach to predict similarity between diseases.

    更新日期:2019-12-19
  • Inter/intra-frame constrained vascular segmentation in X-ray angiographic image sequence
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Shuang Song; Chenbing Du; Ying Chen; Danni Ai; Hong Song; Yong Huang; Yongtian Wang; Jian Yang

    Automatic vascular segmentation in X-ray angiographic image sequence is of crucial interest, for instance, for better quantifying coronary arteries in diagnostic and interventional procedures. A novel inter/intra-frame constrained vascular segmentation method is proposed to automatically segment vessels in coronary X-ray angiographic image sequence. First, a morphological filter operator is applied to remove structures undergoing the respiratory motion from the original image sequence. Second, an inter-frame constrained robust principal component analysis (RPCA) is utilized to remove the quasi-static structures from the image sequence. Third, an intra-frame constrained RPCA is employed to smooth the final extracted vascular sequence. Fourth, a multi-feature fusion is designed to improve the vascular contrast and the final vascular segmentation is realized by thresholding-based method. Experiments are conducted on 22 clinical X-ray angiographic image sequences. The global and local contrast-to-noise ratio of the proposed method are 6.6344 and 4.2882, respectively. And the precision, sensitivity and F1 value are 0.7378, 0.7960 and 0.7658, respectively. It demonstrates that our method is effective and robust for vascular segmentation from image sequence. The proposed method is effective to remove non-vascular structures, reduce motion artefacts and other non-uniform illumination caused noises. Also, the proposed method is online which can just process one image per time without re-optimizing the model.

    更新日期:2019-12-19
  • Expression alteration of microRNAs in Nucleus Accumbens is associated with chronic stress and antidepressant treatment in rats
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Weichen Song; Yifeng Shen; Yanhua Zhang; Sufang Peng; Ran Zhang; Ailing Ning; Huafang Li; Xia Li; Guan Ning Lin; Shunying Yu

    Nucleus Accumbens (NAc) is a vital brain region for the process of reward and stress, whereas microRNA plays a crucial role in depression pathology. However, the abnormality of NAc miRNA expression during the stress-induced depression and antidepressant treatment, as well as its biological significance, are still unknown. We performed the small RNA-sequencing in NAc of rats from three groups: control, chronic unpredictable mild stress (CUMS), and CUMS with an antidepressant, Escitalopram. We applied an integrative pipeline for analyzing the miRNA expression alternation in different model groups, including differential expression analysis, co-expression analysis, as well as a subsequent pathway/network analysis to discover both miRNA alteration pattern and its biological significance. A total of 423 miRNAs were included in analysis.18/8 differential expressing (DE) miRNA (adjusted p < 0.05, |log2FC| > 1) were observed in controls Vs. depression/depression Vs. treatment, 2 of which are overlapping. 78% (14/18) of these miRNAs showed opposite trends of alteration in stress and treatment. Two micro RNA, miR-10b-5p and miR-214-3p, appeared to be hubs in the regulation networks and also among the top findings in both differential analyses. Using co-expression analysis, we found a functional module that strongly correlated with stress (R = 0.96, P = 0.003), and another functional module with a moderate correlation with anhedonia (R = 0.89, P = 0.02). We also found that predicted targets of these miRNAs were significantly enriched in the Ras signaling pathway, which is associated with both depression, anhedonia, and antidepressant treatment. Escitalopram treatment can significantly reverse NAc miRNA abnormality induced by chronic stress. However, the novel miRNA alteration that is absent in stress pathology also emerges, which means that antidepressant treatment is unlikely to bring miRNA expression back to the same level as the controls. Also, the Ras-signaling pathway may be involved in explaining the depression disease etiology, the clinical symptom, and treatment response of stress-induced depression.

    更新日期:2019-12-19
  • Automated segmentation of cardiomyocyte Z-disks from high-throughput scanning electron microscopy data
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-19
    Afshin Khadangi; Eric Hanssen; Vijay Rajagopal

    With the advent of new high-throughput electron microscopy techniques such as serial block-face scanning electron microscopy (SBF-SEM) and focused ion-beam scanning electron microscopy (FIB-SEM) biomedical scientists can study sub-cellular structural mechanisms of heart disease at high resolution and high volume. Among several key components that determine healthy contractile function in cardiomyocytes are Z-disks or Z-lines, which are located at the lateral borders of the sarcomere, the fundamental unit of striated muscle. Z-disks play the important role of anchoring contractile proteins within the cell that make the heartbeat. Changes to their organization can affect the force with which the cardiomyocyte contracts and may also affect signaling pathways that regulate cardiomyocyte health and function. Compared to other components in the cell, such as mitochondria, Z-disks appear as very thin linear structures in microscopy data with limited difference in contrast to the remaining components of the cell. In this paper, we propose to generate a 3D model of Z-disks within single adult cardiac cells from an automated segmentation of a large serial-block-face scanning electron microscopy (SBF-SEM) dataset. The proposed fully automated segmentation scheme is comprised of three main modules including “pre-processing”, “segmentation” and “refinement”. We represent a simple, yet effective model to perform segmentation and refinement steps. Contrast stretching, and Gaussian kernels are used to pre-process the dataset, and well-known “Sobel operators” are used in the segmentation module. We have validated our model by comparing segmentation results with ground-truth annotated Z-disks in terms of pixel-wise accuracy. The results show that our model correctly detects Z-disks with 90.56% accuracy. We also compare and contrast the accuracy of the proposed algorithm in segmenting a FIB-SEM dataset against the accuracy of segmentations from a machine learning program called Ilastik and discuss the advantages and disadvantages that these two approaches have. Our validation results demonstrate the robustness and reliability of our algorithm and model both in terms of validation metrics and in terms of a comparison with a 3D visualisation of Z-disks obtained using immunofluorescence based confocal imaging.

    更新日期:2019-12-19
  • Towards early detection of adverse drug reactions: combining pre-clinical drug structures and post-market safety reports
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-18
    Ruoqi Liu; Ping Zhang

    Adverse drug reaction (ADR) is a major burden for patients and healthcare industry. Early and accurate detection of potential ADRs can help to improve drug safety and reduce financial costs. Post-market spontaneous reports of ADRs remain a cornerstone of pharmacovigilance and a series of drug safety signal detection methods play an important role in providing drug safety insights. However, existing methods require sufficient case reports to generate signals, limiting their usages for newly approved drugs with few (or even no) reports. In this study, we propose a label propagation framework to enhance drug safety signals by combining drug chemical structures with FDA Adverse Event Reporting System (FAERS). First, we compute original drug safety signals via common signal detection algorithms. Then, we construct a drug similarity network based on chemical structures. Finally, we generate enhanced drug safety signals by propagating original signals on the drug similarity network. Our proposed framework enriches post-market safety reports with pre-clinical drug similarity network, effectively alleviating issues of insufficient cases for newly approved drugs. We apply the label propagation framework to four popular signal detection algorithms (PRR, ROR, MGPS, BCPNN) and find that our proposed framework generates more accurate drug safety signals than the corresponding baselines. In addition, our framework identifies potential ADRs for newly approved drugs, thus paving the way for early detection of ADRs. The proposed label propagation framework combines pre-clinical drug structures with post-market safety reports, generates enhanced drug safety signals, and can potentially help to accurately detect ADRs ahead of time. The source code for this paper is available at: https://github.com/ruoqi-liu/LP-SDA.

    更新日期:2019-12-19
  • Promoting healthy teenage behaviour across three European countries through the use of a novel smartphone technology platform, PEGASO fit for future: study protocol of a quasi-experimental, controlled, multi-Centre trial
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-17
    Elisa Puigdomenech; Anne Martin; Alexandra Lang; Fulvio Adorni; Santiago Felipe Gomez; Brian McKinstry; Federica Prinelli; Laura Condon; Rajeeb Rashid; Maurizio Caon; Sarah Atkinson; Claudio L. Lafortuna; Valentina Ciociola; Janet Hanley; Lucy McCloughan; Conxa Castell; Mireia Espallargues

    Behaviour change interventions targeting physical activity, diet, sleep and sedentary behaviour of teenagers show promise when delivered through smartphones. However, to date there is no evidence of effectiveness of multicomponent smartphone-based interventions. Utilising a user-centred design approach, we developed a theory-based, multi-dimensional system, PEGASO Fit For Future (PEGASO F4F), which exploits sophisticated game mechanics involving smartphone applications, a smartphone game and activity sensors to motivate teenagers to take an active role in adopting and maintaining a healthy lifestyle. This paper describes the study protocol to assess the feasibility, usability and effectiveness (knowledge/awareness and behavioural change in lifestyle) of the PEGASO system. We are conducting a quasi-experimental controlled cluster trial in 4 sites in Spain, Italy, and UK (England, Scotland) over 6 months. We plan to recruit 525, in a 2:1 basis, teenagers aged 13–16 years from secondary schools. The intervention group is provided with the PEGASO system whereas the comparison group continues their usual educational routine. Outcomes include feasibility, acceptance, and usability of the PEGASO system as well as between and within group changes in motivation, self-reported diet, physical activity, sedentary and sleeping behaviour, anthropometric measures and knowledge about a healthy lifestyle. PEGASO F4F will provide evidence into the cross-cultural similarities and differences in the feasibility, acceptability and usability of a multi-dimensional smartphone based behaviour change intervention for teenagers. The study will explore facilitating factors, challenges and barriers of engaging teenagers to adapt and maintain a healthy lifestyle when using smartphone technology. Positive results from this ICT based multi component intervention may have significant implications both at clinical level, improving teenagers health and at public health level since it can present an influential tool against the development of chronic disease during adulthood. https://clinicaltrials.gov Registration number: NCT02930148, registered 4 October 2016.

    更新日期:2019-12-18
  • A temporal visualization of chronic obstructive pulmonary disease progression using deep learning and unstructured clinical notes
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-17
    Chunlei Tang; Joseph M. Plasek; Haohan Zhang; Min-Jeoung Kang; Haokai Sheng; Yun Xiong; David W. Bates; Li Zhou

    Chronic obstructive pulmonary disease (COPD) is a progressive lung disease that is classified into stages based on disease severity. We aimed to characterize the time to progression prior to death in patients with COPD and to generate a temporal visualization that describes signs and symptoms during different stages of COPD progression. We present a two-step approach for visualizing COPD progression at the level of unstructured clinical notes. We included 15,500 COPD patients who both received care within Partners Healthcare’s network and died between 2011 and 2017. We first propose a four-layer deep learning model that utilizes a specially configured recurrent neural network to capture irregular time lapse segments. Using those irregular time lapse segments, we created a temporal visualization (the COPD atlas) to demonstrate COPD progression, which consisted of representative sentences at each time window prior to death based on a fraction of theme words produced by a latent Dirichlet allocation model. We evaluated our approach on an annotated corpus of COPD patients’ unstructured pulmonary, radiology, and cardiology notes. Experiments compared to the baselines showed that our proposed approach improved interpretability as well as the accuracy of estimating COPD progression. Our experiments demonstrated that the proposed deep-learning approach to handling temporal variation in COPD progression is feasible and can be used to generate a graphical representation of disease progression using information extracted from clinical notes.

    更新日期:2019-12-17
  • Representation learning for clinical time series prediction tasks in electronic health records
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-17
    Tong Ruan; Liqi Lei; Yangming Zhou; Jie Zhai; Le Zhang; Ping He; Ju Gao

    Electronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful. In this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector. Based on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the “Deep Feature” represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations. We propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.

    更新日期:2019-12-17
  • A low-cost vision system based on the analysis of motor features for recognition and severity rating of Parkinson’s Disease
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-12
    Domenico Buongiorno; Ilaria Bortone; Giacomo Donato Cascarano; Gianpaolo Francesco Trotta; Antonio Brunetti; Vitoantonio Bevilacqua

    Assessment and rating of Parkinson’s Disease (PD) are commonly based on the medical observation of several clinical manifestations, including the analysis of motor activities. In particular, medical specialists refer to the MDS-UPDRS (Movement Disorder Society – sponsored revision of Unified Parkinson’s Disease Rating Scale) that is the most widely used clinical scale for PD rating. However, clinical scales rely on the observation of some subtle motor phenomena that are either difficult to capture with human eyes or could be misclassified. This limitation motivated several researchers to develop intelligent systems based on machine learning algorithms able to automatically recognize the PD. Nevertheless, most of the previous studies investigated the classification between healthy subjects and PD patients without considering the automatic rating of different levels of severity. In this context, we implemented a simple and low-cost clinical tool that can extract postural and kinematic features with the Microsoft Kinect v2 sensor in order to classify and rate PD. Thirty participants were enrolled for the purpose of the present study: sixteen PD patients rated according to MDS-UPDRS and fourteen healthy paired subjects. In order to investigate the motor abilities of the upper and lower body, we acquired and analyzed three main motor tasks: (1) gait, (2) finger tapping, and (3) foot tapping. After preliminary feature selection, different classifiers based on Support Vector Machine (SVM) and Artificial Neural Networks (ANN) were trained and evaluated for the best solution. Concerning the gait analysis, results showed that the ANN classifier performed the best by reaching 89.4% of accuracy with only nine features in diagnosis PD and 95.0% of accuracy with only six features in rating PD severity. Regarding the finger and foot tapping analysis, results showed that an SVM using the extracted features was able to classify healthy subjects versus PD patients with great performances by reaching 87.1% of accuracy. The results of the classification between mild and moderate PD patients indicated that the foot tapping features were the most representative ones to discriminate (81.0% of accuracy). The results of this study have shown how a low-cost vision-based system can automatically detect subtle phenomena featuring the PD. Our findings suggest that the proposed tool can support medical specialists in the assessment and rating of PD patients in a real clinical scenario.

    更新日期:2019-12-13
  • A comparison between two semantic deep learning frameworks for the autosomal dominant polycystic kidney disease segmentation based on magnetic resonance images
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-12
    Vitoantonio Bevilacqua; Antonio Brunetti; Giacomo Donato Cascarano; Andrea Guerriero; Francesco Pesce; Marco Moschetta; Loreto Gesualdo

    The automatic segmentation of kidneys in medical images is not a trivial task when the subjects undergoing the medical examination are affected by Autosomal Dominant Polycystic Kidney Disease (ADPKD). Several works dealing with the segmentation of Computed Tomography images from pathological subjects were proposed, showing high invasiveness of the examination or requiring interaction by the user for performing the segmentation of the images. In this work, we propose a fully-automated approach for the segmentation of Magnetic Resonance images, both reducing the invasiveness of the acquisition device and not requiring any interaction by the users for the segmentation of the images. Two different approaches are proposed based on Deep Learning architectures using Convolutional Neural Networks (CNN) for the semantic segmentation of images, without needing to extract any hand-crafted features. In details, the first approach performs the automatic segmentation of images without any procedure for pre-processing the input. Conversely, the second approach performs a two-steps classification strategy: a first CNN automatically detects Regions Of Interest (ROIs); a subsequent classifier performs the semantic segmentation on the ROIs previously extracted. Results show that even though the detection of ROIs shows an overall high number of false positives, the subsequent semantic segmentation on the extracted ROIs allows achieving high performance in terms of mean Accuracy. However, the segmentation of the entire images input to the network remains the most accurate and reliable approach showing better performance than the previous approach. The obtained results show that both the investigated approaches are reliable for the semantic segmentation of polycystic kidneys since both the strategies reach an Accuracy higher than 85%. Also, both the investigated methodologies show performances comparable and consistent with other approaches found in literature working on images from different sources, reducing both the invasiveness of the analyses and the interaction needed by the users for performing the segmentation task.

    更新日期:2019-12-13
  • An adaptive term proximity based rocchio’s model for clinical decision support retrieval
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-12
    Min Pan; Yue Zhang; Qiang Zhu; Bo Sun; Tingting He; Xingpeng Jiang

    In order to better help doctors make decision in the clinical setting, research is necessary to connect electronic health record (EHR) with the biomedical literature. Pseudo Relevance Feedback (PRF) is a kind of classical query modification technique that has shown to be effective in many retrieval models and thus suitable for handling terse language and clinical jargons in EHR. Previous work has introduced a set of constraints (axioms) of traditional PRF model. However, in the feedback document, the importance degree of candidate term and the co-occurrence relationship between a candidate term and a query term. Most methods do not consider both of these factors. Intuitively, terms that have higher co-occurrence degree with a query term are more likely to be related to the query topic. In this paper, we incorporate original HAL model into the Rocchio’s model, and propose a new concept of term proximity feedback weight. A HAL-based Rocchio’s model in the query expansion, called HRoc, is proposed. Meanwhile, we design three normalization methods to better incorporate proximity information to query expansion. Finally, we introduce an adaptive parameter to replace the length of sliding window of HAL model, and it can select window size according to document length. Based on 2016 TREC Clinical Support medicine dataset, experimental results demonstrate that the proposed HRoc and HRoc_AP models superior to other advanced models, such as PRoc2 and TF-PRF methods on various evaluation metrics. Among them, compared with the Proc2 and TF-PRF models, the MAP of our model is increased by 8.5% and 12.24% respectively, while the F1 score of our model is increased by 7.86% and 9.88% respectively. The proposed HRoc model can effectively enhance the precision and the recall rate of Information Retrieval and gets a more precise result than other models. Furthermore, after introducing self-adaptive parameter, the advanced HRoc_AP model uses less hyper-parameters than other models while enjoys an equivalent performance, which greatly improves the efficiency and applicability of the model and thus helps clinicians to retrieve clinical support document effectively.

    更新日期:2019-12-13
  • Biometric handwriting analysis to support Parkinson’s Disease assessment and grading
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-12
    Giacomo Donato Cascarano; Claudio Loconsole; Antonio Brunetti; Antonio Lattarulo; Domenico Buongiorno; Giacomo Losavio; Eugenio Di Sciascio; Vitoantonio Bevilacqua

    Handwriting represents one of the major symptom in Parkinson’s Disease (PD) patients. The computer-aided analysis of the handwriting allows for the identification of promising patterns that might be useful in PD detection and rating. In this study, we propose an innovative set of features extracted by geometrical, dynamical and muscle activation signals acquired during handwriting tasks, and evaluate the contribution of such features in detecting and rating PD by means of artificial neural networks. Eleven healthy subjects and twenty-one PD patients were enrolled in this study. Each involved subject was asked to write three different patterns on a graphic tablet while wearing the Myo Armband used to collect the muscle activation signals of the main forearm muscles. We have then extracted several features related to the written pattern, the movement of the pen and the pressure exerted with the pen and the muscle activations. The computed features have been used to classify healthy subjects versus PD patients and to discriminate mild PD patients from moderate PD patients by using an artificial neural network (ANN). After the training and evaluation of different ANN topologies, the obtained results showed that the proposed features have high relevance in PD detection and rating. In particular, we found that our approach both detect and rate (mild and moderate PD) with a classification accuracy higher than 90%. In this paper we have investigated the representativeness of a set of proposed features related to handwriting tasks in PD detection and rating. In particular, we used an ANN to classify healthy subjects and PD patients (PD detection), and to classify mild and moderate PD patients (PD rating). The implemented and tested methods showed promising results proven by the high level of accuracy, sensitivity and specificity. Such results suggest the usability of the proposed setup in clinical settings to support the medical decision about Parkinson’s Disease.

    更新日期:2019-12-13
  • Implementation of machine learning algorithms to create diabetic patient re-admission profiles
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-12
    Mohamed Alloghani; Ahmed Aljaaf; Abir Hussain; Thar Baker; Jamila Mustafina; Dhiya Al-Jumeily; Mohammed Khalaf

    Machine learning is a branch of Artificial Intelligence that is concerned with the design and development of algorithms, and it enables today’s computers to have the property of learning. Machine learning is gradually growing and becoming a critical approach in many domains such as health, education, and business. In this paper, we applied machine learning to the diabetes dataset with the aim of recognizing patterns and combinations of factors that characterizes or explain re-admission among diabetes patients. The classifiers used include Linear Discriminant Analysis, Random Forest, k–Nearest Neighbor, Naïve Bayes, J48 and Support vector machine. Of the 100,000 cases, 78,363 were diabetic and over 47% were readmitted.Based on the classes that models produced, diabetic patients who are more likely to be readmitted are either women, or Caucasians, or outpatients, or those who undergo less rigorous lab procedures, treatment procedures, or those who receive less medication, and are thus discharged without proper improvements or administration of insulin despite having been tested positive for HbA1c. Diabetic patients who do not undergo vigorous lab assessments, diagnosis, medications are more likely to be readmitted when discharged without improvements and without receiving insulin administration, especially if they are women, Caucasians, or both.

    更新日期:2019-12-13
  • Representation learning in intraoperative vital signs for heart failure risk prediction
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-09
    Yuwen Chen; Baolian Qi

    The probability of heart failure during the perioperative period is 2% on average and it is as high as 17% when accompanied by cardiovascular diseases in China. It has been the most significant cause of postoperative death of patients. However, the patient is managed by the flow of information during the operation, but a lot of clinical information can make it difficult for medical staff to identify the information relevant to patient care. There are major practical and technical barriers to understand perioperative complications. In this work, we present three machine learning methods to estimate risks of heart failure, which extract intraoperative vital signs monitoring data into different modal representations (statistical learning representation, text learning representation, image learning representation). Firstly, we extracted features of vital signs monitoring data of surgical patients by statistical analysis. Secondly, the vital signs data is converted into text information by Piecewise Approximate Aggregation (PAA) and Symbolic Aggregate Approximation (SAX), then Latent Dirichlet Allocation (LDA) model is used to extract text topics of patients for heart failure prediction. Thirdly, the vital sign monitoring time series data of the surgical patient is converted into a grid image by using the grid representation, and then the convolutional neural network is directly used to identify the grid image for heart failure prediction. We evaluated the proposed methods in the monitoring data of real patients during the perioperative period. In this paper, the results of our experiment demonstrate the Gradient Boosting Decision Tree (GBDT) classifier achieves the best results in the prediction of heart failure by statistical feature representation. The sensitivity, specificity and the area under the curve (AUC) of the best method can reach 83, 85 and 84% respectively. The experimental results demonstrate that representation learning model of vital signs monitoring data of intraoperative patients can effectively capture the physiological characteristics of postoperative heart failure.

    更新日期:2019-12-11
  • Using machine learning models to improve stroke risk level classification methods of China national stroke screening
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-10
    Xuemeng Li; Di Bian; Jinghui Yu; Mei Li; Dongsheng Zhao

    With the character of high incidence, high prevalence and high mortality, stroke has brought a heavy burden to families and society in China. In 2009, the Ministry of Health of China launched the China national stroke screening and intervention program, which screens stroke and its risk factors and conducts high-risk population interventions for people aged above 40 years old all over China. In this program, stroke risk factors include hypertension, diabetes, dyslipidemia, smoking, lack of exercise, apparently overweight and family history of stroke. People with more than two risk factors or history of stroke or transient ischemic attack (TIA) are considered as high-risk. However, it is impossible for this criterion to classify stroke risk levels for people with unknown values in fields of risk factors. The missing of stroke risk levels results in reduced efficiency of stroke interventions and inaccuracies in statistical results at the national level. In this paper, we use 2017 national stroke screening data to develop stroke risk classification models based on machine learning algorithms to improve the classification efficiency. Firstly, we construct training set and test sets and process the imbalance training set based on oversampling and undersampling method. Then, we develop logistic regression model, Naïve Bayesian model, Bayesian network model, decision tree model, neural network model, random forest model, bagged decision tree model, voting model and boosting model with decision trees to classify stroke risk levels. The recall of the boosting model with decision trees is the highest (99.94%), and the precision of the model based on the random forest is highest (97.33%). Using the random forest model (recall: 98.44%), the recall will be increased by about 2.8% compared with the method currently used, and several thousands more people with high risk of stroke can be identified each year. Models developed in this paper can improve the current screening method in the way that it can avoid the impact of unknown values, and avoid unnecessary rescreening and intervention expenditures. The national stroke screening program can choose classification models according to the practice need.

    更新日期:2019-12-11
  • Improving reference prioritisation with PICO recognition
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Austin J. Brockmeier; Meizhi Ju; Piotr Przybyła; Sophia Ananiadou

    Machine learning can assist with multiple tasks during systematic reviews to facilitate the rapid retrieval of relevant references during screening and to identify and extract information relevant to the study characteristics, which include the PICO elements of patient/population, intervention, comparator, and outcomes. The latter requires techniques for identifying and categorising fragments of text, known as named entity recognition. A publicly available corpus of PICO annotations on biomedical abstracts is used to train a named entity recognition model, which is implemented as a recurrent neural network. This model is then applied to a separate collection of abstracts for references from systematic reviews within biomedical and health domains. The occurrences of words tagged in the context of specific PICO contexts are used as additional features for a relevancy classification model. Simulations of the machine learning-assisted screening are used to evaluate the work saved by the relevancy model with and without the PICO features. Chi-squared and statistical significance of positive predicted values are used to identify words that are more indicative of relevancy within PICO contexts. Inclusion of PICO features improves the performance metric on 15 of the 20 collections, with substantial gains on certain systematic reviews. Examples of words whose PICO context are more precise can explain this increase. Words within PICO tagged segments in abstracts are predictive features for determining inclusion. Combining PICO annotation model into the relevancy classification pipeline is a promising approach. The annotations may be useful on their own to aid users in pinpointing necessary information for data extraction, or to facilitate semantic search.

    更新日期:2019-12-06
  • Deterrence approach on the compliance with electronic medical records privacy policy: the moderating role of computer monitoring
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-04
    Kuang-Ming Kuo; Paul C. Talley; Tain-Junn Cheng

    This study explored the possible antecedents that will motivate hospital employees’ compliance with privacy policy related to electronic medical records (EMR) from a deterrence perspective. Further, we also investigated the moderating effect of computer monitoring on relationships among the antecedents and the level of hospital employees’ compliance intention. Data was collected from a large Taiwanese medical center using survey methodology. A total of 303 responses was analyzed via hierarchical regression analysis. The results revealed that sanction severity and sanction certainty significantly predict hospital employees’ compliance intention, respectively. Further, our study found external computer monitoring significantly moderates the relationship between sanction certainty and compliance intention. Based on our findings, the study suggests that healthcare facilities should take proactive countermeasures, such as computer monitoring, to better protect the privacy of EMR in addition to stated privacy policy. However, the extent of computer monitoring should be kept to minimum requirements as stated by relevant regulations.

    更新日期:2019-12-05
  • Proof-of-concept study: Homomorphically encrypted data can support real-time learning in personalized cancer medicine
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-04
    Silvia Paddock; Hamed Abedtash; Jacqueline Zummo; Samuel Thomas

    The successful introduction of homomorphic encryption (HE) in clinical research holds promise for improving acceptance of data-sharing protocols, increasing sample sizes, and accelerating learning from real-world data (RWD). A well-scoped use case for HE would pave the way for more widespread adoption in healthcare applications. Determining the efficacy of targeted cancer treatments used off-label for a variety of genetically defined conditions is an excellent candidate for introduction of HE-based learning systems because of a significant unmet need to share and combine confidential data, the use of relatively simple algorithms, and an opportunity to reach large numbers of willing study participants. We used published literature to estimate the numbers of patients who might be eligible to receive treatments approved for other indications based on molecular profiles. We then estimated the sample size and number of variables that would be required for a successful system to detect exceptional responses with sufficient power. We generated an appropriately sized, simulated dataset (n = 5000) and used an established HE algorithm to detect exceptional responses and calculate total drug exposure, while the data remained encrypted. Our results demonstrated the feasibility of using an HE-based system to identify exceptional responders and perform calculations on patient data during a hypothetical 3-year study. Although homomorphically encrypted computations are time consuming, the required basic computations (i.e., addition) do not pose a critical bottleneck to the analysis. In this proof-of-concept study, based on simulated data, we demonstrate that identifying exceptional responders to targeted cancer treatments represents a valuable and feasible use case. Past solutions to either completely anonymize data or restrict access through stringent data use agreements have limited the utility of abundant and valuable data. Because of its privacy protections, we believe that an HE-based learning system for real-world cancer treatment would entice thousands more patients to voluntarily contribute data through participation in research studies beyond the currently available secondary data populated from hospital electronic health records and administrative claims. Forming collaborations between technical experts, physicians, patient advocates, payers, and researchers, and testing the system on existing RWD are critical next steps to making HE-based learning a reality in healthcare.

    更新日期:2019-12-05
  • A study of deep learning methods for de-identification of clinical notes in cross-institute settings
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Xi Yang; Tianchen Lyu; Qian Li; Chih-Yin Lee; Jiang Bian; William R. Hogan; Yonghui Wu

    De-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality. The clinical natural language processing (NLP) community has invested great efforts in developing methods and corpora for de-identification of clinical notes. These annotated corpora are valuable resources for developing automated systems to de-identify clinical text at local hospitals. However, existing studies often utilized training and test data collected from the same institution. There are few studies to explore automated de-identification under cross-institute settings. The goal of this study is to examine deep learning-based de-identification methods at a cross-institute setting, identify the bottlenecks, and provide potential solutions. We created a de-identification corpus using a total 500 clinical notes from the University of Florida (UF) Health, developed deep learning-based de-identification models using 2014 i2b2/UTHealth corpus, and evaluated the performance using UF corpus. We compared five different word embeddings trained from the general English text, clinical text, and biomedical literature, explored lexical and linguistic features, and compared two strategies to customize the deep learning models using UF notes and resources. Pre-trained word embeddings using a general English corpus achieved better performance than embeddings from de-identified clinical text and biomedical literature. The performance of deep learning models trained using only i2b2 corpus significantly dropped (strict and relax F1 scores dropped from 0.9547 and 0.9646 to 0.8568 and 0.8958) when applied to another corpus annotated at UF Health. Linguistic features could further improve the performance of de-identification in cross-institute settings. After customizing the models using UF notes and resource, the best model achieved the strict and relaxed F1 scores of 0.9288 and 0.9584, respectively. It is necessary to customize de-identification models using local clinical text and other resources when applied in cross-institute settings. Fine-tuning is a potential solution to re-use pre-trained parameters and reduce the training time to customize deep learning-based de-identification models trained using clinical corpus from a different institution.

    更新日期:2019-12-05
  • Editorial: The second international workshop on health natural language processing (HealthNLP 2019)
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Yanshan Wang; Hua Xu; Ozlem Uzuner

    In the past few decades, growing adoption of electronic health record (EHR) systems has made massive clinical narrative data available electronically. Natural language processing (NLP) technologies that can unlock information from narrative text have received great attention in the medical domain. Many clinical NLP methods and systems have been developed and showed promising results in various tasks. These methods and tools have also been successfully applied to facilitate clinical research, as well as to support healthcare applications. Recent advancements in artificial intelligence (AI), particularly deep learning-based neural networks, have achieved state-of-the-art performance on diverse NLP tasks in general domain, indicating great opportunities for solving real-world medical problems. At the same time, the amount of health information available online has exploded through use of social media, community forums, and health-related websites. These present additional challenges and opportunities for further development of new NLP methodologies and applications. The goal of this workshop was to provide a unique platform to bring together researchers and practitioners working with health-related free text, and to facilitate close interaction among students, scholars, and industry professionals on health NLP challenges worldwide. We successfully organized the first international workshop on Health Natural Language Processing (HealthNLP 2018) in June, 2018, at New York City, USA [1].. We continued and held the HealthNLP 2019 workshop on June 10th, 2019, at Beijing, China, in conjunction with the IEEE International Conference on Healthcare Informatics (ICHI 2019). The workshop attracted submissions in the form of research papers, poster abstracts, and demonstration papers. All submissions were subjected to rigorous peer-review, with at least two peer-reviews and at least one review by a senior member of the program committee. Selected papers and abstracts were featured as oral / poster presentations at the workshop. We selected and invited eight high-quality submissions to extend their workshop abstracts for this journal supplement. The main focus of the included papers is information extraction from clinical documents using deep learning-based approaches. Heo et al. [2] proposed a hybrid ranking method that combines a co-occurrence approach considering both direct and indirect entity pair relationship with specialized word embeddings for measuring the relatedness of two entities. They evaluated the proposed ranking method with other well-known methods such as co-occurrence, Word2Vec, COALS, and random indexing by calculating top entities related to Alzheimer’s disease. Furthermore, they conducted analysis of gene, pathway, and gene-phenotype relationships and found that the proposed method could find more hidden relationships than the traditional methods. Li and Hou [3] integrated the attention mechanism into a neural network, and proposed an improved clinical named entity recognition method for Chinese electronic medical records called BiLSTM-Att-CRF. Medical dictionaries and part-of-speech (POS) features were also introduced. They evaluated the proposed model on China Conference on Knowledge Graph and Semantic Computing (CCKS) 2017 and 2018 Chinese EMRs corpora, and found the model achieved better performance than other widely-used models. Their work preliminarily confirmed the validity of attention mechanism in extracting information from clinical documents. In Xu et al.’s study [4], they adopted Bidirectional Long Short-Term Memory (BiLSTM) networks and Conditional Random Fields (CRF) to simultaneously identify named entity attributes, and to relate medical concepts to their attributes. Their approach achieved higher accuracy than the traditional systems that tackle two tasks separately on three medical concept-attribute detection tasks: disease-modifier, medication-signature, and lab test-value. They provide a simple yet unified solution to concept-attribute detection without using external data or knowledge bases, and thus streamlined practical clinical NLP systems. De-identification of clinical notes is one of the most crucial prerequisites for utilizing clinical notes in other downstream biomedical informatics studies. Yang et al. [5] explored de-identification in cross-institute settings using deep learning-based approaches: fine-tuning and pre-training. They pre-trained de-identification models, LSTM-CRF, on the University of Florida (UF) Health corpus and fine-tuned the models on i2b2 datasets. They demonstrated that fine-tuning pre-trained models with a small local corpus (i.e., notes from UF Health) could significantly enhance the performance. Wang et al. [6] developed and evaluated a rule-based NLP system to capture information on stage, histology, tumor grade and therapies in lung cancer patients using various clinical narrative documents including clinical notes, pathology reports and surgery reports. Their evaluation of the system showed promising results with precisions and recalls for stage, histology, grade, and therapies. They used convolutional neural networks (CNN) in the error analysis, and found that CNN and the proposed NLP system could identify more true labels than the reference standard. Li et al. [7] developed a disease classification algorithm for accurately recognizing rare diseases from symptom description documents. They leveraged a knowledge graph in representing documents and compared with LSTM models. On two Chinese disease classification data sets, the proposed algorithm delivered robust performance on rare diseases, outperforming a wide range of baselines, including resampling, deep learning, and feature selection methods. A lack of publicly available clinical corpus resources has become a bottleneck for wide adoption of NLP applications in the clinical domain. Sun et al. [8] demonstrated a Chinese clinical corpus and a novel annotation work for chemical disease semantic extraction. The corpus is chronic disease specific and targeted at combination therapy related mining from biomedical abstracts in Chinese. The result analysis of the corpus verified its quality for the chemical-treat-disease relation identification task. The annotated corpus would be a useful resource for developing useful clinical relation extraction methods and tools. In conclusion, the papers included in this special issue highlight the current research trends in health-related NLP field. With the successful applications of deep learning methods in the general domain, researchers have attempted to apply these methods to medical NLP tasks and have achieved promising results. We envision that these studies will have a significant impact on NLP methodologies, tools, and applications in the healthcare domain. Not applicable. AI: artificial intelligence BiLSTM: Bidirectional Long Short-Term Memory CCKS: China Conference on Knowledge Graph and Semantic Computing CNN: convolutional neural networks CRF: Conditional Random Fields EHR: electronic health record HealthNLP: the workshop on Health Natural Language Processing ICHI: the IEEE International Conference on Healthcare Informatics NLP: natural language processing POS: part-of-speech 1. Vydiswaran VGV, Zhang Y, Wang Y, et al. Special issue of BMC medical informatics and decision making on health natural language processing. BMC Med Inform Decis Mak. 2019;19:76. https://doi.org/10.1186/s12911-019-0777-0. 2. Go Eun Heo, Qing Xie and Min Song. A Hybrid Semantic Relatedness Algorithm by Entity Co-Occurrence and Specialized Word Embedding. BMC Med Inform Decis Mak. 2019;19(Supplement 5). https://doi.org/10.1186/s12911-019-0934-5 3. Luqi Li and Li Hou. Combined Attention Mechanism for Named Entity Recognition in Chinese Electronic Medical Records. BMC Med Inform Decis Mak. 2019;19(Supplement 5). https://doi.org/10.1186/s12911-019-0933-6. 4. Jun Xu, Zhiheng Li, Qiang Wei, Yonghui Wu, Yang Xiang, Hee-Jin Lee, Yaoyun Zhang, Stephen Wu and Hua Xu. Applying a Deep Learning-Based Sequence Labeling Approach to Detect Attributes of Medical Concepts in Clinical Text. BMC Med Inform Decis Mak. 2019;19(Supplement 5). https://doi.org/10.1186/s12911-019-0937-2. 5. Xi Yang, Tianchen Lyu, Chih-Yin Lee, Jiang Bian, William Hogan and Yonghui Wu. A Study of Deep Learning Methods for De-identification of Clinical Notes at Cross Institute Settings. BMC Med Inform Decis Mak. 2019;19(Supplement 5). https://doi.org/10.1186/s12911-019-0935-4. 6. Liwei Wang, Lei Luo, Yanshan Wang, Jason A. Wampfler, Ping Yang and Hongfang Liu. Information Extraction for Populating Lung Cancer Clinical Research Data. BMC Med Inform Decis Mak. 2019;19(Supplement 5). https://doi.org/10.1186/s12911-019-0931-8. 7. Xuedong Li, Yue Wang, Dongwu Wang, Walter Yuan, Dezhong Peng and Qiaozhu Mei. Improving Rare Disease Classification Using Imperfect Knowledge Graph. BMC Med Inform Decis Mak. 2019;19(Supplement 5). https://doi.org/10.1186/s12911-019-0938-1. 8. Yueping Sun, Li Hou, Lu Qin, Jiao Li and Qing Qian. RCorp:an Resource for Chemical Disease Semantic Extration in Chinese. BMC Med Inform Decis Mak. 2019;19(Supplement 5). https://doi.org/10.1186/s12911-019-0936-3. Download references Not applicable. About this supplement This article has been published as part of BMC Medical Informatics and Decision Making Volume 19 Supplement 5, 2019: Selected articles from the second International Workshop on Health Natural Language Processing (HealthNLP 2019). The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-19-supplement-5. Funding Not applicable. Affiliations Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Yanshan Wang School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA Hua Xu Information Sciences and Technology, George Mason University, Fairfax, VA, USA Ozlem Uzuner Authors Search for Yanshan Wang in: PubMed • Google Scholar Search for Hua Xu in: PubMed • Google Scholar Search for Ozlem Uzuner in: PubMed • Google Scholar Contributions YW, HX, and OU drafted the manuscript. All authors read and reviewed the final manuscript. All authors read and approved the final manuscript. Corresponding author Correspondence to Hua Xu. Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Reprints and Permissions Cite this article Wang, Y., Xu, H. & Uzuner, O. Editorial: The second international workshop on health natural language processing (HealthNLP 2019). BMC Med Inform Decis Mak 19, 233 (2019) doi:10.1186/s12911-019-0930-9 Download citation Published 05 December 2019 DOI https://doi.org/10.1186/s12911-019-0930-9 Keywords Natural language processing NLP Healthcare Electronic health records EHR Artificial intelligence

    更新日期:2019-12-05
  • RCorp: a resource for chemical disease semantic extraction in Chinese
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Yueping Sun; Li Hou; Lu Qin; Yan Liu; Jiao Li; Qing Qian

    To robustly identify synergistic combinations of drugs, high-throughput screenings are desirable. It will be of great help to automatically identify the relations in the published papers with machine learning based tools. To support the chemical disease semantic relation extraction especially for chronic diseases, a chronic disease specific corpus for combination therapy discovery in Chinese (RCorp) is manually annotated. In this study, we extracted abstracts from a Chinese medical literature server and followed the annotation framework of the BioCreative CDR corpus, with the guidelines modified to make the combination therapy related relations available. An annotation tool was incorporated to the standard annotation process. The resulting RCorp consists of 339 Chinese biomedical articles with 2367 annotated chemicals, 2113 diseases, 237 symptoms, 164 chemical-induce-disease relations, 163 chemical-induce-symptom relations, and 805 chemical-treat-disease relations. Each annotation includes both the mention text spans and normalized concept identifiers. The corpus gets an inter-annotator agreement score of 0.883 for chemical entities, 0.791 for disease entities which are measured by F score. And the F score for chemical-treat-disease relations gets 0.788 after unifying the entity mentions. We extracted and manually annotated a chronic disease specific corpus for combination therapy discovery in Chinese. The result analysis of the corpus proves its quality for the combination therapy related knowledge discovery task. Our annotated corpus would be a useful resource for the modelling of entity recognition and relation extraction tools. In the future, an evaluation based on the corpus will be held.

    更新日期:2019-12-05
  • An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Luqi Li; Jie Zhao; Li Hou; Yunkai Zhai; Jinming Shi; Fangfang Cui

    Clinical named entity recognition (CNER) is important for medical information mining and establishment of high-quality knowledge map. Due to the different text features from natural language and a large number of professional and uncommon clinical terms in Chinese electronic medical records (EMRs), there are still many difficulties in clinical named entity recognition of Chinese EMRs. It is of great importance to eliminate semantic interference and improve the ability of autonomous learning of internal features of the model under the small training corpus. From the perspective of deep learning, we integrated the attention mechanism into neural network, and proposed an improved clinical named entity recognition method for Chinese electronic medical records called BiLSTM-Att-CRF, which could capture more useful information of the context and avoid the problem of missing information caused by long-distance factors. In addition, medical dictionaries and part-of-speech (POS) features were also introduced to improve the performance of the model. Based on China Conference on Knowledge Graph and Semantic Computing (CCKS) 2017 and 2018 Chinese EMRs corpus, our BiLSTM-Att-CRF model finally achieved better performance than other widely-used models without additional features(F1-measure of 85.4% in CCKS 2018, F1-measure of 90.29% in CCKS 2017), and achieved the best performance with POS and dictionary features (F1-measure of 86.11% in CCKS 2018, F1-measure of 90.48% in CCKS 2017). In particular, the BiLSTM-Att-CRF model had significant effect on the improvement of Recall. Our work preliminarily confirmed the validity of attention mechanism in discovering key information and mining text features, which might provide useful ideas for future research in clinical named entity recognition of Chinese electronic medical records. In the future, we will explore the deeper application of attention mechanism in neural network.

    更新日期:2019-12-05
  • Applying a deep learning-based sequence labeling approach to detect attributes of medical concepts in clinical text
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Jun Xu; Zhiheng Li; Qiang Wei; Yonghui Wu; Yang Xiang; Hee-Jin Lee; Yaoyun Zhang; Stephen Wu; Hua Xu

    To detect attributes of medical concepts in clinical text, a traditional method often consists of two steps: named entity recognition of attributes and then relation classification between medical concepts and attributes. Here we present a novel solution, in which attribute detection of given concepts is converted into a sequence labeling problem, thus attribute entity recognition and relation classification are done simultaneously within one step. A neural architecture combining bidirectional Long Short-Term Memory networks and Conditional Random fields (Bi-LSTMs-CRF) was adopted to detect various medical concept-attribute pairs in an efficient way. We then compared our deep learning-based sequence labeling approach with traditional two-step systems for three different attribute detection tasks: disease-modifier, medication-signature, and lab test-value. Our results show that the proposed method achieved higher accuracy than the traditional methods for all three medical concept-attribute detection tasks. This study demonstrates the efficacy of our sequence labeling approach using Bi-LSTM-CRFs on the attribute detection task, indicating its potential to speed up practical clinical NLP applications.

    更新日期:2019-12-05
  • TestIME: an application for evaluating the efficiency of Chinese input method engines in electronic medical record entry task
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Feihong Yang; Haihong Guo; Jiao Li

    With the wide application of Electronic Medical Record (EMR) systems, it has become a daily work for doctors using keyboards to input clinical information into the EMR system. Chinese Input Method Engine (IME) is essential for doctors to convert pinyin to Chinese characters, and an efficient IME would improve doctors’ healthcare work. We developed a tool (called TestIME) to evaluating the efficiency of the current IMEs used in doctors’ working scenario. The proposed TestIME consists of four major function modules: 1) Test tasks assignment, to ensure that participants using different IMEs to complete the same test task in a random order; 2) IME automatic switching, to automatically switch the input method engines without changing the experimental settings; 3) participants’ behavior monitoring, to record the participants’ keystrokes and timestamp during the typing process; 4) questionnaire, to collect the participants’ subjective data. In addition, we designed a preliminary experiment to demonstrate the usability of TestIME. We selected three sentences from EMR corpus and news corpus as test texts respectively, and recruited four participants in a medical school to complete text entry tasks using the TestIME. Our TestIME was able to generate 72 files that record the detailed participants’ keyboard behavior while transcribing test texts, and 4 questionnaires that reflect participants’ psychological states. These profiles can be downloaded in a structured format (CSV) from the TestIME for further analysis. We developed a tool (TestIME) to evaluate Chinese input methods in the EMR entry tasks. In the given text input scenario in healthcare, the TestIME is capable to record doctors’ keyboard behavior, frequently used Chinese terms, IME usability feedback etc. These user profiles are important to improve current IME tools for doctors and further improve healthcare service.

    更新日期:2019-12-05
  • Improving rare disease classification using imperfect knowledge graph
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Xuedong Li; Yue Wang; Dongwu Wang; Walter Yuan; Dezhong Peng; Qiaozhu Mei

    Accurately recognizing rare diseases based on symptom description is an important task in patient triage, early risk stratification, and target therapies. However, due to the very nature of rare diseases, the lack of historical data poses a great challenge to machine learning-based approaches. On the other hand, medical knowledge in automatically constructed knowledge graphs (KGs) has the potential to compensate the lack of labeled training examples. This work aims to develop a rare disease classification algorithm that makes effective use of a knowledge graph, even when the graph is imperfect. We develop a text classification algorithm that represents a document as a combination of a “bag of words” and a “bag of knowledge terms,” where a “knowledge term” is a term shared between the document and the subgraph of KG relevant to the disease classification task. We use two Chinese disease diagnosis corpora to evaluate the algorithm. The first one, HaoDaiFu, contains 51,374 chief complaints categorized into 805 diseases. The second data set, ChinaRe, contains 86,663 patient descriptions categorized into 44 disease categories. On the two evaluation data sets, the proposed algorithm delivers robust performance and outperforms a wide range of baselines, including resampling, deep learning, and feature selection approaches. Both classification-based metric (macro-averaged F1 score) and ranking-based metric (mean reciprocal rank) are used in evaluation. Medical knowledge in large-scale knowledge graphs can be effectively leveraged to improve rare diseases classification models, even when the knowledge graph is incomplete.

    更新日期:2019-12-05
  • Natural language processing for populating lung cancer clinical research data
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Liwei Wang; Lei Luo; Yanshan Wang; Jason Wampfler; Ping Yang; Hongfang Liu

    Lung cancer is the second most common cancer for men and women; the wide adoption of electronic health records (EHRs) offers a potential to accelerate cohort-related epidemiological studies using informatics approaches. Since manual extraction from large volumes of text materials is time consuming and labor intensive, some efforts have emerged to automatically extract information from text for lung cancer patients using natural language processing (NLP), an artificial intelligence technique. In this study, using an existing cohort of 2311 lung cancer patients with information about stage, histology, tumor grade, and therapies (chemotherapy, radiotherapy and surgery) manually ascertained, we developed and evaluated an NLP system to extract information on these variables automatically for the same patients from clinical narratives including clinical notes, pathology reports and surgery reports. Evaluation showed promising results with the recalls for stage, histology, tumor grade, and therapies achieving 89, 98, 78, and 100% respectively and the precisions were 70, 88, 90, and 100% respectively. This study demonstrated the feasibility and accuracy of automatically extracting pre-defined information from clinical narratives for lung cancer research.

    更新日期:2019-12-05
  • Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer’s disease
    BMC Med. Inform. Decis. Mak. (IF 2.067) Pub Date : 2019-12-05
    Go Eun Heo; Qing Xie; Min Song; Jeong-Hoon Lee

    Extracting useful information from biomedical literature plays an important role in the development of modern medicine. In natural language processing, there have been rigorous attempts to find meaningful relationships between entities automatically by co-occurrence-based methods. It has been increasingly important to understand whether relationships exist, and if so how strong, between any two entities extracted from a large number of texts. One of the defining methods is to measure semantic similarity and relatedness between two entities. We propose a hybrid ranking method that combines a co-occurrence approach considering both direct and indirect entity pair relationship with specialized word embeddings for measuring the relatedness of two entities. We evaluate the proposed ranking method comparatively with other well-known methods such as co-occurrence, Word2Vec, COALS (Correlated Occurrence Analog to Lexical Semantics), and random indexing by calculating top-ranked entities related to Alzheimer’s disease. In addition, we analyze gene, pathway, and gene–phenotype relationships. Overall, the proposed method tends to find more hidden relationships than the other methods. Our proposed method is able to select more useful related entities that not only highly co-occur but also have more indirect relations for the target entity. In pathway analysis, our proposed method shows superior performance at identifying (functional) cross clustering and higher-level pathways. Our proposed method, resulting from phenotype analysis, has an advantage in identifying the common genotype relating to phenotypes from biological literature.

    更新日期:2019-12-05
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
2020新春特辑
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
加州大学洛杉矶分校
上海纽约大学William Glover
南开大学化学院周其林
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug