当前期刊: Bioinformatics Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Secure multiparty computation for privacy-preserving drug discovery
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Ma R, Li Y, Li C, et al.

    MotivationQuantitative structure-activity relationship (QSAR) and drug-target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery. ResultsWe have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance, and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature, and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery. Availability and implementationThe source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git. Supplementary informationSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-17
  • Curation and Annotation of Planarian Gene Expression Patterns with Segmented Reference Morphologies
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Roy J, Cheung E, Bhatti J, et al.

    MotivationMorphological and genetic spatial data from functional experiments based on genetic, surgical, and pharmacological perturbations are being produced at an extraordinary pace in developmental and regenerative biology. However, our ability to extract knowledge from these large datasets are hindered due to the lack of formalization methods and tools able to unambiguously describe, centralize, and interpret them. Formalizing spatial phenotypes and gene expression patterns is especially challenging in organisms with highly variable morphologies such as planarian worms, which due to their extraordinary regenerative capability can experimentally result in phenotypes with almost any combination of body regions or parts. ResultsHere we present a computational methodology and mathematical formalism to encode and curate the morphological outcomes and gene expression patterns in planaria. Worm morphologies are encoded with mathematical graphs based on anatomical ontology terms to automatically generate reference morphologies. Gene expression patterns are registered to these standard reference morphologies, which can then be annotated automatically with anatomical ontology terms by analyzing the spatial expression patterns and their textual descriptions. This methodology enables the curation and annotation of complex experimental morphologies together with their gene expression patterns in a centralized standardized dataset, paving the way for the extraction of knowledge and reverse-engineering of the much sought-after mechanistic models in planaria and other regenerative organisms. AvailabilityWe implemented this methodology in a user-friendly graphical software tool, PlanGexQ, freely available together with the data in the manuscript at https://lobolab.umbc.edu/plangexq.

    更新日期:2020-01-17
  • The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Drysdale R, Cook C, Petryszak R, et al.

    MotivationLife science research in academia, industry, agriculture, and the health sector depends critically on free and open data resources. ELIXIR (www.elixir-europe.org), the European Research Infrastructure for life sciences data, has identified a set of Core Data Resources within Europe that are of most fundamental importance for the long-term preservation of biological data. We explore characteristics of their usage, impact and assured funding horizon to assess their value and importance as an infrastructure, to understand sustainability of the infrastructure, and to demonstrate a model for assessing Core Data Resources worldwide. ResultsThe nineteen resources currently designated ELIXIR Core Data Resources form a data infrastructure in Europe which is a subset of the worldwide open life science data infrastructure. We show that, from 2014 to 2018, data managed by the Core Data Resources more than tripled while staff numbers increased by less than a tenth. Additionally, support for the Core Data Resources is precarious: together they have assured funding for less than a third of current staff after four years.Our findings demonstrate the importance of the ELIXIR Core Data Resources as repositories for research data and knowledge, while also demonstrating the uncertain nature of the funding environment for this infrastructure. ELIXIR is working towards longer-term support for the Core Data Resources and, through the Global Biodata Coalition, aims to ensure support for the worldwide life science data resource infrastructure of which the ELIXIR Core Data Resources are a subset. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-17
  • Targeted Realignment of LC-MS Profiles by Neighbor-wise Compound-specific Graphical Time Warping with Misalignment Detection
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Wu C, Wang Y, Wang Y, et al.

    MotivationLiquid chromatography - mass spectrometry (LC-MS) is a standard method for proteomics and metabolomics analysis of biological samples. Unfortunately, it suffers from various changes in the retention times (RT) of the same compound in different samples, and these must be subsequently corrected (aligned) during data processing. Classic alignment methods such as in the popular XCMS package often assume a single time-warping function for each sample. Thus, the potentially varying RT drift for compounds with different masses in a sample is neglected in these methods. Moreover, the systematic change in RT drift across run order is often not considered by alignment algorithms. Therefore, these methods cannot effectively correct all misalignments. For a large-scale experiment involving many samples, the existence of misalignment becomes inevitable and concerning. ResultsHere we describe an integrated reference-free profile alignment method, neighbor-wise compound-specific Graphical Time Warping (ncGTW), that can detect misaligned features and align profiles by leveraging expected RT drift structures and compound-specific warping functions. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. Validated with both realistic synthetic data and internal quality control samples, ncGTW applied to two large-scale metabolomics LC-MS datasets identifies many misaligned features and successfully realigns them. These features would otherwise be discarded or uncorrected using existing methods. The ncGTW software tool is developed currently as a plug-in to detect and realign misaligned features present in standard XCMS output. Availability and ImplementationAn R package of ncGTW is freely available at Bioconductor and https://github.com/ChiungTingWu/ncGTW. A detailed user’s manual and a vignette are provided within the package. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-17
  • Inference of Gene Regulatory Networks Based on Nonlinear Ordinary Differential Equations
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Ma B, Fang M, Jiao X, et al.

    MotivationGene regulatory networks capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks. ResultsIn this paper, we propose a method for inferring GRNs from time-series data and steady-state data jointly. We make use of a nonlinear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity. Availability and implementationThe proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-17
  • BMDx: a graphical Shiny application to perform Benchmark Dose analysis for transcriptomics data
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Serra A, Saarimäki L, Fratello M, et al.

    MotivationThe analysis of dose-dependent effects on the gene expression is gaining attention in the filed of toxicogenomics. Currently available computational methods are usually limited to specific omics platforms or biological annotations and are able to analyse only one experiment at a time. ResultsWe developed the software BMDx with a graphical user interface for the Benchmark Dose (BMD) analysis of transcriptomics data. We implemented an approach based on the fitting of multiple models and the selection of the optimal model based on the Akaike Information Criterion (AIC). The BMDx tool takes as an input a gene expression matrix and a phenotype table, computes the BMD, its related values, and IC50/EC50 estimations. It reports interactive tables and plots that the user can investigate for further details of the fitting, dose effects, and functional enrichment. BMDx allows a fast and convenient comparison of the BMD values of a transcriptomics experiment at different time points, and an effortless way to interpret the results. Furthermore, BMDx allows to analyse and to compare multiple experiments at once. AvailabilityBMDx is implemented as a R/Shiny software and is available at https://github.com/Greco-Lab/BMDx/ Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-17
  • 3D-Cell-Annotator: an open-source active surface tool for single cell segmentation in 3D microscopy images
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Tasnadi E, Toth T, Kovacs M, et al.

    SummarySegmentation of single cells in microscopy images is one of the major challenges in computational biology. It is the first step of most bioimage analysis tasks, and essential to create training sets for more advanced deep learning approaches. Here, we propose 3D-Cell-Annotator to solve this task using 3D active surfaces together with shape descriptors as prior information in a semi-automated fashion. The software uses the convenient 3D interface of the widely used Medical Imaging Interaction Toolkit (MITK). Results on 3D biological structures (e.g. spheroids, organoids, embryos) show that the precision of the segmentation reaches the level of a human expert. Availability and implementation3D-Cell-Annotator is implemented in CUDA/C ++ as a patch for the segmentation module of MITK. The 3D-Cell-Annotator enabled MITK distribution can be downloaded at: www.3D-cell-annotator.org. It works under Windows 64-bit systems and recent Linux distributions even on a consumer level laptop with a CUDA-enabled video card using recent NVIDIA drivers. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-17
  • Simulating trees with millions of species
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Louca S, Schwartz R.

    MotivationThe birth-death model constitutes the theoretical backbone of most phylogenetic tools for reconstructing speciation/extinction dynamics over time. Performing simulations of reconstructed trees (linking extant taxa) under the birth-death model in backward time, conditioned on the number of species sampled at present day and, in some cases, a specific time interval since the most recent common ancestor (MRCA), is needed for assessing the performance of reconstruction tools, for parametric bootstrapping and for detecting data outliers. The few simulation tools that exist scale poorly to large modern phylogenies, which can comprise thousands or even millions of tips (and rising). ResultsHere I present efficient software for simulating reconstructed phylogenies under time-dependent birth-death models in backward time, conditioned on the number of sampled species and (optionally) on the time since the MRCA. On large trees, my software is 1,000–10,000 times faster than existing tools. AvailabilityThe presented software is incorporated into the R package “castor”, which is available on The Comprehensive R Archive Network (CRAN).

    更新日期:2020-01-17
  • RxnBLAST: Molecular Scaffold and Reactive Chemical Environment Feature Extractor for Biochemical Reactions
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Cheng X, Sun D, Zhang D, et al.

    MotivationMolecular scaffolds are useful in medicinal chemistry to describe, discuss, and visualize series of chemical compounds, biochemical transformations, and associated biological properties. ResultsHere, we present RxnBLAST as a web-based tool for analyzing scaffold transformations and reactive chemical environment features in bioreactions. RxnBLAST extracts chemical features from bioreactions including atom–atom mapping, reaction centers, rules, and functional groups to help understand chemical compositions and reaction patterns. Core-to-Core is proposed, which can be utilized in scaffold networks and for constructing a reaction space, as well as providing guidance for subsequent biosynthesis efforts. Supplementary informationRxnBLAST is available at: http://design.rxnfinder.org/rxnblast/

    更新日期:2020-01-17
  • Visualization of circular RNAs and their internal splicing events from transcriptomic data
    Bioinformatics (IF 4.531) Pub Date : 2020-01-17
    Zheng Y, Zhao F, Mathelier A.

    SummaryCircular RNAs are proved to have unique compositions and splicing events distinct from canonical mRNAs. However, there is no visualization tool designed for the exploration of complex splicing patterns in circRNA transcriptomes. Here, we present CIRI-vis, a Java command line tool for quantifying and visualizing circRNAs by integrating the alignments and junctions of circular transcripts. CIRI-vis can be applied to visualize the internal structure and isoform abundance of circRNAs and perform circRNA transcriptome comparison across multiple samples. Availabilityhttps://sourceforge.net/projects/ciri/files/CIRI-vis. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-17
  • Single-sample landscape entropy reveals the imminent phase transition during disease progression
    Bioinformatics (IF 4.531) Pub Date : 2020-01-14
    Liu R, Chen P, Chen L.

    Bioinformatics (2009) doi:10.1093/bioinformatics/btz758

    更新日期:2020-01-14
  • ClinOmicsTrailbc: a visual analytics tool for breast cancer treatment stratification
    Bioinformatics (IF 4.531) Pub Date : 2019-04-30
    Schneider L, Kehl T, Thedinga K, et al.

    MotivationBreast cancer is the second leading cause of cancer death among women. Tumors, even of the same histopathological subtype, exhibit a high genotypic diversity that impedes therapy stratification and that hence must be accounted for in the treatment decision-making process. ResultsHere, we present ClinOmicsTrailbc, a comprehensive visual analytics tool for breast cancer decision support that provides a holistic assessment of standard-of-care targeted drugs, candidates for drug repositioning and immunotherapeutic approaches. To this end, our tool analyzes and visualizes clinical markers and (epi-)genomics and transcriptomics datasets to identify and evaluate the tumor’s main driver mutations, the tumor mutational burden, activity patterns of core cancer-relevant pathways, drug-specific biomarkers, the status of molecular drug targets and pharmacogenomic influences. In order to demonstrate ClinOmicsTrailbc’s rich functionality, we present three case studies highlighting various ways in which ClinOmicsTrailbc can support breast cancer precision medicine. ClinOmicsTrailbc is a powerful integrated visual analytics tool for breast cancer research in general and for therapy stratification in particular, assisting oncologists to find the best possible treatment options for their breast cancer patients based on actionable, evidence-based results. Availability and implementationClinOmicsTrailbc can be freely accessed at https://clinomicstrail.bioinf.uni-sb.de. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Deep learning of the back-splicing code for circular RNA formation
    Bioinformatics (IF 4.531) Pub Date : 2019-05-11
    Wang J, Wang L, Wren J.

    MotivationCircular RNAs (circRNAs) are a new class of endogenous RNAs in animals and plants. During pre-RNA splicing, the 5′ and 3′ termini of exon(s) can be covalently ligated to form circRNAs through back-splicing (head-to-tail splicing). CircRNAs can be conserved across species, show tissue- and developmental stage-specific expression patterns, and may be associated with human disease. However, the mechanism of circRNA formation is still unclear although some sequence features have been shown to affect back-splicing. ResultsIn this study, by applying the state-of-art machine learning techniques, we have developed the first deep learning model, DeepCirCode, to predict back-splicing for human circRNA formation. DeepCirCode utilizes a convolutional neural network (CNN) with nucleotide sequence as the input, and shows superior performance over conventional machine learning algorithms such as support vector machine and random forest. Relevant features learnt by DeepCirCode are represented as sequence motifs, some of which match human known motifs involved in RNA splicing, transcription or translation. Analysis of these motifs shows that their distribution in RNA sequences can be important for back-splicing. Moreover, some of the human motifs appear to be conserved in mouse and fruit fly. The findings provide new insight into the back-splicing code for circRNA formation. Availability and implementationAll the datasets and source code for model construction are available at https://github.com/BioDataLearning/DeepCirCode. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Identification of disease-associated loci using machine learning for genotype and network data integration
    Bioinformatics (IF 4.531) Pub Date : 2019-05-09
    Leal L, David A, Jarvelin M, et al.

    MotivationIntegration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. ResultsWe developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementationAn R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions
    Bioinformatics (IF 4.531) Pub Date : 2019-05-11
    C. Silva A, Bouwmeester R, Martens L, et al.

    MotivationThe use of post-processing tools to maximize the information gained from a proteomics search engine is widely accepted and used by the community, with the most notable example being Percolator—a semi-supervised machine learning model which learns a new scoring function for a given dataset. The usage of such tools is however bound to the search engine’s scoring scheme, which doesn’t always make full use of the intensity information present in a spectrum. We aim to show how this tool can be applied in such a way that maximizes the use of spectrum intensity information by leveraging another machine learning-based tool, MS2PIP. MS2PIP predicts fragment ion peak intensities. ResultsWe show how comparing predicted intensities to annotated experimental spectra by calculating direct similarity metrics provides enough information for a tool such as Percolator to accurately separate two classes of peptide-to-spectrum matches. This approach allows using more information out of the data (compared with simpler intensity based metrics, like peak counting or explained intensities summing) while maintaining control of statistics such as the false discovery rate. Availability and implementationAll of the code is available online at https://github.com/compomics/ms2rescore. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Incorporating spatial–anatomical similarity into the VGWAS framework for AD biomarker detection
    Bioinformatics (IF 4.531) Pub Date : 2019-05-16
    Huang M, Yu Y, Yang W, et al.

    MotivationThe detection of potential biomarkers of Alzheimer’s disease (AD) is crucial for its early prediction, diagnosis and treatment. Voxel-wise genome-wide association study (VGWAS) is a commonly used method in imaging genomics and usually applied to detect AD biomarkers in imaging and genetic data. However, existing VGWAS methods entail large computational cost and disregard spatial correlations within imaging data. A novel method is proposed to solve these issues. ResultsWe introduce a novel method to incorporate spatial correlations into a VGWAS framework for the detection of potential AD biomarkers. To consider the characteristics of AD, we first present a modification of a simple linear iterative clustering method for spatial grouping in an anatomically meaningful manner. Second, we propose a spatial–anatomical similarity matrix to incorporate correlations among voxels. Finally, we detect the potential AD biomarkers from imaging and genetic data by using a fast VGWAS method and test our method on 708 subjects obtained from an Alzheimer’s Disease Neuroimaging Initiative dataset. Results show that our method can successfully detect some new risk genes and clusters of AD. The detected imaging and genetic biomarkers are used as predictors to classify AD/normal control subjects, and a high accuracy of AD/normal control classification is achieved. To the best of our knowledge, the association between imaging and genetic data has yet to be systematically investigated while building statistical models for classifying AD subjects to create a link between imaging genetics and AD. Therefore, our method may provide a new way to gain insights into the underlying pathological mechanism of AD. Availability and implementationhttps://github.com/Meiyan88/SASM-VGWAS.

    更新日期:2020-01-13
  • ARMBIS: accurate and robust matching of brain image sequences from multiple modal imaging techniques
    Bioinformatics (IF 4.531) Pub Date : 2019-05-22
    Shen Q, Xiao G, Zheng Y, et al.

    MotivationStudy of brain images of rodent animals is the most straightforward way to understand brain functions and neural basis of physiological functions. An important step in brain image analysis is to precisely assign signal labels to specified brain regions through matching brain images to standardized brain reference atlases. However, no significant effort has been made to match different types of brain images to atlas images due to influence of artifact operation during slice preparation, relatively low resolution of images and large structural variations in individual brains. ResultsIn this study, we develop a novel image sequence matching procedure, termed accurate and robust matching brain image sequences (ARMBIS), to match brain image sequences to established atlas image sequences. First, for a given query image sequence a scaling factor is estimated to match a reference image sequence by a curve fitting algorithm based on geometric features. Then, the texture features as well as the scale and rotation invariant shape features are extracted, and a dynamic programming-based procedure is designed to select optimal image subsequences. Finally, a hierarchical decision approach is employed to find the best matched subsequence using regional textures. Our simulation studies show that ARMBIS is effective and robust to image deformations such as linear or non-linear scaling, 2D or 3D rotations, tissue tear and tissue loss. We demonstrate the superior performance of ARMBIS on three types of brain images including magnetic resonance imaging, mCherry with 4′,6-diamidino-2-phenylindole (DAPI) staining and green fluorescent protein without DAPI staining images. Availability and implementationThe R software package is freely available at https://www.synapse.org/#!Synapse:syn18638510/wiki/591054 for Not-For-Profit Institutions. If you are a For-Profit Institution, please contact the corresponding author. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach
    Bioinformatics (IF 4.531) Pub Date : 2019-05-22
    Morais C, Santos M, Lima K, et al.

    MotivationData splitting is a fundamental step for building classification models with spectral data, especially in biomedical applications. This approach is performed following pre-processing and prior to model construction, and consists of dividing the samples into at least training and test sets; herein, the training set is used for model construction and the test set for model validation. Some of the most-used methodologies for data splitting are the random selection (RS) and the Kennard-Stone (KS) algorithms; here, the former works based on a random splitting process and the latter is based on the calculation of the Euclidian distance between the samples. We propose an algorithm called the Morais-Lima-Martin (MLM) algorithm, as an alternative method to improve data splitting in classification models. MLM is a modification of KS algorithm by adding a random-mutation factor. ResultsRS, KS and MLM performance are compared in simulated and six real-world biospectroscopic applications using principal component analysis linear discriminant analysis (PCA-LDA). MLM generated a better predictive performance in comparison with RS and KS algorithms, in particular regarding sensitivity and specificity values. Classification is found to be more well-equilibrated using MLM. RS showed the poorest predictive response, followed by KS which showed good accuracy towards prediction, but relatively unbalanced sensitivities and specificities. These findings demonstrate the potential of this new MLM algorithm as a sample selection method for classification applications in comparison with other regular methods often applied in this type of data. Availability and implementationMLM algorithm is freely available for MATLAB at https://doi.org/10.6084/m9.figshare.7393517.v1.

    更新日期:2020-01-13
  • deepDR: a network-based deep learning approach to in silico drug repositioning
    Bioinformatics (IF 4.531) Pub Date : 2019-05-22
    Zeng X, Zhu S, Liu X, et al.

    MotivationTraditional drug discovery and development are often time-consuming and high risk. Repurposing/repositioning of approved drugs offers a relatively low-cost and high-efficiency approach toward rapid development of efficacious treatments. The emergence of large-scale, heterogeneous biological networks has offered unprecedented opportunities for developing in silico drug repositioning approaches. However, capturing highly non-linear, heterogeneous network structures by most existing approaches for drug repositioning has been challenging. ResultsIn this study, we developed a network-based deep-learning approach, termed deepDR, for in silico drug repurposing by integrating 10 networks: one drug–disease, one drug-side-effect, one drug–target and seven drug–drug networks. Specifically, deepDR learns high-level features of drugs from the heterogeneous networks by a multi-modal deep autoencoder. Then the learned low-dimensional representation of drugs together with clinically reported drug–disease pairs are encoded and decoded collectively via a variational autoencoder to infer candidates for approved drugs for which they were not originally approved. We found that deepDR revealed high performance [the area under receiver operating characteristic curve (AUROC) = 0.908], outperforming conventional network-based or machine learning-based approaches. Importantly, deepDR-predicted drug–disease associations were validated by the ClinicalTrials.gov database (AUROC = 0.826) and we showcased several novel deepDR-predicted approved drugs for Alzheimer’s disease (e.g. risperidone and aripiprazole) and Parkinson’s disease (e.g. methylphenidate and pergolide). Availability and implementationSource code and data can be downloaded from https://github.com/ChengF-Lab/deepDR Supplementary informationSupplementary dataSupplementary data are available online at Bioinformatics.

    更新日期:2020-01-13
  • ReSimNet: drug response similarity prediction using Siamese neural networks
    Bioinformatics (IF 4.531) Pub Date : 2019-05-22
    Jeon M, Park D, Lee J, et al.

    MotivationTraditional drug discovery approaches identify a target for a disease and find a compound that binds to the target. In this approach, structures of compounds are considered as the most important features because it is assumed that similar structures will bind to the same target. Therefore, structural analogs of the drugs that bind to the target are selected as drug candidates. However, even though compounds are not structural analogs, they may achieve the desired response. A new drug discovery method based on drug response, which can complement the structure-based methods, is needed. ResultsWe implemented Siamese neural networks called ReSimNet that take as input two chemical compounds and predicts the CMap score of the two compounds, which we use to measure the transcriptional response similarity of the two compounds. ReSimNet learns the embedding vector of a chemical compound in a transcriptional response space. ReSimNet is trained to minimize the difference between the cosine similarity of the embedding vectors of the two compounds and the CMap score of the two compounds. ReSimNet can find pairs of compounds that are similar in response even though they may have dissimilar structures. In our quantitative evaluation, ReSimNet outperformed the baseline machine learning models. The ReSimNet ensemble model achieves a Pearson correlation of 0.518 and a precision@1% of 0.989. In addition, in the qualitative analysis, we tested ReSimNet on the ZINC15 database and showed that ReSimNet successfully identifies chemical compounds that are relevant to a prototype drug whose mechanism of action is known. Availability and implementationThe source code and the pre-trained weights of ReSimNet are available at https://github.com/dmis-lab/ReSimNet. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • NRStitcher: non-rigid stitching of terapixel-scale volumetric images
    Bioinformatics (IF 4.531) Pub Date : 2019-05-22
    Miettinen A, Oikonomidis I, Bonnin A, et al.

    SummaryIn modern microscopy, the field of view is often increased by obtaining an image mosaic, where multiple sub-images are taken side-by-side and combined post-acquisition. Mosaic imaging often leads to long imaging times that can increase the probability of sample deformation during the acquisition due to, e.g. changes in the environment, damage caused by the radiation used to probe the sample or biologically induced deterioration. Here we propose a technique, based on local phase correlation, to detect the deformations and construct an artifact-free image mosaic from deformed sub-images. The implementation of the method supports distributed computing and can be used to generate teravoxel-size mosaics. We demonstrate its capabilities by assembling a 5.6 teravoxel tomographic image mosaic of microvasculature in whole mouse brain. The method is compared to existing rigid stitching implementations designed for very large datasets, and observed to create artifact-free image mosaics in comparable runtime with the same hardware resources. Availability and implementationThe stitching software and C++/Python source code are available at GitHub (https://github.com/arttumiettinen/pi2) along with an example dataset and user instructions.

    更新日期:2020-01-13
  • CNet: a multi-omics approach to detecting clinically associated, combinatory genomic signatures
    Bioinformatics (IF 4.531) Pub Date : 2019-05-28
    Jia P, Pei G, Zhao Z, et al.

    MotivationGenome-wide multi-omics profiling of complex diseases provides valuable resources and opportunities to discover associations between various measures of genes and diseases. Currently, a pressing challenge is how to effectively detect functional genes associated with or causing phenotypic outcomes. We developed CNet to identify groups of genomic signatures whose combinatory effect is significantly associated with clinical and phenotypical outcomes. ResultsCNet builds on a generalized sequential feedforward method, augmented by a down-sampling bootstrap strategy to reduce random hitchhiking signatures. It further applies a dynamic trimming procedure to remove relatively less informative signatures at every step. CNet can manage heterogeneous genomic signature profiles simultaneously and select the best signature to represent a specific gene. To deal with various forms of clinical and phenotypical measurements, we introduced four models to deal with continuous, categorical and censored data. We tested CNet using drug-response data, multidimensional cancer genomics data and genome-wide association study data for multiple traits. Our results demonstrated that in various scenarios, CNet could effectively identify signatures that are associated with the outcomes. In addition, we applied CNet to identify likely disease-causing chains involving somatic mutations, pathway activities and patient outcomes. With appropriate setting, CNet can be applied in many biological conditions. Availability and implementationCNet can be downloaded at https://github.com/bsml320/CNet. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Assessing concordance among human, in silico predictions and functional assays on genetic variant classification
    Bioinformatics (IF 4.531) Pub Date : 2019-05-29
    Luo J, Zhou T, You X, et al.

    MotivationA variety of in silico tools have been developed and frequently used to aid high-throughput rapid variant classification, but their performances vary, and their ability to classify variants of uncertain significance were not systemically assessed previously due to lack of validation data. This has been changed recently by advances of functional assays, where functional impact of genetic changes can be measured in single-nucleotide resolution using saturation genome editing (SGE) assay. ResultsWe demonstrated the neural network model AIVAR (Artificial Intelligent VARiant classifier) was highly comparable to human experts on multiple verified datasets. Although highly accurate on known variants, AIVAR together with CADD and PhyloP showed non-significant concordance with SGE function scores. Moreover, our results indicated that neural network model trained from functional assay data may not produce accurate prediction on known variants. Availability and implementationAll source code of AIVAR is deposited and freely available at https://github.com/TopGene/AIvar. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Smart computational exploration of stochastic gene regulatory network models using human-in-the-loop semi-supervised learning
    Bioinformatics (IF 4.531) Pub Date : 2019-05-29
    Wrede F, Hellander A, Wren J.

    MotivationDiscrete stochastic models of gene regulatory network models are indispensable tools for biological inquiry since they allow the modeler to predict how molecular interactions give rise to nonlinear system output. Model exploration with the objective of generating qualitative hypotheses about the workings of a pathway is usually the first step in the modeling process. It involves simulating the gene network model under a very large range of conditions, due to the large uncertainty in interactions and kinetic parameters. This makes model exploration highly computational demanding. Furthermore, with no prior information about the model behavior, labor-intensive manual inspection of very large amounts of simulation results becomes necessary. This limits systematic computational exploration to simplistic models. ResultsWe have developed an interactive, smart workflow for model exploration based on semi-supervised learning and human-in-the-loop labeling of data. The workflow lets a modeler rapidly discover ranges of interesting behaviors predicted by the model. Utilizing that similar simulation output is in proximity of each other in a feature space, the modeler can focus on informing the system about what behaviors are more interesting than others by labeling, rather than analyzing simulation results with custom scripts and workflows. This results in a large reduction in time-consuming manual work by the modeler early in a modeling project, which can substantially reduce the time needed to go from an initial model to testable predictions and downstream analysis. Availability and implementationA python-package is available at https://github.com/Wrede/mio.git. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Path2Surv: Pathway/gene set-based survival analysis using multiple kernel learning
    Bioinformatics (IF 4.531) Pub Date : 2019-05-30
    Dereli O, Oğuz C, Gönen M, et al.

    MotivationSurvival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning. ResultsWe extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used). Availability and implementationOur implementations of survival SVM and Path2Surv algorithms in R are available at https://github.com/mehmetgonen/path2surv together with the scripts that replicate the reported experiments. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • deconvSeq: deconvolution of cell mixture distribution in sequencing data
    Bioinformatics (IF 4.531) Pub Date : 2019-05-30
    Du R, Carey V, Weiss S, et al.

    MotivationAlthough single-cell sequencing is becoming more widely available, many tissue samples such as intracranial aneurysms are both fibrous and minute, and therefore not easily dissociated into single cells. To account for the cell type heterogeneity in such tissues therefore requires a computational method. We present a computational deconvolution method, deconvSeq, for sequencing data (RNA and bisulfite) obtained from bulk tissue. This method can also be applied to single-cell RNA sequencing data. ResultsDeconvSeq utilizes a generalized linear model to model effects of tissue type on feature quantification, which is specific to the data structure of the sequencing type used. Estimated model coefficients can then be used to predict the cell type mixture within a tissue. Predicted cell type mixtures were validated against actual cell counts in whole blood samples. Using this method, we obtained a mean correlation of 0.998 (95% CI 0.995–0.999) from the RNA sequencing data of 35 whole blood samples and 0.95 (95% CI 0.91–0.98) from the reduced representation bisulfite sequencing data from 35 whole blood samples. Using symmetric balances to obtain the correlation between compositional parts, we found that the lowest correlation occurred for monocytes for both RNA and bisulfite sequencing. Comparison with other methods of decomposition such as deconRNAseq, CIBERSORT, MuSiC and EpiDISH showed that deconvSeq is able to achieve good prediction using mean correlation with far fewer genes or CpG sites in the signature set. Availability and implementationSoftware implementing deconvSeq is available at https://github.com/rosedu1/deconvSeq. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Full-length de novo viral quasispecies assembly through variation graph construction
    Bioinformatics (IF 4.531) Pub Date : 2019-05-30
    Baaijens J, Van der Roest B, Köster J, et al.

    MotivationViruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent (‘de novo’) approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs. ResultsWe present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers. Availability and implementationVirus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures
    Bioinformatics (IF 4.531) Pub Date : 2019-06-04
    Pagès G, Grudinin S, Valencia A.

    MotivationThanks to the recent advances in structural biology, nowadays 3D structures of various proteins are solved on a routine basis. A large portion of these structures contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. ResultsWe present DeepSymmetry, a versatile method based on 3D convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order and also the corresponding symmetry axes. Detection of symmetry axes is based on learning 6D Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem-repeated proteins and also with symmetrical assemblies. For example, we have discovered about 7800 putative tandem repeat proteins in the PDB. Availability and implementationThe method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • AutoDock CrankPep: combining folding and docking to predict protein–peptide complexes
    Bioinformatics (IF 4.531) Pub Date : 2019-06-04
    Zhang Y, Sanner M, Valencia A.

    MotivationProtein–peptide interactions mediate a wide variety of cellular and biological functions. Methods for predicting these interactions have garnered a lot of interest over the past few years, as witnessed by the rapidly growing number of peptide-based therapeutic molecules currently in clinical trials. The size and flexibility of peptides has shown to be challenging for existing automated docking software programs. ResultsHere we present AutoDock CrankPep or ADCP in short, a novel approach to dock flexible peptides into rigid receptors. ADCP folds a peptide in the potential field created by the protein to predict the protein–peptide complex. We show that it outperforms leading peptide docking methods on two protein–peptide datasets commonly used for benchmarking docking methods: LEADS-PEP and peptiDB, comprised of peptides with up to 15 amino acids in length. Beyond these datasets, ADCP reliably docked a set of protein–peptide complexes containing peptides ranging in lengths from 16 to 20 amino acids. The robust performance of ADCP on these longer peptides enables accurate modeling of peptide-mediated protein–protein interactions and interactions with disordered proteins. Availability and implementationADCP is distributed under the LGPL 2.0 open source license and is available at http://adcp.scripps.edu. The source code is available at https://github.com/ccsb-scripps/ADCP. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
    Bioinformatics (IF 4.531) Pub Date : 2019-06-04
    Zhou J, Lu Q, Gui L, et al.

    MotivationThe prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data. ResultsIn this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained. Availability and implementationThe resource and executable code are freely available at http://hlt.hitsz.edu.cn/MTTFsite/ and http://www.hitsz-hlt.com:8080/MTTFsite/. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
    Bioinformatics (IF 4.531) Pub Date : 2019-06-04
    Zyla J, Marczyk M, Domaszewska T, et al.

    MotivationAnalysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. ResultsWe evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. Availability and implementationtmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • New algorithms for detecting multi-effect and multi-way epistatic interactions
    Bioinformatics (IF 4.531) Pub Date : 2019-06-06
    Ansarifar J, Wang L, Hancock J.

    MotivationEpistasis, which is the phenomenon of genetic interactions, plays a central role in many scientific discoveries. However, due to the combinatorial nature of the problem, it is extremely challenging to decipher the exact combinations of genes that trigger the epistatic effects. Many existing methods only focus on two-way interactions. Some of the most effective methods used machine learning techniques, but many were designed for special case-and-control studies or suffer from overfitting. We propose three new algorithms for multi-effect and multi-way epistases detection, with one guaranteeing global optimality and the other two being local optimization oriented heuristics. ResultsThe computational performance of the proposed heuristic algorithm was compared with several state-of-the-art methods using a yeast dataset. Results suggested that searching for the global optimal solution could be extremely time consuming, but the proposed heuristic algorithm was much more effective and efficient than others at finding a close-to-optimal solution. Moreover, it was able to provide biological insight on the exact configurations of epistases, besides achieving a higher prediction accuracy than the state-of-the-art methods. Availability and implementationData source was publicly available and details are provided in the text.

    更新日期:2020-01-13
  • A scalable method for parameter identification in kinetic models of metabolism using steady-state data
    Bioinformatics (IF 4.531) Pub Date : 2019-06-14
    Srinivasan S, Cluett W, Mahadevan R, et al.

    MotivationIn kinetic models of metabolism, the parameter values determine the dynamic behaviour predicted by these models. Estimating parameters from in vivo experimental data require the parameters to be structurally identifiable, and the data to be informative enough to estimate these parameters. Existing methods to determine the structural identifiability of parameters in kinetic models of metabolism can only be applied to models of small metabolic networks due to their computational complexity. Additionally, a priori experimental design, a necessity to obtain informative data for parameter estimation, also does not account for using steady-state data to estimate parameters in kinetic models. ResultsHere, we present a scalable methodology to structurally identify parameters for each flux in a kinetic model of metabolism based on the availability of steady-state data. In doing so, we also address the issue of determining the number and nature of experiments for generating steady-state data to estimate these parameters. By using a small metabolic network as an example, we show that most parameters in fluxes expressed by mechanistic enzyme kinetic rate laws can be identified using steady-state data, and the steady-state data required for their estimation can be obtained from selective experiments involving both substrate and enzyme level perturbations. The methodology can be used in combination with other identifiability and experimental design algorithms that use dynamic data to determine the most informative experiments requiring the least resources to perform. Availability and implementationhttps://github.com/LMSE/ident. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online

    更新日期:2020-01-13
  • Graphlet Laplacians for topology-function and topology-disease relationships
    Bioinformatics (IF 4.531) Pub Date : 2019-06-13
    Windels S, Malod-Dognin N, Pržulj N, et al.

    MotivationLaplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood. To combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be ‘adjacent’ if they simultaneously touch a given graphlet. ResultsWe utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying Graphlet Laplacian-based spectral embedding, we visually demonstrate that Graphlet Laplacians capture biological functions. This result is quantified by applying Graphlet Laplacian-based spectral clustering, which uncovers clusters enriched in biological functions dependent on the underlying graphlet. We explain the complementarity of biological functions captured by different Graphlet Laplacians by showing that they capture different local topologies. Finally, diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer-related genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks. Availability and implementationhttp://www0.cs.ucl.ac.uk/staff/natasa/graphlet-laplacian/index.html Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data
    Bioinformatics (IF 4.531) Pub Date : 2019-06-14
    Ye C, Speed T, Salim A, et al.

    MotivationDropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. ResultsWe show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementationThe method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network
    Bioinformatics (IF 4.531) Pub Date : 2019-06-14
    Shi Q, Chen W, Huang S, et al.

    MotivationAccurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. ResultsThis article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units’ models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. Availability and implementationThe method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • KinomeX: a web application for predicting kinome-wide polypharmacology effect of small molecules
    Bioinformatics (IF 4.531) Pub Date : 2019-06-22
    Li Z, Li X, Liu X, et al.

    MotivationThe large-scale kinome-wide virtual profiling for small molecules is a daunting task by experimental and traditional in silico drug design approaches. Recent advances in deep learning algorithms have brought about new opportunities in promoting this process. ResultsKinomeX is an online platform to predict kinome-wide polypharmacology effect of small molecules based solely on their chemical structures. The prediction is made by a multi-task deep neural network model trained with over 140 000 bioactivity data points for 391 kinases. Extensive computational and experimental validations have been performed. Overall, KinomeX enables users to create a comprehensive kinome interaction network for designing novel chemical modulators, and is of practical value on exploring the previously less studied or untargeted kinases. Availability and implementationKinomeX is available at: https://kinome.dddc.ac.cn. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • A fully computational and reasonable representation for karyotypes
    Bioinformatics (IF 4.531) Pub Date : 2019-06-22
    Warrender J, Moorman A, Lord P, et al.

    SummaryThe human karyotype has been used as a mechanism for describing and detecting gross abnormalities in the genome for many decades. It is used both for routine diagnostic purposes and for research to further our understanding of the causes of disease. Despite these important applications there has been no rigorous computational representation of the karyotype; rather an informal, string-based representation is used, making it hard to check, organize and search data of this form. In this article, we describe our use of OWL, the Ontology Web Language, to generate a fully computational representation of the karyotype; the development of this ontology represents a significant advance from the traditional bioinformatics use for tagging and navigation and has necessitated the development of a new ontology development environment called Tawny-OWL. Availability and implementationThe Karyotype Ontology and associated Tawny-OWL source code is available on GitHub at https://github.com/jaydchan/tawny-karyotype, under a LGPL License, Version 3.0.

    更新日期:2020-01-13
  • CaPSSA: visual evaluation of cancer biomarker genes for patient stratification and survival analysis using mutation and expression data
    Bioinformatics (IF 4.531) Pub Date : 2019-06-22
    Jang Y, Seo J, Jang I, et al.

    SummaryPredictive biomarkers for patient stratification play critical roles in realizing the paradigm of precision medicine. Molecular characteristics such as somatic mutations and expression signatures represent the primary source of putative biomarker genes for patient stratification. However, evaluation of such candidate biomarkers is still cumbersome and requires multistep procedures especially when using massive public omics data. Here, we present an interactive web application that divides patients from large cohorts (e.g. The Cancer Genome Atlas, TCGA) dynamically into two groups according to the mutation, copy number variation or gene expression of query genes. It further supports users to examine the prognostic value of resulting patient groups based on survival analysis and their association with the clinical features as well as the previously annotated molecular subtypes, facilitated with a rich and interactive visualization. Importantly, we also support custom omics data with clinical information. Availability and implementationCaPSSA (Cancer Patient Stratification and Survival Analysis) runs on a web-browser and is freely available without restrictions at http://www.kobic.re.kr/capssa/. The source code is available on https://github.com/yjjang/capssa. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Interactive exploration of heterogeneous biological networks with Biomine Explorer
    Bioinformatics (IF 4.531) Pub Date : 2019-06-24
    Podpečan V, Ramšak Ž, Gruden K, et al.

    SummaryBiomine Explorer is a web application that enables interactive exploration of large heterogeneous biological networks constructed from selected publicly available biological knowledge sources. It is built on top of Biomine, a system which integrates cross-references from several biological databases into a large heterogeneous probabilistic network. Biomine Explorer offers user-friendly interfaces for search, visualization, exploration and manipulation as well as public and private storage of discovered subnetworks with permanent links suitable for inclusion into scientific publications. A JSON-based web API for network search queries is also available for advanced users. Availability and implementationBiomine Explorer is implemented as a web application, which is publicly available at https://biomine.ijs.si. Registration is not required but registered users can benefit from additional features such as private network repositories.

    更新日期:2020-01-13
  • TCR3d: The T cell receptor structural repertoire database
    Bioinformatics (IF 4.531) Pub Date : 2019-06-26
    Gowthaman R, Pierce B, Wren J.

    SummaryT cell receptors (TCRs) are critical molecules of the adaptive immune system, capable of recognizing diverse antigens, including peptides, lipids and small molecules, and represent a rapidly growing class of therapeutics. Determining the structural and mechanistic basis of TCR targeting of antigens is a major challenge, as each individual has a vast and diverse repertoire of TCRs. Despite shared general recognition modes, diversity in TCR sequence and recognition represents a challenge to predictive modeling and computational techniques being developed to predict antigen specificity and mechanistic basis of TCR targeting. To this end, we have developed the TCR3d database, a resource containing all known TCR structures, with a particular focus on antigen recognition. TCR3d provides key information on antigen binding mode, interface features, loop sequences and germline gene usage. Users can interactively view TCR complex structures, search sequences of interest against known structures and sequences, and download curated datasets of structurally characterized TCR complexes. This database is updated on a weekly basis, and can serve the community as a centralized resource for those studying T cell receptors and their recognition. Availability and implementationThe TCR3d database is available at https://tcr3d.ibbr.umd.edu/.

    更新日期:2020-01-13
  • Hydra image processor: 5-D GPU image analysis library with MATLAB and python wrappers
    Bioinformatics (IF 4.531) Pub Date : 2019-06-26
    Wait E, Winter M, Cohen A, et al.

    SummaryLight microscopes can now capture data in five dimensions at very high frame rates producing terabytes of data per experiment. Five-dimensional data has three spatial dimensions (x, y, z), multiple channels (λ) and time (t). Current tools are prohibitively time consuming and do not efficiently utilize available hardware. The hydra image processor (HIP) is a new library providing hardware-accelerated image processing accessible from interpreted languages including MATLAB and Python. HIP automatically distributes data/computation across system and video RAM allowing hardware-accelerated processing of arbitrarily large images. HIP also partitions compute tasks optimally across multiple GPUs. HIP includes a new kernel renormalization reducing boundary effects associated with widely used padding approaches. Availability and implementationHIP is free and open source software released under the BSD 3-Clause License. Source code and compiled binary files will be maintained on http://www.hydraimageprocessor.com. A comprehensive description of all MATLAB and Python interfaces and user documents are provided. HIP includes GPU-accelerated support for most common image processing operations in 2-D and 3-D and is easily extensible. HIP uses the NVIDIA CUDA interface to access the GPU. CUDA is well supported on Windows and Linux with macOS support in the future.

    更新日期:2020-01-13
  • Comprehensive study of the exposome and omic data using rexposome Bioconductor Packages
    Bioinformatics (IF 4.531) Pub Date : 2019-06-27
    Hernandez-Ferrer C, Wellenius G, Tamayo I, et al.

    SummaryGenomics has dramatically improved our understanding of the molecular origins of certain human diseases. Nonetheless, our health is also influenced by the cumulative impact of exposures experienced across the life course (termed ‘exposome’). The study of the high-dimensional exposome offers a new paradigm for investigating environmental contributions to disease etiology. However, there is a lack of bioinformatics tools for managing, visualizing and analyzing the exposome. The analysis data should include both association with health outcomes and integration with omic layers. We provide a generic framework called rexposome project, developed in the R/Bioconductor architecture that includes object-oriented classes and methods to leverage high-dimensional exposome data in disease association studies including its integration with a variety of high-throughput data types. The usefulness of the package is illustrated by analyzing a real dataset including exposome data, three health outcomes related to respiratory diseases and its integration with the transcriptome and methylome. Availability and implementationrexposome project is available at https://isglobal-brge.github.io/rexposome/. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • JUCHMME: a Java Utility for Class Hidden Markov Models and Extensions for biological sequence analysis
    Bioinformatics (IF 4.531) Pub Date : 2019-06-28
    Tamposis I, Tsirigos K, Theodoropoulou M, et al.

    SummaryJUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. Availability and implementationhttp://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • RTNduals: an R/Bioconductor package for analysis of co-regulation and inference of dual regulons
    Bioinformatics (IF 4.531) Pub Date : 2019-06-28
    Chagas V, Groeneveld C, Oliveira K, et al.

    MotivationTranscription factors (TFs) are key regulators of gene expression, and can activate or repress multiple target genes, forming regulatory units, or regulons. Understanding downstream effects of these regulators includes evaluating how TFs cooperate or compete within regulatory networks. Here we present RTNduals, an R/Bioconductor package that implements a general method for analyzing pairs of regulons. ResultsRTNduals identifies a dual regulon when the number of targets shared between a pair of regulators is statistically significant. The package extends the RTN (Reconstruction of Transcriptional Networks) package, and uses RTN transcriptional networks to identify significant co-regulatory associations between regulons. The Supplementary InformationSupplementary Information reports two case studies for TFs using the METABRIC and TCGA breast cancer cohorts. Availability and implementationRTNduals is written in the R language, and is available from the Bioconductor project at http://bioconductor.org/packages/RTNduals/. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Benchmarking fold detection by DaliLite v.5
    Bioinformatics (IF 4.531) Pub Date : 2019-07-02
    Holm L, Elofsson A.

    MotivationProtein structure comparison plays a fundamental role in understanding the evolutionary relationships between proteins. Here, we release a new version of the DaliLite standalone software. The novelties are hierarchical search of the structure database organized into sequence based clusters, and remote access to our knowledge base of structural neighbors. The detection of fold, superfamily and family level similarities by DaliLite and state-of-the-art competitors was benchmarked against a manually curated structural classification. ResultsDatabase search strategies were evaluated using Fmax with query-specific thresholds. DaliLite and DeepAlign outperformed TM-score based methods at all levels of the benchmark, and DaliLite outperformed DeepAlign at fold level. Hierarchical and knowledge-based searches got close to the performance of systematic pairwise comparison. The knowledge-based search was four times as efficient as the hierarchical search. The knowledge-based search dynamically adjusts the depth of the search, enabling a trade-off between speed and recall. Availability and implementationhttp://ekhidna2.biocenter.helsinki.fi/dali/README.v5.html. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • CytoGPS: a web-enabled karyotype analysis tool for cytogenetics
    Bioinformatics (IF 4.531) Pub Date : 2019-07-02
    Abrams Z, Zhang L, Abruzzo L, et al.

    SummaryKaryotype data are the most common form of genetic data that is regularly used clinically. They are collected as part of the standard of care in many diseases, particularly in pediatric and cancer medicine contexts. Karyotypes are represented in a unique text-based format, with a syntax defined by the International System for human Cytogenetic Nomenclature (ISCN). While human-readable, ISCN is not intrinsically machine-readable. This limitation has prevented the full use of complex karyotype data in discovery science use cases. To enhance the utility and value of karyotype data, we developed a tool named CytoGPS. CytoGPS first parses ISCN karyotypes into a machine-readable format. It then converts the ISCN karyotype into a binary Loss-Gain-Fusion (LGF) model, which represents all cytogenetic abnormalities as combinations of loss, gain, or fusion events, in a format that is analyzable using modern computational methods. Such data is then made available for comprehensive ‘downstream’ analyses that previously were not feasible. Availability and implementationFreely available at http://cytogps.org.

    更新日期:2020-01-13
  • The NEW ESID online database network
    Bioinformatics (IF 4.531) Pub Date : 2019-07-02
    Scheible R, Rusch S, Guzman D, et al.

    SummaryPrimary Immunodeficiencies (PIDs) belong to the group of rare diseases. The European Society for Immunodeficiencies (ESID) operates an international research database application for continuous long-term documentation of patient data. The system is a web application which runs in a standard browser. Therefore, the system is easy to access from any location. Technically, the system is based on Gails backed by MariaDB with high standard security features to comply with the demands of a modern research platform. Availability and implementationThe ESID Online Database is accessible via the official website: https://esid.org/Working-Parties/Registry-Working-Party/ESID-Registry. A demo system is available via: https://cci-esid-reg-demo-app.uniklinik-freiburg.de/EERS with user demouser and password Demo-2019.

    更新日期:2020-01-13
  • ValTrendsDB: bringing Protein Data Bank validation information closer to the user
    Bioinformatics (IF 4.531) Pub Date : 2019-07-02
    Horský V, Bendová V, Toušek D, et al.

    SummaryStructures in PDB tend to contain errors. This is a very serious issue for authors that rely on such potentially problematic data. The community of structural biologists develops validation methods as countermeasures, which are also included in the PDB deposition system. But how are these validation efforts influencing the structure quality of subsequently published data? Which quality aspects are improving, and which remain problematic? We developed ValTrendsDB, a database that provides the results of an extensive exploratory analysis of relationships between quality criteria, size and metadata of biomacromolecules. Key input data are sourced from PDB. The discovered trends are presented via precomputed information-rich plots. ValTrendsDB also supports the visualization of a set of user-defined structures on top of general quality trends. Therefore, ValTrendsDB enables users to see the quality of structures published by selected author, laboratory or journal, discover quality outliers, etc. ValTrendsDB is updated weekly. Availability and implementationFreely accessible at http://ncbr.muni.cz/ValTrendsDB. The web interface was implemented in JavaScript. The database was implemented in C++. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • VeriNA3d: an R package for nucleic acids data mining
    Bioinformatics (IF 4.531) Pub Date : 2019-07-09
    Gallego D, Darré L, Dans P, et al.

    SummaryveriNA3d is an R package for the analysis of nucleic acids structural data, with an emphasis in complex RNA structures. In addition to single-structure analyses, veriNA3d also implements functions to handle whole datasets of mmCIF/PDB structures that could be retrieved from public/local repositories. Our package aims to fill a gap in the data mining of nucleic acids structures to produce flexible and high throughput analysis of structural databases. Availability and implementationhttp://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • HDX-Viewer: interactive 3D visualization of hydrogen–deuterium exchange data
    Bioinformatics (IF 4.531) Pub Date : 2019-07-09
    Bouyssié D, Lesne J, Locard-Paulet M, et al.

    SummaryWith the advent of fully automated sample preparation robots for Hydrogen–Deuterium eXchange coupled to Mass Spectrometry (HDX-MS), this method has become paramount for ligand binding or epitope mapping screening, both in academic research and biopharmaceutical industries. However, bridging the gap between commercial HDX-MS software (for raw data interpretation) and molecular viewers (to map experiment results onto a 3D structure for biological interpretation) remains laborious and requires simple but sometimes limiting coding skills. We solved this bottleneck by developing HDX-Viewer, an open-source web-based application that facilitates and quickens HDX-MS data analysis. This user-friendly application automatically incorporates HDX-MS data from a custom template or commercial HDX-MS software in PDB files, and uploads them to an online 3D molecular viewer, thereby facilitating their visualization and biological interpretation. Availability and implementationThe HDX-Viewer web application is released under the CeCILL (http://www.cecill.info) and GNU LGPL licenses and can be found at https://masstools.ipbs.fr/hdx-viewer. The source code is available at https://github.com/david-bouyssie/hdx-viewer.

    更新日期:2020-01-13
  • MCMCtreeR: functions to prepare MCMCtree analyses and visualize posterior ages on trees
    Bioinformatics (IF 4.531) Pub Date : 2019-07-11
    Puttick M, Schwartz R.

    SummaryThe fossil record is incomplete, so molecular divergence time analysis is a crucial tool in estimating evolutionary timescales. MCMCtree contained in the PAML software provides Bayesian methods to estimate divergence times of genomic-sized sequences. Here, I present MCMCtreeR, a flexible R package to prepare time priors for MCMCtree analysis and plot time-scaled phylogenies. The package provides functions to refine parameters and visualize time-calibrated node prior distributions so that these priors accurately reflect confidence in known, usually fossil, time information. After the parameters have been chosen, the package produces output files ready for MCMCtree analysis. Following analysis, the package has tools to compare prior and posterior calibrated node age distributions and produce plots of the time-scaled phylogenies. The plotting functions allow for the inclusion of age uncertainty on time-scaled phylogenies, including the display of full posterior distributions on nodes. Options also allow for the inclusion of the geological timescale, and these plotting functions are applicable with posterior age estimates from any Bayesian divergence time estimation software. Availability and implementationMCMCtreeR is an R package available on CRAN (https://CRAN.R-project.org/package=MCMCtreeR). MCMCtreeR depends on the R packages ape, sn and stats4.

    更新日期:2020-01-13
  • MEpurity: estimating tumor purity using DNA methylation data
    Bioinformatics (IF 4.531) Pub Date : 2019-07-11
    Liu B, Yang X, Wang T, et al.

    MotivationTumor purity is a fundamental property of each cancer sample and affects downstream investigations. Current tumor purity estimation methods either require matched normal sample or report moderately high tumor purity even on normal samples. It is critical to develop a novel computational approach to estimate tumor purity with sufficient precision based on tumor-only sample. ResultsIn this study, we developed MEpurity, a beta mixture model-based algorithm, to estimate the tumor purity based on tumor-only Illumina Infinium 450k methylation microarray data. We applied MEpurity to both The Cancer Genome Atlas (TCGA) cancer data and cancer cell line data, demonstrating that MEpurity reports low tumor purity on normal samples and comparable results on tumor samples with other state-of-art methods. Availability and implementationMEpurity is a C++ program which is available at https://github.com/xjtu-omics/MEpurity. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • DDT - Drug Discovery Tool: a fast and intuitive graphics user interface for docking and molecular dynamics analysis
    Bioinformatics (IF 4.531) Pub Date : 2019-07-15
    Aureli S, Di Marino D, Raniolo S, et al.

    MotivationThe ligand/protein binding interaction is typically investigated by docking and molecular dynamics (MD) simulations. In particular, docking-based virtual screening (VS) is used to select the best ligands from database of thousands of compounds, while MD calculations assess the energy stability of the ligand/protein binding complexes. Considering the broad use of these techniques, it is of great demand to have one single software that allows a combined and fast analysis of VS and MD results. With this in mind, we have developed the Drug Discovery Tool (DDT) that is an intuitive graphics user interface able to provide structural data and physico-chemical information on the ligand/protein interaction. ResultsDDT is designed as a plugin for the Visual Molecular Dynamics (VMD) software and is able to manage a large number of ligand/protein complexes obtained from AutoDock4 (AD4) docking calculations and MD simulations. DDT delivers four main outcomes: i) ligands ranking based on an energy score; ii) ligand ranking based on a ligands’ conformation cluster analysis; iii) identification of the aminoacids forming the most occurrent interactions with the ligands; iv) plot of the ligands’ center-of-mass coordinates in the Cartesian space. The flexibility of the software allows saving the best ligand/protein complexes using a number of user-defined options. Availability and implementationDDT_site_1 (alternative DDT_site_2); the DDT tutorial movie is available here. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • VariantQC: a visual quality control report for variant evaluation
    Bioinformatics (IF 4.531) Pub Date : 2019-07-16
    Yan M, Ferguson B, Bimber B, et al.

    SummaryLarge scale genomic studies produce millions of sequence variants, generating datasets far too massive for manual inspection. To ensure variant and genotype data are consistent and accurate, it is necessary to evaluate variants prior to downstream analysis using quality control (QC) reports. Variant call format (VCF) files are the standard format for representing variant data; however, generating summary statistics from these files is not always straightforward. While tools to summarize variant data exist, they generally produce simple text file tables, which still require additional processing and interpretation. VariantQC fills this gap as a user friendly, interactive visual QC report that generates and concisely summarizes statistics from VCF files. The report aggregates and summarizes variants by dataset, chromosome, sample and filter type. The VariantQC report is useful for high-level dataset summary, quality control and helps flag outliers. Furthermore, VariantQC operates on VCF files, so it can be easily integrated into many existing variant pipelines. Availability and implementationDISCVRSeq's VariantQC tool is freely available as a Java program, with the compiled JAR and source code available from https://github.com/BimberLab/DISCVRSeq/. Documentation and example reports are available at https://bimberlab.github.io/DISCVRSeq/.

    更新日期:2020-01-13
  • PTM-Logo: a program for generation of sequence logos based on position-specific background amino-acid probabilities
    Bioinformatics (IF 4.531) Pub Date : 2019-07-18
    Saethang T, Hodge K, Yang C, et al.

    SummaryIdentification of the amino-acid motifs in proteins that are targeted for post-translational modifications (PTMs) is of great importance in understanding regulatory networks. Information about targeted motifs can be derived from mass spectrometry data that identify peptides containing specific PTMs such as phosphorylation, ubiquitylation and acetylation. Comparison of input data against a standardized ‘background’ set allows identification of over- and under-represented amino acids surrounding the modified site. Conventionally, calculation of targeted motifs assumes a random background distribution of amino acids surrounding the modified position. However, we show that probabilities of amino acids depend on (i) the type of the modification and (ii) their positions relative to the modified site. Thus, software that identifies such over- and under-represented amino acids should make appropriate adjustments for these effects. Here we present a new program, PTM-Logo, that generates representations of these amino acid preferences (‘logos’) based on position-specific amino-acid probability backgrounds calculated either from user-input data or curated databases. Availability and implementationPTM-Logo is freely available online at http://sysbio.chula.ac.th/PTMLogo/ or https://hpcwebapps.cit.nih.gov/PTMLogo/. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • Genetic association testing using the GENESIS R/Bioconductor package
    Bioinformatics (IF 4.531) Pub Date : 2019-07-22
    Gogarten S, Sofer T, Chen H, et al.

    SummaryThe Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment. Availability and implementationhttps://bioconductor.org/packages/GENESIS; vignettes included. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • sAOP: linking chemical stressors to adverse outcomes pathway networks
    Bioinformatics (IF 4.531) Pub Date : 2019-07-22
    Aguayo-Orozco A, Audouze K, Siggaard T, et al.

    MotivationAdverse outcome pathway (AOP) is a toxicological concept proposed to provide a mechanistic representation of biological perturbation over different layers of biological organization. Although AOPs are by definition chemical-agnostic, many chemical stressors can putatively interfere with one or several AOPs and such information would be relevant for regulatory decision-making. ResultsWith the recent development of AOPs networks aiming to facilitate the identification of interactions among AOPs, we developed a stressor-AOP network (sAOP). Using the ‘cytotoxitiy burst’ (CTB) approach, we mapped bioactive compounds from the ToxCast data to a list of AOPs reported in AOP-Wiki database. With this analysis, a variety of relevant connections between chemicals and AOP components can be identified suggesting multiple effects not observed in the simplified ‘one-biological perturbation to one-adverse outcome’ model. The results may assist in the prioritization of chemicals to assess risk-based evaluations in the context of human health. Availability and implementationsAOP is available at http://saop.cpr.ku.dk Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • diSTruct v1.0: generating biomolecular structures from distance constraints
    Bioinformatics (IF 4.531) Pub Date : 2019-07-22
    Taubert O, Reinartz I, Meyerhenke H, et al.

    SummaryThe distance geometry problem is often encountered in molecular biology and the life sciences at large, as a host of experimental methods produce ambiguous and noisy distance data. In this note, we present diSTruct; an adaptation of the generic MaxEnt-Stress graph drawing algorithm to the domain of biological macromolecules. diSTruct is fast, provides reliable structural models even from incomplete or noisy distance data and integrates access to graph analysis tools. Availability and implementationdiSTruct is written in C++, Cython and Python 3. It is available from https://github.com/KIT-MBS/distruct.git or in the Python package index under the MIT license. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
  • SquiggleKit: a toolkit for manipulating nanopore signal data
    Bioinformatics (IF 4.531) Pub Date : 2019-07-23
    Ferguson J, Smith M, Birol I.

    SummaryThe management of raw nanopore sequencing data poses a challenge that must be overcome to facilitate the creation of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualization and signal processing. Availability and implementationSquiggleKit is cross platform and freely available from GitHub at (https://github.com/Psy-Fer/SquiggleKit). Detailed documentation can be found at (https://psy-fer.github.io/SquiggleKitDocs/). All tools have been designed to operate in python 2.7+, with minimal additional libraries. Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.

    更新日期:2020-01-13
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
2020新春特辑
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
中国科学院大学楚甲祥
上海纽约大学William Glover
中国科学院化学研究所
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug