Abstract
In the post-genomic era of big data in biology, computational approaches to integrate multiple heterogeneous data sets become increasingly important. Despite the availability of large amounts of omics data, the prioritisation of genes relevant for a specific functional pathway based on genetic screening experiments, remains a challenging task. Here, we introduce netprioR, a probabilistic generative model for semi-supervised integrative prioritisation of hit genes. The model integrates multiple network data sets representing gene–gene similarities and prior knowledge about gene functions from the literature with gene-based covariates, such as phenotypes measured in genetic perturbation screens, for example, by RNA interference or CRISPR/Cas9. We evaluate netprioR on simulated data and show that the model outperforms current state-of-the-art methods in many scenarios and is on par otherwise. In an application to real biological data, we integrate 22 network data sets, 1784 prior knowledge class labels and 3840 RNA interference phenotypes in order to prioritise novel regulators of Notch signalling in Drosophila melanogaster. The biological relevance of our predictions is evaluated using in silico and in vivo experiments. An efficient implementation of netprioR is available as an R package at http://bioconductor.org/packages/netprioR.
1 Introduction
Identifying the set of genes, or their protein products, that operate together in order to execute a certain function within the cell or that are relevant for a specific disease has been a challenging task in the post-genomic era for molecular and computational biologists alike. In particular, the prioritisation of prospective candidate genes, so called hits, from preliminary genetic screening experiments for follow up analyses is critical (Moreau and Tranchevent 2012). With the development and widespread application of high-throughput techniques, such as Yeast two-hybrid (Giot et al., 2003; Formstecher et al., 2005) and co-affinity purification (co-AP)/MS (Friedman et al., 2011; Guruharsha et al., 2011) screens for protein–protein interactions, gene expression profiling (Chintapalli, Wang & Dow, 2007; Horan et al., 2008; Graveley et al., 2011), or genetic interaction (Costanzo et al. 2010) screening, we have seen a steady increase of publicly available genome-wide interaction data sets. These data sets are often represented as functional linkage networks, where nodes represent genes and weighted edges represent the degree of evidence for co-functionality (Mostafavi et al. 2008).
Semi-supervised graph-based learning for gene prioritisation. Over the last decade, the availability of vast amounts of network data organised in organism-centric databases (Yu et al. 2008) fuelled the development of new integrative approaches and the adaptation of numerous algorithms from graph theory for network-based gene function prediction (Tsuda, Shin & Schölkopf, 2005; Aerts et al., 2006; Mostafavi et al., 2008; Chen et al., 2009; Kato, Kashima & Sugiyama, 2009). The common underlying principle of these methods is the consistency assumption (Zhu, Ghahramani, and Lafferty 2003), i.e. similar genes are likely to have a similar function. This principle has also been termed guilt by association (Mostafavi et al. 2008). It allows the use of interactions between genes to predict functions for uncharacterised genes by associating them with genes of known function. Guilt-by-association approaches typically take a set of seed genes with known function distributed across the network and score uncharacterised genes according to their proximity to the seed genes. It has been shown that approaches integrating multiple network data sets outperform predictions based on single data sets (Tsuda, Shin & Schölkopf, 2005; Mostafavi et al., 2008), likely because of their noisy, incomplete, and in part complementary nature.
Current state-of-the-art approaches, such as TSS (Tsuda, Shin, and Schölkopf 2005) and GeneMANIA (Mostafavi et al. 2008) integrate two types of input data: functional linkage networks and prior knowledge class labels for seed genes. GeneMANIA is based on the seminal work of Zhu et al. (Zhu, Ghahramani, and Lafferty 2003) and was developed with a focus on fast predictions. Suitable to be used as an online server, GeneMANIA weighs multiple network data sets prior to the learning task using a regularised regression model inspired by kernel-target alignment (Cristianini et al. 2002), sacrificing maximum accuracy for speed. TSS, in contrast, estimates network weights and prioritisation labels simultaneously solving a convex optimisation problem. However, both approaches do not allow for the integration of additional gene-based features not easily transformed into gene–gene similarities or class labels, such as, for instance, perturbation screen phenotypes. To the best of our knowledge, the first guilt-by-association method that implements this feature is LMgraph (Vembu and Morris 2015), a recent extension to GeneMANIA, which combines functional linkage networks and gene-based features using a weighting scheme in conjunction with linear classifiers.
Integrative prioritisation of perturbation screen hits. Perturbation experiments have become a common approach to screen in a genome-wide fashion for candidate genes involved in a certain biological function. Scalable screening technologies include gene knock-downs using RNA interference (RNAi) and, more recently, gene knock-outs using CRISPR/Cas9. Typical workflows for the analysis of screening data have been shown to produce high numbers of false positive and false negative candidate genes when prioritising for follow-up analyses. For RNAi screens, this limitation is often attributed to sequence-based siRNA off-target effects (Rämö et al., 2014; Schmich et al., 2015). In organisms like Drosophila melanogaster, however, where this problem is not likely to be caused by miRNA-like siRNA off-target effects, it has been proposed to integrate network data and prior knowledge about the biological systems for improved hit prioritisation of the screen (Wang, Tu, and Sun 2009).
Following this paradigm we developed netprioR, a probabilistic graphical model inspired by work from Kato and co-workers (Kato, Kashima, and Sugiyama 2009) that integrates multiple network data sets, prior knowledge on gene functions, and phenotype data or any other type of additional covariates (Figure 1). The output of netprioR is a list of prioritised hit genes ranked by predicted labels, as well as estimated weights for the integrated networks reflecting the importance of each data set. We demonstrate on simulated data that prioritisations from netprioR outperform current state-of-the-art methods with respect to different metrics. In addition, we prioritise novel regulators of Notch signalling in Drosophila melanogaster, integrating 22 network data sets, prior knowledge labels for true positive and true negative Notch regulators from the literature, as well as perturbation screen phenotypes from a recent RNAi screen (Saj et al. 2010).
2 Results
We present the probabilistic graphical model of netprioR, provide a comparative evaluation of its performance on simulated data and, integrating multiple data sets for Drosophila melanogaster, we prioritise novel regulators of Notch signalling.
2.1 The netprioR model for integrative hit prioritisation
Let N be the number of genes. Let
The random effects R are modelled by a Gaussian Markov Random Field (GMRF) (Rue and Held 2005) with N × N precision matrix
Without loss of generality, we separate Y into the vector of labelled genes
Expectation Maximisation (EM) algorithm for netprioR. Update rules for the E-step and M-step are derived in section 4.1.
2.2 Comparative performance evaluation on simulated data
Data simulation. In order to evaluate the performance of netprioR, we generated simulated data with known ground truth labels and phenotypes for each gene, as well as gene–gene networks according to the schema depicted in Figure 3. We simulated 1000 genes and split them into equally sized classes with labels hit and non-hit (Figure 3B). We simulated two kinds of gene–gene networks: For low-noise networks, 80% of all interactions, i.e. gene–gene similarities, lie within the same class and are simulated as a scale-free network with preferential attachment proportional to node degrees, whereas for high-noise networks, interactions do not obey the class structure, such that the model is provided with networks of varying degrees of information content during hit prioritisation (Figure 3A). Univariate phenotypes for hits and non-hits were sampled from
Benchmark methods. The performance of netprioR was compared to the competing methods LMgraph (Vembu and Morris 2015; web http://droidb.org) and TSS (Tsuda, Shin, and Schölkopf 2005; web https://github.com/morrislab/lmgraph), as well as the baseline of prioritising hits solely based on the phenotypic measurements (termed phenotype-only). We used default parameters for TSS (
Evaluation of prioritisation performance. Performance was evaluated based on how well each model could correctly separate unlabelled genes into hits and non-hits, measured by the area under the receiver-operator characteristic curve (AUC). We focused on simulated data with 10% of class labels available, as this resembled the setting we found in real data from Drosophila melanogaster (see Section 2.3), and present the results of the comparative analysis in Figure 4. We observed netprioR to outperform or be on par with both competing models, as well as the baseline of phenotype-only prioritisation in the classification task of separating hit from non-hit labelled genes (Figure 4A). As expected, all methods showed an increase in performance with increasing phenotypic effect size and fewer high-noise networks. However, in contrast to netprioR, TSS fell below the phenotype-only baseline when a single high-noise similarity network was added and the phenotype effect size was 1. This effect was even observable for smaller effect sizes, when we increased the number of high-noise networks. The performance of netprioR never dropped below that of the baseline phenotype-only method, indicating that our integrative approach is favourable even if only noisy network data is available. This observation was confirmed in a larger simulation with 20 networks and 10% labelled data (Supplementary Figure 1), more closely matching our real-world application in Drosophila melanogaster. The performance evaluation for 1%, 2%, and 5% of labelled data available for learning was similar, except for the case with 1% labelled data and no high-noise networks, where all methods performed equally well (Supplementary Figure 2). We also investigated the distributions of estimates for the phenotype fixed effect β and found that, as expected, for increasing effect size netprioR inferred increasing values of β (Supplementary Figure 3).
Evaluation of inferred network weights. We compared the inferred weights for low-noise and high-noise data sources for netprioR, TSS, and LMgraph (Figure 4B). netprioR put most of the probability mass on low-noise networks, whereas TSS and LMgraph weighed high- and low-noise networks very similarly. This observation may explain the more steeply derogating performance of TSS for increasing number of high-noise networks, compared to netprioR and emphasizes the strength of netprioR to distinguish high-noise from low-noise network data sources for a specific prioritisation task.
Evaluation of model robustness. Both the estimated prioritisation rankings of unobserved genes and the network weights are very robust with respect to restarts of the EM algorithm. In order to demonstrate this, we performed 100 restarts on the same input data where we each time sampled initial parameters from the prior and fitted the netprioR model. Then, we investigated both the inferred network weights, as well as the inferred labels Y′ and observed that the coefficient of variation (CV) in both predicted variables is very low (network weights: CV <0.01, Y′: median CV = 0.11) as illustrated in Supplementary Figure 5. We also investigated netprioR’s robustness with respect to the hyper parameters τ, σ, a and b and found high median concordance (>0.9) between prioritisation rankings across all hyper parameters indicating netprioR’s ability to recover the same ranking consistently (Supplementary Figure 6).
Evaluation of runtime.netprioR’s robust integration of networks from data sources with varying degree of noise using a probabilistic model comes at the price of increased runtime compared to TSS and LMgraph. The runtime of netprioR is dominated by the convergence of the EM algorithm and depends strongly on the number of networks and available labels. The computational bottleneck in each iteration of the EM algorithm is the computation of the expectation
because typically U ≫ L. Using a conjugate gradient method to solve the linear equation system, the asymptotic runtime is
2.3 Prioritisation of novel regulators of Notch signalling in fly
Integrated datasets. We applied netprioR to Notch signalling in Drosophila melanogaster. Quantitative phenotypes were obtained from an in vitro RNA interference (RNAi) screen for regulators of Notch signalling by Saj and co-workers (Saj et al. 2010). In our application, we did not consider the direction of regulation and defined the phenotype as
Interaction | Source | Conf. | #Nodes | #Edges |
---|---|---|---|---|
Protein-Protein | BIND | 591 | 1210 | |
Protein-Protein | BioGRID | 1095 | 4606 | |
Protein-Protein | Curagen | high | 4395 | 8854 |
Protein-Protein | Curagen | low | 5210 | 29,336 |
Protein-Protein | DPIM | high | 1830 | 5508 |
Protein-Protein | DPIM | low | 4205 | 68,814 |
Protein-Protein | Finley | 2236 | 12,380 | |
Protein-Protein | Flybase | 4127 | 31,626 | |
Protein-Protein | Hybrigenics | 1269 | 3648 | |
Protein-Protein | IntAct | 5163 | 30,224 | |
Protein-Protein | Perrimon | 252 | 762 | |
Co-Expression | modEncode-FlyAtlas | high | 8513 | 31,124 |
Co-Expression | modEncode-FlyAtlas | low | 13,616 | 26,5538 |
Genetic | Flybase | 2889 | 14,288 | |
Interolog | Human | 5688 | 13,4176 | |
Interolog | Worm | 1837 | 6924 | |
Interolog | Yeast | 2810 | 16,0984 | |
RNA-Gene | MinoTar | 2527 | 9410 | |
RNA-Gene | modENCODE | 1162 | 6288 | |
RNA-Gene | TargetScanFly | 11,925 | 21,0536 | |
TF-Gene | modENCODE | 12,313 | 31,3620 | |
TF-Gene | REDfly | 180 | 494 |
Robustness evaluation by sub-sampling. In the absence of a ground truth of Notch regulators and given the fact that we already included most of available prior knowledge in the integrative prioritisation, we evaluated the robustness of netprioR prioritisations with respect to missing phenotypes, as well as the predictive performance of the model by sub-sampling the data. First, we investigated the robustness of the model and constructed ten data sets from measured phenotypes, each containing 80% of all measurements. Integrating the full set of available network data and prior knowledge about labels, we fitted ten models and evaluated the pairwise stability between prioritisation ranks, as well as the robustness of inferred relative weights for each network data set. The average pairwise rank correlation between prioritisations was 0.45 and the average pairwise rank biased overlap (Webber, Moffat, and Zobel 2010) (rbo) of the top of the ranked lists was 0.75 (Figure 5B). This result indicates that top prioritised genes are highly stable with respect to missing phenotypic data, while the overall ranking is only moderately stable.
Network weight estimates were robust across phenotype subsamples with an average standard error of the mean (SEM) of 0.45 (Figure 5C). Networks spanning only a small number of genes (e.g. 180 Redfly transcription factor (TF)–gene interactions and 252 Perrimon protein–protein interactions (PPI)) exhibited, as expected, higher variation. Among the highest weighted network data sets were the PPI networks from Perrimon (13.2% relative weight) and BioGRID (9.5%), as well as the RNA–gene interaction networks from modENCODE (11.4%) and MinoTAR (8.9%). Low confidence (LC) filtered networks (Curagen_LC, DPIM_LC, modENCODE-FlyAtlas_LC) were assigned low weights, notably always lower than the respective high confidence (HC) counterparts (e.g. DPIM PPI 3.4% and 0.9%). This observation indicates that netprioR successfully down weighs noisy network data sets.
Next, we split the available prior knowledge about known true positive (TP) and true negative (TN) genes into ten non-overlapping subsets and evaluated the model performance in a ten-fold cross validation setting. In each iteration, we used 90% of available prior knowledge to predict labels for the remaining 10% and evaluated whether the TPs of the leave-out set were enriched at the top of the ranked prioritisation gene lists (Figure 5D). We computed the running overlap from ranks 1 to 500 of each prioritisation with the left out TPs and the corresponding Benjamini-Hochberg corrected q-values from hypergeometric tests for enrichment. Comparing the performance of netprioR to prioritisating based on phenotype-only, we observed that our integrative model yielded much stronger overlaps (10.1% versus 3.9% at rank 100) and enrichments (
Prioritisation of novel Notch regulators. For the prioritisation of Notch regulators, we fit the full netprioR model using all available network data sets, phenotype data and prior knowledge. The resulting list of the top 50 prioritised novel regulators, i.e. high-scoring unlabelled prioritised by the model, is show in Supplementary Table 1. Gene shrb (rank 1) and AGO1 (rank 3) were previously both deemed hits for potential follow-up analysis based on their strong phenotype in the study by Saj and co-workers (Saj et al. 2010) and are again picked up by netprioR. However, also genes which did not generate a strong phenotype in the original screen were prioritised highly, for example, Fadd which was ranked 35. Literature search revealed that Fadd indeed modulates Notch signalling (Zhang et al. 2014). Furthermore, we compared a set of novel in vivo validated Notch regulators, hand-selected by an expert based on data from the study by Saj et al. (Saj et al. 2010) (Supplementary Table 2), to the prioritisation from netprioR. The in vivo phenotypes were obtained following the protocols described in (Saj et al. 2010). We found that the distribution of ranks of in vivo validated regulators in the netprioR prioritisation were positively skewed towards lower ranks (Figure 6A), indicating high prioritisation of true positive Notch regulators. Looking at the top 500 genes prioritised by netprioR, we observed an overlap of 13.2% with the set of validated regulators, which is a highly significant enrichment yielding a
3 Discussion
We developed a probabilistic, generative model for integrative gene prioritisation and devised an EM algorithm for parameter inference. In contrast to many other methods in the field (Tsuda, Shin & Schölkopf, 2005; Mostafavi et al., 2008; Kato, Kashima & Sugiyama, 2009), netprioR allows for the integration of gene-based covariates, such as phenotypic readouts from perturbation screens, in addition to multiple network data sets and prior knowledge in the form of known true positive and true negative hits. Our comparative study to assess the performance of netprioR has several limitations. Like any simulation study, it is driven by the way the data is simulated. While we have not drawn the data from the netprioR model itself to avoid over–optimism, other simulation settings may result in other performance differences. In addition, we have only used the default parameters of all competing methods, and future, more extensive simulation studies that include systematic parameter optimisation should address this limitation.
Robustness to high-noise networks.netprioR showed superior performance in comparison to competing methods (Tsuda, Shin & Schölkopf, 2005; Mostafavi et al., 2008) in a simulation study (Figure 3 and Figure 4); in particular in cases with increased number of high-noise network data sources. This conclusion is particularly promising, as it permits integrating highly heterogenous data sets of different quality without having to guess which data sets to include a priori for a certain prioritisation task. However, for a fixed number of prior knowledge gene labels and fixed phenotypic effect size the accuracy of the prioritisation, as expected, did show a decreasing trend with increasing number of high-noise networks (panels in Figure 4A). This is a consequence of the fact that netprioR shrinks the weights of high-noise networks, but does not set them to zero. Hence, a possible extension of netprioR, in order to yield a sparsifying estimation of network weights Wk, could be to use a prior distribution on Wk similar to the Laplace distribution as for the Bayesian Lasso (Park and Casella 2008).
Novel regulators of Notch signalling. Integrating 22 different network data sets, 3840 perturbation screen phenotypes and 1784 labels for a priori known true positive and true negative genes, we successfully prioritised novel regulators of Notch signalling in Drosophila melanogaster. While we provided evidence from in vivo experiments for the biological relevance of several prioritised genes, netprioR generated many more directly testable hypotheses for potential regulators. The list of top 50 prioritised genes (Supplementary Table 1) contained numerous ribosomal proteins. Due to strong phenotypic effects (Saj et al. 2010) and true positive hit labels (Guruharsha, Kankel, and Artavanis-Tsakonas 2012) for ribosomal proteins, this result was to be expected. However, the exact role of ribosomal proteins in Notch signalling remains unclear as in vivo perturbation experiments for validation are typically prohibitive due to toxic effects (Saj et al. 2010).
Wide applicability. With increasing amounts of publicly available omics data, netprioR will be a valuable tool for the identification of novel hit genes from perturbation screen hits based on RNAi or CRISPR/Cas9. Its availability as an R/Bioconductor package allows for smooth integration into existing data analysis pipelines. Apart from prioritisation of perturbation screen hits, netprioR could also be applied to disease gene prioritisation tasks, such as, for instance, the prediction of driver genes in cancer (Moreau & Tranchevent, 2012; Leiserson et al., 2015; Dimitrakopoulos et al., 2018). While using similar network data sets, rich sources for a priori known driver genes, such as COSMIC (Forbes et al. 2015), are readily available. As additional covariates, one could integrate mutation profiles or gene expression measurements, for instance from exome sequencing experiments.
Computational complexity. A limitation of netprioR is the high computational cost proportional to m, the sum of the joint number of interactions and number of genes for all networks. In order to reduce m, pre-processing steps, such as constructing the n-nearest-neighbour network from each data source prior to integration, will lead to higher sparsity in Q and consequently to decreased runtime (Mostafavi et al. 2008). Nevertheless, the application to Notch signalling in Drosophila melanogaster, integrating as many as 22 network data sets in under two hours, showed that the current implementation of netprioR is in fact suitable for genome-scale problems.
4 Materials and methods
4.1 Model inference
Let the set of hidden data be
Including the priors for the model parameters and substituting the normal and gamma distributions (Eqs. (1)–(2)), the logarithm of the posterior probability of the parameters is given by
Substituting and removing constant terms,
and after re-arranging
We aim to find the maximum a posteriori (MAP) estimate of parameters in our latent variable model and to predict the missing labels YU. For this purpose, we use the Expectation Maximisation (EM) algorithm, which iteratively maximises the expected hidden log-likelihood of the data with respect to the logarithm of the posterior distribution of
In each iteration of the EM algorithm, 𝒬 is maximised with respect to
M-step. In the M-step, we obtain a new estimate of the parameters by maximising 𝒬 with respect to the previous estimate of the parameters
The new parameter estimates
are obtained by setting the derivative of
to obtain
and similarly, for the fixed effects,
yields
E-step. In the E-step, we compute the expected values of the hidden data
We re-order genes, such that, without loss of generality, labelled genes appear before unlabelled genes and partition H, R, S and Q, such that
Consequently, for the subset of labelled genes
The conditional distribution of RL given HL is constructed by completing the squares for the joint distribution
It can be seen that
The covariance matrix of the Gaussian distribution of J is the inverse of the precision matrix M. It is computed as
We note that the exponent in a general Gaussian distribution
In order to find an expression for the conditional distribution
which is equivalent to the form in Eq. (10). Then, we can derive the mean and covariance of the conditional Gaussian
Similarly, the conditional Gaussian distribution of RU given RL is given by
This result is fundamental in the field of Gaussian Markov Random Fields (GMRFs), where unobserved vertices are typically conditioned on observed vertices (Rue and Held 2005). By the law of total expectation, we can compute the conditional expectation for the random effect of genes with unobserved labels, RU, as
and likewise the conditional expectation of the unobserved labels, YU, as
4.2 Implementation
We iterate the E-step and M-step of the EM algorithm until the difference in the expected hidden log likelihood of two consecutive iterations is smaller than
4.3 Construction and normalisation of gene–gene similarity networks from omics data
While the construction of similarity networks in the case of protein–protein or genetic interactions is straightforward (interaction = similarity), for other data sets, different measures of similarity exist. For co-expression networks, for instance, where each gene is associated with an expression profile over multiple conditions, a common approach is to use thresholded pairwise correlation between genes as a proxy for similarity (Stuart et al. 2003). Interolog data are predicted interactions which are based on experimental evidence for interactions between orthologous genes or proteins in other species. These interactions typically span a multitude of different omics data sets from additional databases. In this study, we used 22 distinct gene–gene similarity networks from the DroID database (Yu et al. 2008) with highly heterogenous numbers of genes and interactions. Therefore, we normalised the network data sets by scaling each interaction by the Frobenius norm of the adjacency matrix of corresponding network. This step allows to compare netprioR’s weight estimates Wk between network data sets.
Author contributions: Conceived and designed the experiments: FS GM NB. Analysed the data: FS. Contributed reagents/materials/analysis tools: JK GM. Wrote the paper: FS NB.
References
Aerts, S., D. Lambrechts, S. Maity, P. Van Loo, B. Coessens, F. De Smet, L.-C. Tranchevent, B. De Moor, P. Marynen, B. Hassan, P. Carmeliet and Y. Moreau (2006): “Gene prioritization through genomic data fusion,” Nat. Biotechnol., 24, 537–544.10.1038/nbt1203Search in Google Scholar PubMed
C. M. Bishop (2006): Pattern recognition and machine learning (information science and statistics), Springer-Verlag New York, Inc., Secaucus, NJ, USA.Search in Google Scholar
Chen, J., E. E. Bardes, B. J. Aronow and A. G. Jegga (2009): “ToppGene Suite for gene list enrichment analysis and candidate gene prioritization,” Nucleic Acids Res., 37, W305–W311.10.1093/nar/gkp427Search in Google Scholar PubMed PubMed Central
Chintapalli, V. R., J. Wang and J. A. T. Dow (2007): “Using FlyAtlas to identify better Drosophila melanogaster models of human disease,” Nat. Genet., 39, 715–720.10.1038/ng2049Search in Google Scholar PubMed
Costanzo, M., A. Baryshnikova, J. Bellay, Y. Kim, E. D. Spear, C. S. Sevier, H. Ding, J. L. Y. Koh, K. Toufighi, S. Mostafavi, J. Prinz, R. P. St Onge, B. VanderSluis, T. Makhnevych, F. J. Vizeacoumar, S. Alizadeh, S. Bahr, R. L. Brost, Y. Chen, M. Cokol, R. Deshpande, Z. Li, Z.-Y. Lin, W. Liang, M. Marback, J. Paw, B.-J. San Luis, E. Shuteriqi, A. H. Y. Tong, N. van Dyk, I. M. Wallace, J. A. Whitney, M. T. Weirauch, G. Zhong, H. Zhu, W. A. Houry, M. Brudno, S. Ragibizadeh, B. Papp, C. Pál, F. P. Roth, G. Giaever, C. Nislow, O. G. Troyanskaya, H. Bussey, G. D. Bader, A.-C. Gingras, Q. D. Morris, P. M. Kim, C. A. Kaiser, C. L. Myers, B. J. Andrews and C. Boone (2010): “The genetic landscape of a cell,” Science, 327, 425–431.10.1126/science.1180823Search in Google Scholar PubMed PubMed Central
Cristianini, N., J. Kandola, A. Elisseeff and J. Shawe-Taylor (2002): “On kernel-target alignment.” In: Advances in Neural Information Processing Systems 14. Berlin, Heidelberg: MIT Press. pp. 367–373.Search in Google Scholar
Dimitrakopoulos, C., S. K. Hindupur, L. Häfliger, J. Behr, H. Montazeri, M. N. Hall and N. Beerenwinkel (2018): “Network-based integration of multi-omics data for prioritizing cancer genes,” Bioinformatics, 34, 2441–2448.10.1093/bioinformatics/bty148Search in Google Scholar PubMed PubMed Central
Forbes, S. A., D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward, C. Y. Kok, M. Jia, T. De, J. W. Teague, M. R. Stratton, U. McDermott and P. J. Campbell (2015): “COSMIC: exploring the world’s knowledge of somatic mutations in human cancer,” Nucleic Acids Res., 43, D805–D811.10.1093/nar/gku1075Search in Google Scholar PubMed PubMed Central
Formstecher, E., S. Aresta, V. Collura, A. Hamburger, A. Meil, A. Trehin, C. Reverdy, V. Betin, S. Maire, C. Brun, B. Jacq, M. Arpin, Y. Bellaiche, S. Bellusci, P. Benaroch, M. Bornens, R. Chanet, P. Chavrier, O. Delattre, V. Doye, R. Fehon, G. Faye, T. Galli, J.-A. Girault, B. Goud, J. de Gunzburg, L. Johannes, M.-P. Junier, V. Mirouse, A. Mukherjee, D. Papadopoulo, F. Perez, A. Plessis, C. Rossé, S. Saule, D. Stoppa-Lyonnet, A. Vincent, M. White, P. Legrain, J. Wojcik, J. Camonis and L. Daviet (2005): “Protein interaction mapping: a Drosophila case study,” Genome Research, 15, 376–384.10.1101/gr.2659105Search in Google Scholar PubMed PubMed Central
Friedman, A. A., G. Tucker, R. Singh, D. Yan, A. Vinayagam, Y. Hu, R. Binari, P. Hong, X. Sun, M. Porto, S. Pacifico, T. Murali, R. L. Finley, J. M. Asara, B. Berger and N. Perrimon (2011): “Proteomic and functional genomic landscape of receptor tyrosine kinase and ras to extracellular signal-regulated kinase signaling,” Sci Signal, 4, rs10–rs10.10.1126/scisignal.2002029Search in Google Scholar PubMed PubMed Central
Giot, L., J. S. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, Y. L. Hao, C. E. Ooi, B. Godwin, E. Vitols, G. Vijayadamodar, P. Pochart, H. Machineni, M. Welsh, Y. Kong, B. Zerhusen, R. Malcolm, Z. Varrone, A. Collis, M. Minto, S. Burgess, L. McDaniel, E. Stimpson, F. Spriggs, J. Williams, K. Neurath, N. Ioime, M. Agee, E. Voss, K. Furtak, R. Renzulli, N. Aanensen, S. Carrolla, E. Bickelhaupt, Y. Lazovatsky, A. DaSilva, J. Zhong, C. A. Stanyon, R. L. Finley, K. P. White, M. Braverman, T. Jarvie, S. Gold, M. Leach, J. Knight, R. A. Shimkets, M. P. McKenna, J. Chant and J. M. Rothberg (2003): “A protein interaction map of Drosophila melanogaster,” Science, 302, 1727–1736.10.1126/science.1090289Search in Google Scholar PubMed
Graveley, B. R., A. N. Brooks, J. W. Carlson, M. O. Duff, J. M. Landolin, L. Yang, C. G. Artieri, M. J. van Baren, N. Boley, B. W. Booth, J. B. Brown, L. Cherbas, C. A. Davis, A. Dobin, R. Li, W. Lin, J. H. Malone, N. R. Mattiuzzo, D. Miller, D. Sturgill, B. B. Tuch, C. Zaleski, D. Zhang, M. Blanchette, S. Dudoit, B. Eads, R. E. Green, A. Hammonds, L. Jiang, P. Kapranov, L. Langton, N. Perrimon, J. E. Sandler, K. H. Wan, A. Willingham, Y. Zhang, Y. Zou, J. Andrews, P. J. Bickel, S. E. Brenner, M. R. Brent, P. Cherbas, T. R. Gingeras, R. A. Hoskins, T. C. Kaufman, B. Oliver and S. E. Celniker (2011): “The developmental transcriptome of Drosophila melanogaster,” Nature, 471, 473–479.10.1038/nature09715Search in Google Scholar PubMed PubMed Central
Guruharsha, K. G., J.-F. Rual, B. Zhai, J. Mintseris, P. Vaidya, N. Vaidya, C. Beekman, C. Wong, D. Y. Rhee, O. Cenaj, E. McKillip, S. Shah, M. Stapleton, K. H. Wan, C. Yu, B. Parsa, J. W. Carlson, X. Chen, B. Kapadia, K. VijayRaghavan, S. P. Gygi, S. E. Celniker, R. A. Obar and S. Artavanis-Tsakonas (2011): “A protein complex network of Drosophila melanogaster,” Cell, 147, 690–703.10.1016/j.cell.2011.08.047Search in Google Scholar PubMed PubMed Central
Guruharsha, K. G., M. W. Kankel and S. Artavanis-Tsakonas (2012): “The Notch signalling system: recent insights into the complexity of a conserved pathway,” Nat. Rev. Genet., 13, 654–666.10.1038/nrg3272Search in Google Scholar PubMed PubMed Central
K. Horan, C. Jang, J. Bailey-Serres, R. Mittler, C. Shelton, J. F. Harper, Zhu, J.-K., J. C. Cushman, M. Gollery and T. Girke (2008): “Annotating genes of known and unknown function by large-scale coexpression analysis,” Plant Physiol., 147, 41–57.10.1104/pp.108.117366Search in Google Scholar PubMed PubMed Central
Kato, T., H. Kashima and M. Sugiyama (2009): “Robust label propagation on multiple networks,” IEEE Trans. Neural Netw., 20, 35–44.10.1109/TNN.2008.2003354Search in Google Scholar PubMed
Leiserson, M. D. M., F. Vandin, H.-T. Wu, J. R. Dobson, J. V. Eldridge, J. L. Thomas, A. Papoutsaki, Y. Kim, B. Niu, M. McLellan, M. S. Lawrence, A. Gonzalez-Perez, D. Tamborero, Y. Cheng, G. A. Ryslik, N. Lopez-Bigas, G. Getz, L. Ding and B. J. Raphael (2015): “Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes,” Nat. Genet., 47, 106–114.10.1038/ng.3168Search in Google Scholar PubMed PubMed Central
Moreau, Y. and L.-C. Tranchevent (2012): “Computational tools for prioritizing candidate genes: boosting disease gene discovery,” Nat. Rev. Genet., 13, 523–536.10.1038/nrg3253Search in Google Scholar PubMed
Mostafavi, S., D. Ray, D. Warde-Farley, C. Grouios and Q. Morris (2008): “GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function,” Genome Biology, 9(Suppl 1), S4.10.1186/gb-2008-9-s1-s4Search in Google Scholar PubMed PubMed Central
Park, T. and G. Casella (2008): “The bayesian lasso,” J. Am. Stat. Assoc., 103, 681–686.10.1198/016214508000000337Search in Google Scholar
Rämö, P., A. Drewek, C. Arrieumerlou, N. Beerenwinkel, H. Ben-Tekaya, B. Cardel, A. Casanova, R. Conde-Alvarez, P. Cossart, G. Csúcs, S. Eicher, M. Emmenlauer, U. Greber, W.-D. Hardt, A. Helenius, C. Kasper, A. Kaufmann, S. Kreibich, A. Kühbacher, P. Kunszt, S. H. Low, J. Mercer, D. Mudrak, S. Muntwiler, L. Pelkmans, J. Pizarro-Cerdá, M. Podvinec, E. Pujadas, B. Rinn, V. Rouilly, F. Schmich, J. Siebourg-Polster, B. Snijder, M. Stebler, G. Studer, E. Szczurek, M. Truttmann, C. von Mering, A. Vonderheit, A. Yakimovich, P. Bühlmann and C. Dehio (2014): “Simultaneous analysis of large-scale RNAi screens for pathogen entry,” BMC Genomics, 15, 1162.10.1186/1471-2164-15-1162Search in Google Scholar PubMed PubMed Central
Rue, H. and L. Held (2005): Gaussian markov random fields: theory and application. Boca Raton: Chapman & Hall/CRC.10.1201/9780203492024Search in Google Scholar
Saj, A., Z. Arziman, D. Stempfle, W. van Belle, U. Sauder, T. Horn, M. Dürrenberger, R. Paro, M. Boutros and G. Merdes (2010): “A combined ex vivo and in vivo RNAi screen for notch regulators in Drosophila reveals an extensive notch interaction network,” Dev. Cell, 18, 862–876.10.1016/j.devcel.2010.03.013Search in Google Scholar PubMed
Schmich, F., E. Szczurek, S. Kreibich, S. Dilling, D. Andritschke, A. Casanova, S. H. Low, S. Eicher, S. Muntwiler, M. Emmenlauer, P. Rämö, R. Conde-Alvarez, C. von Mering, W.-D. Hardt, C. Dehio and N. Beerenwinkel (2015): “gespeR: a statistical model for deconvoluting off-target-confounded RNA interference screens,” Genome Biology, 16, 220.10.1186/s13059-015-0783-1Search in Google Scholar PubMed PubMed Central
Stuart, J. M., E. Segal, D. Koller and S. K. Kim (2003): “A gene-coexpression network for global discovery of conserved genetic modules,” Science, 302, 249–255.10.1126/science.1087447Search in Google Scholar PubMed
Tsuda, K., H. Shin and B. Schölkopf (2005): “Fast protein classification with multiple networks,” Bioinformatics, 21(Suppl 2), ii59–65.10.1093/bioinformatics/bti1110Search in Google Scholar PubMed
Vembu, S. and Q. Morris (2015): “An Efficient Algorithm to Integrate Network and Attribute Data for Gene Function Prediction,” In: Proceedings of the Pacific Symposium on Biocomputing. pp. 388–399.Search in Google Scholar
Wang, L., Z. Tu and F. Sun (2009): “A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila,” BMC Genomics, 10, 220.10.1186/1471-2164-10-220Search in Google Scholar PubMed PubMed Central
Webber, W., A. Moffat and J. Zobel (2010): “A similarity measure for indefinite rankings,” ACM TOIS, 28. DOI: 10.1145/1852102.1852106.10.1145/1852102.1852106Search in Google Scholar
Yu, J., S. Pacifico, G. Liu and R. L. Finley (2008): “DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions,” BMC Genomics, 9, 461.10.1186/1471-2164-9-461Search in Google Scholar PubMed PubMed Central
Zhang, X., X. Dong, H. Wang, J. Li, B. Yang, J. Zhang and Z.-C. Hua (2014): “FADD regulates thymocyte development at the β-selection checkpoint by modulating Notch signaling,” Cell Death Dis, 5, e1273.10.1038/cddis.2014.198Search in Google Scholar PubMed PubMed Central
Zhu, X., Z. Ghahramani and J. Lafferty (2003): “Semi-supervised learning using gaussian fields and harmonic functions. ICML, 912–919.Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0033).
©2019 Walter de Gruyter GmbH, Berlin/Boston