Integrative analyses in omics data: Machine learning perspective

mibe000244 10.3205/mibe000244 urn:nbn:de:0183-mibe0002442 Review Article Integrative analyses in omics data: Machine learning perspective Integrative Analysen von Omics-Daten: Perspektive des maschinellen Lernens Unlu Yazici Unlu Yazici Miray M

Department of Bioengineering, Faculty of Engineering, Abdullah Gül University, Kayseri, Turkey

author Bakir-Gungor Bakir-Gungor Burcu B

Department of Computer Engineering, Faculty of Engineering, Abdullah Gül University, Kayseri, Turkey

author Yousef Yousef Malik M

Zefat Academic College, Jerusalem St 11, Zefat, 1320611, IsraelDepartment of Information Systems, Zefat Academic College, Zefat, IsraelGalilee Digital Health Research Center, Zefat Academic College, Zefat, Israel

malik.yousef@gmail.com author German Medical Science GMS Publishing House

Düsseldorf

610 Medical Omics 20230704 engl This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). 1860-9171 19 GMS Medizinische Informatik, Biometrie und Epidemiologie GMS Med Inform Biom Epidemiol 05 Die Entwicklungen im Bereich der Hochdurchsatztechnologien haben den Erwerb einer immensen Menge an Wissen auf der Multi-Omics-Ebene ermöglicht. In Anbetracht komplexer Krankheiten, die von mehreren Faktoren beeinflusst werden, reichen einzelne Omics-Datensätze möglicherweise nicht aus, um die molekularen Mechanismen heterogener Krankheiten aufzudecken. Ein umfassender und systematischer Überblick ist notwendig, um Krankheitsmerkmale ausreichend zu erklären. Die Verwendung von Multi-Omics-Datensätzen hat zur Entwicklung einer Vielzahl von Werkzeugen und Plattformen geführt. Modelle des maschinellen Lernens werden in einer Vielzahl von Instrumenten eingesetzt, um die Komplexität von Krankheiten zu erfassen und neue biomolekulare Signaturen und potenzielle Marker zu identifizieren. Die grundlegenden Aspekte dieser Ansätze beruhen auf dem Training der Modelle, um Vorhersagen und Klassifizierungen der gegebenen Daten vorzunehmen. In dieser Übersichtsarbeit beschreiben wir die aktuellen, auf maschinellem Lernen basierenden Ansätze und die verfügbaren Implementierungen. Die Herausforderungen bei der Aufklärung der Mechanismen von Krankheitsentstehung und Krankheitsverlauf und zukünftige Entwicklungen im Bereich der Medizin werden erörtert. Auch die Bedeutung der biologischen Interpretation von Modellergebnissen mit entsprechendem biologischen Wissen wird in dieser Übersichtsarbeit angesprochen. Developments in the high throughput technologies have enabled the production of an immense amount of knowledge at the multi-omics level. Considering complex diseases which are affected by multi-factors, single omics datasets might not be sufficient to unveil the molecular mechanisms of heterogeneous diseases. Providing a comprehensive and systematic overview to explain disease hallmarks in significant depth is critical. Utilizing multi-omics datasets has led to the development of a variety of tools and platforms. Machine learning models are utilized in a wide variety of tools to tackle the complexity of disorders and to identify new biomolecular signatures and potential markers. Underlying aspects of these approaches are based on training the models for making predictions and classification of the given data. In this review, we describe current machine learning-based approaches and available implementations. Challenges in the enlightenment of disease mechanisms of onset and progression and future development of the field of medicine will be discussed. The prominence of biological interpretation of model output with corresponding biological knowledge will be also covered in this review. Overview of omics data typesCollective characterization and quantification of biomolecules with advanced technologies have yielded the study of fields such as genome, transcriptome, epigenome, metabolome, etc. Initiation of omics studies with genomics lead to early diagnosis and target treatments via understanding the mechanisms of diseases. Genomics driven genetic variations on phenotype are analyzed with different methods and databases, such as the genome-wide association study (GWAS) and Gene Expression Omnibus (GEO) . Transcriptomics data publicly available in GEO and Sequence Read Archive (SRA) enable the identification of novel transcripts and expression value of transcripts in RNA level studies. The PRoteomics IDEntifications (PRIDE) and ProteomicsDB profile mass spectrometry-based proteome changes. Furthermore, whole exome sequencing (WES) studies focus on protein coding regions of genes to identify genetic variants affecting the mechanism of diseases. The Genome Aggregation Database (gnomAD) provides whole genome and exome sequencing data from large-scale sequencing projects. Interactome provides molecular interaction wiring in cells. The interactome databases such as IntAct , BioGrid , and STRING are utilized to understand the dynamic interplay of molecules in developing novel therapeutic strategies. For instance, cross-link with neighboring proteins can lead to a basis for their role in signaling pathways and identification of molecular targets of specific drugs.Genetic changes rewire the cellular networks in complex diseases. Multi-omics data obtained from the same set of samples can enlighten the mechanisms underlying the disease heterogeneity via detecting more coherent signatures and relevant interactions through flow of genetic information. The publicly available repositories The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/), International Cancer Genomics Consortium (ICGC, https://icgc.org/), and Cancer Cell Line Encyclopedia (CCLE, https://portals.broadinstitute.org/ccle) provide several types of multi-omics data in cancer. While the Therapeutically Applicable Research To Generate Effective Treatments (TARGET, https://ocg.cancer.gov/programs/target) database includes pediatric cancer-related omics data at the biologic</PlainText></TextGroup>al level, the datasets of human, model and non-model organisms can be accessed from the rep<TextGroup><PlainText>ositor</PlainText></TextGroup>y Omics Discovery Index (OmicsDI, <Hyperlink href="https://www.omicsdi.org/">https://www.omicsdi.org/</Hyperlink>).</Pgraph></TextBlock> <TextBlock linked="yes" name="Machine learning perspective in omics data analysis"> <MainHeadline>Machine learning perspective in omics data analysis</MainHeadline><Pgraph>The analyses of pattern recognition and making predictions based on high dimensional omics data has enabled the machine learning models to capture the patterns accurately compared to traditional mathematical models. Supervised learning models are trained with labeled data and the evaluated model is used for prediction. Unsupervised learning models identify hidden patterns in unlabeled data.</Pgraph><Pgraph>Unsupervised learning mostly covers dimension reduction techniques and association analyses. Clustering-based unsupervised integration method is used to identify disease and molecular subtypes and grouping of features. The Similarity Network Fusion approach (unsupervised) creates sample-sample similarity matrix for each omics data type and merges the matrices <TextLink reference="10"></TextLink>. Network based unsupervised integration approaches are based on statistical models and functional interactions of features. In the constructed network, edges represent the predicted relationships of different signatures (nodes) such as genes, CpGs and proteins <TextLink reference="11"></TextLink>.</Pgraph><Pgraph>Several approaches have been collected under the umbrella term of supervised learning. Support vector machine algorithms (SVM) classify the features by finding hyper-planes. Meta-analytic SVM allows multiple omics data analysis and potential biomarker detection for integrating multiple omics data <TextLink reference="12"></TextLink>. The k-Nearest Neighbor (kNN) algorithm based on distance-based method uses a feature similarity approach to calculate the distance from all features around the unknown data to predict the class of it. The kNN Graph (kNN-G), which is widely used in single cell analysis, detects communities or clusters of related cells based on, for example, gene expression data and RNA-Seq profiles <TextLink reference="13"></TextLink>. Random forest algorithm based on building random decision trees uses bootstrap aggregation method for class prediction in classification tasks. Random forest with the components recursive feature elimination and permutation-based feature selection providing significance label for the selected feature is used in omics data analysis for the diagnosis of the diseases <TextLink reference="14"></TextLink>. </Pgraph><Pgraph>Most of the feature selection methods in ML perform omics data analyses with statistics and computer science, called as fully data driven approaches, disregarding biological domain knowledge. The domain knowledge such as disease-gene, drug-disease associations, and protein-protein interactions is entitled as pre-existing biological knowledge. In the following part, we will discuss the studies including pre-existing, fully data driven or a combination of them. </Pgraph><SubHeadline>Integrative approach by utilizing pre-existing biological knowledge</SubHeadline><Pgraph>Biological systems are massively complex and heterogeneous in nature. To understand the processes holistically in complex organisms, the interpretation of biological data generated in massive volume via high throughput technologies is imperative. Integrating omics data types and utilizing the flow of information among them have facilitated researchers to decipher the field of medicine and biology. Constructing a framework on multi-dimensional biological data integration such as clustering and machine learning approaches can provide a comprehensive understanding of the biological mechanisms under study.</Pgraph><Pgraph>A cost-related limited number of samples for omics data generation is a challenge. The phenomenon, curse of dimensionality, reported by Bellman et al. defines this kind of obstacle with data in high-dimensional spaces. Dimension of the gene or biological features with functional metrics are crucial for prediction, optimization problems and performance results in machine learning (ML).</Pgraph><Pgraph>The assessment of gene expression to unveil the relationship between genotype and phenotype has led scientists to advance in novel methodologies such as DNA microar<TextGroup><PlainText>r</PlainText></TextGroup>ay and RNA-seq. Previous conventional studies include standard ML and clustering procedures <TextLink reference="15"></TextLink>, <TextLink reference="16"></TextLink>, <TextLink reference="17"></TextLink> for biomarker discovery <TextLink reference="18"></TextLink>. The immense amount of biological knowledge has deflected the course of action of studies from pure data-oriented to integration-based approaches. The advanced tools, platforms, and software developed by bioinformaticians have incorporated biological knowledge into the knowledge base and improved the performance analysis of biological processes. Some of the organized biological knowledge in databases are miRTarBase <TextLink reference="19"></TextLink> identifying miRNA-target interactions, Gene Ontology (GO) <TextLink reference="20"></TextLink> describing the attributes of genes, KEGG pathways providing molecular interaction networks <TextLink reference="21"></TextLink>, and DisGeNET <TextLink reference="22"></TextLink> targeting disease-gene associations.</Pgraph><Pgraph>Conventional feature selection algorithms typically performed in gene expression analysis rely on statistical and machine learning models. Improving the models by integrating the biological knowledge can contribute to better performance. Current approaches used in gene-expression analyses are reviewed in <TextLink reference="23"></TextLink>.The authors surveyed the clustering methods with several distance measures such as Euclidean and Manhattan distance, Kendall, and Pearson correlations. Biological background information from external sources <TextLink reference="24"></TextLink> and statistics provided to integrative gene selection approaches are used in the identification of informative genes. In this context, the conducted studies aim to improve the classification performance, and biological relevance of significant genes. Gene Ontology (GO), one of the extensively used external sources, exploits the domain knowledge and yields computable gene knowledge by defining classes of gene functions. The Gene Ontology Consortium summarized the studies incorporating GO into statistical analysis to reveal GO terms associated with given genes <TextLink reference="25"></TextLink>. Liang et al. presented the enrichment analysis of differentially expressed genes by capturing significant KEGG pathways with a modified Fisher’s exact test <TextLink reference="12"></TextLink>. Another study conducted by Wang et al. introduced the over-representation analysis of circRNAs via DisGeNET external biological database to find their potential molecular functions in neurodegenerative diseases <TextLink reference="26"></TextLink>. CrowdGO provided an improvement in gene functional annotation with model-informed methods. Calculated GO term-semantic similarities are evaluated with a machine learning model to enhance the performance of consensus results <TextLink reference="27"></TextLink>. Another study performed by Kumar et al. combined GO and KEGG terms for comprehensive enrichment analysis and visualized them with network topology-based approaches <TextLink reference="28"></TextLink>. Contrary to the single knowledge base approach, Perscheid et al. introduced a novel method that integrates knowledge from curated databases and conventional gene selection approaches. The presented framework has achieved better classification accuracy <TextLink reference="29"></TextLink>.</Pgraph><Pgraph>Yousef and others recently introduced machine learning approaches based on grouping, scoring, and modeling (G-S-M) for gene expression analysis with biological information. They proposed various tools that follow this approach. For instance, maTE <TextLink reference="30"></TextLink> adopts a biological grouping approach via integrating microRNAs (miRNAs). The GEO datasets and miRTarBase are given as input and RF model is trained with group information to model miRNA and mRNA regulations. The cogNet <TextLink reference="31"></TextLink> serves as ranking active subnetworks and suggesting significant pathways by using KEGG pathways biological information. Another proposed tool, miRcorrNet <TextLink reference="32"></TextLink>, identifies miRNA-mRNAs regulatory modules via correlation analysis of expression profiles. The miRNA and mRNA profiles of target disease are retrieved from TCGA and fully data driven biological domain analysis is performed via G-S-M approach. The tool miRModuleNet <TextLink reference="33"></TextLink> similar to miRco<TextGroup><PlainText>rr</PlainText></TextGroup>Net also detects significant miRNA-mRNA groups by considering two omics datasets. The relationships of pairs are calculated by Mutual Information which differs from the previous tool using correlation function. The significant groups ranking is not only based on the gene list but also miRNA information. Another G-S-M model-based study by Yousef et al. <TextLink reference="34"></TextLink> integrates Gene Ontology information for grouping the genes. A novel approach PriPath <TextLink reference="35"></TextLink> utilizes ranking and grouping functions to analyze gene expression with KEGG pathways. GediNET <TextLink reference="36"></TextLink> incorporates gene information associated with diseases like cancer to identify significant groups. For identification of disease-disease associations, “disease is represented by a list of genes” strategy is used. The RF classifier is trained, and performance results are evaluated with Area Under Curve (AUC). The approaches like GediNET enable the improvement of disease diagnosis, prognosis, and treatment.</Pgraph><Pgraph>The idea of considering groups or clusters of genes instead of individual genes in studies was pioneered by Yousef et al., followed by more studies to improve the tools <TextLink reference="37"></TextLink>, <TextLink reference="38"></TextLink>. Similarly, Support Vector Machine with Recursive Network Elimination (SVM-RNE) <TextLink reference="39"></TextLink> method integrates gene network information by using the G-S-M model. Table 1 <ImgLink imgNo="1" imgType="table"/> gives summaries of the main tools with type of the method, disease in case study, and biological knowledge details in this review.</Pgraph><SubHeadline>Integrative approaches for multi-omics data</SubHeadline><Pgraph>Understanding the functioning of biological systems with heterogeneous characteristics has directed scientists to deeper analyses of omics data. As illustrated in <TextGroup><PlainText>Figure 1 </PlainText></TextGroup><ImgLink imgNo="1" imgType="figure"/>, a wealth of data repositories providing valuable building blocks and biological samples take integration approaches a step forward. Tools that adopt omics data features such as genomics, epigenomics, and metabolomics are required for the interpretation of affecting mechanisms of diseases in terms of genetic mutations, metabolites, and pathways etc. Advanced tools provisioning multi-omics data analysis can enable users to capture possible key factors associated with the phenotype of interest <TextLink reference="40"></TextLink>. </Pgraph><Pgraph>Deciphering these markers and their interplay can help to dissect the mechanism underlying disease onset and progression. Recently, proposed tools integrating multi-omics data are basically categorized as Bayesian, network, similarity, multivariate, supervised, semi-supervised, or unsupervised based approaches <TextLink reference="41"></TextLink>. </Pgraph><Pgraph>One of these tools, MiBiOmics, enables users to identify associations between up to 3 omics datasets. Network-based approach depending on weighted gene correlation network analysis is performed to explore molecular signatures and associations across layers <TextLink reference="42"></TextLink>. STATegRa tools developed by Planell et al. combined feature identification with an unsupervised machine learning approach and detected enriched pathways with exploratory analysis <TextLink reference="43"></TextLink>. The designed tool combines Principal Component Analysis, non-parametric combination for linking the features of different omics data with exploratory analysis. Mergeomics 2.0 presented by Ding et al. incorporates Meta marker set enrichment analysis for detection of omics-related disease pathways and networks through the integration of selected biomarkers. Subnetworks including gene sets associated with the interested disease are captured with key driver function and fed to Pharm<TextGroup><PlainText>Om</PlainText></TextGroup>ics repository for drug repositioning analysis <TextLink reference="44"></TextLink>. </Pgraph><Pgraph>mixOmics, a versatile multivariate method, enables the analysis of single and integrative omics data with modeling features as a set approach. The tool supports preprocessed multi-omics data from different platforms. The multivariate method is applied for the identification of molecular signatures and the distinction of disease subtypes via un/supervised analysis <TextLink reference="45"></TextLink>. The frameworks DIABLO and MINT are developed for integration datasets. While DIABLO enables integration of same samples from different omics platforms, MINT integrates independent datasets.</Pgraph><Pgraph>Another machine learning tool, miRcorrNet, developed by Yousef et al. integrates miRNAs and gene expression profiles via a supervised machine learning approach. Highly scored groups, including target gene lists constructed with grouping functions, are utilized for the identification of disease-related biosignatures <TextLink reference="32"></TextLink>. The following tool, miRModuleNet, integrates a pair of omics data to get more insight into the disease process. Generated hierarchical group list, each of the groups including miRNA and associated genes, with Mutual Information is introduced into machine learning model and intergroup relationships of the groups evaluated for deciphering significant therapeutic targets affecting disease progression <TextLink reference="33"></TextLink>.</Pgraph><Pgraph>DeepProg, semi-supervised hybrid ML tool, models patient survival to predict new patient statuses by combining deep learning and ML approaches. Multi-omics data matrices and survival information is given as input and cluster labels obtained by GaussianMixture function are used to build models via SVM to predict the subtypes of target disease <TextLink reference="46"></TextLink>.</Pgraph></TextBlock> <TextBlock linked="yes" name="Conclusion"> <MainHeadline>Conclusion</MainHeadline><Pgraph>In this review, we have surveyed several computational tools that tackle the integration of biological domain knowledge into the machine learning algorithm while in the second part the multi-omics computational tools were surveyed to open up new prospects for readers in the field. Multiple layer analysis of biological information leads to deeper understanding of biological systems. Strategies regarding the combination of fully data-driven and pre-existing biological knowledge in selecting features can improve the classification performance and potential marker selection. The tools using pre-existing knowledge in multi-omics integration may pave the way for a better comprehension in complex biological systems. Thus, extracting the biological knowledge from multi-omics datasets can be utilized to develop a novel integrative tool addressing multi-omics applications and study complex biological processes holistically.</Pgraph></TextBlock> <TextBlock linked="yes" name="Notes"> <MainHeadline>Notes</MainHeadline><SubHeadline>Competing interests</SubHeadline><Pgraph>The authors declare that they have no competing interests.</Pgraph></TextBlock> <References linked="yes"> <Reference refNo="1"> <RefAuthor>Uffelmann E</RefAuthor> <RefAuthor>Huang QQ</RefAuthor> <RefAuthor>Munung NS</RefAuthor> <RefAuthor>de Vries J</RefAuthor> <RefAuthor>Okada Y</RefAuthor> <RefAuthor>Martin AR</RefAuthor> <RefAuthor>Martin HC</RefAuthor> <RefAuthor>Lappalainen T</RefAuthor> <RefAuthor>Posthuma D</RefAuthor> <RefTitle>Genome-wide association studies</RefTitle> <RefYear>2021</RefYear> <RefJournal>Nat Rev Methods Primer</RefJournal> <RefPage>1-21</RefPage> <RefTotal>Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, Martin HC, Lappalainen T, Posthuma D. Genome-wide association studies. Nat Rev Methods Primer. 2021;1(59):1-21. DOI: 10.1038/s43586-021-00056-9</RefTotal> <RefLink>https://doi.org/10.1038/s43586-021-00056-9</RefLink> </Reference> <Reference refNo="2"> <RefAuthor>Edgar R</RefAuthor> <RefAuthor>Domrachev M</RefAuthor> <RefAuthor>Lash AE</RefAuthor> <RefTitle>Gene Expression Omnibus: NCBI gene expression and hybridization array data repository</RefTitle> <RefYear>2002</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>207-10</RefPage> <RefTotal>Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002 Jan;30(1):207-10. DOI: 10.1093/nar/30.1.207</RefTotal> <RefLink>https://doi.org/10.1093/nar/30.1.207</RefLink> </Reference> <Reference refNo="3"> <RefAuthor>Leinonen R</RefAuthor> <RefAuthor>Sugawara H</RefAuthor> <RefAuthor>Shumway M</RefAuthor> <RefAuthor> International Nucleotide Sequence Database Collaboration</RefAuthor> <RefTitle>The sequence read archive</RefTitle> <RefYear>2011</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D19-21</RefPage> <RefTotal>Leinonen R, Sugawara H, Shumway M; International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21. DOI: 10.1093/nar/gkq1019</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkq1019</RefLink> </Reference> <Reference refNo="4"> <RefAuthor>Vizcaíno JA</RefAuthor> <RefAuthor>Côté R</RefAuthor> <RefAuthor>Reisinger F</RefAuthor> <RefAuthor>Foster JM</RefAuthor> <RefAuthor>Mueller M</RefAuthor> <RefAuthor>Rameseder J</RefAuthor> <RefAuthor>Hermjakob H</RefAuthor> <RefAuthor>Martens L</RefAuthor> <RefTitle>A guide to the Proteomics Identifications Database proteomics data repository</RefTitle> <RefYear>2009</RefYear> <RefJournal>Proteomics</RefJournal> <RefPage>4276-83</RefPage> <RefTotal>Vizcaíno JA, Côté R, Reisinger F, Foster JM, Mueller M, Rameseder J, Hermjakob H, Martens L. A guide to the Proteomics Identifications Database proteomics data repository. Proteomics. 2009 Sep;9(18):4276-83. DOI: 10.1002/pmic.200900402</RefTotal> <RefLink>https://doi.org/10.1002/pmic.200900402</RefLink> </Reference> <Reference refNo="5"> <RefAuthor>Samaras P</RefAuthor> <RefAuthor>Schmidt T</RefAuthor> <RefAuthor>Frejno M</RefAuthor> <RefAuthor>Gessulat S</RefAuthor> <RefAuthor>Reinecke M</RefAuthor> <RefAuthor>Jarzab A</RefAuthor> <RefAuthor>Zecha J</RefAuthor> <RefAuthor>Mergner J</RefAuthor> <RefAuthor>Giansanti P</RefAuthor> <RefAuthor>Ehrlich HC</RefAuthor> <RefAuthor>Aiche S</RefAuthor> <RefAuthor>Rank J</RefAuthor> <RefAuthor>Kienegger H</RefAuthor> <RefAuthor>Krcmar H</RefAuthor> <RefAuthor>Kuster B</RefAuthor> <RefAuthor>Wilhelm M</RefAuthor> <RefTitle>ProteomicsDB: a multi-omics and multi-organism resource for life science research</RefTitle> <RefYear>2020</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D1153-D1163</RefPage> <RefTotal>Samaras P, Schmidt T, Frejno M, Gessulat S, Reinecke M, Jarzab A, Zecha J, Mergner J, Giansanti P, Ehrlich HC, Aiche S, Rank J, Kienegger H, Krcmar H, Kuster B, Wilhelm M. ProteomicsDB: a multi-omics and multi-organism resource for life science research. Nucleic Acids Res. 2020 Jan;48(D1):D1153-D1163. DOI: 10.1093/nar/gkz974</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkz974</RefLink> </Reference> <Reference refNo="6"> <RefAuthor>Karczewski KJ</RefAuthor> <RefAuthor>Francioli LC</RefAuthor> <RefAuthor>Tiao G</RefAuthor> <RefAuthor>Cummings BB</RefAuthor> <RefAuthor>Alföldi J</RefAuthor> <RefAuthor>Wang Q</RefAuthor> <RefAuthor>Collins RL</RefAuthor> <RefAuthor>Laricchia KM</RefAuthor> <RefAuthor>Ganna A</RefAuthor> <RefAuthor>Birnbaum DP</RefAuthor> <RefAuthor>Gauthier LD</RefAuthor> <RefAuthor>Brand H</RefAuthor> <RefAuthor>Solomonson M</RefAuthor> <RefAuthor>Watts NA</RefAuthor> <RefAuthor>Rhodes D</RefAuthor> <RefAuthor>Singer-Berk M</RefAuthor> <RefAuthor>England EM</RefAuthor> <RefAuthor>Seaby EG</RefAuthor> <RefAuthor>Kosmicki JA</RefAuthor> <RefAuthor>Walters RK</RefAuthor> <RefAuthor>Tashman K</RefAuthor> <RefAuthor>Farjoun Y</RefAuthor> <RefAuthor>Banks E</RefAuthor> <RefAuthor>Poterba T</RefAuthor> <RefAuthor>Wang A</RefAuthor> <RefAuthor>Seed C</RefAuthor> <RefAuthor>Whiffin N</RefAuthor> <RefAuthor>Chong JX</RefAuthor> <RefAuthor>Samocha KE</RefAuthor> <RefAuthor>Pierce-Hoffman E</RefAuthor> <RefAuthor>Zappala Z</RefAuthor> <RefAuthor>O’Donnell-Luria AH</RefAuthor> <RefAuthor>Minikel EV</RefAuthor> <RefAuthor>Weisburd B</RefAuthor> <RefAuthor>Lek M</RefAuthor> <RefAuthor>Ware JS</RefAuthor> <RefAuthor>Vittal C</RefAuthor> <RefAuthor>Armean IM</RefAuthor> <RefAuthor>Bergelson L</RefAuthor> <RefAuthor>Cibulskis K</RefAuthor> <RefAuthor>Connolly KM</RefAuthor> <RefAuthor>Covarrubias M</RefAuthor> <RefAuthor>Donnelly S</RefAuthor> <RefAuthor>Ferriera S</RefAuthor> <RefAuthor>Gabriel S</RefAuthor> <RefAuthor>Gentry J</RefAuthor> <RefAuthor>Gupta N</RefAuthor> <RefAuthor>Jeandet T</RefAuthor> <RefAuthor>Kaplan D</RefAuthor> <RefAuthor>Llanwarne C</RefAuthor> <RefAuthor>Munshi R</RefAuthor> <RefAuthor>Novod S</RefAuthor> <RefAuthor>Petrillo N</RefAuthor> <RefAuthor>Roazen D</RefAuthor> <RefAuthor>Ruano-Rubio V</RefAuthor> <RefAuthor>Saltzman A</RefAuthor> <RefAuthor>Schleicher M</RefAuthor> <RefAuthor>Soto J</RefAuthor> <RefAuthor>Tibbetts K</RefAuthor> <RefAuthor>Tolonen C</RefAuthor> <RefAuthor>Wade G</RefAuthor> <RefAuthor>Talkowski ME</RefAuthor> <RefAuthor> Genome Aggregation Database ConsortiumNeale BM</RefAuthor> <RefAuthor>Daly MJ</RefAuthor> <RefAuthor>MacArthur DG</RefAuthor> <RefTitle>The mutational constraint spectrum quantified from variation in 141,456 humans</RefTitle> <RefYear>2020</RefYear> <RefJournal>Nature</RefJournal> <RefPage>434-43</RefPage> <RefTotal>Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME; Genome Aggregation Database ConsortiumNeale BM, Daly MJ, MacArthur DG. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May;581(7809):434-43. DOI: 10.1038/s41586-020-2308-7</RefTotal> <RefLink>https://doi.org/10.1038/s41586-020-2308-7</RefLink> </Reference> <Reference refNo="7"> <RefAuthor>Kerrien S</RefAuthor> <RefAuthor>Alam-Faruque Y</RefAuthor> <RefAuthor>Aranda B</RefAuthor> <RefAuthor>Bancarz I</RefAuthor> <RefAuthor>Bridge A</RefAuthor> <RefAuthor>Derow C</RefAuthor> <RefAuthor>Dimmer E</RefAuthor> <RefAuthor>Feuermann M</RefAuthor> <RefAuthor>Friedrichsen A</RefAuthor> <RefAuthor>Huntley R</RefAuthor> <RefAuthor>Kohler C</RefAuthor> <RefAuthor>Khadake J</RefAuthor> <RefAuthor>Leroy C</RefAuthor> <RefAuthor>Liban A</RefAuthor> <RefAuthor>Lieftink C</RefAuthor> <RefAuthor>Montecchi-Palazzi L</RefAuthor> <RefAuthor>Orchard S</RefAuthor> <RefAuthor>Risse J</RefAuthor> <RefAuthor>Robbe K</RefAuthor> <RefAuthor>Roechert B</RefAuthor> <RefAuthor>Thorneycroft D</RefAuthor> <RefAuthor>Zhang Y</RefAuthor> <RefAuthor>Apweiler R</RefAuthor> <RefAuthor>Hermjakob H</RefAuthor> <RefTitle>IntAct – open source resource for molecular interaction data</RefTitle> <RefYear>2007</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D561-5</RefPage> <RefTotal>Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. IntAct – open source resource for molecular interaction data. Nucleic Acids Res. 2007 Jan;35(Database issue):D561-5. DOI: 10.1093/nar/gkl958</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkl958</RefLink> </Reference> <Reference refNo="8"> <RefAuthor>Oughtred R</RefAuthor> <RefAuthor>Rust J</RefAuthor> <RefAuthor>Chang C</RefAuthor> <RefAuthor>Breitkreutz BJ</RefAuthor> <RefAuthor>Stark C</RefAuthor> <RefAuthor>Willems A</RefAuthor> <RefAuthor>Boucher L</RefAuthor> <RefAuthor>Leung G</RefAuthor> <RefAuthor>Kolas N</RefAuthor> <RefAuthor>Zhang F</RefAuthor> <RefAuthor>Dolma S</RefAuthor> <RefAuthor>Coulombe-Huntington J</RefAuthor> <RefAuthor>Chatr-Aryamontri A</RefAuthor> <RefAuthor>Dolinski K</RefAuthor> <RefAuthor>Tyers M</RefAuthor> <RefTitle>The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions</RefTitle> <RefYear>2021</RefYear> <RefJournal>Protein Sci</RefJournal> <RefPage>187-200</RefPage> <RefTotal>Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, Boucher L, Leung G, Kolas N, Zhang F, Dolma S, Coulombe-Huntington J, Chatr-Aryamontri A, Dolinski K, Tyers M. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 2021 Jan;30(1):187-200. DOI: 10.1002/pro.3978</RefTotal> <RefLink>https://doi.org/10.1002/pro.3978</RefLink> </Reference> <Reference refNo="9"> <RefAuthor>Szklarczyk D</RefAuthor> <RefAuthor>Gable AL</RefAuthor> <RefAuthor>Nastou KC</RefAuthor> <RefAuthor>Lyon D</RefAuthor> <RefAuthor>Kirsch R</RefAuthor> <RefAuthor>Pyysalo S</RefAuthor> <RefAuthor>Doncheva NT</RefAuthor> <RefAuthor>Legeay M</RefAuthor> <RefAuthor>Fang T</RefAuthor> <RefAuthor>Bork P</RefAuthor> <RefAuthor>Jensen LJ</RefAuthor> <RefAuthor>von Mering C</RefAuthor> <RefTitle>The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets</RefTitle> <RefYear>2021</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D605-D612</RefPage> <RefTotal>Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021 Jan;49(D1):D605-D612. DOI: 10.1093/nar/gkaa1074</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkaa1074</RefLink> </Reference> <Reference refNo="10"> <RefAuthor>Wang B</RefAuthor> <RefAuthor>Mezlini AM</RefAuthor> <RefAuthor>Demir F</RefAuthor> <RefAuthor>Fiume M</RefAuthor> <RefAuthor>Tu Z</RefAuthor> <RefAuthor>Brudno M</RefAuthor> <RefAuthor>Haibe-Kains B</RefAuthor> <RefAuthor>Goldenberg A</RefAuthor> <RefTitle>Similarity network fusion for aggregating data types on a genomic scale</RefTitle> <RefYear>2014</RefYear> <RefJournal>Nat Methods</RefJournal> <RefPage>333-7</RefPage> <RefTotal>Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, Haibe-Kains B, Goldenberg A. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014 Mar;11(3):333-7. DOI: 10.1038/nmeth.2810</RefTotal> <RefLink>https://doi.org/10.1038/nmeth.2810</RefLink> </Reference> <Reference refNo="11"> <RefAuthor>Koh HWL</RefAuthor> <RefAuthor>Fermin D</RefAuthor> <RefAuthor>Vogel C</RefAuthor> <RefAuthor>Choi KP</RefAuthor> <RefAuthor>Ewing RM</RefAuthor> <RefAuthor>Choi H</RefAuthor> <RefTitle>iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery</RefTitle> <RefYear>2019</RefYear> <RefJournal>NPJ Syst Biol Appl</RefJournal> <RefPage>22</RefPage> <RefTotal>Koh HWL, Fermin D, Vogel C, Choi KP, Ewing RM, Choi H. iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery. NPJ Syst Biol Appl. 2019;5:22. DOI: 10.1038/s41540-019-0099-y</RefTotal> <RefLink>https://doi.org/10.1038/s41540-019-0099-y</RefLink> </Reference> <Reference refNo="12"> <RefAuthor>Kim S</RefAuthor> <RefAuthor>Jhong JH</RefAuthor> <RefAuthor>Lee J</RefAuthor> <RefAuthor>Koo JY</RefAuthor> <RefTitle>Meta-analytic support vector machine for integrating multiple omics data</RefTitle> <RefYear>2017</RefYear> <RefJournal>BioData Min</RefJournal> <RefPage>2</RefPage> <RefTotal>Kim S, Jhong JH, Lee J, Koo JY. Meta-analytic support vector machine for integrating multiple omics data. BioData Min. 2017;10:2. DOI: 10.1186/s13040-017-0126-8</RefTotal> <RefLink>https://doi.org/10.1186/s13040-017-0126-8</RefLink> </Reference> <Reference refNo="13"> <RefAuthor>Tjärnberg A</RefAuthor> <RefAuthor>Mahmood O</RefAuthor> <RefAuthor>Jackson CA</RefAuthor> <RefAuthor>Saldi GA</RefAuthor> <RefAuthor>Cho K</RefAuthor> <RefAuthor>Christiaen LA</RefAuthor> <RefAuthor>Bonneau RA</RefAuthor> <RefTitle>Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data</RefTitle> <RefYear>2021</RefYear> <RefJournal>PLoS Comput Biol</RefJournal> <RefPage>e1008569</RefPage> <RefTotal>Tjärnberg A, Mahmood O, Jackson CA, Saldi GA, Cho K, Christiaen LA, Bonneau RA. Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data. PLoS Comput Biol. 2021 Jan;17(1):e1008569. DOI: 10.1371/journal.pcbi.1008569</RefTotal> <RefLink>https://doi.org/10.1371/journal.pcbi.1008569</RefLink> </Reference> <Reference refNo="14"> <RefAuthor>Acharjee A</RefAuthor> <RefAuthor>Larkman J</RefAuthor> <RefAuthor>Xu Y</RefAuthor> <RefAuthor>Cardoso VR</RefAuthor> <RefAuthor>Gkoutos GV</RefAuthor> <RefTitle>A random forest based biomarker discovery and power analysis framework for diagnostics research</RefTitle> <RefYear>2020</RefYear> <RefJournal>BMC Med Genomics</RefJournal> <RefPage>178</RefPage> <RefTotal>Acharjee A, Larkman J, Xu Y, Cardoso VR, Gkoutos GV. A random forest based biomarker discovery and power analysis framework for diagnostics research. BMC Med Genomics. 2020 Nov;13(1):178. DOI: 10.1186/s12920-020-00826-6</RefTotal> <RefLink>https://doi.org/10.1186/s12920-020-00826-6</RefLink> </Reference> <Reference refNo="15"> <RefAuthor>Hedenfalk I</RefAuthor> <RefAuthor>Duggan D</RefAuthor> <RefAuthor>Chen Y</RefAuthor> <RefAuthor>Radmacher M</RefAuthor> <RefAuthor>Bittner M</RefAuthor> <RefAuthor>Simon R</RefAuthor> <RefAuthor>Meltzer P</RefAuthor> <RefAuthor>Gusterson B</RefAuthor> <RefAuthor>Esteller M</RefAuthor> <RefAuthor>Kallioniemi OP</RefAuthor> <RefAuthor>Wilfond B</RefAuthor> <RefAuthor>Borg A</RefAuthor> <RefAuthor>Trent J</RefAuthor> <RefAuthor>Raffeld M</RefAuthor> <RefAuthor>Yakhini Z</RefAuthor> <RefAuthor>Ben-Dor A</RefAuthor> <RefAuthor>Dougherty E</RefAuthor> <RefAuthor>Kononen J</RefAuthor> <RefAuthor>Bubendorf L</RefAuthor> <RefAuthor>Fehrle W</RefAuthor> <RefAuthor>Pittaluga S</RefAuthor> <RefAuthor>Gruvberger S</RefAuthor> <RefAuthor>Loman N</RefAuthor> <RefAuthor>Johannsson O</RefAuthor> <RefAuthor>Olsson H</RefAuthor> <RefAuthor>Sauter G</RefAuthor> <RefTitle>Gene-expression profiles in hereditary breast cancer</RefTitle> <RefYear>2001</RefYear> <RefJournal>N Engl J Med</RefJournal> <RefPage>539-48</RefPage> <RefTotal>Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioniemi OP, Wilfond B, Borg A, Trent J, Raffeld M, Yakhini Z, Ben-Dor A, Dougherty E, Kononen J, Bubendorf L, Fehrle W, Pittaluga S, Gruvberger S, Loman N, Johannsson O, Olsson H, Sauter G. Gene-expression profiles in hereditary breast cancer. N Engl J Med. 2001 Feb;344(8):539-48. DOI: 10.1056/NEJM200102223440801</RefTotal> <RefLink>https://doi.org/10.1056/NEJM200102223440801</RefLink> </Reference> <Reference refNo="16"> <RefAuthor>Bittner M</RefAuthor> <RefAuthor>Meltzer P</RefAuthor> <RefAuthor>Chen Y</RefAuthor> <RefAuthor>Jiang Y</RefAuthor> <RefAuthor>Seftor E</RefAuthor> <RefAuthor>Hendrix M</RefAuthor> <RefAuthor>Radmacher M</RefAuthor> <RefAuthor>Simon R</RefAuthor> <RefAuthor>Yakhini Z</RefAuthor> <RefAuthor>Ben-Dor A</RefAuthor> <RefAuthor>Sampas N</RefAuthor> <RefAuthor>Dougherty E</RefAuthor> <RefAuthor>Wang E</RefAuthor> <RefAuthor>Marincola F</RefAuthor> <RefAuthor>Gooden C</RefAuthor> <RefAuthor>Lueders J</RefAuthor> <RefAuthor>Glatfelter A</RefAuthor> <RefAuthor>Pollock P</RefAuthor> <RefAuthor>Carpten J</RefAuthor> <RefAuthor>Gillanders E</RefAuthor> <RefAuthor>Leja D</RefAuthor> <RefAuthor>Dietrich K</RefAuthor> <RefAuthor>Beaudry C</RefAuthor> <RefAuthor>Berens M</RefAuthor> <RefAuthor>Alberts D</RefAuthor> <RefAuthor>Sondak V</RefAuthor> <RefTitle>Molecular classification of cutaneous malignant melanoma by gene expression profiling</RefTitle> <RefYear>2000</RefYear> <RefJournal>Nature</RefJournal> <RefPage>536-40</RefPage> <RefTotal>Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 2000 Aug;406(6795):536-40. DOI: 10.1038/35020115</RefTotal> <RefLink>https://doi.org/10.1038/35020115</RefLink> </Reference> <Reference refNo="17"> <RefAuthor>Ben-Dor A</RefAuthor> <RefAuthor>Shamir R</RefAuthor> <RefAuthor>Yakhini Z</RefAuthor> <RefTitle>Clustering gene expression patterns</RefTitle> <RefYear>1999</RefYear> <RefJournal>J Comput Biol</RefJournal> <RefPage>281-97</RefPage> <RefTotal>Ben-Dor A, Shamir R, Yakhini Z. Clustering gene expression patterns. J Comput Biol. 1999;6(3-4):281-97. DOI: 10.1089/106652799318274</RefTotal> <RefLink>https://doi.org/10.1089/106652799318274</RefLink> </Reference> <Reference refNo="18"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Najami N</RefAuthor> <RefAuthor>Abedallah L</RefAuthor> <RefAuthor>Khalifa W</RefAuthor> <RefTitle>Computational Approaches for Biomarker Discovery</RefTitle> <RefYear>2014</RefYear> <RefJournal>J Intell Learn Syst Appl</RefJournal> <RefPage>153-61</RefPage> <RefTotal>Yousef M, Najami N, Abedallah L, Khalifa W. Computational Approaches for Biomarker Discovery. J Intell Learn Syst Appl. 2014;6(4):153-61. DOI: 10.4236/jilsa.2014.64012</RefTotal> <RefLink>https://doi.org/10.4236/jilsa.2014.64012</RefLink> </Reference> <Reference refNo="19"> <RefAuthor>Chou CH</RefAuthor> <RefAuthor>Shrestha S</RefAuthor> <RefAuthor>Yang CD</RefAuthor> <RefAuthor>Chang NW</RefAuthor> <RefAuthor>Lin YL</RefAuthor> <RefAuthor>Liao KW</RefAuthor> <RefAuthor>Huang WC</RefAuthor> <RefAuthor>Sun TH</RefAuthor> <RefAuthor>Tu SJ</RefAuthor> <RefAuthor>Lee WH</RefAuthor> <RefAuthor>Chiew MY</RefAuthor> <RefAuthor>Tai CS</RefAuthor> <RefAuthor>Wei TY</RefAuthor> <RefAuthor>Tsai TR</RefAuthor> <RefAuthor>Huang HT</RefAuthor> <RefAuthor>Wang CY</RefAuthor> <RefAuthor>Wu HY</RefAuthor> <RefAuthor>Ho SY</RefAuthor> <RefAuthor>Chen PR</RefAuthor> <RefAuthor>Chuang CH</RefAuthor> <RefAuthor>Hsieh PJ</RefAuthor> <RefAuthor>Wu YS</RefAuthor> <RefAuthor>Chen WL</RefAuthor> <RefAuthor>Li MJ</RefAuthor> <RefAuthor>Wu YC</RefAuthor> <RefAuthor>Huang XY</RefAuthor> <RefAuthor>Ng FL</RefAuthor> <RefAuthor>Buddhakosai W</RefAuthor> <RefAuthor>Huang PC</RefAuthor> <RefAuthor>Lan KC</RefAuthor> <RefAuthor>Huang CY</RefAuthor> <RefAuthor>Weng SL</RefAuthor> <RefAuthor>Cheng YN</RefAuthor> <RefAuthor>Liang C</RefAuthor> <RefAuthor>Hsu WL</RefAuthor> <RefAuthor>Huang HD</RefAuthor> <RefTitle>miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions</RefTitle> <RefYear>2018</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D296-D302</RefPage> <RefTotal>Chou CH, Shrestha S, Yang CD, Chang NW, Lin YL, Liao KW, Huang WC, Sun TH, Tu SJ, Lee WH, Chiew MY, Tai CS, Wei TY, Tsai TR, Huang HT, Wang CY, Wu HY, Ho SY, Chen PR, Chuang CH, Hsieh PJ, Wu YS, Chen WL, Li MJ, Wu YC, Huang XY, Ng FL, Buddhakosai W, Huang PC, Lan KC, Huang CY, Weng SL, Cheng YN, Liang C, Hsu WL, Huang HD. miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic Acids Res. 2018 Jan;46(D1):D296-D302. DOI: 10.1093/nar/gkx1067</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkx1067</RefLink> </Reference> <Reference refNo="20"> <RefAuthor>The Gene Ontology Consortium</RefAuthor> <RefTitle>The Gene Ontology Resource: 20 years and still GOing strong</RefTitle> <RefYear>2019</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D330-D338</RefPage> <RefTotal>The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019 Jan;47(D1):D330-D338. DOI: 10.1093/nar/gky1055</RefTotal> <RefLink>https://doi.org/10.1093/nar/gky1055</RefLink> </Reference> <Reference refNo="21"> <RefAuthor>Kanehisa M</RefAuthor> <RefAuthor>Furumichi M</RefAuthor> <RefAuthor>Tanabe M</RefAuthor> <RefAuthor>Sato Y</RefAuthor> <RefAuthor>Morishima K</RefAuthor> <RefTitle>KEGG: new perspectives on genomes, pathways, diseases and drugs</RefTitle> <RefYear>2017</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D353-D361</RefPage> <RefTotal>Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017 Jan;45(D1):D353-D361. DOI: 10.1093/nar/gkw1092</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkw1092</RefLink> </Reference> <Reference refNo="22"> <RefAuthor>Piñero J</RefAuthor> <RefAuthor>Ramírez-Anguita JM</RefAuthor> <RefAuthor>Saüch-Pitarch J</RefAuthor> <RefAuthor>Ronzano F</RefAuthor> <RefAuthor>Centeno E</RefAuthor> <RefAuthor>Sanz F</RefAuthor> <RefAuthor>Furlong LI</RefAuthor> <RefTitle>The DisGeNET knowledge platform for disease genomics: 2019 update</RefTitle> <RefYear>2020</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>D845-D855</RefPage> <RefTotal>Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020 Jan;48(D1):D845-D855. DOI: 10.1093/nar/gkz1021</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkz1021</RefLink> </Reference> <Reference refNo="23"> <RefAuthor>Bellazzi R</RefAuthor> <RefAuthor>Zupan B</RefAuthor> <RefTitle>Towards knowledge-based gene expression data mining</RefTitle> <RefYear>2007</RefYear> <RefJournal>J Biomed Inform</RefJournal> <RefPage>787-802</RefPage> <RefTotal>Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J Biomed Inform. 2007 Dec;40(6):787-802. DOI: 10.1016/j.jbi.2007.06.005</RefTotal> <RefLink>https://doi.org/10.1016/j.jbi.2007.06.005</RefLink> </Reference> <Reference refNo="24"> <RefAuthor>Jaskowiak PA</RefAuthor> <RefAuthor>Campello RJ</RefAuthor> <RefAuthor>Costa IG</RefAuthor> <RefTitle>On the selection of appropriate distances for gene expression data clustering</RefTitle> <RefYear>2014</RefYear> <RefJournal>BMC Bioinformatics</RefJournal> <RefPage>S2</RefPage> <RefTotal>Jaskowiak PA, Campello RJ, Costa IG. On the selection of appropriate distances for gene expression data clustering. BMC Bioinformatics. 2014;15 (Suppl 2):S2. DOI: 10.1186/1471-2105-15-S2-S2</RefTotal> <RefLink>https://doi.org/10.1186/1471-2105-15-S2-S2</RefLink> </Reference> <Reference refNo="25"> <RefAuthor>Falcon S</RefAuthor> <RefAuthor>Gentleman R</RefAuthor> <RefTitle>Using GOstats to test gene lists for GO term association</RefTitle> <RefYear>2007</RefYear> <RefJournal>Bioinformatics</RefJournal> <RefPage>257-8</RefPage> <RefTotal>Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007 Jan;23(2):257-8. DOI: 10.1093/bioinformatics/btl567</RefTotal> <RefLink>https://doi.org/10.1093/bioinformatics/btl567</RefLink> </Reference> <Reference refNo="26"> <RefAuthor>Wang S</RefAuthor> <RefAuthor>Tang X</RefAuthor> <RefAuthor>Qin L</RefAuthor> <RefAuthor>Shi W</RefAuthor> <RefAuthor>Bian S</RefAuthor> <RefAuthor>Wang Z</RefAuthor> <RefAuthor>Wang Q</RefAuthor> <RefAuthor>Wang X</RefAuthor> <RefAuthor>Gu J</RefAuthor> <RefAuthor>Hao B</RefAuthor> <RefAuthor>Ding K</RefAuthor> <RefAuthor>Liao S</RefAuthor> <RefTitle>Integrative Analysis Extracts a Core ceRNA Network of the Fetal Hippocampus With Down Syndrome</RefTitle> <RefYear>2020</RefYear> <RefJournal>Front Genet</RefJournal> <RefPage>565955</RefPage> <RefTotal>Wang S, Tang X, Qin L, Shi W, Bian S, Wang Z, Wang Q, Wang X, Gu J, Hao B, Ding K, Liao S. Integrative Analysis Extracts a Core ceRNA Network of the Fetal Hippocampus With Down Syndrome. Front Genet. 2020;11:565955. DOI: 10.3389/fgene.2020.565955</RefTotal> <RefLink>https://doi.org/10.3389/fgene.2020.565955</RefLink> </Reference> <Reference refNo="27"> <RefAuthor>Reijnders MJMF</RefAuthor> <RefAuthor>Waterhouse RM</RefAuthor> <RefTitle>CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation</RefTitle> <RefYear>2022</RefYear> <RefJournal>PLoS Comput Biol</RefJournal> <RefPage>e1010075</RefPage> <RefTotal>Reijnders MJMF, Waterhouse RM. CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation. PLoS Comput Biol. 2022 May;18(5):e1010075. DOI: 10.1371/journal.pcbi.1010075</RefTotal> <RefLink>https://doi.org/10.1371/journal.pcbi.1010075</RefLink> </Reference> <Reference refNo="28"> <RefAuthor>Udhaya Kumar S</RefAuthor> <RefAuthor>Thirumal Kumar D</RefAuthor> <RefAuthor>Bithia R</RefAuthor> <RefAuthor>Sankar S</RefAuthor> <RefAuthor>Magesh R</RefAuthor> <RefAuthor>Sidenna M</RefAuthor> <RefAuthor>George Priya Doss C</RefAuthor> <RefAuthor>Zayed H</RefAuthor> <RefTitle>Analysis of Differentially Expressed Genes and Molecular Pathways in Familial Hypercholesterolemia Involved in Atherosclerosis: A Systematic and Bioinformatics Approach</RefTitle> <RefYear>2020</RefYear> <RefJournal>Front Genet</RefJournal> <RefPage>734</RefPage> <RefTotal>Udhaya Kumar S, Thirumal Kumar D, Bithia R, Sankar S, Magesh R, Sidenna M, George Priya Doss C, Zayed H. Analysis of Differentially Expressed Genes and Molecular Pathways in Familial Hypercholesterolemia Involved in Atherosclerosis: A Systematic and Bioinformatics Approach. Front Genet. 2020;11:734. DOI: 10.3389/fgene.2020.00734</RefTotal> <RefLink>https://doi.org/10.3389/fgene.2020.00734</RefLink> </Reference> <Reference refNo="29"> <RefAuthor>Perscheid C</RefAuthor> <RefAuthor>Grasnick B</RefAuthor> <RefAuthor>Uflacker M</RefAuthor> <RefTitle>Integrative Gene Selection on Gene Expression Data</RefTitle> <RefYear></RefYear> <RefTotal>Perscheid C, Grasnick B, Uflacker M. Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches. J Integr Bioinform. 2018 Dec;16(1). DOI: 10.1515/jib-2018-0064</RefTotal> <RefLink>https://doi.org/10.1515/jib-2018-0064</RefLink> </Reference> <Reference refNo="30"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Abdallah L</RefAuthor> <RefAuthor>Allmer J</RefAuthor> <RefTitle>maTE: discovering expressed interactions between microRNAs and their targets</RefTitle> <RefYear>2019</RefYear> <RefJournal>Bioinformatics</RefJournal> <RefPage>4020-8</RefPage> <RefTotal>Yousef M, Abdallah L, Allmer J. maTE: discovering expressed interactions between microRNAs and their targets. Bioinformatics. 2019 Oct;35(20):4020-8. DOI: 10.1093/bioinformatics/btz204</RefTotal> <RefLink>https://doi.org/10.1093/bioinformatics/btz204</RefLink> </Reference> <Reference refNo="31"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Ülgen E</RefAuthor> <RefAuthor>Uğur Sezerman O</RefAuthor> <RefTitle>CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis</RefTitle> <RefYear>2021</RefYear> <RefJournal>PeerJ Comput Sci</RefJournal> <RefPage>e336</RefPage> <RefTotal>Yousef M, Ülgen E, Uğur Sezerman O. CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis. PeerJ Comput Sci. 2021;7:e336. DOI: 10.7717/peerj-cs.336</RefTotal> <RefLink>https://doi.org/10.7717/peerj-cs.336</RefLink> </Reference> <Reference refNo="32"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Goy G</RefAuthor> <RefAuthor>Mitra R</RefAuthor> <RefAuthor>Eischen CM</RefAuthor> <RefAuthor>Jabeer A</RefAuthor> <RefAuthor>Bakir-Gungor B</RefAuthor> <RefTitle>miRcorrNet: machine learning-based integration of miRNA and mRNA expression profiles, combined with feature grouping and ranking</RefTitle> <RefYear>2021</RefYear> <RefJournal>PeerJ</RefJournal> <RefPage>e11458</RefPage> <RefTotal>Yousef M, Goy G, Mitra R, Eischen CM, Jabeer A, Bakir-Gungor B. miRcorrNet: machine learning-based integration of miRNA and mRNA expression profiles, combined with feature grouping and ranking. PeerJ. 2021;9:e11458. DOI: 10.7717/peerj.11458</RefTotal> <RefLink>https://doi.org/10.7717/peerj.11458</RefLink> </Reference> <Reference refNo="33"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Goy G</RefAuthor> <RefAuthor>Bakir-Gungor B</RefAuthor> <RefTitle>miRModuleNet: Detecting miRNA-mRNA Regulatory Modules</RefTitle> <RefYear>2022</RefYear> <RefJournal>Front Genet</RefJournal> <RefPage>767455</RefPage> <RefTotal>Yousef M, Goy G, Bakir-Gungor B. miRModuleNet: Detecting miRNA-mRNA Regulatory Modules. Front Genet. 2022;13:767455. DOI: 10.3389/fgene.2022.767455</RefTotal> <RefLink>https://doi.org/10.3389/fgene.2022.767455</RefLink> </Reference> <Reference refNo="34"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Sayıcı A</RefAuthor> <RefAuthor>Bakir-Gungor B</RefAuthor> <RefTitle>Integrating Gene Ontology Based Grouping and Ranking into the Machine Learning Algorithm for Gene Expression Data Analysis</RefTitle> <RefYear>2021</RefYear> <RefBookTitle>Database and Expert Systems Applications – DEXA 2021 Workshops</RefBookTitle> <RefPage>205-14</RefPage> <RefTotal>Yousef M, Sayıcı A, Bakir-Gungor B. Integrating Gene Ontology Based Grouping and Ranking into the Machine Learning Algorithm for Gene Expression Data Analysis. In: Kotsis G, Tjoa AM, Khalil I, Moser B, Mashkoor A, Sametinger J, Fensel A, Martinez-Gil J, Fischer L, Czech G, Sobieczky F, Khan S, editors. Database and Expert Systems Applications – DEXA 2021 Workshops. Cham: Springer International Publishing; 2021. (Communications in Computer and Information Science; 1479). p. 205-14. DOI: 10.1007/978-3-030-87101-7_20</RefTotal> <RefLink>https://doi.org/10.1007/978-3-030-87101-7_20</RefLink> </Reference> <Reference refNo="35"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Ozdemir F</RefAuthor> <RefAuthor>Jaaber A</RefAuthor> <RefAuthor>Allmer J</RefAuthor> <RefAuthor>Bakir-Gungor B</RefAuthor> <RefTitle>PriPath: Identifying Dysregulated Pathways from Differential Gene Expression via Grouping, Scoring and Modeling with an Embedded Machine Learning Approach [Preprint]</RefTitle> <RefYear>2022</RefYear> <RefJournal>Research Square</RefJournal> <RefPage></RefPage> <RefTotal>Yousef M, Ozdemir F, Jaaber A, Allmer J, Bakir-Gungor B. PriPath: Identifying Dysregulated Pathways from Differential Gene Expression via Grouping, Scoring and Modeling with an Embedded Machine Learning Approach [Preprint]. Research Square. 2022 Apr. DOI: 10.21203/rs.3.rs-1449467/v1</RefTotal> <RefLink>https://doi.org/10.21203/rs.3.rs-1449467/v1</RefLink> </Reference> <Reference refNo="36"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Qumsiyeh E</RefAuthor> <RefTitle>GediNET – Discover Disease-Disease Gene Associations utilizing Knowledge-based Machine Learning [Preprint]</RefTitle> <RefYear>2022</RefYear> <RefJournal>Research Square</RefJournal> <RefPage></RefPage> <RefTotal>Yousef M, Qumsiyeh E. GediNET – Discover Disease-Disease Gene Associations utilizing Knowledge-based Machine Learning [Preprint]. Research Square. 2022 May. DOI: 10.21203/rs.3.rs-1643219/v1</RefTotal> <RefLink>https://doi.org/10.21203/rs.3.rs-1643219/v1</RefLink> </Reference> <Reference refNo="37"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Jung S</RefAuthor> <RefAuthor>Showe LC</RefAuthor> <RefAuthor>Showe MK</RefAuthor> <RefTitle>Recursive cluster elimination (RCE) for classification and feature selection from gene expression data</RefTitle> <RefYear>2007</RefYear> <RefJournal>BMC Bioinformatics</RefJournal> <RefPage>144</RefPage> <RefTotal>Yousef M, Jung S, Showe LC, Showe MK. Recursive cluster elimination (RCE) for classification and feature selection from gene expression data. BMC Bioinformatics. 2007 May;8:144. DOI: 10.1186/1471-2105-8-144</RefTotal> <RefLink>https://doi.org/10.1186/1471-2105-8-144</RefLink> </Reference> <Reference refNo="38"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Bakir-Gungor B</RefAuthor> <RefAuthor>Jabeer A</RefAuthor> <RefAuthor>Goy G</RefAuthor> <RefAuthor>Qureshi R</RefAuthor> <RefAuthor>C Showe L</RefAuthor> <RefTitle>Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME</RefTitle> <RefYear>2020</RefYear> <RefJournal>F1000Res</RefJournal> <RefPage>1255</RefPage> <RefTotal>Yousef M, Bakir-Gungor B, Jabeer A, Goy G, Qureshi R, C Showe L. Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME. F1000Res. 2020;9:1255. DOI: 10.12688/f1000research.26880.2</RefTotal> <RefLink>https://doi.org/10.12688/f1000research.26880.2</RefLink> </Reference> <Reference refNo="39"> <RefAuthor>Yousef M</RefAuthor> <RefAuthor>Ketany M</RefAuthor> <RefAuthor>Manevitz L</RefAuthor> <RefAuthor>Showe LC</RefAuthor> <RefAuthor>Showe MK</RefAuthor> <RefTitle>Classification and biomarker identification using gene network modules and support vector machines</RefTitle> <RefYear>2009</RefYear> <RefJournal>BMC Bioinformatics</RefJournal> <RefPage>337</RefPage> <RefTotal>Yousef M, Ketany M, Manevitz L, Showe LC, Showe MK. Classification and biomarker identification using gene network modules and support vector machines. BMC Bioinformatics. 2009 Oct;10:337. DOI: 10.1186/1471-2105-10-337</RefTotal> <RefLink>https://doi.org/10.1186/1471-2105-10-337</RefLink> </Reference> <Reference refNo="40"> <RefAuthor>Graw S</RefAuthor> <RefAuthor>Chappell K</RefAuthor> <RefAuthor>Washam CL</RefAuthor> <RefAuthor>Gies A</RefAuthor> <RefAuthor>Bird J</RefAuthor> <RefAuthor>Robeson MS 2nd</RefAuthor> <RefAuthor>Byrum SD</RefAuthor> <RefTitle>Multi-omics data integration considerations and study design for biological systems and disease</RefTitle> <RefYear>2021</RefYear> <RefJournal>Mol Omics</RefJournal> <RefPage>170-85</RefPage> <RefTotal>Graw S, Chappell K, Washam CL, Gies A, Bird J, Robeson MS 2nd, Byrum SD. Multi-omics data integration considerations and study design for biological systems and disease. Mol Omics. 2021 Apr;17(2):170-85. DOI: 10.1039/d0mo00041h</RefTotal> <RefLink>https://doi.org/10.1039/d0mo00041h</RefLink> </Reference> <Reference refNo="41"> <RefAuthor>Subramanian I</RefAuthor> <RefAuthor>Verma S</RefAuthor> <RefAuthor>Kumar S</RefAuthor> <RefAuthor>Jere A</RefAuthor> <RefAuthor>Anamika K</RefAuthor> <RefTitle>Multi-omics Data Integration, Interpretation, and Its Application</RefTitle> <RefYear>2020</RefYear> <RefJournal>Bioinform Biol Insights</RefJournal> <RefPage>1177932219899051</RefPage> <RefTotal>Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinform Biol Insights. 2020;14:1177932219899051. DOI: 10.1177/1177932219899051</RefTotal> <RefLink>https://doi.org/10.1177/1177932219899051</RefLink> </Reference> <Reference refNo="42"> <RefAuthor>Zoppi J</RefAuthor> <RefAuthor>Guillaume JF</RefAuthor> <RefAuthor>Neunlist M</RefAuthor> <RefAuthor>Chaffron S</RefAuthor> <RefTitle>MiBiOmics: an interactive web application for multi-omics data exploration and integration</RefTitle> <RefYear>2021</RefYear> <RefJournal>BMC Bioinformatics</RefJournal> <RefPage>6</RefPage> <RefTotal>Zoppi J, Guillaume JF, Neunlist M, Chaffron S. MiBiOmics: an interactive web application for multi-omics data exploration and integration. BMC Bioinformatics. 2021 Jan;22(1):6. DOI: 10.1186/s12859-020-03921-8</RefTotal> <RefLink>https://doi.org/10.1186/s12859-020-03921-8</RefLink> </Reference> <Reference refNo="43"> <RefAuthor>Planell N</RefAuthor> <RefAuthor>Lagani V</RefAuthor> <RefAuthor>Sebastian-Leon P</RefAuthor> <RefAuthor>van der Kloet F</RefAuthor> <RefAuthor>Ewing E</RefAuthor> <RefAuthor>Karathanasis N</RefAuthor> <RefAuthor>Urdangarin A</RefAuthor> <RefAuthor>Arozarena I</RefAuthor> <RefAuthor>Jagodic M</RefAuthor> <RefAuthor>Tsamardinos I</RefAuthor> <RefAuthor>Tarazona S</RefAuthor> <RefAuthor>Conesa A</RefAuthor> <RefAuthor>Tegner J</RefAuthor> <RefAuthor>Gomez-Cabrero D</RefAuthor> <RefTitle>STATegra: Multi-Omics Data Integration - A Conceptual Scheme With a Bioinformatics Pipeline</RefTitle> <RefYear>2021</RefYear> <RefJournal>Front Genet</RefJournal> <RefPage>620453</RefPage> <RefTotal>Planell N, Lagani V, Sebastian-Leon P, van der Kloet F, Ewing E, Karathanasis N, Urdangarin A, Arozarena I, Jagodic M, Tsamardinos I, Tarazona S, Conesa A, Tegner J, Gomez-Cabrero D. STATegra: Multi-Omics Data Integration - A Conceptual Scheme With a Bioinformatics Pipeline. Front Genet. 2021;12:620453. DOI: 10.3389/fgene.2021.620453</RefTotal> <RefLink>https://doi.org/10.3389/fgene.2021.620453</RefLink> </Reference> <Reference refNo="44"> <RefAuthor>Ding J</RefAuthor> <RefAuthor>Blencowe M</RefAuthor> <RefAuthor>Nghiem T</RefAuthor> <RefAuthor>Ha SM</RefAuthor> <RefAuthor>Chen YW</RefAuthor> <RefAuthor>Li G</RefAuthor> <RefAuthor>Yang X</RefAuthor> <RefTitle>Mergeomics 2.0: a web server for multi-omics data integration to elucidate disease networks and predict therapeutics</RefTitle> <RefYear>2021</RefYear> <RefJournal>Nucleic Acids Res</RefJournal> <RefPage>W375-W387</RefPage> <RefTotal>Ding J, Blencowe M, Nghiem T, Ha SM, Chen YW, Li G, Yang X. Mergeomics 2.0: a web server for multi-omics data integration to elucidate disease networks and predict therapeutics. Nucleic Acids Res. 2021 Jul;49(W1):W375-W387. DOI: 10.1093/nar/gkab405</RefTotal> <RefLink>https://doi.org/10.1093/nar/gkab405</RefLink> </Reference> <Reference refNo="45"> <RefAuthor>Rohart F</RefAuthor> <RefAuthor>Gautier B</RefAuthor> <RefAuthor>Singh A</RefAuthor> <RefAuthor>Lê Cao KA</RefAuthor> <RefTitle>mixOmics: An R package for ‘omics feature selection and multiple data integration</RefTitle> <RefYear>2017</RefYear> <RefJournal>PLoS Comput Biol</RefJournal> <RefPage>e1005752</RefPage> <RefTotal>Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol. 2017 Nov;13(11):e1005752. DOI: 10.1371/journal.pcbi.1005752</RefTotal> <RefLink>https://doi.org/10.1371/journal.pcbi.1005752</RefLink> </Reference> <Reference refNo="46"> <RefAuthor>Poirion OB</RefAuthor> <RefAuthor>Jing Z</RefAuthor> <RefAuthor>Chaudhary K</RefAuthor> <RefAuthor>Huang S</RefAuthor> <RefAuthor>Garmire LX</RefAuthor> <RefTitle>DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data</RefTitle> <RefYear>2021</RefYear> <RefJournal>Genome Med</RefJournal> <RefPage>112</RefPage> <RefTotal>Poirion OB, Jing Z, Chaudhary K, Huang S, Garmire LX. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med. 2021 Jul;13(1):112. DOI: 10.1186/s13073-021-00930-x</RefTotal> <RefLink>https://doi.org/10.1186/s13073-021-00930-x</RefLink> </Reference> </References> <Media> <Tables> <Table format="png"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <Caption><Pgraph><Mark1>Table 1: The summaries of the main tools including type of the method, disease in case study, and biological knowledge</Mark1></Pgraph></Caption> </Table> <NoOfTables>1</NoOfTables> </Tables> <Figures> <Figure format="png" height="668" width="525"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <Caption><Pgraph><Mark1>Figure 1: General Multi-omics data analysis framework, integrating omics data and pre-existing biological knowledge</Mark1></Pgraph></Caption> </Figure> <NoOfPictures>1</NoOfPictures> </Figures> <InlineFigures> <NoOfPictures>0</NoOfPictures> </InlineFigures> <Attachments> <NoOfAttachments>0</NoOfAttachments> </Attachments> </Media> </OrigData> </GmsArticle>