Integrative machine learning reveals potential signature genes using transcriptomics in colon cancer
DOI:
https://doi.org/10.14295/bjs.v4i9.745Keywords:
cancer genome atlas, colon cancer, machine learning, transcriptomicsAbstract
Colon cancer is a significant health burden in the world and the second leading cause of cancer-related deaths. Despite advancements in diagnosis and treatment, identifying potential biomarkers for early detection and therapeutic targets remains challenging. This study used an integrative approach combining transcriptomics and machine learning to identify signature genes and pathways associated with colon cancer. RNA-Seq data from The Cancer Genome Atlas- Colon Adenocarcinoma (TCGA-COAD) project, comprising 485 samples, were analyzed in this study. Differential gene expression analysis revealed 657 upregulated and 8,566 downregulated genes. Notably, EPB41L3, TSPAN7, and ABI3BP were identified as highly upregulated, while LYVE1, PLPP1, and NFE2L3 were significantly downregulated in tumor samples. Gene Set Enrichment Analysis (GSEA) identified dysregulated pathways, including E2F targets, MYC targets, and G2M checkpoints, underscoring cell cycle regulation and metabolic reprogramming alterations in colon cancer. Machine learning models-Random Forest, Neural Networks, and Logistic Regression-achieved high classification accuracy (97–99%). Key genes consistently identified across these models highlight their potential translational relevance as biomarkers. This study integrates differential expression analysis, pathway enrichment, and machine learning to uncover critical insights into colon cancer biology. The study lays the groundwork for developing diagnostic and therapeutic strategies, with the identified genes and pathways serving as potential candidates for further validation and clinical applications. This approach exemplifies the potential of precision medicine to advance colon cancer research and improve patient outcomes.
References
Aono, S., Hatanaka, A., Hatanaka, A., Gao, Y., Hippo, Y., Taketo, M. M., Waku, T., & Kobayashi, A. (2019). beta-Catenin/TCF4 complex-mediated induction of the NRF3 (NFE2L3) gene in cancer cells. International Journal of Molecular Sciences, 20(13). https://doi.org/10.3390/ijms20133344 DOI: https://doi.org/10.3390/ijms20133344
Augustus, G. J., & Ellis, N. A. (2018). Colorectal cancer disparity in african americans: Risk factors and carcinogenic mechanisms. The American Journal of Pathology, 188(2), 291-303. https://doi.org/10.1016/j.ajpath.2017.07.023 DOI: https://doi.org/10.1016/j.ajpath.2017.07.023
Barabasi, A. L., Gulbahce, N., & Loscalzo, J. (2011). Network medicine: a network-based approach to human disease. Nature Reviews Genetics, 12(1), 56-68. https://doi.org/10.1038/nrg2918 DOI: https://doi.org/10.1038/nrg2918
Bury, M., Le Calve, B., Lessard, F., Dal Maso, T., Saliba, J., Michiels, C., Ferbeyre, G., & Blank, V. (2019). NFE2L3 Controls colon cancer cell growth through regulation of DUX4, a CDK1 inhibitor. Cell Reports, 29(6), 1469-1481 e1469. https://doi.org/10.1016/j.celrep.2019.09.087 DOI: https://doi.org/10.1016/j.celrep.2019.09.087
Capuano, A., Pivetta, E., Sartori, G., Bosisio, G., Favero, A., Cover, E., Andreuzzi, E., Colombatti, A., Cannizzaro, R., Scanziani, E., Minoli, L., Bucciotti, F., Amor Lopez, A. I., Gaspardo, K., Doliana, R., Mongiat, M., & Spessotto, P. (2019). Abrogation of EMILIN1-beta1 integrin interaction promotes experimental colitis and colon carcinogenesis. Matrix Biology, 83, 97-115. https://doi.org/10.1016/j.matbio.2019.08.006 DOI: https://doi.org/10.1016/j.matbio.2019.08.006
Chen, W., Huang, J., Xiong, J., Fu, P., Chen, C., Liu, Y., Li, Z., Jie, Z., & Cao, Y. (2021). Identification of a Tumor Microenvironment-Related Gene Signature Indicative of Disease Prognosis and Treatment Response in Colon Cancer. Oxidative Medicine and Cellular Longevity, 2021, 6290261. https://doi.org/10.1155/2021/6290261 DOI: https://doi.org/10.1155/2021/6290261
Colaprico, A., Silva, T. C., Olsen, C., Garofano, L., Cava, C., Garolini, D., Sabedot, T. S., Malta, T. M., Pagnotta, S. M., Castiglioni, I., Ceccarelli, M., Bontempi, G., & Noushmehr, H. (2016). TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res, 44(8), e71. https://doi.org/10.1093/nar/gkv1507 DOI: https://doi.org/10.1093/nar/gkv1507
Dunne, P. D., & Arends, M. J. (2024). Molecular pathological classification of colorectal cancer-an update. Virchows Arch, 484(2), 273-285. https://doi.org/10.1007/s00428-024-03746-3 DOI: https://doi.org/10.1007/s00428-024-03746-3
Ellrott, K., Wong, C. K., Yau, C., Castro, M. A. A., Lee, J. A., Karlberg, B. J., Grewal, J. K., Lagani, V., Tercan, B., Friedl, V., Hinoue, T., Uzunangelov, V., Westlake, L., Loinaz, X., Felau, I., Wang, P. I., Kemal, A., Caesar-Johnson, S. J., Shmulevich, I. & Laird, P. W. (2024). Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets. Cancer Cell. https://doi.org/10.1016/j.ccell.2024.12.002 DOI: https://doi.org/10.1016/j.ccell.2024.12.002
Horpaopan, S., Kirfel, J., Peters, S., Kloth, M., Huneburg, R., Altmuller, J., Drichel, D., Odenthal, M., Kristiansen, G., Strassburg, C., Nattermann, J., Hoffmann, P., Nurnberg, P., Buttner, R., Thiele, H., Kahl, P., Spier, I., & Aretz, S. (2017). Exome sequencing characterizes the somatic mutation spectrum of early serrated lesions in a patient with serrated polyposis syndrome (SPS). Hereditary Cancer in Clinical Practice, 15, 22. https://doi.org/10.1186/s13053-017-0082-9 DOI: https://doi.org/10.1186/s13053-017-0082-9
Johnson, J., Thijssen, B., McDermott, U., Garnett, M., Wessels, L. F., & Bernards, R. (2016). Targeting the RB-E2F pathway in breast cancer. Oncogene, 35(37), 4829-4835. https://doi.org/10.1038/onc.2016.32 DOI: https://doi.org/10.1038/onc.2016.32
Latini, F. R., Hemerly, J. P., Freitas, B. C., Oler, G., Riggins, G. J., & Cerutti, J. M. (2011). ABI3 ectopic expression reduces in vitro and in vivo cell growth properties while inducing senescence. BMC Cancer, 11, 11. https://doi.org/10.1186/1471-2407-11-11 DOI: https://doi.org/10.1186/1471-2407-11-11
Libbrecht, M. W., & Noble, W. S. (2015). Machine learning applications in genetics and genomics. Nature Reviews Genetics, 16(6), 321-332. https://doi.org/10.1038/nrg3920 DOI: https://doi.org/10.1038/nrg3920
Liberzon, A., Birger, C., Thorvaldsdottir, H., Ghandi, M., Mesirov, J. P., & Tamayo, P. (2015). The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst, 1(6), 417-425. https://doi.org/10.1016/j.cels.2015.12.004 DOI: https://doi.org/10.1016/j.cels.2015.12.004
Lopez-Cortes, A., Cabrera-Andrade, A., Vazquez-Naya, J. M., Pazos, A., Gonzales-Diaz, H., Paz, Y. M. C., Guerrero, S., Perez-Castillo, Y., Tejera, E., & Munteanu, C. R. (2020). Prediction of breast cancer proteins involved in immunotherapy, metastasis, and RNA-binding using molecular descriptors and artificial neural networks. Scientific Report, 10(1), 8515. https://doi.org/10.1038/s41598-020-65584-y DOI: https://doi.org/10.1038/s41598-020-65584-y
Mounir, M., Lucchetta, M., Silva, T. C., Olsen, C., Bontempi, G., Chen, X., Noushmehr, H., Colaprico, A., & Papaleo, E. (2019). New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Computational Biology, 15(3), e1006701. https://doi.org/10.1371/journal.pcbi.1006701 DOI: https://doi.org/10.1371/journal.pcbi.1006701
Nong, B., Guo, M., Wang, W., Songyang, Z., & Xiong, Y. (2021). Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types. Genes (Basel), 12(12). https://doi.org/10.3390/genes12121865 DOI: https://doi.org/10.3390/genes12121865
Okoro, P. C., Schubert, R., Guo, X., Johnson, W. C., Rotter, J. I., Hoeschele, I., Liu, Y., Im, H. K., Luke, A., Dugas, L. R., & Wheeler, H. E. (2021). Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Advances, 2(2). https://doi.org/10.1016/j.xhgg.2020.100019 DOI: https://doi.org/10.1016/j.xhgg.2020.100019
Oshi, M., Takahashi, H., Tokumaru, Y., Yan, L., Rashid, O. M., Nagahashi, M., Matsuyama, R., Endo, I., & Takabe, K. (2020). The E2F Pathway Score as a Predictive Biomarker of Response to Neoadjuvant Therapy in ER+/HER2- Breast Cancer. Cells, 9(7). https://doi.org/10.3390/cells9071643 DOI: https://doi.org/10.3390/cells9071643
Palma, M., Lopez, L., Garcia, M., de Roja, N., Ruiz, T., Garcia, J., Rosell, E., Vela, C., Rueda, P., & Rodriguez, M. J. (2012). Detection of collagen triple helix repeat containing-1 and nuclear factor (erythroid-derived 2)-like 3 in colorectal cancer. BMC Clinical Pathology, 12, 2. https://doi.org/10.1186/1472-6890-12-2 DOI: https://doi.org/10.1186/1472-6890-12-2
Parr, C., & Jiang, W. G. (2003). Quantitative analysis of lymphangiogenic markers in human colorectal cancer. International Journal of Oncology, 23(2), 533-539. https://doi.org/10.3892/ijo.23.2.533 DOI: https://doi.org/10.3892/ijo.23.2.533
Qi, Y., Li, H., Lv, J., Qi, W., Shen, L., Liu, S., Ding, A., Wang, G., Sun, L., & Qiu, W. (2020). Expression and function of transmembrane 4 superfamily proteins in digestive system cancers. Cancer Cell Internation, 20, 314. https://doi.org/10.1186/s12935-020-01353-1 DOI: https://doi.org/10.1186/s12935-020-01353-1
Reimand, J., Isserlin, R., Voisin, V., Kucera, M., Tannus-Lopes, C., Rostamianfar, A., Wadi, L., Meyer, M., Wong, J., Xu, C., Merico, D., & Bader, G. D. (2019). Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, cytoscape and enrichmentMap. Nature Protocols, 14(2), 482-517. https://doi.org/10.1038/s41596-018-0103-9 DOI: https://doi.org/10.1038/s41596-018-0103-9
Saliba, J., Coutaud, B., Makhani, K., Epstein Roth, N., Jackson, J., Park, J. Y., Gagnon, N., Costa, P., Jeyakumar, T., Bury, M., Beauchemin, N., Mann, K. K., & Blank, V. (2022). Loss of NFE2L3 protects against inflammation-induced colorectal cancer through modulation of the tumor microenvironment. Oncogene, 41(11), 1563-1575. https://doi.org/10.1038/s41388-022-02192-2 DOI: https://doi.org/10.1038/s41388-022-02192-2
Sawicki, T., Ruszkowska, M., Danielewicz, A., Niedzwiedzka, E., Arlukowicz, T., & Przybylowicz, K. E. (2021). A review of colorectal cancer in terms of epidemiology, risk factors, development, symptoms and diagnosis. Cancers (Basel), 13(9). https://doi.org/10.3390/cancers13092025 DOI: https://doi.org/10.3390/cancers13092025
Siegel, R. L., Giaquinto, A. N., & Jemal, A. (2024). Cancer statistics, 2024. CA Cancer Journal for Clinicians, 74(1), 12-49. https://doi.org/10.3322/caac.21820 DOI: https://doi.org/10.3322/caac.21820
Siegel, R. L., Wagle, N. S., Cercek, A., Smith, R. A., & Jemal, A. (2023). Colorectal cancer statistics, 2023. CA: A Cancer Journal for Clinicians, 73(3), 233-254. https://doi.org/https://doi.org/10.3322/caac.21772 DOI: https://doi.org/10.3322/caac.21772
Son, H. J., Choi, E. J., Yoo, N. J., & Lee, S. H. (2020). Mutation and expression of a candidate tumor suppressor gene EPB41L3 in gastric and colorectal cancers. Pathology & Oncology Research, 26(3), 2003-2005. https://doi.org/10.1007/s12253-019-00787-x DOI: https://doi.org/10.1007/s12253-019-00787-x
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., & Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545-15550. https://doi.org/10.1073/pnas.0506580102 DOI: https://doi.org/10.1073/pnas.0506580102
Sundov, Z., Tomic, S., Alfirevic, S., Sundov, A., Capkun, V., Nincevic, Z., Nincevic, J., Kunac, N., Kontic, M., Poljak, N., & Druzijanic, N. (2013). Prognostic value of MVD, LVD and vascular invasion in lymph node-negative colon cancer. Hepatogastroenterology, 60(123), 432-438. https://doi.org/10.5754/hge12826
Tang, X., & Brindley, D. N. (2020). Lipid Phosphate Phosphatases and Cancer. Biomolecules, 10(9). https://doi.org/10.3390/biom10091263 DOI: https://doi.org/10.3390/biom10091263
Tomczak, K., Czerwinska, P., & Wiznerowicz, M. (2015). The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemporary Oncology (Pozn), 19(1A), A68-77. https://doi.org/10.5114/wo.2014.47136 DOI: https://doi.org/10.5114/wo.2014.47136
Viudez-Pareja, C., Kreft, E., & Garcia-Caballero, M. (2023). Immunomodulatory properties of the lymphatic endothelium in the tumor microenvironment. Frontiers Immunology, 14, 1235812. https://doi.org/10.3389/fimmu.2023.1235812 DOI: https://doi.org/10.3389/fimmu.2023.1235812
Walter Reed National Military Medical Center. (2024). Colorectal Cancer Awareness Month: Early detection is the best prevention. https://walterreed.tricare.mil/News-Gallery/Articles/Article/3719070/colorectal-cancer-awareness-month-early-detection-is-the-best-prevention#:~:text=According%20to%20the%20American%20Cancer,men%20and%2019%2C890%20in%20women).
Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10(1), 57-63. https://doi.org/10.1038/nrg2484 DOI: https://doi.org/10.1038/nrg2484
Xi, Y., & Xu, P. (2021). Global colorectal cancer burden in 2020 and projections to 2040. Translational Oncology, 14(10), 101174. https://doi.org/10.1016/j.tranon.2021.101174 DOI: https://doi.org/10.1016/j.tranon.2021.101174

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Mostafa Amir Hamza, Saiful Islam

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.