Author ORCID Identifier
Document Type
Thesis
Date of Award
2022
Degree Name
Master of Science (MS)
Department
Biomedical Engineering
First Advisor
Etienne Z Gnimpieba
Abstract
Biofilm formation occurs in the attachment, colony, maturation, and dispersion stages. Understanding the molecular basis at every point of this process is essential to developing efficient diagnostics devices and effective antibiofilm agents. Gene expression data provide molecular insight for both static and temporal biofilm development. The most used analytic techniques for biofilm gene expression data are clustering and network inference algorithms, which class genes with similar expressions across the samples. However, these methods are inherently deficient because they do not capture gene(s) expressed in a subset of the samples. These subsets might be unique to a developmental stage, for example. Secondly, these methods perform a nonoverlapping gene assignment to the classes. This also leads to loss of information because gene expression is combinatorial, and a gene product can simultaneously participate more or less in different pathways. In this study, I developed an analysis Framework referred BiofilmGeneSet to classify genes significantly contributing to biofilm developmental stages. I applied the JADE algorithm to Expression data (X) to extract statistically independent expression modules (S) and their module activity (A). Next, Pearson correlation coefficients between the module activity and expression profile were computed to determine significant modules. BioNERO: an all-in-one Bioconductor package for comprehensive and easy biological network reconstruction was applied to the same data to evaluate the performance of this workflow. Of the 15 independent expression modules, modules 14, 11, and 4 were significantly associated with the attachment, colony, and maturation stages. The significance of this work can be summarized as follows: (i) a new data mining and expression gene classification framework with high accuracy compared to weighted gene co-expression network methods for problem-based gene set identification; (ii) a new gene set as a potential biomarker for each biofilm development stage; (iii) the generalization of our framework allows us to find gene sets relevant to several other related biological events such as quorum sensing, EPS, antibiotic resistance, etc.; (iv) a relevant functional annotation that will guide scientist in designing an experiment to validate our newly discovered marker gene sets.
Subject Categories
Bioinformatics
Keywords
Biofilm, Class Discovery, Clustering, Independent Component Analysis, RNA-Seq
Number of Pages
83
Publisher
University of South Dakota
Recommended Citation
Alaba, Mathew Olakunle, "BiofilmGeneSet: Leveraging Multi-Omics Data Mining and ICA To Discover Biofilm Stage Genes of Interest from Condition-Specific Expression Dataset" (2022). Dissertations and Theses. 98.
https://red.library.usd.edu/diss-thesis/98