Author ORCID Identifier

https://orcid.org/0000-0002-6802-8021

Document Type

Thesis

Date of Award

2022

Degree Name

Master of Science (MS)

Department

Biomedical Engineering

First Advisor

Etienne Z Gnimpieba

Abstract

Biofilm formation occurs in the attachment, colony, maturation, and dispersion stages. Understanding the molecular basis at every point of this process is essential to developing efficient diagnostics devices and effective antibiofilm agents. Gene expression data provide molecular insight for both static and temporal biofilm development. The most used analytic techniques for biofilm gene expression data are clustering and network inference algorithms, which class genes with similar expressions across the samples. However, these methods are inherently deficient because they do not capture gene(s) expressed in a subset of the samples. These subsets might be unique to a developmental stage, for example. Secondly, these methods perform a nonoverlapping gene assignment to the classes. This also leads to loss of information because gene expression is combinatorial, and a gene product can simultaneously participate more or less in different pathways. In this study, I developed an analysis Framework referred BiofilmGeneSet to classify genes significantly contributing to biofilm developmental stages. I applied the JADE algorithm to Expression data (X) to extract statistically independent expression modules (S) and their module activity (A). Next, Pearson correlation coefficients between the module activity and expression profile were computed to determine significant modules. BioNERO: an all-in-one Bioconductor package for comprehensive and easy biological network reconstruction was applied to the same data to evaluate the performance of this workflow. Of the 15 independent expression modules, modules 14, 11, and 4 were significantly associated with the attachment, colony, and maturation stages. The significance of this work can be summarized as follows: (i) a new data mining and expression gene classification framework with high accuracy compared to weighted gene co-expression network methods for problem-based gene set identification; (ii) a new gene set as a potential biomarker for each biofilm development stage; (iii) the generalization of our framework allows us to find gene sets relevant to several other related biological events such as quorum sensing, EPS, antibiotic resistance, etc.; (iv) a relevant functional annotation that will guide scientist in designing an experiment to validate our newly discovered marker gene sets.

Subject Categories

Bioinformatics

Keywords

Biofilm, Class Discovery, Clustering, Independent Component Analysis, RNA-Seq

Number of Pages

83

Publisher

University of South Dakota

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.