Author ORCID Identifier

Document Type


Date of Award


Degree Name

Doctor of Philosophy (PhD)


Biomedical Engineering

First Advisor

Etienne Gnimpieba


One approach to interrogating the complexities of human systems in their well-regulated and dysregulated states is through the use of digital twins. Digital twins are virtual representations of physical systems that are descriptive of an individual's state of health, an object fundamentally related to precision medicine. A key element for building a functional digital twin type for a disease or predicting the therapeutic efficacy of a potential treatment is harmonized, machine-parsable domain knowledge. Hypothesis-driven investigations are the gold standard for representing subsystems, but their results encompass a limited knowledge of the full biosystem. Multi-omics data is one rich source of knowledge for characterizing disease- and therapy-induced shifts across the systems biology landscape. However, systematic biases in and between the data types limits the functionality of big multi-omics data. In this dissertation, the generation of and results from transcriptomic analysis pipelines are assessed in their biological context and respective to their usability for applications such as digital twins. This latter is achieved by assessing the adherence of the workflows to the FAIR principles --- Findability, Accessibility, Interoperability, and Reusability --- and the extent to which they connect to the broader systems biology landscape. The first two specific aims of this work emphasize the transcriptomic shifts induced by atypical teratoid rhabdoid tumors (ATRT) relative to the normal brain and those induced by treatment of tumor models by 4SC-202 across disease states including medulloblastoma, ATRT, triple negative breast cancer, osteosarcoma, and pancreatic cancer. These are problem-driven workflows, tightly connected to biological hypotheses that contribute to disease and therapy-specific domain knowledge. In contrast, the third specific aim introduces a domain-agnostic approach for developing transcriptomic pipelines to harmonize bulk RNA-sequencing datasets. This framework does not directly contribute to a given biological domain, but instead provides a generalized approach for integrating large RNA-sequencing datasets and assessing the resultant representation for biological meaningfulness. This harmonization framework may also have utility in assessing the clinical relevance of in vitro biomodels. Collectively, this work presents and assesses the efficacy of multiple transcriptomic workflows within their biological context and broader machine learning applicability.

Subject Categories

Bioinformatics | Biomedical Engineering and Bioengineering | Biostatistics


Bioinformatics, Data mining, Digital twin, FAIR, Machine learning, Systems biology

Number of Pages



University of South Dakota



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.