Developing Bottom-Up, Integrated Omics Methodologies for Big Data Biomarker Discovery
Date
Authors
Language
Embargo Lift Date
Department
Committee Chair
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
The availability of highly-distributed computing compliments the proliferation of next generation sequencing (NGS) and genome-wide association studies (GWAS) datasets. These data sets are often complex, poorly annotated or require complex domain knowledge to sensibly manage. These novel datasets provide a rare, multi-dimensional omics (proteomics, transcriptomics, and genomics) view of a single sample or patient. Previously, biologists assumed a strict adherence to the central dogma: replication, transcription and translation. Recent studies in genomics and proteomics emphasize that this is not the case. We must employ big-data methodologies to not only understand the biogenesis of these molecules, but also their disruption in disease states. The Cancer Genome Atlas (TCGA) provides high-dimensional patient data and illustrates the trends that occur in expression profiles and their alteration in many complex disease states. I will ultimately create a bottom-up multi-omics approach to observe biological systems using big data techniques. I hypothesize that big data and systems biology approaches can be applied to public datasets to identify important subsets of genes in cancer phenotypes. By exploring these signatures, we can better understand the role of amplification and transcript alterations in cancer.