Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Uses for High-Throughput Platforms and Big Data in Engineering and Learning Biological Systems

Abstract

Despite immense growth in our biological knowledge over the past decades, purely knowledge-based rational approaches to metabolic engineering, protein engineering, and cancer prognosis have showed limited success. Instead, tools such as directed evolution and machine learning have greatly accelerated the pace of engineering and learning biological systems in the face of incomplete information. In this work, existing tools to engineer enzymes and shed light on the biochemical basis of cancer prognosis were utilized and built upon. In the first section, the focus is on keto acid decarboxylase (Kdc), a key enzyme in producing keto acid derived higher alcohols such as isobutanol. Kdc has no highly active yet thermostable variant in nature. The only reported Kdc activity is 2 orders of magnitude less active than the most active Kdc’s found in mesophiles. Therefore, isobutanol production temperature is limited by the thermostability of mesophilic Kdc enzyme variants. By configuring a high-throughput platform to parallelize the task of applying our directed evolution scheme on enzyme variants, thermostable 2-ketoisovalerate decarboxylase (Kivd) variants were developed. The top variants were recombined and further computationally directed protein design was applied to improve thermostability. Compared to wild-type Kivd, the final thermostable variant has 10.5-fold increased residual activity after 1h preincubation at 60 degrees Celsius, a 13 degrees Celsius increase in melting temperature and an over 4-fold increase in half-life at 60 degrees Celsius.

In the next section, the focus is on the relationship between current histopathology-based prognostic factors for endometrial cancer and their molecular features. Such information could speed progress on a revised classification system that may provide more accurate prognoses. Starting from predefined biochemical relationships, machine learning classifiers incorporated into a heuristic search strategy were used to identify small gene sets consisting of 3 genes from an endometrial cancer mRNA expression dataset that could predict prognostic factors. Cross-validated prediction accuracies obtained are 80% for overall survival at 5 years, 78% for progression-free survival at 5 years, 77% for European Society for Medical Oncology risk classification, 82% for histological grade, and 91% for histology type among high grade tumors. Predictive accuracy was evaluated on approximately 1.6 to 2 million two-gene and three-gene sets across all five prognostic factors. A statistically significant difference in overall survival and progression-free survival was identified when the most predictive gene sets were used to separate patient groups in a Kaplan-Meier survival analysis. These small non-canonical gene sets are expected to reveal the underlying endometrial cancer biochemistry and could serve as candidate biomarkers with further investigation and clinical validation. The methods, results and discussion contained in this work contributes to the growing number of uses for high-throughput platforms and big data sets in engineering and learning biological systems.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View