Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Compressive Microbial Pangenomics

No data is associated with this publication.
Abstract

A rapid reduction in genomic sequencing costs has resulted in an ever-increasing amount of genomic sequence data. This explosion of genomic data has enhanced the possibilities in the study of Pangenomics, which is the collective study of genomic data within a species, and made it more attractive than ever before. However, working with large amounts of genomic data comes with a set of computational, interpretation and scalability challenges. Several pangenomic data structures have been introduced to address these challenges and offer tools for the analysis of pangenomic data. Most of these data structures are graph-based, and while they effectively highlight the homology and variations in genomic data, they do not represent the evolutionary relationships between the samples and the mutational histories of the sequences in a pangenome. Some other data structures that do represent these pieces of information use lossy formats, which means that the original raw sequences cannot be completely retrieved from them. Another shortcoming of many of these pangenomic data structures is that they do not effectively exploit the redundancies in genomic data, and therefore could be made more efficient by accounting for these redundancies.During my research, I worked on PanMAT, a lossless pangenomic data structure that represents microbial pangenomes efficiently and represents evolutionary relationships as well as mutational histories. PanMAT exploits the technique of ‘evolutionary compression’ due to which, it is not only information rich, but is also the most efficient and scalable pangenomic data structure. I also worked on an extension of PanMAT called PanMAN that allows us to also represent complex mutations such as recombinations and horizontal gene transfers (HGTs). I also developed a utility for PanMAT that can be used to extract various effective pieces of information from a PanMAT. PanMAT and PanMAN allow for efficient storage and analysis of pangenomic data with consideration to details about evolutionary history.

Main Content

This item is under embargo until March 28, 2025.