Dhaka: variational autoencoder for unmasking tumor heterogeneity from single cell genomic data
- Sabrina Rashid (CMU) ,
- Ziv Bar-Joseph (CMU) ,
- Sohrab Shah (BC Cancer) ,
- Ravi Pandya
Bioinformatics |
Here we describe ‘Dhaka’, a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and six single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.