September 2022 - December 2022
Source: https://www.genome.gov/genetics-glossary/acgt
Aging is a biological process that is still a conundrum yet to solve. Researchers have already worked out age can be predicted from external signs like wrinkles, sagging cheeks, etc. But, the internal human age tells a different story. Can we predict human age given the gene expression data of a patient? That is the question we sought to answer.
Our data comprises single cell RNA-seq expression from 17658 genes x 189423 cells, from 69 patients. Initially we conducted a simple linear regression on single cell RNA-seq data to see how well we can conduct regression analysis and predicted ages with a mean absolute error of 11.21.
Can we improve this using deep learning models maybe? Our first round of results run on simple Multi-layer Perceptrons with tuned hyper-parameters did not get us the results we expected as it turned out to be slightly worse than the linear regression model.
Maybe, dimensionality reduction would help us get better results? So, we implemented Non-negative Matrix Factorization, Standard (Gaussian) Variational Autoencoder, Poisson Variational Autoencoder and De-noising Criterion with Variational Autoencoder. And as we expected Poisson VAE got us much better results than any of the techniques we implemented! This is mainly because the likelihood function (here, poisson) helped in modeling the huge sparse data much better than others.
Do check out our complete report, linked in the footnotes, to refer to the deep learning architecture used in more detail.
PyTorch, Skorch, Scikit-learn, AWS Cloud, GPU
Mirudhula Mukundan and Qiao Su