Member-only story
Selection of Highly Variable Genes (HVG’s) in scRNA-seq datasets
A take on Heteroskedasticity in scRNA-seq data-sets

For context, when performing Single-Cell RNA sequencing analysis, a pivotal step is finding the Highly Variable Genes or HVGs. These HVGs are important as they directly influence the downstream analysis steps such as clustering.
Over the years, many methods have been developed to select for HVGs, however, it turns out that there is an intrinsic aspect pertaining to these data that must be corrected before HVGs can be properly selected.
Why do we need to select for HVGs in the first place?
In reality, there are tens and thousands of genes that are sequenced in each cell, however, the underlying challenge is the sparse nature of the data generated by single-cell experiments (Most cells have zero counts associated with a given gene). These zeros are primarily derived from drop-out events alongside other technical limitations of the technology.
In addition, a majority of the genes across all cells are highly correlated. Therefore, it makes sense to focus on genes that are highly variable across cells (typically top 2000 - top 5000 genes). These are the genes that drive the main signal in our dataset. Selecting for HVGs not only makes the data less sparse in comparison to the original count matrix, but also facilitates the downstream computational steps to be more efficient.
The problem:

It may be easier to conceptually understand the problem through visualization. Let's take a look at the left panel in the above figure. In this panel, we have plotted the relationship of a gene’s average expression with its observed variance. Each dot here is a gene, the x-axis is the average expression of that gene and the y-axis is the observed variance associated with that gene.
In this panel, we observe that there is a very strong positive relationship between a gene’s average expression and its observed variance. In other words, highly expressed genes have high variances associated with them and vice…