posted on 2024-06-20, 10:07authored byArthur Radley, Stefan Boeing, Austin Smith
Analysis of single cell transcriptomics (scRNA-seq) data is typically performed after sub-setting to highly variable genes (HVGs). Here we show that Entropy Sorting provides an alternative mathematical framework for feature selection. On synthetic datasets, continuous entropy sort feature weighting (cESFW) outperforms HVG selection in distinguishing cell state specific genes. We apply cESFW to six merged scRNA-seq datasets spanning human early embryo development. Without smoothing or augmenting the raw counts matrices, cESFW generates a high-resolution embedding displaying coherent developmental progression from 8-cell to post-implantation stages and delineating 15 distinct cell states. The embedding highlights sequential lineage decisions during blastocyst development while unsupervised clustering identifies branch point populations obscured in previous analyses. The first branching region, where morula cells become specified for inner cell mass or trophectoderm, includes cells previously asserted to lack a developmental trajectory. We quantify the relatedness of different pluripotent stem cell cultures to distinct embryo cell types and identify marker genes of naïve and primed pluripotency. Finally, by revealing genes with dynamic lineage-specific expression we provide markers for staging progression from morula to blastocyst.
Funding
Crick (Grant ID: CC1107, Grant title: STP Bioinformatics & Biostatistics)