scRNA-seq Task-specific Methods

Our review encompasses 84 task-specific methods developed specifically for single-cell RNA sequencing analysis. These methods span eight key analytical workflows including denoising & imputation, dimension reduction, batch effect correction, cell clustering, cell annotation, trajectory inference, gene regulatory network (GRN) inference, and cross-species analysis. The following statistics provide an overview of the methodological landscape and reproducibility standards in the scRNA-seq field.

Distribution by Supervision Type

Learning Paradigms: Among the 84 scRNA-seq task-specific methods reviewed, unsupervised and self-supervised approaches collectively dominate with 69% (58/84 methods), reflecting the field's emphasis on discovering intrinsic cellular structures without extensive labeled references. Unsupervised methods lead at 55% (46 methods), particularly prevalent in foundational tasks like denoising & imputation (11 methods), dimension reduction (7 methods), and batch effect correction (12 methods). Self-supervised learning accounts for 14% (12 methods), while supervised (16%, 13 methods), semi-supervised (13%, 11 methods), and weakly-supervised (2%, 2 methods) approaches are primarily deployed in annotation tasks requiring reference labels.

Installation & Tutorial Availability

Reproducibility Support: Code accessibility remains strong with 97.6% (82/84) of methods providing public repositories. However, reproducibility support reveals room for improvement: 85.7% (72/84) provide installation instructions, 82.1% (69/84) include tutorials, and 82.1% (69/84) offer both documentation types. Notably, 12 methods lack both installation and tutorial documentation (scVGAE, cnnImpute, scIDPMs, Pathway-Constrained DNNs, sciLaMA, deepMNN, scMEDAL, scSemiCluster, scVQC, scGAD, TripletCell, scMultiomeGRN), representing 14.3% of the field, while an additional 3 methods (DeepBID, scMMT, scRegNet) provide only installation instructions without tutorials. This indicates ongoing challenges in standardizing reproducibility practices, though the majority of methods demonstrate strong commitment to accessibility and usability.

Table A: scRNA-seq Methods

💡 How to use: Click on any method name to expand and view detailed information including Model, Features, Experimental Profile, Installation, and Tutorials. The default view shows: Method, Application, Supervision, and Code links.

Method (Click to expand)	Application	Supervision	Code
scGNN	Denoising and imputation	Graph Neural Network with Multi-modal Autoencoders	Unsupervised	Explicitly models cell-cell relationships in a graph to inform imputation by aggregating information from neighboring cells.	Inputs: omics data scale: >10k cells Metrics: ARI:0.67–0.92 Pearson's:0.95	Link	Yes	Yes
scVGAE	Denoising and imputation	Variational Graph Autoencoder (VGAE) with ZINB Loss	Unsupervised	Integrates Graph Convolutional Networks into a ZINB-based VAE framework to preserve cell-cell similarity during imputation.	Inputs: omics data scale: 1,014–22,770 cells Metrics: ARI:0.184–0.797	Link
DeepImpute	Denoising and imputation	Divided Deep Neural Networks	Unsupervised	Fast and scalable "divide-and-conquer" strategy that learns gene-gene relationships to predict missing values.	Inputs: omics data scale: 100–50k cells Metrics: Pearson: 0.880–0.884	Link	Yes	Yes
DCA	Denoising and imputation	Autoencoder with ZINB Loss	Unsupervised	Specifically models scRNA-seq count distribution, overdispersion, and dropout rates simultaneously; highly scalable.	Inputs: omics data scale: 2,000 cells Metrics: Pearson's：0.8 Spearman：0.51	Link	Yes	Yes
AutoClass	Denoising and imputation	Autoencoder with an integrated Classifier	Self-supervised	Distribution-agnostic model that can effectively clean a wide range of noise types beyond dropouts without strong statistical assumptions.	Inputs: omics data scale: 182–7,162 cells Metrics: MSE:0.5–0.6 ARI:0.37–0.86 NMI：0.39–0.82	Link	Yes	Yes
scDHA	Denoising and imputation/Cell clustering	Hierarchical Autoencoder	Unsupervised	Provides a fast, precise, and complete analysis pipeline for robust feature extraction, denoising, and downstream analysis.	Inputs: omics data scale: 90–61,000 cells Metrics: R² = 0.93 ARI = 0.81 NMI：0.39–0.82	Link	Yes	Yes
SERM	Denoising and imputation	Neural Network with Data Self-Consistency	Unsupervised	Recovers high-fidelity expression values by learning from partial data and enforcing self-consistency, offering high computational efficiency.	Inputs: omics data scale: 2,000–599,926 cells Metrics: Pearson >0.9 Accuracy>0.8 NMI>0.75	Link	Yes	Yes
scNET	Denoising and imputation	Dual-view Graph Neural Network	Unsupervised	Integrates external biological knowledge (Protein-Protein Interaction networks) to learn context-specific gene and cell embeddings for improved imputation.	Inputs: omics data scale: 799–65,960 cells Metrics: AUPR:0.65–0.97 ARI:0.8–0.97	Link	Yes	Yes
cnnImpute	Denoising and imputation	1D Convolutional Neural Network (CNN)	Unsupervised	Uses a CNN to first predict dropout probability and then restore expression values, effectively capturing local gene patterns.	Inputs: omics data scale: 320–4,700 cells Metrics: AUPR:0.65–0.97 ARI:0.8–0.97
scAMF	Denoising and imputation	Manifold Fitting Module	Unsupervised	Denoises data by unfolding its distribution in the ambient space, causing cells of the same type to aggregate more tightly.	Inputs: omics data scale: 10³–10⁵ cells Metrics: ARI：0.78 Accuracy：57%→ 100%	Link	Yes	Yes
DGAN	Denoising and imputation	Deep Generative Autoencoder Network	Unsupervised	A variational autoencoder variant that robustly imputes data dropouts while simultaneously identifying and excluding outlier cells.	Inputs: omics data scale: 1,000–5,000 cells Metrics: ARI = 0.92 FMI = 0.89 Accuracy = 0.96	Link	Yes	Yes
ZILLNB	Denoising and imputation/Batch effect correction/Cell clustering	ZINB Regression with a Deep Generative Model	Unsupervised	Combines a ZINB likelihood with a deep generative model to explicitly handle zero inflation and overdispersion, producing denoised/imputed expression and a biologically meaningful latent space that supports high-quality cell clustering, while incorporating batch covariates to correct technical variation.	Inputs: omics data scale: 10⁴ cells Metrics: ARI ≈ 0.85–0.90 Accuracy ~0.9	Link	Yes	Yes
UniVI	Denoising and imputation	Mixture-of-experts β-VAE	Unsupervised	Denoises and imputes data across different modalities (e.g., scRNA-seq, scATAC-seq) via manifold alignment.	Inputs: omics data scale: 10⁴–10⁵cells Metrics: ARI > 0.9 R² ：0.85–0.9	Link	Yes	Yes
SCDD	Denoising and imputation	Cell-similarity diffusion + GCN-Autoencoder denoising	Unsupervised	A two-stage approach that first uses cell similarity for initial imputation and then a GCN-autoencoder to denoise the result and mitigate over-smoothing.	Inputs: omics data scale: 10²–10⁶cells Metrics: ARI：0.5–0.975 R²：0.999 MSE：0.061	Link	Yes	Yes
scIDPMs	Denoising and imputation	Conditional Diffusion Probabilistic Model	Unsupervised	Performs targeted imputation by first identifying likely dropout sites and then inferring values, which helps avoid altering true biological zeros.	Inputs: omics data scale: 10⁴cells Metrics: ARI：0.98 NMI：0.98 F–score：0.99	Link
scVI	Dimension reduction/Batch effect correction/Cross-Species Analysis/Cell clustering	VAE with ZINB loss function	Unsupervised	Learns a robust probabilistic latent space that disentangles biological variation from technical noise and batch effects, models batch identity as a covariate to yield a harmonized representation, extends to cross-species analysis by treating species as a batch effect, and enables high-quality indirect cell clustering through denoised, integrated latent embeddings.	Inputs: omics data scale: 3,000–1.3M cells Unsupervised Metrics: ASW ≈ 0.47 ARI ≈ 0.81 NMI ≈ 0.72 BE ≈ 0.6	Link	Yes	Yes
scGAE	Dimension reduction	Graph Autoencoder (GAE)	Unsupervised	Explicitly preserves the topological structure of the cell-cell similarity graph, improving trajectory inference and cluster separation.	Inputs: omics data scale: 10,000 cells Metrics: NMI:0.61–0.65	Link	Yes	Yes
totalVI	Dimension reduction/Batch effect correction/Cell clustering	VAE for Multi-modal Data	Unsupervised	Jointly models RNA and surface proteins to create a unified latent space for multi-omic analysis, simultaneously corrects batch effects in both modalities, and enables high-quality indirect cell clustering by providing robust, denoised, integrated embeddings.	Inputs: omics CITE–seq data scale: 32,648 cells Metrics: MAE ≈ 0.8 AUC ≈ 0.99 Latent Mixing Metric: –0.025	Link	Yes	Yes
SAUCIE	Dimension reduction/Cell clustering	Deep Sparse Autoencoder	Unsupervised	Performs multiple tasks simultaneously (dimensionality reduction, clustering, imputation, batch correction) within a single, unified framework.	Inputs: omics data scale: 11 million cells Metrics: Modularity：0.8531 AUC≈0.9342	Link	Yes	Yes
SIMBA	Dimension reduction	Multi-entity Graph Embedding	Unsupervised	Co-embeds cells and their defining features (e.g., genes) into a shared latent space, enabling a unified framework for diverse tasks like marker discovery and integration.	Inputs: omics data scale: million cells Metrics: ARI :0.6 –0.9	Link	Yes	Yes
GLUE	Dimension reduction	Graph-linked VAEs with Adversarial Alignment	supervised	Accurately integrates unpaired multi-omics data by explicitly modeling regulatory interactions with a guidance graph, ensuring scalability and robustness.	Inputs: omics data scale: > 17,000 cells Metrics: ARI :0.716 FI Score：0.802 AMI ≈ 0.778	Link	Yes	Yes
Pathway-Constrained DNNs	Dimension reduction	Deep Neural Network with Biologically-informed Architecture	Unsupervised	Enhances biological interpretability and reduces model complexity by designing network layers to correspond to known biological pathways.	Inputs: omics data scale: Millions of cells Metrics: ASW: 0.6–0.7 R² = 0.236	Link
CellBox	Dimension reduction	ODE-based Dynamic Systems Model	Supervised	Predicts cellular responses to unseen perturbations by learning a de novo, interpretable network of molecular interactions directly from data, without relying on prior pathway knowledge.	Inputs: omics data scale: 100 proteins Metrics: Pearson's Correlation：0.93	Link	Yes	Yes
sciLaMA	Dimension reduction	Paired-VAE with LLM Gene Embeddings	Unsupervised	Integrates static gene embeddings from LLMs to generate context-aware representations for both cells and genes, improving performance while maintaining computational efficiency.	Inputs: omics data scale: 14k cells Metrics: NMI:0.745 ASW :0.535 BatchASW:0.865	Link
Vaeda	Doublet removal	Cluster-aware VAE with Positive-Unlabeled (PU) Learning	Supervised	Provides a more nuanced separation of singlets and doublets by considering cell cluster information during representation learning.	Inputs: omics data scale: 12k cells Metrics: AUPRC :0.558 F1–score:0.496 Precision :0.59	Link	Yes	Yes
Solo	Doublet removal	Semi-supervised VAE	Supervised	Achieves high accuracy by learning the manifold of genuine single-cell profiles and then training a classifier to identify deviations (doublets).	Inputs: omics data scale: 44k cells Metrics: AP :0.489 AUROC :0.856	Link	Yes	Yes
deepMNN	Batch effect correction	Deep Learning with MNN and Residual Networks	Self-supervised	Integrates the logic of Mutual Nearest Neighbors (MNN) into a deep learning framework for one-step, multi-batch correction.	Inputs: omics data scale: 10³–10⁵ cells Metrics: ASW F1 Score: ~0.565 ARI: ~ 0.8	Link
STACAS	Batch effect correction	MNN-based Method	Semi-supervised	Leverages prior knowledge (cell type labels) to filter inconsistent anchors, improving the balance between batch correction and signal preservation.	Inputs: omics data scale: 10³–10⁵ cells semi–supervised Metrics: Clisi > 0.6 Cell type ASW > 0.4	Link	Yes	Yes
scGen	Batch effect correction/Cross-Species Analysis	VAE with Latent Space Arithmetic	Supervised	Models and removes batch effects by performing vector arithmetic on the latent representations of cells.Predicts cellular perturbation responses across species, demonstrating that latent space can bridge species differences.	Inputs: omics data scale: 105,476 cells Metrics: R2:0.85–0.95 ASW > 0.6	Link	Yes	Yes
scANVI	Batch effect correction/Cell clustering	Semi-supervised VAE	Supervised	Uses partial cell-type labels in a semi-supervised VAE to more accurately align shared populations across batches, enabling high-quality indirect clustering by first learning robust, denoised, and integrated latent representations.	Inputs: omics data scale: 10k cells Metrics: Weighted Accuracy: >0.8	Link	Yes	Yes
scMEDAL	Batch effect correction	Dual-Autoencoder System	Unsupervised	Separately models batch-invariant (fixed) and batch-specific (random) effects, enhancing interpretability and enabling retrospective analysis.	Inputs: omics data scale: 10⁴–10⁵cells Metrics: ASW = +0.69
ABC	Batch effect correction	Semi-supervised Adversarial Autoencoder	Semi-supervised	Guided by a cell type classifier to ensure the retention of biological signals during adversarial batch correction.	Inputs: omics data scale: 10⁴–10⁵cells Metrics: NMI ~ 0.91 Ilisi ~ 0.3	Link	Yes	Yes
CarDEC	Batch effect removal/Cell clustering	Generative Models with Integrated Clustering	Self-supervised	Performs clustering and batch effect removal jointly by optimizing a unified objective, producing batch-invariant embeddings and clear cluster assignments within a generative/multi-task framework that delineates cell subpopulations.	Inputs: omics data scale: 10³–10⁵cells Metrics: ARI：0.78–0.98 CV ~ 0	Link	Yes	Yes
DESC	Batch effect correction/Cell clustering	Deep Embedding and Clustering Models	Unsupervised	Performs batch effect correction and clustering jointly by optimizing a unified objective, co-optimizing representation learning and cluster assignment end-to-end to produce batch-invariant embeddings and more coherent cell groups.	Inputs: omics data scale: 10³–10⁶cells Metrics: ARI = 0.919–0.970 Accuracy：96.5% KL divergence:0.6	Link	Yes	Yes
scArches	Batch effect correction/Cell clustering/Cross-species analysis	Transfer Learning Framework	Supervised	Transfer-learning maps queries to a fixed reference without retraining, providing batch-corrected embeddings, atlas-level clustering/label transfer, and scalable cross-species mapping.	Inputs: omics data scale: Million cells Metrics: Batch ASW：0.5–0.7 ARI:0.8–0.9	Link	Yes	Yes
AIF	Batch effect correction	Adversarial Information Factorization	Unsupervised	Factorizes batch information from the biological signal using adversarial networks, without needing prior cell type knowledge.	Inputs: omics data scale: 30K cells Metrics: ASW：0.56–0.87 ARI:0.89–0.91	Link	Yes	Yes
DeepBID	Batch effect correction	NB-based Autoencoder with dual-KL loss	Unsupervised	Concurrently corrects batch effects and performs clustering through an iterative process guided by a dual-KL divergence loss.	Inputs: omics data scale: 10³–10⁶cells Metrics: ARI = 0.65–0.97 NMI = 0.72–0.98	Link	Yes
ResPAN	Batch effect correction	Wasserstein GAN with Residual Networks	Unsupervised	A powerful batch correction model that combines a WGAN with mutual nearest neighbor pairing for robust integration.	Inputs: omics data scale: 10³–10⁶cells Metrics: ARI =0.92681 NMI = 0.90775 cLISI：0.97093	Link	Yes	Yes
scDML	Batch effect correction	Deep Metric Learning	Self-supervised	Learns a batch-agnostic embedding space where distances between similar cells are minimized, regardless of batch origin.	Inputs: omics data scale: 10³–10⁶cells Metrics: ARI = 0.966 NMI = 0.934	Link	Yes	Yes
BERMAD	Batch effect correction	Multi-layer, Dual-channel Autoencoder	Self-supervised	Designed to preserve dataset-specific heterogeneity before alignment, mitigating the risk of over-correction.	Inputs: omics data scale: 10³–10⁵cells Metrics: ARI = 0.94 ± 0.00	Link	Yes	Yes
Portal	Batch effect correction	Adversarial Domain Translation Network	Unsupervised	Fast and scalable integration that avoids over-correction by adaptively distinguishing between shared and batch-unique cell types.	Inputs: omics data scale: 10⁵–10⁶cells Metrics: iLISI ~ 1	Link	Yes	Yes
scVAE	Cell clustering	Generative Models with Integrated Clustering	Unsupervised	Possess integrated capabilities to delineate cell subpopulations as part of their generative or multi-task framework.	Inputs: omics data scale: 10³–10⁶cells Metrics: ARI = 0.656 ± 0.039	Link	Yes	Yes
scDeepCluster	Cell clustering	Integrated Deep Clustering (AE + KL loss)	Unsupervised	Co-optimizes representation learning and cluster assignment in an end-to-end fashion for more coherent cell groups.	Inputs: omics data scale: 4,271 cells Metrics: ACC= 0.8100 NMI= 0.7736 ARI= 0.7841	Link	Yes	Yes
Cell BLAST	Cell annotation	Generative Model / Adversarial Autoencoder	Unsupervised	Provides a BLAST-like querying system for scRNA-seq data, using a learned, batch-corrected embedding to annotate cells and identify novel types.	Inputs: omics data scale: Million cells Metrics: MBA:0.873	Link	Yes	Yes
scSemiCluster	Cell annotation	Deep Clustering with Structural Regularization	Semi-supervised	Applies a semi-supervised deep clustering algorithm for annotation, regularized by data structure.	Inputs: omics data scale: 10⁵ cells Metrics: Accuracy： >97%。 ARI：≈ 0.95	Link
scBalance	Cell annotation	Sparse Neural Network with Adaptive Sampling	Supervised	Specialized tool that uses adaptive sampling techniques to enhance the identification of rare cell types.	Inputs: omics data scale: 10⁵ cells Metrics: Cohen's κ：0.95	Link	Yes	Yes
scTab	Cell annotation	Feature-attention Model for Tabular Data	Supervised	A scalable model trained on over 22 million cells, achieving robust cross-tissue annotation by focusing on relevant features.	Inputs: omics data scale: 15 million cells Metrics: Macro F1 = 0.7841 ± 0.0030	Link	Yes	Yes
scVQC	Cell annotation	Split-vector Quantization	Supervised	The first method to apply split-vector quantization to create discrete cellular representations that enhance cell type distinction.	Inputs: omics data scale: 10⁵ cells Metrics: Accuracy ： 0.86–0.95 ARI:0.82–0.88	Link
scNym	Cell annotation	Semi-supervised Adversarial Neural Network	Semi-supervised	Robustly transfers annotations across experiments by learning from both labeled reference and unlabeled query data.	Inputs: omics data scale: 10⁵ cells Metrics: Accuracy ： 90–92%	Link	Yes	Yes
CAMLU	Cell annotation	Hybrid Autoencoder + SVM	Semi-supervised	A hybrid framework that combines an autoencoder with a support vector machine, capable of identifying novel cell types.	Inputs: omics data scale: 2,400–3,800 cells Metrics: Accuracy ≈ 0.95 ARI ≈ 0.9	Link	Yes	Yes
TripletCell	Cell annotation	Deep Metric Learning (Triplet Loss)	Supervised	Learns a discriminative embedding space, enabling accurate annotation even across different samples or protocols.	Inputs: omics data scale: 10⁵ cells Metrics: Accuracy ≈ 80%	Link
scDeepSort	Cell annotation	Pre-trained Weighted Graph Neural Network (GNN)	Supervised	An early example of a pre-trained, weighted GNN designed for scalable and accurate cell type annotation.	Inputs: omics data scale: 265,489 cells Metrics: Accuracy：83.79% F1–score (95% CI)：0.47–0.68	Link	Yes	Yes
mtANN	Cell annotation	Ensemble of Models	Supervised	Improves annotation accuracy by integrating multiple reference datasets and can identify previously unseen cell types.	Inputs: omics data scale: 10⁵ cells Metrics: Pearson > 0.9 AUPRC ≈ 0.6	Link	Yes	Yes
scGAD	Cell annotation	Anchor-based Self-supervised Framework	Semi-supervised & self-supervised	Solves the generalized annotation task by simultaneously annotating seen cell types from a reference and discovering/clustering novel cell types in the query data.	Inputs: omics data scale: 10⁵ cells Metrics: Accuracy >90%	Link
Cellassign	Cell annotation	Probabilistic Model with Marker Genes	Weakly supervised	Assigns cell types based on a predefined matrix of marker genes, making it highly effective and interpretable in specific contexts.	Inputs: omics data scale: 1,000–20,000 cells Metrics: Accuracy = 0.944 F1–score = 0.943	Link	Yes	Yes
Celler	Cell annotation	Genomic Language Model	Supervised	Specifically designed with mechanisms to address the long-tail distribution problem for improved annotation of rare cells.	Inputs: omics data scale: 10⁷ cells Metrics: F1 = 0.956 Precision = 0.841 ± 0.002	Link	Yes	Yes
scMMT	Cell annotation	Multi-use CNN Framework	Supervised	A flexible multi-task framework that performs cell annotation alongside other tasks like protein prediction.	Inputs: omics data scale: 10⁵cells Metrics: Accuracy ≈ 0.85 ARI = 0.945	Link	Yes	Yes
TOSICA	Cell annotation	Transformer	Supervised	Performs interpretable annotation guided by biological entities such as pathways and regulons.	Inputs: omics data scale: 647366 cells Metrics: Accuracy =0.8669	Link	Yes	Yes
RegFormer	Cell annotation	Mamba-based Architecture with GRN Hierarchies	Self-supervised	A FM that integrates gene regulatory network hierarchies to enhance interpretability and performance.	Inputs: omics data scale: 10⁶ cells Metrics: Accuracy = 0.86 Macro–F1 = 0.77	Link	Yes	Yes
GPTCelltype	Cell annotation	Large Language Model (GPT-4)	Self-supervised	Demonstrates that large models can accurately infer cell types simply by interpreting lists of marker genes, automating the process.	Inputs: omics data scale: 10⁵ cells Metrics: accuracy ：0.75–0.93	Link	Yes	Yes
DeepVelo	Trajectory Inference and Pseudotime Analysis	Deep Learning Framework	Self-supervised	Extends RNA velocity analysis to complex, multi-lineage systems where traditional methods often fail.	Inputs: omics data scale: 10⁴ cells Metrics: Consistency Score :0.9	Link	Yes	Yes
VeloVI	Trajectory Inference and Pseudotime Analysis	Deep Generative Model (VAE)	Unsupervised	Provides crucial transcriptome-wide uncertainty quantification for the inferred cellular dynamics, enhancing reliability.	Inputs: omics data scale: 10³–10⁴ cells Metrics: accuracy: 66–68%	Link	Yes	Yes
scTour	Trajectory Inference and Pseudotime Analysis	VAE with Neural ODE	Unsupervised	Learns the vector field of cellular transitions and provides interpretability mechanisms to reveal driver genes.	Inputs: omics data scale: 10³–10⁵ cells Metrics: Spearman ρ > 0.9	Link	Yes	Yes
VITAE	Trajectory Inference and Pseudotime Analysis	VAE with a Latent Hierarchical Mixture Model	Unsupervised	Enables joint trajectory inference from multiple datasets and provides robust uncertainty quantification.	Inputs: omics data scale: 10³–10⁶ cells Metrics: ARI :0.5~0.9 PDT:0.4~0.9	Link	Yes	Yes
TrajectoryNet	Trajectory Inference and Pseudotime Analysis	Dynamic Optimal Transport Network	Unsupervised	Employs a dynamic optimal transport network to learn the continuous flow of cells over time.	inputs: omics data scale: 10³–10⁵ cells Metrics: Base TrajectoryNet：≈ 0.897 Arch MSE = 0.300 Cycle MSE = 0.190	Link	Yes	Yes
TIGON	Trajectory Inference and Pseudotime Analysis	Optimal Transport with Growth/Death Models	Unsupervised	Reconstructs both population dynamics and state transition trajectories simultaneously by incorporating cell growth and death.	Inputs: omics data scale: 5,000+ cells Metrics: Pearson = 0.62 AUROC ≈ 0.9	Link	Yes	Yes
GeneTrajectory	Trajectory Inference and Pseudotime Analysis	Optimal Transport on a Cell-Cell Graph	Unsupervised	A novel gene-centric paradigm that infers trajectories of genes, allowing it to deconvolve concurrent biological programs.	Inputs: omics data scale: 1,000–10,500 cells Metrics: Robustness ≈ 1 Spearman ≈ 0.9	Link	Yes	Yes
DeepSEM	GRN inference	Deep Generative Model for SEMs	Unsupervised	A pioneering work that generalized linear structural equation models (SEMs) for GRN inference using a deep generative model.	Inputs: omics data scale: 1,000–10,500 cells Metrics: ARI ≈ 0.82 NMI ≈ 0.86	Link	Yes	Yes
CellOracle	GRN inference	GRN Inference with In Silico Perturbation	Unsupervised	Integrates scRNA/ATAC-seq and performs in silico perturbation simulations to predict the functional consequences of TF activity.	Inputs: omics data scale: 10³–10⁵ cells Metrics: AUROC = 0.66–0.85	Link	Yes	Yes
LINGER	GRN inference	GRN Inference with Regularization	Unsupervised	Enhances inference by incorporating atlas-scale external bulk genomics data and TF motif knowledge as regularization.	Inputs: omics data scale: 10³–10⁴ cells Metrics: AUC = 0.76 AUPR = 2.60	Link	Yes	Yes
scMultiomeGRN	GRN inference	Cross-modal Attention Model	Semi-supervised	Specifically designed for multi-omics integration using modality-specific aggregators and cross-modal attention.	Inputs: omics data scale: 10³–10⁵ cells Metrics: Accuracy: >0.83 AUROC ≈ 0.924 AUPR ≈ 0.79	Link
scMTNI	GRN inference	Multi-task Learning	Unsupervised	Infers cell-type-specific GRNs along developmental lineages from multi-omic data.	Inputs: omics data scale: 10³ cells Metrics: Accuracy: >0.83 F–score >0.3 AUPR : 0.21~0.27	Link	Yes	Yes
GRN-VAE	GRN inference	VAE-based GRN Model	Unsupervised	Improves upon the stability and efficiency of earlier generative models like DeepSEM for GRN inference.	Inputs: omics data scale: 10⁵ cells Metrics: AUPRC > 1	Link	Yes	Yes
GRANGER	GRN inference	Recurrent VAE	Unsupervised	Infers causal relationships from time-series scRNA-seq data to capture the dynamic nature of GRNs.	Inputs: omics data scale: 10³ cells Metrics: AUROC ≈ 0.85–0.90 AUPRC ≈ 0.90–0.98	Link	Yes	Yes
scGeneRAI	GRN inference	Explainable AI (XAI) Model	Unsupervised/self-supervised	Employs XAI techniques to infer interpretable, cell-specific regulatory networks, addressing the "black box" problem.	Inputs: omics data scale: 15,000 cells Metrics: AUC = 0.75–0.88	Link	Yes	Yes
scGREAT / InfoSEM	GRN inference	LLM-integrated Models	Supervised	Incorporate textual gene embeddings from large language models as an informative prior to improve GRN inference.	Inputs: omics data scale: thousands of cells Metrics: AUROC = 0.913 AUPRC = 0.5597	Link	Yes	Yes
scRegNet	GRN inference	FM+ GNN	Supervised	Combines the power of single-cell FMs with GNNs to predict regulatory connections.	Inputs: omics data scale: 800–1,000 cells Metrics: AUROC :0.93 AUPRC: 0.86	Link	Yes
DigNet / RegDiffusion	GRN inference	Diffusion Models	Unsupervised	Conceptualize network inference as a reversible denoising process, representing a new wave of generative frameworks for GRN inference.	Inputs: omics data scale: thousands of cells Metrics: AUPRC：up 19–32%	Link	Yes	Yes
GRNFormer	GRN inference	Graph Transformer	Semi-supervised	Uses a sophisticated graph transformer pipeline to infer regulatory relationships with high accuracy.	Inputs: omics data scale: 500–5,900 genes Metrics: AUROC/AUPRC：0.90–0.98	Link	Yes	Yes
GeneCompass	Cross-Species Analysis	Knowledge-informed Transformer (FM)	Self-supervised	A large-scale model pre-trained on human and mouse cells to decipher universal gene regulatory mechanisms for cross-species tasks.	Inputs: omics data scale: 126M cells Metrics: AUC≈.0.95 Annotations accuracy：0.84–0.87	Link	Yes	Yes
CACIMAR	Cross-Species Analysis	Weighted Sum Model	Self-supervised / unsupervised	Systematically quantifies the conservation score of cell types, markers, and interactions based on homologous features.	Inputs: omics data scale: 80,777 cells Metrics: R2 > 0.66	Link	Yes	Yes
Nvwa	Cross-Species Analysis	Deep Learning on DNA Sequences	Self-supervised / unsupervised	Predicts cell-specific gene expression from DNA sequences, allowing it to identify conserved regulatory programs across species.	Inputs: omics data scale: 635k cells Metrics: AUROC = 0.78 AUPR = 0.59	Link	Yes	Yes
CAME	Cross-Species Analysis	Heterogeneous Graph Neural Network (GNN)	Self-supervised / unsupervised	Directly assigns cell types across species from scRNA-seq data and provides quantitative assignment probabilities.	Inputs: omics data scale: Million cells Metrics: Accuracy ≈ 0.87	Link	Yes	Yes
SATURN	Cross-Species Analysis	Protein Language Model (PLM) Integration	Weakly supervised	Enables cell alignment based on functional protein similarity, which is often more conserved across species than gene sequences.	Inputs: omics data scale: 335,000 cells Metrics: accuracy ≈ 0.8 ARI / NMI > 0.8	Link	Yes	Yes

📊 Analysis Summary

Total Methods Reviewed: 84
Primary Applications (by frequency):
- Cell Annotation (21 methods, 25%)
- Denoising & Imputation (16 methods, 19%)
- Batch Effect Correction (16 methods, 19%)
- GRN Inference (11 methods, 13%)
- Dimension Reduction (9 methods, 11%)
- Trajectory Inference (7 methods, 8%)
- Cross-Species Analysis (6 methods, 7%)
- Cell Clustering (6 methods, 7%)
- Doublet Removal (2 methods, 2%)
Supervision Distribution: Unsupervised (46 methods, ~55%), Self-supervised (12 methods, ~14%), Semi-supervised (11 methods, ~13%), Supervised (13 methods, ~16%), Weakly-supervised (2 methods, ~2%)
Unsupervised + Self-supervised: 58/84 (69%)
Code Availability: 82/84 (97.6%) link to public repositories
Installation Docs: 72/84 (85.7%)
Tutorials: 69/84 (82.1%)
Both Install + Tutorial: 69/84 (82.1%)
Multi-task Methods: 8 methods (10%) address multiple applications simultaneously (e.g., scVI, totalVI, SAUCIE, scDHA, ZILLNB, CarDEC, DESC, scArches)
Notable Trends:
- Foundation model integration emerging in annotation (scTab: 22M cells, Celler: 10M cells) and GRN inference (scGREAT, scRegNet)
- VAE-based architectures dominate across tasks (26 methods, 31%)
- GNN-based approaches increasingly popular for preserving cell-cell relationships (12 methods, 14%)
- Growing adoption of diffusion models (scIDPMs, DigNet, RegDiffusion) and transformers (TOSICA, GRNFormer) in recent methods