scRNA-seq Task-specific Methods

Our review encompasses 84 task-specific methods developed specifically for single-cell RNA sequencing analysis. These methods span eight key analytical workflows including denoising & imputation, dimension reduction, batch effect correction, cell clustering, cell annotation, trajectory inference, gene regulatory network (GRN) inference, and cross-species analysis. The following statistics provide an overview of the methodological landscape and reproducibility standards in the scRNA-seq field.

Distribution by Supervision Type

Distribution by Supervision Type

Learning Paradigms: Among the 84 scRNA-seq task-specific methods reviewed, unsupervised and self-supervised approaches collectively dominate with 69% (58/84 methods), reflecting the field's emphasis on discovering intrinsic cellular structures without extensive labeled references. Unsupervised methods lead at 55% (46 methods), particularly prevalent in foundational tasks like denoising & imputation (11 methods), dimension reduction (7 methods), and batch effect correction (12 methods). Self-supervised learning accounts for 14% (12 methods), while supervised (16%, 13 methods), semi-supervised (13%, 11 methods), and weakly-supervised (2%, 2 methods) approaches are primarily deployed in annotation tasks requiring reference labels.

Installation & Tutorial Availability

Installation & Tutorial Availability

Reproducibility Support: Code accessibility remains strong with 97.6% (82/84) of methods providing public repositories. However, reproducibility support reveals room for improvement: 85.7% (72/84) provide installation instructions, 82.1% (69/84) include tutorials, and 82.1% (69/84) offer both documentation types. Notably, 12 methods lack both installation and tutorial documentation (scVGAE, cnnImpute, scIDPMs, Pathway-Constrained DNNs, sciLaMA, deepMNN, scMEDAL, scSemiCluster, scVQC, scGAD, TripletCell, scMultiomeGRN), representing 14.3% of the field, while an additional 3 methods (DeepBID, scMMT, scRegNet) provide only installation instructions without tutorials. This indicates ongoing challenges in standardizing reproducibility practices, though the majority of methods demonstrate strong commitment to accessibility and usability.

Table A: scRNA-seq Methods

💡 How to use: Click on any method name to expand and view detailed information including Model, Features, Experimental Profile, Installation, and Tutorials. The default view shows: Method, Application, Supervision, and Code links.

Method (Click to expand) Application Supervision Code
scGNNDenoising and imputationGraph Neural Network with Multi-modal AutoencodersUnsupervisedExplicitly models cell-cell relationships in a graph to inform imputation by aggregating information from neighboring cells.Inputs: omics
data scale:
>10k cells
Metrics:
ARI:0.67–0.92
Pearson's:0.95
LinkYesYes
scVGAEDenoising and imputationVariational Graph Autoencoder (VGAE) with ZINB LossUnsupervisedIntegrates Graph Convolutional Networks into a ZINB-based VAE framework to preserve cell-cell similarity during imputation.Inputs: omics
data scale:
1,014–22,770 cells
Metrics:
ARI:0.184–0.797
Link
DeepImputeDenoising and imputationDivided Deep Neural NetworksUnsupervisedFast and scalable "divide-and-conquer" strategy that learns gene-gene relationships to predict missing values.Inputs: omics
data scale:
100–50k cells
Metrics:
Pearson: 0.880–0.884
LinkYesYes
DCADenoising and imputationAutoencoder with ZINB LossUnsupervisedSpecifically models scRNA-seq count distribution, overdispersion, and dropout rates simultaneously; highly scalable.Inputs: omics
data scale:
2,000 cells
Metrics:
Pearson's:0.8
Spearman:0.51
LinkYesYes
AutoClassDenoising and imputationAutoencoder with an integrated ClassifierSelf-supervisedDistribution-agnostic model that can effectively clean a wide range of noise types beyond dropouts without strong statistical assumptions.Inputs: omics
data scale:
182–7,162 cells
Metrics:
MSE:0.5–0.6
ARI:0.37–0.86
NMI:0.39–0.82
LinkYesYes
scDHADenoising and imputation/Cell clusteringHierarchical AutoencoderUnsupervisedProvides a fast, precise, and complete analysis pipeline for robust feature extraction, denoising, and downstream analysis.Inputs: omics
data scale:
90–61,000 cells
Metrics:
R² = 0.93
ARI = 0.81
NMI:0.39–0.82
LinkYesYes
SERMDenoising and imputationNeural Network with Data Self-ConsistencyUnsupervisedRecovers high-fidelity expression values by learning from partial data and enforcing self-consistency, offering high computational efficiency.Inputs: omics
data scale:
2,000–599,926 cells
Metrics:
Pearson >0.9
Accuracy>0.8
NMI>0.75
LinkYesYes
scNETDenoising and imputationDual-view Graph Neural NetworkUnsupervisedIntegrates external biological knowledge (Protein-Protein Interaction networks) to learn context-specific gene and cell embeddings for improved imputation.Inputs: omics
data scale:
799–65,960 cells
Metrics:
AUPR:0.65–0.97
ARI:0.8–0.97
LinkYesYes
cnnImputeDenoising and imputation1D Convolutional Neural Network (CNN)UnsupervisedUses a CNN to first predict dropout probability and then restore expression values, effectively capturing local gene patterns.Inputs: omics
data scale:
320–4,700 cells
Metrics:
AUPR:0.65–0.97
ARI:0.8–0.97
scAMFDenoising and imputationManifold Fitting ModuleUnsupervisedDenoises data by unfolding its distribution in the ambient space, causing cells of the same type to aggregate more tightly.Inputs: omics
data scale: 103–105 cells
Metrics: ARI:0.78
Accuracy:57%→ 100%
LinkYesYes
DGANDenoising and imputationDeep Generative Autoencoder NetworkUnsupervisedA variational autoencoder variant that robustly imputes data dropouts while simultaneously identifying and excluding outlier cells.Inputs: omics
data scale:
1,000–5,000 cells
Metrics:
ARI = 0.92
FMI = 0.89
Accuracy = 0.96
LinkYesYes
ZILLNBDenoising and imputation/Batch effect correction/Cell clusteringZINB Regression with a Deep Generative ModelUnsupervisedCombines a ZINB likelihood with a deep generative model to explicitly handle zero inflation and overdispersion, producing denoised/imputed expression and a biologically meaningful latent space that supports high-quality cell clustering, while incorporating batch covariates to correct technical variation.Inputs: omics
data scale: 104 cells
Metrics:
ARI ≈ 0.85–0.90
Accuracy ~0.9
LinkYesYes
UniVIDenoising and imputationMixture-of-experts β-VAEUnsupervisedDenoises and imputes data across different modalities (e.g., scRNA-seq, scATAC-seq) via manifold alignment.Inputs: omics
data scale:
104–105cells
Metrics:
ARI > 0.9
R² :0.85–0.9
LinkYesYes
SCDDDenoising and imputationCell-similarity diffusion + GCN-Autoencoder denoisingUnsupervisedA two-stage approach that first uses cell similarity for initial imputation and then a GCN-autoencoder to denoise the result and mitigate over-smoothing.Inputs: omics
data scale:
102–106cells
Metrics:
ARI:0.5–0.975
R²:0.999
MSE:0.061
LinkYesYes
scIDPMsDenoising and imputationConditional Diffusion Probabilistic ModelUnsupervisedPerforms targeted imputation by first identifying likely dropout sites and then inferring values, which helps avoid altering true biological zeros.Inputs: omics
data scale:
104cells
Metrics:
ARI:0.98
NMI:0.98
F–score:0.99
Link
scVIDimension reduction/Batch effect correction/Cross-Species Analysis/Cell clusteringVAE with ZINB loss functionUnsupervisedLearns a robust probabilistic latent space that disentangles biological variation from technical noise and batch effects, models batch identity as a covariate to yield a harmonized representation, extends to cross-species analysis by treating species as a batch effect, and enables high-quality indirect cell clustering through denoised, integrated latent embeddings.Inputs: omics
data scale: 3,000–1.3M cells
Unsupervised
Metrics: ASW ≈ 0.47
ARI ≈ 0.81
NMI ≈ 0.72
BE ≈ 0.6
LinkYesYes
scGAEDimension reductionGraph Autoencoder (GAE)UnsupervisedExplicitly preserves the topological structure of the cell-cell similarity graph, improving trajectory inference and cluster separation.Inputs: omics
data scale:
10,000 cells
Metrics:
NMI:0.61–0.65
LinkYesYes
totalVIDimension reduction/Batch effect correction/Cell clusteringVAE for Multi-modal DataUnsupervisedJointly models RNA and surface proteins to create a unified latent space for multi-omic analysis, simultaneously corrects batch effects in both modalities, and enables high-quality indirect cell clustering by providing robust, denoised, integrated embeddings.Inputs: omics
CITE–seq
data scale:
32,648 cells
Metrics:
MAE ≈ 0.8
AUC ≈ 0.99
Latent Mixing Metric: –0.025
LinkYesYes
SAUCIEDimension reduction/Cell clusteringDeep Sparse AutoencoderUnsupervisedPerforms multiple tasks simultaneously (dimensionality reduction, clustering, imputation, batch correction) within a single, unified framework.Inputs: omics
data scale:
11 million cells
Metrics:
Modularity:0.8531
AUC≈0.9342
LinkYesYes
SIMBADimension reductionMulti-entity Graph EmbeddingUnsupervisedCo-embeds cells and their defining features (e.g., genes) into a shared latent space, enabling a unified framework for diverse tasks like marker discovery and integration.Inputs: omics
data scale:
million cells
Metrics:
ARI :0.6 –0.9
LinkYesYes
GLUEDimension reductionGraph-linked VAEs with Adversarial AlignmentsupervisedAccurately integrates unpaired multi-omics data by explicitly modeling regulatory interactions with a guidance graph, ensuring scalability and robustness.Inputs: omics
data scale:
> 17,000 cells
Metrics:
ARI :0.716
FI Score:0.802
AMI ≈ 0.778
LinkYesYes
Pathway-Constrained DNNsDimension reductionDeep Neural Network with Biologically-informed ArchitectureUnsupervisedEnhances biological interpretability and reduces model complexity by designing network layers to correspond to known biological pathways.Inputs: omics
data scale:
Millions of cells
Metrics:
ASW: 0.6–0.7
R² = 0.236
Link
CellBoxDimension reductionODE-based Dynamic Systems ModelSupervisedPredicts cellular responses to unseen perturbations by learning a de novo, interpretable network of molecular interactions directly from data, without relying on prior pathway knowledge.Inputs: omics
data scale:
100 proteins
Metrics:
Pearson's Correlation:0.93
LinkYesYes
sciLaMADimension reductionPaired-VAE with LLM Gene EmbeddingsUnsupervisedIntegrates static gene embeddings from LLMs to generate context-aware representations for both cells and genes, improving performance while maintaining computational efficiency.Inputs: omics
data scale:
14k cells
Metrics:
NMI:0.745
ASW :0.535
BatchASW:0.865
Link
VaedaDoublet removalCluster-aware VAE with Positive-Unlabeled (PU) LearningSupervisedProvides a more nuanced separation of singlets and doublets by considering cell cluster information during representation learning.Inputs: omics
data scale:
12k cells
Metrics:
AUPRC :0.558
F1–score:0.496
Precision :0.59
LinkYesYes
SoloDoublet removalSemi-supervised VAESupervisedAchieves high accuracy by learning the manifold of genuine single-cell profiles and then training a classifier to identify deviations (doublets).Inputs: omics
data scale:
44k cells
Metrics:
AP :0.489
AUROC :0.856
LinkYesYes
deepMNNBatch effect correctionDeep Learning with MNN and Residual NetworksSelf-supervisedIntegrates the logic of Mutual Nearest Neighbors (MNN) into a deep learning framework for one-step, multi-batch correction.Inputs: omics
data scale: 103–105 cells
Metrics: ASW F1 Score: ~0.565
ARI: ~ 0.8
Link
STACASBatch effect correctionMNN-based MethodSemi-supervisedLeverages prior knowledge (cell type labels) to filter inconsistent anchors, improving the balance between batch correction and signal preservation.Inputs: omics
data scale: 103–105 cells
semi–supervised
Metrics: Clisi > 0.6
Cell type ASW > 0.4
LinkYesYes
scGenBatch effect correction/Cross-Species AnalysisVAE with Latent Space ArithmeticSupervisedModels and removes batch effects by performing vector arithmetic on the latent representations of cells.Predicts cellular perturbation responses across species, demonstrating that latent space can bridge species differences.Inputs: omics
data scale:
105,476 cells
Metrics:
R2:0.85–0.95
ASW > 0.6
LinkYesYes
scANVIBatch effect correction/Cell clusteringSemi-supervised VAESupervisedUses partial cell-type labels in a semi-supervised VAE to more accurately align shared populations across batches, enabling high-quality indirect clustering by first learning robust, denoised, and integrated latent representations.Inputs: omics
data scale:
10k cells
Metrics:
Weighted Accuracy:
>0.8
LinkYesYes
scMEDALBatch effect correctionDual-Autoencoder SystemUnsupervisedSeparately models batch-invariant (fixed) and batch-specific (random) effects, enhancing interpretability and enabling retrospective analysis.Inputs: omics
data scale:
104–105cells
Metrics:
ASW = +0.69
ABCBatch effect correctionSemi-supervised Adversarial AutoencoderSemi-supervisedGuided by a cell type classifier to ensure the retention of biological signals during adversarial batch correction.Inputs: omics
data scale: 104–105cells
Metrics:
NMI ~ 0.91
Ilisi ~ 0.3
LinkYesYes
CarDECBatch effect removal/Cell clusteringGenerative Models with Integrated ClusteringSelf-supervisedPerforms clustering and batch effect removal jointly by optimizing a unified objective, producing batch-invariant embeddings and clear cluster assignments within a generative/multi-task framework that delineates cell subpopulations.Inputs: omics
data scale:
103–105cells
Metrics:
ARI:0.78–0.98
CV ~ 0
LinkYesYes
DESCBatch effect correction/Cell clusteringDeep Embedding and Clustering ModelsUnsupervisedPerforms batch effect correction and clustering jointly by optimizing a unified objective, co-optimizing representation learning and cluster assignment end-to-end to produce batch-invariant embeddings and more coherent cell groups.Inputs: omics
data scale:
103–106cells
Metrics:
ARI = 0.919–0.970
Accuracy:96.5%
KL divergence:0.6
LinkYesYes
scArchesBatch effect correction/Cell clustering/Cross-species analysisTransfer Learning FrameworkSupervisedTransfer-learning maps queries to a fixed reference without retraining, providing batch-corrected embeddings, atlas-level clustering/label transfer, and scalable cross-species mapping.Inputs: omics
data scale:
Million cells
Metrics:
Batch ASW:0.5–0.7
ARI:0.8–0.9
LinkYesYes
AIFBatch effect correctionAdversarial Information FactorizationUnsupervisedFactorizes batch information from the biological signal using adversarial networks, without needing prior cell type knowledge.Inputs: omics
data scale:
30K cells
Metrics:
ASW:0.56–0.87
ARI:0.89–0.91
LinkYesYes
DeepBIDBatch effect correctionNB-based Autoencoder with dual-KL lossUnsupervisedConcurrently corrects batch effects and performs clustering through an iterative process guided by a dual-KL divergence loss.Inputs: omics
data scale: 103–106cells
Metrics:
ARI = 0.65–0.97
NMI = 0.72–0.98
LinkYes
ResPANBatch effect correctionWasserstein GAN with Residual NetworksUnsupervisedA powerful batch correction model that combines a WGAN with mutual nearest neighbor pairing for robust integration.Inputs: omics
data scale: 103–106cells
Metrics:
ARI =0.92681
NMI = 0.90775
cLISI:0.97093
LinkYesYes
scDMLBatch effect correctionDeep Metric LearningSelf-supervisedLearns a batch-agnostic embedding space where distances between similar cells are minimized, regardless of batch origin.Inputs: omics
data scale:
103–106cells
Metrics:
ARI = 0.966
NMI = 0.934
LinkYesYes
BERMADBatch effect correctionMulti-layer, Dual-channel AutoencoderSelf-supervisedDesigned to preserve dataset-specific heterogeneity before alignment, mitigating the risk of over-correction.Inputs: omics
data scale:
103–105cells
Metrics:
ARI = 0.94 ± 0.00
LinkYesYes
PortalBatch effect correctionAdversarial Domain Translation NetworkUnsupervisedFast and scalable integration that avoids over-correction by adaptively distinguishing between shared and batch-unique cell types.Inputs: omics
data scale:
105–106cells
Metrics:
iLISI ~ 1
LinkYesYes
scVAECell clusteringGenerative Models with Integrated ClusteringUnsupervisedPossess integrated capabilities to delineate cell subpopulations as part of their generative or multi-task framework.Inputs: omics
data scale:
103–106cells
Metrics:
ARI = 0.656 ± 0.039
LinkYesYes
scDeepClusterCell clusteringIntegrated Deep Clustering (AE + KL loss)UnsupervisedCo-optimizes representation learning and cluster assignment in an end-to-end fashion for more coherent cell groups.Inputs: omics
data scale:
4,271 cells
Metrics:
ACC= 0.8100
NMI= 0.7736
ARI= 0.7841
LinkYesYes
Cell BLASTCell annotationGenerative Model / Adversarial AutoencoderUnsupervisedProvides a BLAST-like querying system for scRNA-seq data, using a learned, batch-corrected embedding to annotate cells and identify novel types.Inputs: omics
data scale:
Million cells
Metrics:
MBA:0.873
LinkYesYes
scSemiClusterCell annotationDeep Clustering with Structural RegularizationSemi-supervisedApplies a semi-supervised deep clustering algorithm for annotation, regularized by data structure.Inputs: omics
data scale:
105 cells
Metrics: Accuracy: >97%。
ARI:≈ 0.95
Link
scBalanceCell annotationSparse Neural Network with Adaptive SamplingSupervisedSpecialized tool that uses adaptive sampling techniques to enhance the identification of rare cell types.Inputs: omics
data scale:
105 cells
Metrics:
Cohen's κ:0.95
LinkYesYes
scTabCell annotationFeature-attention Model for Tabular DataSupervisedA scalable model trained on over 22 million cells, achieving robust cross-tissue annotation by focusing on relevant features.Inputs: omics
data scale:
15 million cells
Metrics:
Macro F1 = 0.7841 ± 0.0030
LinkYesYes
scVQCCell annotationSplit-vector QuantizationSupervisedThe first method to apply split-vector quantization to create discrete cellular representations that enhance cell type distinction.Inputs: omics
data scale:
105 cells
Metrics:
Accuracy : 0.86–0.95
ARI:0.82–0.88
Link
scNymCell annotationSemi-supervised Adversarial Neural NetworkSemi-supervisedRobustly transfers annotations across experiments by learning from both labeled reference and unlabeled query data.Inputs: omics
data scale:
105 cells
Metrics:
Accuracy : 90–92%
LinkYesYes
CAMLUCell annotationHybrid Autoencoder + SVMSemi-supervisedA hybrid framework that combines an autoencoder with a support vector machine, capable of identifying novel cell types.Inputs: omics
data scale:
2,400–3,800 cells
Metrics:
Accuracy ≈ 0.95
ARI ≈ 0.9
LinkYesYes
TripletCellCell annotationDeep Metric Learning (Triplet Loss)SupervisedLearns a discriminative embedding space, enabling accurate annotation even across different samples or protocols.Inputs: omics
data scale:
105 cells
Metrics:
Accuracy ≈ 80%
Link
scDeepSortCell annotationPre-trained Weighted Graph Neural Network (GNN)SupervisedAn early example of a pre-trained, weighted GNN designed for scalable and accurate cell type annotation.Inputs: omics
data scale:
265,489 cells
Metrics:
Accuracy:83.79%
F1–score (95% CI):0.47–0.68
LinkYesYes
mtANNCell annotationEnsemble of ModelsSupervisedImproves annotation accuracy by integrating multiple reference datasets and can identify previously unseen cell types.Inputs: omics
data scale:
105 cells
Metrics:
Pearson > 0.9
AUPRC ≈ 0.6
LinkYesYes
scGADCell annotationAnchor-based Self-supervised FrameworkSemi-supervised & self-supervisedSolves the generalized annotation task by simultaneously annotating seen cell types from a reference and discovering/clustering novel cell types in the query data.Inputs: omics
data scale:
105 cells
Metrics:
Accuracy >90%
Link
CellassignCell annotationProbabilistic Model with Marker GenesWeakly supervisedAssigns cell types based on a predefined matrix of marker genes, making it highly effective and interpretable in specific contexts.Inputs: omics
data scale:
1,000–20,000 cells
Metrics:
Accuracy = 0.944
F1–score = 0.943
LinkYesYes
CellerCell annotationGenomic Language ModelSupervisedSpecifically designed with mechanisms to address the long-tail distribution problem for improved annotation of rare cells.Inputs: omics
data scale:
107 cells
Metrics:
F1 = 0.956
Precision = 0.841 ± 0.002
LinkYesYes
scMMTCell annotationMulti-use CNN FrameworkSupervisedA flexible multi-task framework that performs cell annotation alongside other tasks like protein prediction.Inputs: omics
data scale:
105cells
Metrics:
Accuracy ≈ 0.85
ARI = 0.945
LinkYesYes
TOSICACell annotationTransformerSupervisedPerforms interpretable annotation guided by biological entities such as pathways and regulons.Inputs: omics
data scale:
647366 cells
Metrics:
Accuracy =0.8669
LinkYesYes
RegFormerCell annotationMamba-based Architecture with GRN HierarchiesSelf-supervisedA FM that integrates gene regulatory network hierarchies to enhance interpretability and performance.Inputs: omics
data scale: 106 cells
Metrics:
Accuracy = 0.86
Macro–F1 = 0.77
LinkYesYes
GPTCelltypeCell annotationLarge Language Model (GPT-4)Self-supervisedDemonstrates that large models can accurately infer cell types simply by interpreting lists of marker genes, automating the process.Inputs: omics
data scale:
105 cells
Metrics:
accuracy :0.75–0.93
LinkYesYes
DeepVeloTrajectory Inference and Pseudotime AnalysisDeep Learning FrameworkSelf-supervisedExtends RNA velocity analysis to complex, multi-lineage systems where traditional methods often fail.Inputs: omics
data scale:
104 cells
Metrics:
Consistency Score :0.9
LinkYesYes
VeloVITrajectory Inference and Pseudotime AnalysisDeep Generative Model (VAE)UnsupervisedProvides crucial transcriptome-wide uncertainty quantification for the inferred cellular dynamics, enhancing reliability.Inputs: omics
data scale:
103–104 cells
Metrics:
accuracy: 66–68%
LinkYesYes
scTourTrajectory Inference and Pseudotime AnalysisVAE with Neural ODEUnsupervisedLearns the vector field of cellular transitions and provides interpretability mechanisms to reveal driver genes.Inputs: omics
data scale:
103–105 cells
Metrics:
Spearman ρ > 0.9
LinkYesYes
VITAETrajectory Inference and Pseudotime AnalysisVAE with a Latent Hierarchical Mixture ModelUnsupervisedEnables joint trajectory inference from multiple datasets and provides robust uncertainty quantification.Inputs: omics
data scale:
103–106 cells
Metrics:
ARI :0.5~0.9
PDT:0.4~0.9
LinkYesYes
TrajectoryNetTrajectory Inference and Pseudotime AnalysisDynamic Optimal Transport NetworkUnsupervisedEmploys a dynamic optimal transport network to learn the continuous flow of cells over time.inputs: omics
data scale:
103–105 cells
Metrics:
Base TrajectoryNet:≈ 0.897
Arch MSE = 0.300
Cycle MSE = 0.190
LinkYesYes
TIGONTrajectory Inference and Pseudotime AnalysisOptimal Transport with Growth/Death ModelsUnsupervisedReconstructs both population dynamics and state transition trajectories simultaneously by incorporating cell growth and death.Inputs: omics
data scale:
5,000+ cells
Metrics:
Pearson = 0.62
AUROC ≈ 0.9
LinkYesYes
GeneTrajectoryTrajectory Inference and Pseudotime AnalysisOptimal Transport on a Cell-Cell GraphUnsupervisedA novel gene-centric paradigm that infers trajectories of genes, allowing it to deconvolve concurrent biological programs.Inputs: omics
data scale:
1,000–10,500 cells
Metrics:
Robustness ≈ 1
Spearman ≈ 0.9
LinkYesYes
DeepSEMGRN inferenceDeep Generative Model for SEMsUnsupervisedA pioneering work that generalized linear structural equation models (SEMs) for GRN inference using a deep generative model.Inputs: omics
data scale:
1,000–10,500 cells
Metrics: ARI ≈ 0.82
NMI ≈ 0.86
LinkYesYes
CellOracleGRN inferenceGRN Inference with In Silico PerturbationUnsupervisedIntegrates scRNA/ATAC-seq and performs in silico perturbation simulations to predict the functional consequences of TF activity.Inputs: omics
data scale:
103–105 cells
Metrics:
AUROC = 0.66–0.85
LinkYesYes
LINGERGRN inferenceGRN Inference with RegularizationUnsupervisedEnhances inference by incorporating atlas-scale external bulk genomics data and TF motif knowledge as regularization.Inputs: omics
data scale:
103–104 cells
Metrics:
AUC = 0.76
AUPR = 2.60
LinkYesYes
scMultiomeGRNGRN inferenceCross-modal Attention ModelSemi-supervisedSpecifically designed for multi-omics integration using modality-specific aggregators and cross-modal attention.Inputs: omics
data scale:
103–105 cells
Metrics:
Accuracy: >0.83
AUROC ≈ 0.924
AUPR ≈ 0.79
Link
scMTNIGRN inferenceMulti-task LearningUnsupervisedInfers cell-type-specific GRNs along developmental lineages from multi-omic data.Inputs: omics
data scale:
103 cells
Metrics:
Accuracy: >0.83
F–score >0.3
AUPR : 0.21~0.27
LinkYesYes
GRN-VAEGRN inferenceVAE-based GRN ModelUnsupervisedImproves upon the stability and efficiency of earlier generative models like DeepSEM for GRN inference.Inputs: omics
data scale:
105 cells
Metrics:
AUPRC > 1
LinkYesYes
GRANGERGRN inferenceRecurrent VAEUnsupervisedInfers causal relationships from time-series scRNA-seq data to capture the dynamic nature of GRNs.Inputs: omics
data scale:
103 cells
Metrics:
AUROC ≈ 0.85–0.90
AUPRC ≈ 0.90–0.98
LinkYesYes
scGeneRAIGRN inferenceExplainable AI (XAI) ModelUnsupervised/self-supervisedEmploys XAI techniques to infer interpretable, cell-specific regulatory networks, addressing the "black box" problem.Inputs: omics
data scale:
15,000 cells
Metrics:
AUC = 0.75–0.88
LinkYesYes
scGREAT / InfoSEMGRN inferenceLLM-integrated ModelsSupervisedIncorporate textual gene embeddings from large language models as an informative prior to improve GRN inference.Inputs: omics
data scale:
thousands of cells
Metrics:
AUROC = 0.913
AUPRC = 0.5597
LinkYesYes
scRegNetGRN inferenceFM+ GNNSupervisedCombines the power of single-cell FMs with GNNs to predict regulatory connections.Inputs: omics
data scale:
800–1,000 cells
Metrics:
AUROC :0.93
AUPRC: 0.86
LinkYes
DigNet / RegDiffusionGRN inferenceDiffusion ModelsUnsupervisedConceptualize network inference as a reversible denoising process, representing a new wave of generative frameworks for GRN inference.Inputs: omics
data scale:
thousands of cells
Metrics:
AUPRC:up 19–32%
LinkYesYes
GRNFormerGRN inferenceGraph TransformerSemi-supervisedUses a sophisticated graph transformer pipeline to infer regulatory relationships with high accuracy.Inputs: omics
data scale:
500–5,900 genes
Metrics: AUROC/AUPRC:0.90–0.98
LinkYesYes
GeneCompassCross-Species AnalysisKnowledge-informed Transformer (FM)Self-supervisedA large-scale model pre-trained on human and mouse cells to decipher universal gene regulatory mechanisms for cross-species tasks.Inputs: omics
data scale:
126M cells
Metrics:
AUC≈.0.95
Annotations accuracy:0.84–0.87
LinkYesYes
CACIMARCross-Species AnalysisWeighted Sum ModelSelf-supervised / unsupervisedSystematically quantifies the conservation score of cell types, markers, and interactions based on homologous features.Inputs: omics
data scale:
80,777 cells
Metrics: R2 > 0.66
LinkYesYes
NvwaCross-Species AnalysisDeep Learning on DNA SequencesSelf-supervised / unsupervisedPredicts cell-specific gene expression from DNA sequences, allowing it to identify conserved regulatory programs across species.Inputs: omics
data scale:
635k cells
Metrics:
AUROC = 0.78
AUPR = 0.59
LinkYesYes
CAMECross-Species AnalysisHeterogeneous Graph Neural Network (GNN)Self-supervised / unsupervisedDirectly assigns cell types across species from scRNA-seq data and provides quantitative assignment probabilities.Inputs: omics
data scale:
Million cells
Metrics:
Accuracy ≈ 0.87
LinkYesYes
SATURNCross-Species AnalysisProtein Language Model (PLM) IntegrationWeakly supervisedEnables cell alignment based on functional protein similarity, which is often more conserved across species than gene sequences.Inputs: omics
data scale:
335,000 cells
Metrics:
accuracy ≈ 0.8
ARI / NMI > 0.8
LinkYesYes

📊 Analysis Summary

  • Total Methods Reviewed: 84
  • Primary Applications (by frequency):
    • Cell Annotation (21 methods, 25%)
    • Denoising & Imputation (16 methods, 19%)
    • Batch Effect Correction (16 methods, 19%)
    • GRN Inference (11 methods, 13%)
    • Dimension Reduction (9 methods, 11%)
    • Trajectory Inference (7 methods, 8%)
    • Cross-Species Analysis (6 methods, 7%)
    • Cell Clustering (6 methods, 7%)
    • Doublet Removal (2 methods, 2%)
  • Supervision Distribution: Unsupervised (46 methods, ~55%), Self-supervised (12 methods, ~14%), Semi-supervised (11 methods, ~13%), Supervised (13 methods, ~16%), Weakly-supervised (2 methods, ~2%)
  • Unsupervised + Self-supervised: 58/84 (69%)
  • Code Availability: 82/84 (97.6%) link to public repositories
  • Installation Docs: 72/84 (85.7%)
  • Tutorials: 69/84 (82.1%)
  • Both Install + Tutorial: 69/84 (82.1%)
  • Multi-task Methods: 8 methods (10%) address multiple applications simultaneously (e.g., scVI, totalVI, SAUCIE, scDHA, ZILLNB, CarDEC, DESC, scArches)
  • Notable Trends:
    • Foundation model integration emerging in annotation (scTab: 22M cells, Celler: 10M cells) and GRN inference (scGREAT, scRegNet)
    • VAE-based architectures dominate across tasks (26 methods, 31%)
    • GNN-based approaches increasingly popular for preserving cell-cell relationships (12 methods, 14%)
    • Growing adoption of diffusion models (scIDPMs, DigNet, RegDiffusion) and transformers (TOSICA, GRNFormer) in recent methods