Functional Genomics and Gene Discovery through AI

Mar 11

AI in Enhancer–Promoter Prediction and Regulatory Mapping

Understanding enhancer-promoter interactions (EPI) is critical for interpreting gene regulation. Traditional methods rely on chromatin conformation assays, which can be laborious and cell-type specific. AI approaches have begun to offer powerful alternatives. The PDCNN model employs deep learning to predict enhancers based on nucleotide sequence features and achieves high performance in benchmark tests, providing a scalable method for enhancer detection.

Similarly, the DREAM Challenge focused on predicting gene expression from promoter sequences using millions of random promoter data and tested model architectures across species. The top-performing deep neural networks exceeded prior benchmarks in regulatory sequence modeling, and their modular evaluation framework (‘Prix Fixe’) informed design choices for regulatory prediction models.

These tools demonstrate that AI models can learn intricate regulatory grammar from sequence, facilitating genome-wide enhancer and promoter annotation.

Deep Learning for Transcription Factor Binding and Chromatin Accessibility

Predicting transcription factor (TF) binding or chromatin state is foundational to genome function mapping. TFBS-Finder is a recent transformer-based deep learning model built on DNABERT embeddings, convolutional modules, attention mechanisms, and multi-scale convolutions. It outperforms previous methods across 165 ENCODE ChIP-seq datasets, illustrating the ability to identify TF binding sites with high precision and cross-cell-line generalizability ([turn0academia22]).

Other transformer-based architectures like Genomic Interpreter leverage hierarchical windowed attention to model long-range genomic dependencies for chromatin accessibility and gene expression prediction, offering interpretable insights into regulatory syntax.

Network-Based Gene Prioritization with AI Extensions

Gene function prediction often benefits from network context. AI frameworks enhance tools like GeneMANIA by inferring gene relationships from expression, physical interaction, and co-regulation data, improving prioritization of disease candidates. While recent dynamic, AI-powered advances are emerging in this space, gene network analysis remains central to functional gene discovery, especially when integrated with AI-driven filtering of regulatory features (e.g., [turn0search1] demonstrates combining AI with network analysis to identify drug targets in Rett syndrome models).

Cell-Type and Tissue-Specific Models via Single-Cell Data Integration

AI’s ability to integrate single-cell transcriptomic data has transformed tissue-specific inference of gene function. CRISPR–Perturb-seq combines pooled genetic perturbations with single-cell RNA sequencing to assess gene regulatory effects in a high-throughput, cell-type-specific manner. This method enables direct mapping of function at the resolution of individual cell types, facilitating AI-driven interpretation of perturbation outcomes across diverse cellular contexts.

Integrating AI with CRISPR Screen Interpretation

AI enhances the analysis and design of CRISPR screens. Transformer-based models now aid in guide-RNA (gRNA) design by predicting on-target efficiency and off-target risks, sometimes revealing informative gRNA sequence motifs via attention-based interpretability methods.

Moreover, AI-powered networks constructed from regulatory and gene expression data can refine target selection for CRISPR screens. In a recent study using Rett syndrome models, AI-guided network analysis combined with in vivo CRISPR screening in Xenopus and mice led to identification of an FDA-approved drug (vorinostat) that effectively rescued multi-organ disease phenotypes, demonstrating AI-driven potential for functional discovery and translational application.

AI-Enabled Functional Genomics Workflow Example

Use case: A researcher studying a rare neurological disorder suspects that regulatory variants in non-coding regions are disrupting gene expression. The goal is to identify candidate enhancer–promoter pairs, prioritize likely disease genes, and validate functional impact with single-cell CRISPR screens.

Step 1 - Enhancer–Promoter Discovery

The researcher begins with whole-genome sequencing data from patients. Instead of relying only on experimental chromatin conformation assays, they apply the PDCNN deep learning model, which predicts enhancer elements directly from DNA sequence. This method highlights several non-coding regions in proximity to known neurological loci as putative enhancers.

Step 2 - Regulatory Grammar and TF Binding

To understand how these enhancers might act, the researcher runs TFBS-Finder, a transformer-based deep learning model trained on DNABERT embeddings. It predicts high-confidence binding motifs for neuronal transcription factors within the enhancer regions, providing mechanistic clues about their potential regulatory role.

Step 3 - Gene Prioritization via Networks

The enhancer–promoter predictions are linked to potential target genes. To refine prioritization, the researcher integrates results into a gene network model augmented with AI-driven ranking. This approach emphasizes genes with both strong enhancer connections and high centrality in co-expression networks relevant to neuronal tissue.

Step 4 - Tissue-Specific Modeling with Single-Cell Data

The shortlisted genes are evaluated in single-cell transcriptomic datasets. Using Perturb-seq, the researcher perturbs candidate genes in cultured neuronal cells, capturing downstream expression effects at single-cell resolution. This establishes which candidates produce cell-type–specific transcriptional shifts consistent with disease phenotypes.

Step 5 - CRISPR Screen Optimization

Finally, AI is applied to design optimized guide RNAs targeting the candidate regulatory elements. A recent review describes transformer-based models that predict both on-target efficiency and off-target effects, making CRISPR screens more precise.

Outcome: By combining enhancer discovery (PDCNN), binding motif analysis (TFBS-Finder), network-based prioritization, Perturb-seq single-cell screens, and AI-optimized CRISPR design, the researcher identifies a previously uncharacterized enhancer that regulates a candidate neuronal gene. Functional validation shows this regulatory element contributes to disease-relevant expression changes, making it a strong therapeutic target.

Resources

PDCNN model for enhancer prediction using deep learning on DNA sequence. iScience, 29(5), 1001255. (2024). https://www.sciencedirect.com/science/article/pii/S2589004224012550
Consensus evaluation of sequence-to-expression models using millions of promoter sequences. Nature Biotechnology, 43, 1373–1383. (2025). https://www.nature.com/articles/s41587-024-02414-w
TFBS-Finder: Deep learning model using DNABERT for TF binding site prediction. arXiv preprint (Feb 2025). https://doi.org/10.48550/arXiv.2502.01311
Genomic Interpreter: Hierarchical transformer-based model for chromatin accessibility and gene expression prediction. arXiv preprint (Jun 2023). https://doi.org/10.48550/arXiv.2306.05143
AI + Gene Network + CRISPR for Drug Discovery: AI-enabled drug prediction and gene network analysis applied to Rett syndrome. Nature Communications, (2025). https://doi.org/10.1038/s43856-025-00975-8
EPIPDLF: Pretrained deep learning framework for enhancer-promoter interactions. Bioinformatics, (2025). https://doi.org/10.1093/bioinformatics/btae716
CRISPR gRNA design and interpretation: Recent review of AI in gRNA efficiency and off-target prediction. arXiv (Aug 2025). https://doi.org/10.48550/arXiv.2508.20130

Kamayani Gupta