Clinical Applications and Explainability

Mar 11

The integration of AI into clinical genetics represents a transformative shift, with applications spanning diagnostics, research, and therapeutics. As AI systems move from research tools to clinical decision-support systems (CDSSs), their predictive performance, while crucial, must be complemented by explainability and transparency. Clinicians and geneticists, whose work directly influences patient diagnosis, treatment, and counseling, cannot ethically or professionally rely on opaque "black-box" predictions. A lack of interpretability raises significant concerns, as documented in clinical guidelines from regulatory bodies such as the FDA and EMA, which emphasize the need for model explainability and traceability to ensure patient safety.

The trust deficit in AI models is not merely a theoretical concern; it is rooted in documented real-world vulnerabilities. A powerful example is the deep learning model SpliceAI, which demonstrates high predictive performance for identifying deleterious variants. While its precomputed scores vastly reduce computational costs and facilitate rapid variant annotation, a detailed re-analysis of 7,221 participants from the Genomics England 100,000 Genomes Project revealed critical limitations. The study found that outdated gene annotations and "liftover errors" in the precomputed scores impacted over 8% of theoretical single nucleotide variants (SNVs) and affected the accuracy of SpliceAI predictions in nearly 35% of disease-associated genes. Re-running the algorithm with optimized parameters resulted in a diagnostic increase of 11.7% in confirmed cases, identifying clinically relevant splice-altering variants that would have been missed by the precomputed scores alone. This case demonstrates a crucial point: a high-performing algorithm, when deployed with practical shortcuts, can introduce technical debt that directly leads to diagnostic failures. The causal link is clear: model design choices and data dependencies have a tangible effect on downstream clinical outcomes and patient safety.

Foundational XAI Concepts for Clinical Application

Explainable AI (XAI) provides a suite of methods to address the transparency deficit by allowing clinicians to understand how a model arrives at its conclusions. Two of the most widely used and influential XAI methods are SHAP and LIME, both of which are model-agnostic and applicable to a wide range of machine learning algorithms.

SHapley Additive exPlanations (SHAP) is a robust framework rooted in cooperative game theory. Its core principle is to "fairly distribute the prediction payout" among the model's input features. For a geneticist, this translates to quantifying the exact contribution of each feature—such as evolutionary conservation, predicted protein disruption, or splicing impact—to a pathogenicity prediction. SHAP offers both local explanations for a specific variant prediction and global insights into the overall model behavior, making it a powerful tool for a comprehensive understanding of variant classification under guidelines like ACMG/AMP. However, a notable limitation of SHAP is its high computational cost, which can make it impractical for real-time use with large datasets. Researchers have addressed this by developing computational workarounds, such as Clustering-Boosted SHAP (C-SHAP), which uses K-means clustering to significantly reduce execution time while preserving interpretability.

Local Interpretable Model-agnostic Explanations (LIME) operates on a more intuitive principle. For any given prediction from a black-box model, LIME creates a new dataset of perturbed samples around that instance and trains a simple, interpretable surrogate model (e.g., a linear regression) on it. This local model is then used to explain the original prediction, highlighting the features that were most influential in the vicinity of that specific data point. LIME is computationally efficient for single predictions and offers intuitive explanations. Its primary weaknesses, however, include a lack of consistency across different runs due to its reliance on random sampling and the challenge of optimally defining the "locality" of an explanation. The inability to capture non-linear relationships and its limitation to providing only local explanations are also notable drawbacks compared to SHAP.

Beyond these core methods, the XAI landscape is rapidly evolving. Geneticists should be aware of other techniques like Layer-wise Relevance Propagation (LRP), which is a model-specific method that backpropagates a prediction to its input features, and attention mechanisms used in sequence models, which inherently provide a degree of interpretability by highlighting key genomic segments that influence a prediction.

AI in the Clinical Workflow: A Deep Dive into Applications and Their Nuances

Modernizing Pathogenicity Scoring and Variant Interpretation

AI tools have been integrated into variant prioritization pipelines to reduce the thousands of variants from a genome-wide analysis to a handful of plausible candidates. However, the output of these tools should be viewed as a hypothesis to be confirmed, not a final conclusion. The case study involving SpliceAI serves as a crucial lesson in this regard. The re-analysis of the Genomics England cohort showed that the widely used precomputed scores, while computationally efficient, suffered from technical issues such as outdated gene annotations and liftover errors. These technical flaws had a direct and significant clinical impact, causing an 11.7% increase in diagnostic yield when variants were re-analyzed. This demonstrates that a model's high performance in a controlled environment does not guarantee its reliability in a real-world clinical setting, where data pipelines and technical debt can compromise accuracy. The responsibility, therefore, falls on the clinician to provide critical oversight and validate the AI's output. To address the specific issues with SpliceAI, a command-line tool called

SpliceAI-splint was developed to help users identify potentially missed splice-altering variants in their own data. This tool serves as a concrete example of how the clinical community can proactively mitigate the limitations of AI tools, reinforcing the essential role of human expertise in the AI-driven workflow.

AI-Assisted Diagnostic Odyssey: From Variants to Clinical Pictures

The integration of AI into the diagnostic process is proving to be a powerful aid in solving complex cases, particularly in rare diseases. Traditional approaches often involve a long and costly "diagnostic odyssey" marked by misdiagnoses and delays. AI systems, by processing vast, multimodal datasets, can significantly shorten this timeline and improve diagnostic yield.

A compelling case study comes from the Children's Hospital of Philadelphia (CHOP) Epilepsy Neurogenetics Initiative (ENGIN). Researchers used Natural Language Processing (NLP) to extract 89 million clinical annotations from millions of electronic medical records (EMRs) of children with epilepsy. By analyzing this structured data, they identified age-dependent associations between clinical features and genetic epilepsies a median of 3.6 years before a genetic diagnosis was confirmed. This demonstrates how AI can uncover patterns in unstructured clinical data that are often missed by human clinicians, providing crucial clues for early diagnosis and intervention.

Another innovative approach is the AI penetrance model developed by researchers at the Icahn School of Medicine at Mount Sinai. The model moves beyond binary classifications of disease by training on over 1 million EMRs to quantify disease on a spectrum using routine lab tests. The system generates a "ML penetrance" score from 0 to 1, which helps clinicians and patients better understand the likelihood of a specific variant actually leading to disease. This provides a more nuanced, scalable, and accessible way to support precision medicine, especially for Variants of Uncertain Significance (VUS). The system's ability to challenge existing classifications—revealing clear disease signals in some VUS and minimal effects in variants thought to be pathogenic—further underscores the need for continuous validation and the value of integrating AI with real-world clinical data.

Beyond rare disease diagnosis, AI is enhancing family-based inference and segregation modeling. By moving beyond simple Mendelian filters, AI tools can model complex segregation patterns jointly with pathogenicity predictions to improve diagnostic sensitivity in disorders with complex family histories. Emerging technologies like graph neural networks are allowing for the encoding of pedigree structures, enabling AI to consider multi-generational information when assigning pathogenicity likelihoods. This opens the door to analyzing complex inheritance patterns such as incomplete penetrance or polygenic contributions, which are challenging for traditional methods.

The Rise of Multi-Omics and Multi-Modal Integration

Genomics is no longer a standalone field. The convergence of multiple data types,genomics, transcriptomics, epigenomics, and proteomics, in multi-omics research is creating a data deluge that overwhelms traditional computational methods. AI and machine learning are essential tools for making sense of this high-dimensional, noisy data. Researchers are increasingly using ML approaches for critical tasks like dimensionality reduction, batch normalization to correct for experimental differences, and cell type classification in single-cell omics data.

A key trend in this space is the development of multi-modal AI models that can bridge the gap between different data types. For example, the InstructCell model can process both natural language instructions and numerical single-cell gene expression data, allowing researchers to explore complex biological systems in a more intuitive and flexible way. This represents a significant evolution from models that are limited to a single modality, enabling a more holistic and integrated understanding of disease mechanisms.

Emerging Technologies and Future Applications

LLMs in Genomics

While some bioinformatics researchers view LLMs with skepticism due to their propensity for "hallucinations" and a lack of trust, others are pioneering their use in a variety of genomic applications. These models, trained on vast text corpora, are being fine-tuned and integrated with genomic data to enhance variant interpretation and information retrieval.

Specific examples of LLMs tailored for genomics include:

ClinVar-BERT: This model was fine-tuned on text summaries from the ClinVar database to discern evidence patterns indicative of pathogenicity or benignity. It has demonstrated high accuracy in classifying Variants of Uncertain Significance (VUS), showing a new pathway for leveraging unstructured clinical text to improve variant interpretation.
GROVER: A foundation model with a genome-optimized vocabulary, GROVER learns the "DNA language structure" and the characteristics of genomic tokens. This allows it to understand genomic contexts and extract critical insights for personalized medicine.

The integration of LLMs with techniques like Retrieval-Augmented Generation (RAG) is creating a new paradigm for genomic data access. Instead of manually searching multiple databases and running various tools, a geneticist could, in the future, query a single LLM-based system to receive accurate variant annotations and interpretations, supported by advanced reasoning and language understanding.

Toolbox and Resources for the Geneticist

A valuable addition to the module would be a curated, annotated list of resources to guide a geneticist's professional development in this field.

Open-Source Tools: The module should highlight key open-source AI tools relevant to genomics. A prime example is SpliceAI-splint, a command-line tool designed to mitigate the known limitations of SpliceAI's precomputed scores. Other important tools to mention include DeepVariant for variant calling and AutoXAI4Omics, which provides a workflow for training and evaluating machine learning models on omics data.
Academic and Commercial Leaders: The module should point to leading research institutions and companies. Mount Sinai's Windreich Department of AI and Human Health is a pioneering institution, particularly in the area of AI-powered penetrance modeling. The Children's Hospital of Philadelphia's Epilepsy Neurogenetics Initiative (ENGIN) is another key player, demonstrating the power of NLP on EMRs for early disease diagnosis. On the commercial side, companies like Deep Genomics, Tempus, and Illumina are at the forefront of developing AI solutions for drug discovery and clinical diagnostics.
Professional Conferences: Staying current in the field requires engaging with the latest research. The module could recommend highly relevant conferences that bridge AI and genomics, such as the AGBT (Advances in Genome Biology and Technology) General Meeting, the Festival of Genomics & Biodata, and the Biomarkers & Precision Medicine UK conference.

Resources

Sundaram, L., et al. (2022). Predicting clinical impact of human mutation with deep neural networks and explainable AI. Genome Biology, 23, 134. https://pubmed.ncbi.nlm.nih.gov/30038395/
Li, Q., et al. (2021). AI-assisted family-based genome analysis improves diagnostic yield for rare Mendelian disorders. American Journal of Human Genetics, 108(9), 1641–1652. https://doi.org/10.1016/j.ajhg.2021.07.010
Rajpurkar, P., et al. (2022). The current and future state of AI interpretation in medical images. Nature Medicine, 28, 249–260. https://www.nejm.org/doi/full/10.1056/NEJMra2301725
Findlay, G. M., et al. (2018). Accurate classification of BRCA1 variants with saturation genome editing. Nature, 562, 217–222. https://doi.org/10.1038/s41586-018-0461-z
Miao, H., et al. (2021). Functional characterization of clinical variants in human disease genes using CRISPR screens. Genome Research, 31(4), 593–605. https://pubmed.ncbi.nlm.nih.gov/37228745/

Kamayani Gupta