Resources

Software Libraries & Frameworks

  • SHAP (SHapley Additive exPlanations) - https://github.com/slundberg/shap
    A Python library implementing game-theoretic feature attribution methods for interpreting model outputs. Widely used for tabular, text, and biological data.

  • LIME (Local Interpretable Model-agnostic Explanations) - https://github.com/marcotcr/lime
    Model-agnostic explanation technique for understanding individual predictions by approximating complex models locally with interpretable ones.

  • Captum - https://captum.ai/
    An interpretability library developed by Facebook for PyTorch models, supporting integrated gradients, DeepLIFT, and other attribution methods.

  • Alibi - https://github.com/SeldonIO/alibi
    An open-source Python library providing algorithms for explaining and interpreting black-box models with support for counterfactuals and anchors.

  • Explainability Tools for Graph Neural Networks

Reproducibility & Workflow Tools

  • Docker
    Containerization platform to create reproducible computational environments encapsulating code, dependencies, and system libraries.
    https://www.docker.com/

  • Snakemake
    A workflow management system for building reproducible and scalable data analyses. Widely used in bioinformatics pipelines.
    https://snakemake.readthedocs.io/en/stable/

  • Nextflow
    Enables portable and reproducible workflows across different computational environments, including cloud and HPC clusters.
    https://www.nextflow.io/

  • Git & GitHub
    Version control and collaborative code hosting platforms to track changes and facilitate reproducible research.
    https://git-scm.com/
    https://github.com/

  • Zenodo
    An open repository for sharing datasets, code, and preprints with DOI assignment, supporting open reproducible science.
    https://zenodo.org/

Tutorials & Workshops

Datasets & Benchmark Collections

  • OpenML
    Repository of datasets with metadata and task definitions, including biological and biomedical data useful for benchmarking interpretability methods.
    https://www.openml.org/

  • recount2
    Uniformly processed RNA-seq data from thousands of human samples, ideal for reproducible gene expression modeling.
    https://jhubiostatistics.shinyapps.io/recount/

  • STRING Database
    Known and predicted protein-protein interactions, useful for integrating biological priors in interpretable network models.
    https://string-db.org/

  • BioGRID
    Comprehensive repository of genetic and protein interaction data across multiple species.
    https://thebiogrid.org/

Community and Discussion Forums

Next
Next

Core AI and ML Concepts for Computational Biology