Evaluating AI Tools Through Healthcare Life Science Frameworks

May 17

HIMSS Adoption Report. 2024. Link and discussion below.

To effectively leverage AI tools in healthcare and life sciences (HCLS), a rigorous evaluation process is crucial. Evaluating AI tools within healthcare and life sciences is a multifaceted process, involving technical, ethical, regulatory, and clinical considerations. Every healthcare practice will have their own existing frameworks but the ones mentioned below can be applied generally as core principles for AI assessments.

Common Frameworks used to evaluate AI tools

Fairness, Accountability, Transparency, and Ethics (FATE) Framework

Fairness: Ensure the AI system is designed and deployed to avoid biases based on race, gender, socio-economic status, and other demographic factors. This can include assessing if the AI system's predictions or recommendations disproportionately impact specific groups.
Accountability: Establish mechanisms for identifying who is responsible for AI decisions and actions. This includes defining who will be accountable if an AI system leads to adverse outcomes.
Transparency: Assess how explainable the AI system is, particularly in healthcare where understanding the rationale behind decisions is crucial for clinicians and patients. Have a clear methodology in place to gather the rationale of AI systems for decision-making purposes.
Ethics: Evaluates the broader ethical implications of using AI in healthcare, such as patient privacy, autonomy, and the risk of replacing human judgment with automated decision-making.

Example frameworks:

AI Maturity Models (e.g., AI Readiness Assessment)

AI maturity models are used to evaluate how ready an organization is to implement AI tools across their systems, including healthcare applications. These models assess the AI maturity in the context of both technical and organizational factors, such as:

Data infrastructure readiness
Integration with clinical workflows
Governance and compliance structures
Scalability and adaptability to new medical challenges

Without having the proper infrastructure and compliance structures in place to implement AI tools, an organization may be leading to a long-term problematic environment in which the tool operates within.

Example Model:

AI Readiness Assessment by HIMSS

Regulatory and Compliance Frameworks

AI tools in healthcare must comply with various regulations and standards that ensure safety, efficacy, and privacy. These frameworks were written prior to January 2025, before the inauguration of the new US administration, and are subject to change. As of today, the most relevant regulatory frameworks include:

FDA (Food and Drug Administration): In the U.S., the FDA regulates medical AI as a form of medical device if it is used for diagnosing or treating patients. Tools like Software as a Medical Device (SaMD) fall under FDA's purview, and they must meet specific criteria for safety and performance.

Framework: FDA's AI/ML Software as a Medical Device (SaMD) Action Plan.

CE Marking (Europe): In the European Union, AI tools used in healthcare must comply with the Medical Device Regulation (MDR) or the In-vitro Diagnostic Regulation (IVDR), depending on the use case. AI-based tools can be classified as medical devices and must be CE-marked to enter the European market.
HIPAA (Health Insurance Portability and Accountability Act): In the U.S., AI tools that process health data must comply with HIPAA, which establishes requirements for patient data privacy and security.
General Data Protection Regulation (GDPR): In Europe, AI tools must also comply with GDPR, especially regarding data privacy and user consent.

Clinical Evaluation Frameworks

AI tools used in healthcare must be rigorously evaluated for their clinical efficacy. Frameworks typically focus on assessing:

Accuracy and Performance: How well the AI tool performs in clinical scenarios, often compared against human clinicians or existing standard-of-care practices. This includes metrics like sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
Clinical Validation and Trial Design: AI systems often require extensive clinical validation studies, randomized controlled trials (RCTs), or real-world evidence (RWE) studies to prove their safety and effectiveness.
Usability and Integration into Clinical Workflow: The ease of use and seamless integration with existing electronic health records (EHRs), laboratory systems, imaging systems, or patient management tools.

Example Frameworks:

CONSORT-AI (Consolidated Standards of Reporting Trials-AI): Offers guidelines for reporting clinical trials involving AI technologies.
SPIRIT-AI (Standard Protocol Items for Randomized Trials of AI): Framework for planning clinical trials that involve AI.

Trustworthy AI Frameworks

These frameworks assess the overall "trustworthiness" of an AI system by evaluating several key components:

Safety: Ensures that the AI system doesn't pose any unintended risks to patients or clinicians.
Explainability: How easily the AI system’s decisions or predictions can be explained to end users (e.g., clinicians, patients).
Robustness: The AI system's ability to function reliably across different clinical environments, settings, and patient populations.
Ethical Considerations: Ensures that the AI tool adheres to ethical standards, such as respect for patient autonomy, minimizing harm, and maintaining patient privacy.

Example:

The AI Ethics Guidelines by the European Commission: Focuses on ensuring AI systems meet essential requirements of accountability, transparency, fairness, and non-discrimination.

Value-based Healthcare (VBHC) Evaluation Frameworks

In the context of healthcare, VBHC is focused on improving patient outcomes relative to the costs incurred. Evaluating AI tools through the lens of VBHC involves assessing:

Cost-effectiveness: How the AI tool impacts healthcare costs while improving patient outcomes.
Health outcomes: Evaluating whether the AI system improves health outcomes in a tangible way, for example, by improving diagnosis accuracy, reducing readmissions, or enhancing treatment adherence.
Patient satisfaction: Whether AI tools contribute to improved patient experiences and engagement.

Example:

The Institute for Value-Based Medicine (IVBM) evaluates AI tools based on how they improve health outcomes and reduce unnecessary costs.

Real-World Evidence (RWE) Frameworks

In healthcare, AI tools are often tested using real-world evidence, including data from observational studies, clinical registries, and patient-reported outcomes. The evaluation framework here focuses on:

Real-world impact: How the AI tool performs in real clinical environments, outside of controlled trials.
Generalizability: Whether the AI model can be generalized across diverse patient populations and clinical settings.
Post-market surveillance: Ongoing monitoring of AI tools once they are deployed to ensure safety and performance over time.

Example:

FDA’s RWE Framework: Provides guidance on using real-world data and evidence to evaluate the performance of medical devices, including AI tools.

Key Considerations in Evaluating AI Tools for Healthcare

Data Quality and Bias: Evaluating the quality, completeness, and representativeness of the data used to train AI models is essential for ensuring reliable and unbiased outcomes.
Human-in-the-loop (HITL) Design: In healthcare, human oversight of AI tools is critical. The framework should assess whether AI tools are designed to support clinicians, rather than replace them, and ensure that decisions made by AI tools are interpretable and actionable.
Scalability and Adaptability: AI tools should be scalable and adaptable to different medical fields and evolving clinical needs. This is particularly important in healthcare settings where needs and technologies constantly change.

By following these frameworks, healthcare providers, regulators, and developers can ensure that AI tools are effective, safe, and ethically sound for use in clinical practice. If you or your team is ideating on the best ways to assess or create AI tools, please reach out to KAMI Think Tank (info@kamithinktank.com).

Kamayani Gupta

Evaluating AI Tools Through Healthcare Life Science Frameworks

Fairness, Accountability, Transparency, and Ethics (FATE) Framework

AI Maturity Models (e.g., AI Readiness Assessment)

Regulatory and Compliance Frameworks

Clinical Evaluation Frameworks

Trustworthy AI Frameworks

Value-based Healthcare (VBHC) Evaluation Frameworks

Real-World Evidence (RWE) Frameworks

Key Considerations in Evaluating AI Tools for Healthcare

Beyond Prediction: Bridging the Gap Between Classic ML and Generative AI

Top AI Research Trends from NeurIPS 2024