Step 3: Set Up Experiments and Pilot Model
The 4-Step Methodology for AI Experimentation
Identify Outcomes & Tasks: Determine the goals you want to achieve and the specific tasks where AI might help.
Create a Gold Standard & Gather Data: Establish how you'll measure success and collect the necessary data.
Experiment & Pilot: Test different AI models on your chosen task.
Build & Deploy: Integrate the successful AI solution into workflows. You can complete steps 1-3 without engineering.
Let's dive into step 3.
Step 3: Set Up Experiments and Pilot Models
Now it's time for systematic testing. The only way to tell if an idea is good or not is to experiment with the task using prompts. This helps you build intuition over time for what a model is good at or not and requires a lot of iteration over time.
Test Models: Select a few promising AI models (commercial ones are a good starting point) and test them against your gold standard evaluation set.
Vary Prompts: Try different ways of asking the model to perform the task. Also, try multiple prompt engineering techniques.
Add Context: Experiment with providing the relevant data you gathered (templates, examples) as context to the models. See if varying the amount of context helps.
Document Everything: Keep detailed records of your prompts, the models used, the data provided, and the results.
Analyze your findings:
Performance: Did the models perform well enough to potentially augment or automate the workflow? Can you quantify the potential time savings or other benefits (ROI)?
For instance, for the clinical research coordinator (CRC) in Step 1 of the guide, was the model successfully able to recognize a handwritten signature every single time, and appropriately categorize the ICF form as approved? If so, can this process be fully automated and how much time does this save the CRC as they are preparing the study procedures?
Weaknesses: Where did the models fail? Did they hallucinate or make consistent errors? Understanding error patterns is crucial.
Let’s imagine a human provided their signature on the consent form, but it was not handwritten - and therefore, the model did not accept it. Is this an acceptable error pattern per the workflow? What other scenarios may cause the model to fail? What are the other possible steps that can be taken to prevent such errors from occurring?
Scalability: If the results are promising, could this solution benefit the broader team or organization?
While building a model to recognize handwritten signatures is beneficial to the CRC, this task could also help people in other parts of the business as well. This may be a scalable solution for the rest of the organization.
By the end of this step, you should have a clear analysis of model performance, potential business value, and a recommendation on whether to proceed. We will now go to the final step of this series.