LLM-Labs/content/labs/lab-8-evaluation-and-red-teaming.md

---
order: 8
title: Lab 8 - Evaluation and Red Teaming
description: Probe model defenses manually and with Promptfoo to evaluate security controls.
---

<!-- breakout-style: instruction-rails -->
<!-- step-style: underline -->
<!-- objective-style: divider -->

# Lab 8 - Evaluation and Red Teaming

In this lab, we will:

- Perform prompt injection against three layers of model protection
- Use Promptfoo to programmatically evaluate a model's security protections

<div class="lab-callout lab-callout--info">
  <strong>Lab Flow Guide</strong><br />
  <strong>Explore</strong> sections focus on manually probing a model, then scaling that same thinking into repeatable evaluation workflows.<br />
  Expect this lab to move from hands-on experimentation into structured testing.
</div>

To start this lab, one web service has been preconfigured:

- Promptfoo - {{service-url:promptfoo}}

## Objective 1 Explore: Direct Prompt Injection

For the first part of this lab, we are going to explore direct prompt injection. There are three levels for this challenge:

1. **System Prompt Instructional Guardrail**
2. **System Prompt + Regex**
3. **System Prompt + LLM Evaluation**

Each level will be more difficult than the last, based on how the protection interacts with the generated output.

Use the embedded widget below to probe each layer. The endpoint and model are already configured. Enter your API key, pick a level, and start testing.

<div class="lab-callout lab-callout--info">
  <strong>Tip:</strong> Conversations in the widget stay in your browser for this lab only. Copy anything you want to keep before refreshing the page.
</div>

<div data-lab8-chat></div>

As you test each protection level, pay attention to how the model responds. The goal is not just to trigger unsafe output, but to understand how each layer attempts to prevent it. For Levels 2 and 3, only the final result (or safety rejection) is shown.

---

## Objective 2 Explore: Promptfoo

While manual interaction with a model is often required for a successful jailbreak, it is often unnecessary for a quick vulnerability-scan-style red team. More often, we want confidence that a model will not respond poorly during routine user interactions. For testing a wide set of prompts against a model or application, Promptfoo is an excellent open-source framework for generating and evaluating large sets of mutated prompts.

### Explore: Promptfoo red-team workflow

Promptfoo is available on our lab machine at {{service-url:promptfoo}}. We can start by creating a new red-team configuration.

<figure style="text-align: center;">
  <a href="https://i.imgur.com/YyP8mwB.png" target="_blank">
    <img
      src="https://i.imgur.com/YyP8mwB.png"
      style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
  </a>
  <figcaption style="margin-top: 8px; font-size: 1.1em;">
    Promptfoo Home Page
  </figcaption>
</figure>
<br>

Promptfoo is designed to be approachable for both beginners and practitioners. Its wizard guides you through configuring the target, selecting datasets and mutation strategies, and tracking execution.

<div class="lab-callout lab-callout--info">
  <strong>Tip:</strong> Although the Promptfoo WebUI is convenient, it hides a critical configuration option for this lab inside the YAML file. Please use the provided configuration file: [lab-8-evaluation-and-red-teaming/promptfoo.yaml](/labs/lab-8-evaluation-and-red-teaming/promptfoo.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
</div>

<figure style="text-align: center;">
  <a href="https://i.imgur.com/1xbMstb.png" target="_blank">
    <img
      src="https://i.imgur.com/1xbMstb.png"
      style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
  </a>
  <figcaption style="margin-top: 8px; font-size: 1.1em;">
    Target API Provider (OpenAI)
  </figcaption>
</figure>
<br>

<figure style="text-align: center;">
  <a href="https://i.imgur.com/p5mw1i4.png" target="_blank">
    <img
      src="https://i.imgur.com/p5mw1i4.png"
      style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
  </a>
  <figcaption style="margin-top: 8px; font-size: 1.1em;">
    Application Details (Direct LLM Testing)
  </figcaption>
</figure>
<br>

<figure style="text-align: center;">
  <a href="https://i.imgur.com/jhgkxx0.png" target="_blank">
    <img
      src="https://i.imgur.com/jhgkxx0.png"
      style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
  </a>
  <figcaption style="margin-top: 8px; font-size: 1.1em;">
    Dataset(s) Selection
  </figcaption>
</figure>
<br>

<figure style="text-align: center;">
  <a href="https://i.imgur.com/lStW3Zo.png" target="_blank">
    <img
      src="https://i.imgur.com/lStW3Zo.png"
      style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
  </a>
  <figcaption style="margin-top: 8px; font-size: 1.1em;">
    Mutation Strategies
  </figcaption>
</figure>
<br>

Once we select `Start`, Promptfoo handles the rest. Mutations, tests, and results are all tracked by the WebUI. Promptfoo runs can take a significant amount of time, but when they finish you will be presented with a new results screen.

<figure style="text-align: center;">
  <a href="https://i.imgur.com/2UopUGj.png" target="_blank">
    <img
      src="https://i.imgur.com/2UopUGj.png"
      style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
  </a>
  <figcaption style="margin-top: 8px; font-size: 1.1em;">
    Promptfoo Results
  </figcaption>
</figure>
<br>

Promptfoo is highly flexible. Anything that involves mass evaluation of prompts against a model can be performed with the framework. Likewise, we can run an evaluation against a direct Hugging Face dataset. Once again, Promptfoo provides a WebUI, but supplying the direct YAML is often easier.

### Explore: Promptfoo evaluation workflow

<div class="lab-callout lab-callout--info">
  <strong>Tip:</strong> Please use the provided evaluation configuration file: [lab-8-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml](/labs/lab-8-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
</div>

<figure style="text-align: center;">
  <a href="https://i.imgur.com/23iFYNo.png" target="_blank">
    <img
      src="https://i.imgur.com/23iFYNo.png"
      style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
  </a>
  <figcaption style="margin-top: 8px; font-size: 1.1em;">
    Promptfoo Evaluation
  </figcaption>
</figure>
<br>

Often, running an evaluation against a known public benchmark provides a more quantitative way to measure the precision loss in your local configuration. This can be especially useful when you are trying to squeeze the best possible performance out of limited hardware.

---

## Conclusion

In this lab, we performed red team evaluations against a target model:

1. **Direct Injections** - We explored different ways to bypass common LLM controls.
2. **Promptfoo Red Teaming** - We used Promptfoo for red teaming a large number of prompts against our target model.
3. **Promptfoo Evaluation** - We used Promptfoo for benchmarking a model against a popular public benchmark, giving us a local point of comparison.

We should now have a better sense of what our next round of fine-tuning should be, or if we need to explore additional protections for our model!