Polish Update

This commit is contained in:
c4ch3c4d3
2026-03-30 17:25:32 -06:00
parent 1aa9310bc8
commit 6bcebd55ee
6 changed files with 154 additions and 78 deletions
@@ -5,28 +5,44 @@
# Lab 6 - Evaluation and Red Teaming
In this lab, we will:
* Perform Prompt Injection against three layers of model protection
* Use PromptFoo to programmatically evaluate a model's security protections
* Perform prompt injection against three layers of model protection
* Use Promptfoo to programmatically evaluate a model's security protections
<div class="lab-callout lab-callout--info">
<strong>Lab Flow Guide</strong><br />
<strong>Explore</strong> sections focus on manually probing a model, then scaling that same thinking into repeatable evaluation workflows.<br />
Expect this lab to move from hands-on experimentation into structured testing.
</div>
To start this lab, one web service has been preconfigured:
* Promptfoo - http://<IP>:15500
You'll also need to access:
* Open WebUI - https://ai.zuccaro.me/
## Objective 1 Explore: Direct Prompt Injection
For part 1 of our lab, we're going to explore Direct Prompt Injection. There are three levels for this lab:
For the first part of this lab, we are going to explore direct prompt injection. There are three levels for this challenge:
1. System Prompt Instructional Guardrail
2. System Prompt + Regex
3. System Prompt + LLM Evaluation
1. **System Prompt Instructional Guardrail**
2. **System Prompt + Regex**
3. **System Prompt + LLM Evaluation**
Each level will be more difficult than the last, based on how the protection interacts with the generated output.
<div class="lab-callout lab-callout--warning">
<strong>Warning:</strong> Due to the limitations of Open WebUI, you will see generated outputs BEFORE safety evaluation. A passing answer involves the protection missing the final output.
<strong>Warning:</strong> Due to the limitations of Open WebUI, you will see generated outputs before safety evaluation. A successful jailbreak means the protection missed the final output.
</div>
To access the lab, navigate to https://ai.zuccaro.me. You can log in with the following information:
### Explore: Access the hosted challenge
To access the lab, navigate to https://ai.zuccaro.me and log in with the following credentials:
* `Username` - `student@zuccaro.me`
* `Password` - `Student9205!`
* `Username` - student@zuccaro.me
* `Password` - Student9205!
<br>
<figure style="text-align: center;">
<a href="https://i.imgur.com/YSgw3wq.png" target="_blank">
<img
@@ -34,22 +50,28 @@ To access the lab, navigate to https://ai.zuccaro.me. You can log in with the f
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Chapter 1 - 1024 Chunks, Recursive Character. This strategy nicely breaks paragraphs up.
Open WebUI Outside Lab Hosted Challenge
</figcaption>
</figure>
<br>
Good luck and have fun!
Good luck and have fun.
<div class="lab-callout lab-callout--info">
<strong>Tip:</strong> Conversations for this Open WebUI instance will not be saved. Ensure you take steps to save any interactions you wish to keep!
<strong>Tip:</strong> Conversations for this Open WebUI instance will not be saved. Ensure you save any interactions you want to keep.
</div>
## Objective 2 Explore: PromptFoo
As you test each protection level, pay attention to how the model behaves before and after the safety check. The goal is not just to trigger unsafe output, but to understand how each layer attempts to prevent it.
While manual interaction with a model is often required for a successful jailbreak, it is often unnecessary for a quick "Vulnerability Scan" style of red team. Often, we're concerned about ensuring our model won't respond poorly during typical user interactions. For testing a wide set of prompts against a model or application, Promptfoo is a great open source project to empower us with the ability to test a wide set of mutated prompts.
---
Promptfoo is available on our lab machine at https://<YOUR STUDENT IP>:15500. We can start working with Promptfoo by creating a new Red Team configuration.
## Objective 2 Explore: Promptfoo
While manual interaction with a model is often required for a successful jailbreak, it is often unnecessary for a quick vulnerability-scan-style red team. More often, we want confidence that a model will not respond poorly during routine user interactions. For testing a wide set of prompts against a model or application, Promptfoo is an excellent open-source framework for generating and evaluating large sets of mutated prompts.
### Explore: Promptfoo red-team workflow
Promptfoo is available on our lab machine at http://<YOUR STUDENT IP>:15500. We can start by creating a new red-team configuration.
<figure style="text-align: center;">
<a href="https://i.imgur.com/YyP8mwB.png" target="_blank">
@@ -58,15 +80,15 @@ Promptfoo is available on our lab machine at https://<YOUR STUDENT IP>:15500. W
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Embedding Atlas Flow Diagram
Promptfoo Home Page
</figcaption>
</figure>
<br>
Promptfoo is designed to be easy to use for both beginners and practitioners. It's wizard will guide us through the process of configuration the tool for our target, selection of datasets and mutations, and track execution.
Promptfoo is designed to be approachable for both beginners and practitioners. Its wizard guides you through configuring the target, selecting datasets and mutation strategies, and tracking execution.
<div class="lab-callout lab-callout--info">
<strong>Tip:</strong> Although the Promptfoo WebUI is convenient, it unfortunately hides a critical configuration option within .yaml for our lab. As such, please use this provided configuration - [./content/labs/lab-6-evaluationn-and-red-teaming/promptfoo.yaml](Promptfoo.yaml). You can upload it with the "Load Config" button in the bottom left corner, and then proceed with the following screenshot steps.
<strong>Tip:</strong> Although the Promptfoo WebUI is convenient, it hides a critical configuration option for this lab inside the YAML file. Please use the provided configuration file: [lab-6-evaluation-and-red-teaming/promptfoo.yaml](content/labs/lab-6-evaluation-and-red-teaming/promptfoo.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
</div>
<figure style="text-align: center;">
@@ -118,7 +140,7 @@ Promptfoo is designed to be easy to use for both beginners and practitioners. I
<br>
Once we select start, Promptfoo handles the rest! Mutations, tests, and results are all tracked by the WebUI. Promptfoo tests can take a significant period of time! Once done, we'll be provided a new results screen.
Once we select `Start`, Promptfoo handles the rest. Mutations, tests, and results are all tracked by the WebUI. Promptfoo runs can take a significant amount of time, but when they finish you will be presented with a new results screen.
<figure style="text-align: center;">
<a href="https://i.imgur.com/2UopUGj.png" target="_blank">
@@ -132,10 +154,12 @@ Once we select start, Promptfoo handles the rest! Mutations, tests, and results
</figure>
<br>
Promptfoo is supremely flexible! Anything that involves mass evaluation of prompts against a model can be easily performed using the Promptfoo framework. Likewise, we can run an evaluation against a direct HuggingFace dataset. Once again, PromptFoo provides a WebUI, but providing the direct .yaml is often easier.
Promptfoo is highly flexible. Anything that involves mass evaluation of prompts against a model can be performed with the framework. Likewise, we can run an evaluation against a direct Hugging Face dataset. Once again, Promptfoo provides a WebUI, but supplying the direct YAML is often easier.
### Explore: Promptfoo evaluation workflow
<div class="lab-callout lab-callout--info">
<strong>Tip:</strong> Please use this provided configuration - [./content/labs/lab-6-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml](MMLU-Promptfoo-Config.yaml). You can upload it with the "Load Config" button in the bottom left corner, and then proceed with the following screenshot steps.
<strong>Tip:</strong> Please use the provided evaluation configuration file: [lab-6-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml](content/labs/lab-6-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
</div>
@@ -151,7 +175,9 @@ Promptfoo is supremely flexible! Anything that involves mass evaluation of prom
</figure>
<br>
Often times, running an Evaluation of a known publicly tested dataset against a copy of your local model can be a more quantitative way to determine the precision loss of your configuration. This can be useful when trying to squeeze the maximum possible performance possible out of your hardware!
Often, running an evaluation against a known public benchmark provides a more quantitative way to measure the precision loss in your local configuration. This can be especially useful when you are trying to squeeze the best possible performance out of limited hardware.
---
## Conclusion
@@ -162,4 +188,3 @@ In this lab, we performed red team evaluations against a target model:
3. **Promptfoo Evaluation** - We used Promptfoo for benchmarking a model against a popular public benchmark, giving us a local point of comparison.
We should now have a better sense of what our next round of fine-tuning should be, or if we need to explore additional protections for our model!