Lab 5 Moved to Unsloth

This commit is contained in:
c4ch3c4d3
2026-03-27 21:09:58 -06:00
parent d8882384f7
commit 882abccb65
3 changed files with 314 additions and 129 deletions
+14 -1
View File
@@ -112,6 +112,15 @@ Locate, pull, and run **Qwen3.5 4B** using the **OpenWebUI**. By defualt, Op
<figcaption>Successful inference the model returns a coherent answer.</figcaption> <figcaption>Successful inference the model returns a coherent answer.</figcaption>
</figure> </figure>
9. **Download Gemma3n e2B**
* While we're downloading models, let us download one more. You can either repeat the process from the previous steps to find and download **Gemma3n e2B**, or just use the following model tag to download the model via the Open WebUI search bar:
```bash
ollama pull gemma3n:e2b
```
Google has designed gemma 3n models designed for efficient execution on resource constrained devices such as laptops, tablets, phones, or Nvidia 2080 Super GPUs.
--- ---
@@ -203,7 +212,11 @@ Feel free to continue to explore with other topics or images. Note how each tim
### Explore: Prompt Engineering & System Prompting ### Explore: Prompt Engineering & System Prompting
<div class="lab-callout lab-callout--warning"> <div class="lab-callout lab-callout--warning">
<strong>Warning:</strong> As you explore chat via Open WebUI, ensure you turn <code>think (Ollama)</code> to OFF. Qwen3.5 4b is likely to enter an infinite thinking loop for these tasks otherwise, which will require a VM reboot. <strong>Warning:</strong> As you explore chat via Open WebUI, ensure you turn <code>think (Ollama)</code> to OFF. <strong>Qwen3.5 4B</strong> is likely to enter an infinite thinking loop for these tasks otherwise, which will require a VM reboot.
<br><br>
Alternatively, choose to perform these steps with **Gemma3n e2B**, which can handle tight environments more gracefully.
</div> </div>
Next, lets review different ways we can coax a model to perform better, without having to perform fine tuning or parameter customization. We can do this by "priming" the model with our first prompt in a number of ways: Next, lets review different ways we can coax a model to perform better, without having to perform fine tuning or parameter customization. We can do this by "priming" the model with our first prompt in a number of ways:
@@ -7,7 +7,7 @@
In this lab, we will: In this lab, we will:
* Explore public datasets * Explore public datasets
* Generate a dataset with Kiln.ai * Generate a dataset with Kiln.ai
* Fine-tune Gemma3 with LLaMA Factory * Fine-tune Gemma3 with Unsloth Studio
## Objective 1 Explore: Public Datasets ## Objective 1 Explore: Public Datasets
@@ -48,7 +48,7 @@ Navigate to [GSM8K](https://huggingface.co/datasets/openai/gsm8k). Much like ho
<figure style="text-align: center;"> <figure style="text-align: center;">
<img <img
src="https://huggingface.co/datasets/openai/gsm8k/resolve/main/docs/assets/gsm8k-card.png" src="https://i.imgur.com/Y55FAPV.png"
width="600" width="600"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;"> style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; "> <figcaption style="margin-top: 8px; font-size: 1.1em; ">
@@ -73,10 +73,16 @@ Larger datasets, such as [Fineweb](https://huggingface.co/datasets/HuggingFaceFW
#### Openweight vs. opensource #### Openweight vs. opensource
One last note on public datasets. A common misconception is that *open weight* models are **open source**. One last note on public datasets. A common misconception is that *open weight* models are **open source**.
- **Openweight** models (e.g., Gemma, DeepSeekR1, Qwen) provide publicly released checkpoints but **do not** include permissive sourcecode licenses. <br>
- **True opensource** LLMs remain rare; the only notable example at time of writing is **INTELLECT2**, which was built via a distributed "SETI@Homestyle" effort.
Unfortunately, **INTELLECT2** does not favorably compare to existing *open weight* models such as **Gemma**, **DeepSeek R1**, **Qwen**, or other bleeding edge models. When using these *open weight* models for corporate purposes, review the license! - *Openweight* models (e.g., Gemma, DeepSeekR1, Qwen) provide publicly released checkpoints but **do not** include permissive sourcecode licenses.
- True **opensource** LLMs remain rare; there are very few models that freely share their Dataset and Training pipeline. Examples are **INTELLECT2**, which was built via a distributed "SETI@Homestyle" effort, or Nvidia's **Nemotron 3** family of models.
<br>
Unfortunately, **INTELLECT2** does not favorably compare to existing *open weight* models such as **Gemma**, **DeepSeek R1**, **Qwen**, or other bleeding edge models. **Nemotron 3** also is behind the State of the Art (SOTA) models, but instead serves as a showcase on how anyone can train models using Nvidia hardware.
Regardless of model type though, when using any *open weight* model for corporate purposes, review the license for allowed use!
<br> <br>
@@ -84,7 +90,7 @@ Unfortunately, **INTELLECT2** does not favorably compare to existing *open we
## Objective 2: Synthetic Dataset Generation ## Objective 2: Synthetic Dataset Generation
If you can, I strongly encourage you to try and find ready made, or easily massaged datasets that do not require synthetic data. You'll often obtain better results with less effort this way. Afterall, the original frontier ChatGPT family of models merely scraped the entire internet, every book, scientific papers, and other "pre made" raw data to help generate their first dataset. However, this is often unrealistic, as at minimum, we need **1000** input-output pairs in order to begin fine tuning, so... If you can, I strongly encourage you to try and find ready made, or easily massaged datasets that do not require synthetic data. You'll often obtain better results with less effort this way. After all, the original frontier ChatGPT family of models merely scraped the entire internet, every book, scientific papers, and other "pre made" raw data to help generate their first dataset. However, this is often unrealistic, as at minimum, we need **1000** input-output pairs in order to begin fine tuning, so...
### Why Use Synthetic Data? ### Why Use Synthetic Data?
@@ -105,10 +111,14 @@ If you can, I strongly encourage you to try and find ready made, or easily massa
### 1. Install & Launch KilnAI ### 1. Install & Launch KilnAI
If you haven't yet, download [Kiln AI](https://github.com/Kiln-AI/Kiln) and run the installer for your OS. If you haven't yet, download [Kiln AI](https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1) and run the installer for your OS.
1. **Open Kiln**. It should automatically go to `http://localhost:3000` in your browser. <div class="lab-callout lab-callout--info">
2. Click **`Get Started`**. <strong>Tip:</strong> These steps were designed for <strong>Kiln v0.18</strong>. While compatible with newer versions, v0.18 features a polished, simplified UI ideal for this lab. Note that Kiln undergoes active development with frequent UI changes across versions.
</div>
1. **Open Kiln**. It should automatically go to `http://localhost:3000` in your machine's browser.
2. Click **`Get Started`**.
<figure style="text-align:center;"> <figure style="text-align:center;">
<img src="https://i.imgur.com/hJNehuE.png" width="400" <img src="https://i.imgur.com/hJNehuE.png" width="400"
@@ -125,11 +135,15 @@ Kiln is now ready for configuration.
1. In Kiln's lefthand **Providers** panel, click **`Connect`** under the Ollama entry. 1. In Kiln's lefthand **Providers** panel, click **`Connect`** under the Ollama entry.
<figure style="text-align:center;"> <div class="lab-callout lab-callout--warning">
Use your Ollama instance IP to connect (I.E. http://<STUDENT IP>:11434). You must be connected to the VPN for this to work.
</div>
<figure style="text-align:center;">
<img src="https://i.imgur.com/vEwUszl.png" width="600" <img src="https://i.imgur.com/vEwUszl.png" width="600"
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;"> style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
<figcaption>Connect to a local or remote Ollama instance.</figcaption> <figcaption>Connect to a local or remote Ollama instance.</figcaption>
</figure> </figure>
2. Click **`Continue`** to confirm the connection. 2. Click **`Continue`** to confirm the connection.
@@ -157,7 +171,7 @@ Kiln is now ready for configuration.
1. Click **`Add Task`** and fill out the form with the details below. 1. Click **`Add Task`** and fill out the form with the details below.
* **Task name:** `ATT&CK Classification` * **Task name:** `ATT&CK Classification`
* **Goal:** "Finetune Gemma34B so it can map a textual scenario to the correct MITREATT&CK technique." * **Goal:** "Given a description of an attack technique, tactic, or procedure, return only an accurate MITRE ATT&CK ID and Name in the format: "ID# - Technique". "
* **System prompt (autofilled):** Kiln will prepend this text to every generation request. * **System prompt (autofilled):** Kiln will prepend this text to every generation request.
<figure style="text-align:center;"> <figure style="text-align:center;">
@@ -227,11 +241,11 @@ When you first open a project, Kiln lands on the **Run** page.
#### 7.2 Generate TopLevel Topics #### 7.2 Generate TopLevel Topics
1. Click **`Add Topics`**. This will generate top level topics that follow broad MITRE ATT&CK categories. 1. Click **`Add Topics`**. This will generate top level topics that follow broad MITRE ATT&CK categories.
2. Choose **`Gemma3:12bitqat`** (or any larger model you prefer). 2. Choose **`Gemma-3n-2B`**.
3. Set **Number of topics** to **8** and click **`Generate`**. 3. Set **Number of topics** to **8** and click **`Generate`**.
<figure style="text-align:center;"> <figure style="text-align:center;">
<img src="https://i.imgur.com/e6MvhSj.png" width="400" <img src="https://i.imgur.com/SHh8v0y.png" width="400"
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;"> style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
<figcaption>Select model & number of topics.</figcaption> <figcaption>Select model & number of topics.</figcaption>
</figure> </figure>
@@ -246,7 +260,7 @@ When you first open a project, Kiln lands on the **Run** page.
#### 7.3 Create Input Scenarios for All Topics #### 7.3 Create Input Scenarios for All Topics
1. With the topics selected, click **`Generate Model Inputs`**. Ensure **`Gemma3:12bitqat`** is still chosen, and then affirm your selection. 1. With the topics selected, click **`Generate Model Inputs`**. Ensure **`Gemma-3n-2B`** is still chosen, and then affirm your selection.
Kiln now asks the model to produce a short *scenario description* for each topic. Kiln now asks the model to produce a short *scenario description* for each topic.
2. After the model finishes, review the generated inputs. You may edit any that look off. 2. After the model finishes, review the generated inputs. You may edit any that look off.
@@ -292,24 +306,44 @@ When you first open a project, Kiln lands on the **Run** page.
--- ---
## Objective 3: Fine Tuning with LLaMA Factory ## Objective 3: Fine Tuning with Unsloth Studio
There are many popular options for performing finetunes, although many have their drawbacks: There are many popular options for performing fine tunes, although many have their drawbacks:
* [Unsloth](https://unsloth.ai) is the most popular solution, but currently does not support multi-gpu setups without a commercial license. * [Unsloth](https://unsloth.ai) is the most popular solution, but currently does not support multi-gpu setups without a commercial license.
* [Axoltl](https://axolotl.ai) is built off of Unsloth, and does support multi-gpu setups, but often lags behind Unsloth in features and capability. * [Axoltl](https://axolotl.ai) is built off of Unsloth, and does support multi-gpu setups, but often lags behind Unsloth in features and capability, and does not feature any Web UI.
* Both these options are also CLI only. While not the end of the world, it does mean we need to learn how these tools work * [LLaMaFactory](https://github.com/hiyouga/LLaMA-Factory) is the most flexible of these options, supporting both Unsloth & Axlotle, as well as additional backends. However, this tool is daunting for a beginner to approach fine tuning, and is best left for later experimentation.
<br>
While I encourage you to explore all of these tools, they are unfortunately out of the scope for this lab. Instead, we're going to focus on **Unsloth**, as it provides the best web UI to easily navigate the fine tuning process.
While I encourage you to explore both of these tools, they are unfortunately out of the scope for this lab. Instead, we're going to use a project that tries to make these tools easier to use - [LLaMaFactory](https://github.com/hiyouga/LLaMA-Factory). To do so, we'll need to perform some additional setup of our lab environment: ### Explore: Touring Unsloth Studio
### Explore: Touring LLaMa Factory Although Unsloth Studio does its best to simplify the fine tuning process, there are still many dials and knobs to turn! Lets take a brief tour of the most important options:
Although LLaMa Factory does its best to simplify the fine tuning process, there are still many dials and knobs to turn! Lets take a brief tour of the most important options: 1. Model Selection - This area allows us to select any model that we're interested in fine tuning. Unsloth Studio will handle downloading the FP16 version of the model from **HuggingFace** for us.
2. Quantization Selection - Without much better hardware, we will usually be training **LoRA**s (Low-Rank Adapters). These will slightly nudge the parameters of the model in the direction we're interested in. If we need additional headroom, we can instead **quantize the base model** (e.g., reduce its precision from 16-bit to 4-bit) and then apply **LoRA** to the quantized model, generating a **QLoRA** (Quantized LoRA). This approach combines the efficiency of quantization with the parameter-efficiency of LoRA. Unsloth will conveniently tell us its estimate for how well a given combination of *Model* & **QLoRA** will fit in our system's available VRAM.
<figure style="text-align: center;">
<img
src="https://i.imgur.com/XwAdaKJ.png"
width="800"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
Model & LoRA Type Selections. Note how models are labeled "OOM" or "Tight" based on hardware.
</figcaption>
</figure>
3. Dataset Selection - This is where we can utilize our custom made dataset. Unfortunately, while we've gone through the process of making a dataset, we had to use a very small model to simulate the process. Conveniently, Unsloth allows us to search for any dataset available publicly on HuggingFace. We can select conveniently select the sarahwei/cyber_MITRE_CTI_dataset_v15 for our purposes. You can select "View Dataset" if you'd like to see some of the raw contents of this data.
<figure style="text-align: center;">
<img
src="https://i.imgur.com/8xBdcnd.png"
width="400"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
Dataset Selection
</figcaption>
</figure>
1. Model Selection - This area allows us to select any model that we're interested in finetuning. LLaMa factory will handle downloading the FP16 version of the model from **HuggingFace** for us. Note that for fine tuning, while you can fine tune an already quantized model, you'll often obtain a better result as measured by perplexity by starting with the "raw" model.
2. Quantization Selection - Without much better hardware, we will usually be training **LoRA**s (Low-Rank Adapters). These will slightly nudge the parameters of the model in the direction we're interested in. If we need additional headroom, we can instead **quantize the base model** (e.g., reduce its precision from 16-bit to 4-bit) and then apply **LoRA** to the quantized model, generating a **QLoRA** (Quantized LoRA). This approach combines the efficiency of quantization with the parameter-efficiency of LoRA.
3. Dataset Selection - This is where we can utilize our custom made dataset. Unfortunately, adding these datasets is a rather manual effort. This lab has already pre-loaded our dataset for us, but the steps are listed in COME UP WITH SOMEHWERE TO DO THAT.
4. Train Settings - This is where we can configure exactly how our model will be trained. The majority of these settings can stay default, until you've a specific need that pushes you down the rabbit hole. In particular, we'll be interested in 4. Train Settings - This is where we can configure exactly how our model will be trained. The majority of these settings can stay default, until you've a specific need that pushes you down the rabbit hole. In particular, we'll be interested in
* **Learning Rate** - Controls how large an adjustment to the model's weights are made during each step * **Learning Rate** - Controls how large an adjustment to the model's weights are made during each step
* **Epoch** - Determines the number of times the training algorithm will iterate over the entire dataset (aka repeats training 3 times by default). Critical to help avoid under or over fitting. * **Epoch** - Determines the number of times the training algorithm will iterate over the entire dataset (aka repeats training 3 times by default). Critical to help avoid under or over fitting.
@@ -319,84 +353,60 @@ Although LLaMa Factory does its best to simplify the fine tuning process, there
<figure style="text-align: center;"> <figure style="text-align: center;">
<img <img
src="https://i.imgur.com/zbQ17cp.png" src="https://i.imgur.com/fzSvggY.png"
width="800" width="400"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;"> style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; "> <figcaption style="margin-top: 8px; font-size: 1.1em; ">
Fine Tuning Settings Fine Tuning Settings
</figcaption> </figcaption>
</figure> </figure>
### Execute: LLaMa Factory Fine Tuning ### Execute: Unsloth Studio Fine Tuning
Set the following before we start to fine tune Gemma: Set the following before we start to fine tune Gemma:
1. **Model**: `Gemma-3-4B` 1. **Model**: `unsloth/gemma-3-270m-it`
2. **Chat template**: `Gemma3` 2. **Max Steps**: `100` (NOTE: For real fine tuning, use Epochs, not Steps.)
3. **Learning Rate**: `5e-6` 3. **Learning Rate**: `0.00005`
4. **Dataset**: `mitre` 4. **Dataset**: `sarahwei/cyber_MITRE_CTI_dataset_v15`
5. **Warmup Steps**: `100` 5. **Warmup Steps**: `100`
* Scroll to the bottom of the page, and click `Preview command`. The WebUI is merely a front end for constructuing `llamafactory-cli` commands, and this shows exactly what will be run. * Scroll to the bottom of the page, and click `Preview command`. The WebUI is merely a front end for constructuing `llamafactory-cli` commands, and this shows exactly what will be run.
* When done reviewing, next click `Start`. It will take some time for LLaMa Factory to start its process, as it will first need to download the full `FP16` raw `Gemma-3-4B` model files. * When done reviewing, next click `Start`. It will take some time for Unsloth Studio to start its process, as it will first need to download the full `FP16` raw `Gemma-3-4B` model files.
<figure style="text-align: center;"> <figure style="text-align: center;">
<img <img
src="https://i.imgur.com/r7dfG2k.png" src="https://i.imgur.com/fzSvggY.png"
width="600" width="400"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;"> style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; "> <figcaption style="margin-top: 8px; font-size: 1.1em; ">
LLaMa Factory CLI Generated Command & Start Setting Max Steps, Learning Rate, and Warmup Steps
</figcaption> </figcaption>
</figure> </figure>
**Monitor the loss graph** | The graph is measuring **Loss** per **Training step** (roughly 8k steps, 2.5k examples * 3 epochs), or put simply, how different the model's predicted answer is from our data. This should gudually, logarithmically slope downwards if training is working. **Monitor the loss graph** | The graph is measuring **Loss** per **Training step** (roughly 8k steps, 2.5k examples * 3 epochs), or put simply, how different the model's predicted answer is from our data. This should gradually, logarithmically slope downwards if training is stable.
#### What to Look for in the Loss Curve #### What to Look For
- **Steady decline** → model is learning. - **Training Loss:** Decreasing smoothly → model is learning effectively and training is stable
- **Rapid flattening early** → learningrate may be too low or the model is underparameterized. - **Gradient Norm:** Drops then stabilizes → gradients are well-behaved (no major spikes)
- **Very flat near the end** → possible overfitting; consider reducing the number of epochs or adding regularization. - **Learning Rate:** Gradually increasing, then eventually decreasing → expected warmup behavior helping stable early training
If the curve behaves unexpectedly, you can stop the job, adjust the **learningrate** or **warmup steps**, and restart from the latest checkpoint.
<div style="
display: flex;
justify-content: center;
align-items: flex-start;
gap: 32px;
width: 100%;
max-width: 1200px;
margin: 0 auto;
padding: 10px;
box-sizing: border-box;
">
<div style="text-align: center; flex: 0 0 auto;">
<img
src="https://i.imgur.com/4n6G3Db.png"
width="700px"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<div style="margin-top: 8px; font-size: 1.1em; text-align: center;">
LLaMa Factory Fine Tuning View
</div>
</div>
<div style="text-align: center; flex: 0 0 auto;">
<img
src="https://i.imgur.com/9NYEjpA.png"
width="400px"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<div style="margin-top: 8px; font-size: 1.1em; text-align: center;">
Loss Curve Upclose
</div>
</div>
</div>
Once completed, we can scroll back up and
1. Select Chat
2. Select our newly trained **LoRA** checkpoint. This name of this checkpoint will match the date that you performed the lab.
3. Click `Load Model`
<figure style="text-align: center;"> <figure style="text-align: center;">
<img <img
src="https://i.imgur.com/Z2Hpa2S.png" src="https://i.imgur.com/Cue7afQ.png"
width="600"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
Typical Training Run
</figcaption>
</figure>
Unfortunately, due to the time constraints of a live classroom, we'll be unable to pursue this training run to completion. On the lab provided GPUs, a full Epoch could take up to two hours! Feel free to cancel it at your leisure.
We can however chat with a version of Gemma 3 4B that was trained before this class. It was trained against roughly 60,000 examples, partially generated using kiln, partially harvested from various datasets throughout Huggingface. While not perfect, we can see that the model is signifigantly better than the default.
<figure style="text-align: center;">
<img
src="https://i.imgur.com/FKZXaV3.png"
width="600" width="600"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;"> style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; "> <figcaption style="margin-top: 8px; font-size: 1.1em; ">
@@ -404,19 +414,15 @@ Once completed, we can scroll back up and
</figcaption> </figcaption>
</figure> </figure>
Scrolling down will show all the options for interaction with the model, as we'd expect in most other interfaces. We have options for changing inference perameters, such as Top-P or Temperature, as well as a location for us to input our system prompt. If we're looking to test the model's accuracy with our fine tune, we normally need to ensure these values match the desired endstate values as closely as possible, but we're only going to set the system prompt, as that is most critical for our finetune. To test this ourselves, select:
Set the system prompt to the one we selected when using **Kiln.ai** - "Given a description of an attack technique, tactic, or procedure, the model should return only a MITRE ATTACK ID and Name." 1. The chat button at the very top of the screen
2. Download our model. Its under my personal HuggingFace Account name, c4ch3c4d3
| Test Prompt | Expected Output Format | 3. Set the system prompt to the one we selected when using **Kiln.ai** - "Given a description of an attack technique, tactic, or procedure, return only an accurate MITRE ATT&CK ID and Name in the format: "ID# - Technique".
|------------|------------------------|
| "A malicious actor uses PowerShell to download a file from a remote server." | `T1059.001 PowerShell` |
| "The adversary exfiltrates data via a compressed archive sent over HTTP." | `T1567.001 Exfiltration Over Web Services` |
| "Credential dumping is performed using Mimikatz." | `T1003.001 LSASS Memory` |
<figure style="text-align: center;"> <figure style="text-align: center;">
<img <img
src="https://i.imgur.com/ArMfy4j.png" src="https://i.imgur.com/GHExjE3.png"
width="600" width="600"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;"> style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<figcaption style="margin-top: 8px; font-size: 1.1em; "> <figcaption style="margin-top: 8px; font-size: 1.1em; ">
@@ -424,58 +430,41 @@ Set the system prompt to the one we selected when using **Kiln.ai** - "Given a d
</figcaption> </figcaption>
</figure> </figure>
If we're happy with our final model, lastly we can export the model for easy import into Ollama. | Test Prompt | Expected Output Format |
|------------|------------------------|
| "A malicious actor uses PowerShell to download a file from a remote server." | `T1059.001 PowerShell` |
| "The adversary exfiltrates data via a compressed archive sent over HTTP." | `T1567.001 Exfiltration Over Web Services` |
| "Credential dumping is performed using Mimikatz." | `T1003.001 LSASS Memory` |
The Unsloth chat view is relatively simplistic, but does provide options for changing inference perameters, such as Top-P or Temperature, as well as a location for us to input our system prompt. If we're looking to test the model's accuracy with our fine tune, we normally need to ensure these values match the desired endstate values as closely as possible.
### Export the FineTuned Model ### Export the FineTuned Model
1. Switch to the **Export** tab. <div class="lab-callout lab-callout--warning">
2. Choose a directory on your local machine (or a mounted drive) where you want the exported files to live. <strong>Skippable:</strong> These steps are provided for reference as we never successfully finished a fine tune within the lab time period.
3. Select one of the following output formats:
- **FP16 Safetensors** a highquality checkpoint you can load again with LLaMAFactory or HuggingFace.
- **GGUF (4bit)** a compact file ready for import into **Ollama** or other GGUFcompatible runtimes.
<div style="
display: flex;
justify-content: center;
align-items: flex-start;
gap: 32px;
width: 100%;
max-width: 1200px;
margin: 0 auto;
padding: 10px;
box-sizing: border-box;
">
<div style="text-align: center; flex: 0 0 auto;">
<img
src="https://i.imgur.com/7rAbX33.png"
width="700px"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<div style="margin-top: 8px; font-size: 1.1em; text-align: center;">
Export Model
</div>
</div>
<div style="text-align: center; flex: 0 0 auto;">
<img
src="https://i.imgur.com/5GBXu0i.png"
width="400px"
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
<div style="margin-top: 8px; font-size: 1.1em; text-align: center;">
Local File Location
</div>
</div>
</div> </div>
1. Switch to the **Export** tab.
2. Select the training run of the model you've performed.
3. Select the latest checkpoint, or if you'd like to explore an alternative, the checkpoint desired.
4. We can export in a number of formats:
- **Merged Model** A BF16 .safetensors format of the model which can be utilized in other projects
- **LORA** Only export the LORA adapter layers generated during training. Useful if we wish to share only our new files with other users who already have the model downloaded, but not our fine tune.
- **GGUF** A compact file ready for import into **Ollama** or other GGUFcompatible runtimes.
<br> <br>
--- ---
## Conclusion ## Conclusion
In this lab, we completed a full fine-tuning workflow: In this lab, we completed a LoRA fine-tuning workflow:
1. **Dataset Generation** - We explored public datasets on HuggingFace and used Kiln AI to generate a synthetic dataset for MITRE ATT&CK classification. 1. **Dataset Generation** - We explored public datasets on HuggingFace and used Kiln AI to generate a synthetic dataset for MITRE ATT&CK classification.
2. **Fine Tuning** - We used LLaMA Factory to fine-tune Gemma-3-4B on our generated dataset. 2. **Fine Tuning** - We used Unsloth Studio to fine-tune Gemma-3-4B on our generated dataset.
3. **Validation & Export** - We tested the model with sample prompts and exported the fine-tuned model in both FP16 and GGUF formats. 3. **Validation & Export** - We tested the model with sample prompts and exported the fine-tuned model in both FP16 and GGUF formats.
If all has gone well, then the model should be much more accurate at identifying MITRE ATT&CK codes from user input scenarios. If not, additional experimentation may be necessary to produce a good fine tune. Playing with the parameters we've discussed, improving and expanding our dataset, or even fine tuning a larger or better base model can also help affect our success rate. If all has gone well, then the model should be much more accurate at identifying MITRE ATT&CK codes from user input scenarios. If not, additional experimentation may be necessary to produce a good fine tune. Playing with the parameters we've discussed, improving and expanding our dataset, or even fine tuning a larger or better base model can also help affect our success rate.
@@ -0,0 +1,183 @@
<!-- breakout-style: instruction-rails -->
<!-- step-style: underline -->
<!-- objective-style: divider -->
# Lab 6 - Evaluation and Red Teaming
In this lab, we will:
* Perform Prompt Injection against three layers of model protection
* Use PromptFoo to programmatically evaluate a model's security protections
## Objective 1 Explore: Red Teaming
Chunking is the first step in any RAG pipeline. Chunking is the process of dividing our document into snippets that can then be stored within a database, paired with an embedded representation of that data. Because chunking occurs so early within the RAG process, the strategy chosen to create chunks of a document proves critical to the eventual embeddings which will be stored.
Successful chunking is hyper specific to the kinds of documents we wish to chunk. In real RAG pipeline production level development, we'd likely execute across a number of strategies against documents that we've analyzed for quality, bucketing them into various processing processes. However, we can at least get a rough idea of what affects chunking will have with a basic visualization.
First, ensure we've started our lab:
```bash
~/lab1/lab4_start.sh
```
And then, in a web browser, navigate to http://<STUDENT ASSIGNED SYSTEM IP>:3000. Once loaded, you should see the ChunkViz homepage.
<figure style="text-align: center;">
<a href="https://i.imgur.com/PG6fp1V.png" target="_blank">
<img
src="https://i.imgur.com/PG6fp1V.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
ChunkViz Default Page
</figcaption>
</figure>
<br>
Already, ChunkViz is populated with some example text. Additionally, the text has already been "chunked" according to a default, character based splitting strategy. In this case, every 200 characters is considered one chunk. We can modify chunk sizes by playing with the "Chunk Size" and "Chunk Overlap" sliders. Try changing those to 256 & 20 respectively.
<figure style="text-align: center;">
<a href="https://i.imgur.com/9SDyh7I.png" target="_blank">
<img
src="https://i.imgur.com/9SDyh7I.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Chunk Size & Overlap
</figcaption>
</figure>
<br>
Note how the colors in the text below dynamically change. Each color is a single chunk, with the "green" between each unique color the overlap. This overlap helps to increase the liklyhood that any given chunk will be properly selected.
Next, lets explore different chunking strategies. The major ones that we will cover are:
| Strategy | Description |
|---|---|
| Character Splitter | This default view splits chunks into characters of words. |
| Token Splitter | Split chunks based on their tokenization values (tokenization done by **tiktoken**). |
| Sentence Splitter | Split chunks into rough sizes based on the interpretation of what is a "sentence". |
| Recursive Character | Split chunks based on multiple possible separators, such as new lines (`\n`), periods (`.`), commas (`,`), or other relevant language section signifiers. |
Select each option, and observe some peculiarities in how ChonkViz breaks text into chunks.
<figure style="text-align: center;">
<a href="https://i.imgur.com/jWY4nOd.png" target="_blank">
<img
src="https://i.imgur.com/jWY4nOd.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Chunking Strategies
</figcaption>
</figure>
<br>
Each strategy comes with its own unique benefits and drawbacks. Character based splitting is often one of the easiest strategies to implement, as all input will utilize text characters for OCR. Token based splitting is useful when consistency in chunk size is imperative. Sentence & Recursive splitting are often better for preserving "complete thoughts", as humans often write in complete sentences, but not always.
Lets explore one more facet of chunking, this time through the process of how chunking might present itself against a novel. Open your provided copy of "Blindsight" by Peter Watts, in txt format. Paste the contents into ChonkViz. Once again, play with the sliders (anywhere from 64 up to 1024 chunk sizes) and strategies. Note how different chunk sizes split the novel in different ways.
<figure style="text-align: center;">
<a href="https://i.imgur.com/M51ASNK.png" target="_blank">
<img
src="https://i.imgur.com/M51ASNK.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Chapter 1 - 1024 Chunks, Recursive Character. This strategy nicely breaks paragraphs up.
</figcaption>
</figure>
<br>
Imagine how precise / difficult it may be to find specific sets of information depending on chunk size!
## Objective 2 Explore: Embedding Space
Now that weve seen some of the different trade-offs when chunking, we can move to the next major step of a RAG pipeline, embedding. As discussed during lecture, embedding is the process of converting text into a numerical representation that captures the "meaning" of the content. Instead of treating text as raw strings, embedding models map each chunk into a N-dimensional space where semantically similar content is vectored closer together.
This allows the system to perform similarity search efficiently: when a user submits a query, the query is also embedded into the same vector space, and the system retrieves the chunks whose embeddings are most closest together. This is in contrast to how embedding vectors are utilized within an LLM itself, I.E. for Attention and transformation via the Feed Forward network. In conclusion, this step is what enables a RAG system to move beyond simple keyword matching and instead retrieve information based on meaning and context.
Lets explore a real embedding space. Navigate to http://<STUDENT ASSIGNED SYSTEM IP>:5055. Here, we've started a project called Embedding Atlas. Embedding Atlas is a tool that provides interactive visualizations for datasets in parquet format. Each "chunk" in this case is one row in the dataset. It allows for us to visualize, cross-filter, and search embeddings and metadata in an interactive, manual way.
<figure style="text-align: center;">
<a href="https://i.imgur.com/8PvcZBP.png" target="_blank">
<img
src="https://i.imgur.com/8PvcZBP.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Embedding Atlas Flow Diagram
</figcaption>
</figure>
<br>
The lab4_start.sh script will have automatically started Embedding Atlas, as well as have performed embedding against each "Scenario" in our dataset. Scenarios in this case are 1-3 sentence snippets describing an action taken by an attacker.
<figure style="text-align: center;">
<a href="https://i.imgur.com/9bGQce8.png" target="_blank">
<img
src="https://i.imgur.com/9bGQce8.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Embedding Atlas CLI (Backend, EXAMPLE ONLY)
</figcaption>
</figure>
<br>
Our Embedding Atlas has already been pre-loaded with the main dataset we'll be using throughout the rest of today. Specifically, this is a dataset that matches "hacker scenarios" with MITRE ATT&CK Tactics, Technique, and Procedural IDs. If you're unfamiliar with ATT&CK, it is primarily a project that attempts to categorize and organize the possible ways a hacker might attempt to execute malware, pivot throughout a network, and eventually, act on their objectives (often ransomware). ATT&CK also provides us with a rich example and corpus of data that we can use to visualize the embedding process.
To help us visualize groups more clearly, before we start, please be sure to select "TTP_Name" from the dropdown in the top left.
<figure style="text-align: center;">
<a href="https://i.imgur.com/996ukgZ.png" target="_blank">
<img
src="https://i.imgur.com/996ukgZ.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
TTP_Name Grouping
</figcaption>
</figure>
<br>
Each color is a semantically similar concept, as defined by the embeddings generated during test processing. We can dynamically explore this embedding space through a few options:
1. Select the text categories on the right side. This will visually show only entries that match that category's organization
2. Alternatively, select any of the categories in the column on the right. This will perform the same function, exclusively showing only entries for the relevant ID
Note: You can use your mouse wheel to zoom in and out. Additionally, click and drag the map around with your left click to center areas you deem of interest.
<figure style="text-align: center;">
<a href="https://i.imgur.com/YkSqT4v.png" target="_blank">
<img
src="https://i.imgur.com/YkSqT4v.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Single Visible Category - System Information Discovery
</figcaption>
</figure>
<br>
Explore how the various categories naturally cluster together within the embedding space. If we, as a user, were to use this embedding space as a part of a RAG pipeline, an LLM could embed the words in our query in a similar manner, and surface the semantically similar ideas within our dataset back to us.
Lets visualize similarity in one other way:
3. Select any single dot, and click "Nearest Neighbor". Embedding Atlas will show us the specific datapoints that embed the closest to our selected datapoint. Notice how some of the nearest datapoints appear very distant! Think about why this might be. We'll discuss in review of this lab why.
<figure style="text-align: center;">
<a href="https://i.imgur.com/zKa6GxD.png" target="_blank">
<img
src="https://i.imgur.com/zKa6GxD.png"
style="width: 50%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
</a>
<figcaption style="margin-top: 8px; font-size: 1.1em;">
Nearest Neighbors
</figcaption>
</figure>
<br>
If you'd like to continue to explore alternative datasets and see how embeddings can flexibly cluster raw data, feel free to take a look at [Embedding Atlas' Examples Page](https://apple.github.io/embedding-atlas/examples/). In particular, take a look at the Wine dataset until class resumes.
## Objective 3 Explore: Full RAG Exploration