New Lab 2
@@ -1,11 +1,19 @@
|
||||
---
|
||||
order: 1
|
||||
title: Lab 1 - Visualizing LLMs in TransformerLab
|
||||
description: Explore model structure, tokenization, and next-token prediction inside TransformerLab.
|
||||
---
|
||||
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 1 - Visualizing LLMs in TransformerLab
|
||||
|
||||
In this lab, we will:
|
||||
* Download and Visualize LLama-3.2-1B-Instruct
|
||||
* Visualize Tokenization & Prediction with LLama-3.2-1B-Instruct
|
||||
|
||||
- Download and Visualize LLama-3.2-1B-Instruct
|
||||
- Visualize Tokenization & Prediction with LLama-3.2-1B-Instruct
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
@@ -13,7 +21,6 @@ In this lab, we will:
|
||||
<strong>Execute</strong> steps require performing actions in the lab environment.
|
||||
</div>
|
||||
|
||||
|
||||
## Objective 1: Starting TransformerLab
|
||||
|
||||
### Execute: Access the Lab Environment
|
||||
@@ -101,8 +108,8 @@ Once selected, click **Run**. Give TransformerLab a moment to successfully load
|
||||
<br>
|
||||
|
||||
### Explore: Inspect the Architecture View
|
||||
To start, lets navigate to the **Interact** page, and then select **Model Architecture** from the Chat drop down.
|
||||
|
||||
To start, lets navigate to the **Interact** page, and then select **Model Architecture** from the Chat drop down.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/X0CM31h.png" target="_blank">
|
||||
@@ -117,8 +124,9 @@ To start, lets navigate to the **Interact** page, and then select **Model Archit
|
||||
<br>
|
||||
|
||||
This page allows us to visualize the actively loaded model, in this case our downloaded `LLama-3.2-1B-Instruct-`. This interactive view is equivalent to the greatly simplified version shown on the slide “Transformation: Multylayer Perceptron” from our lecture. We can explore this view by:
|
||||
* Holding down both right and left mouse buttons and dragging will move the entire model.
|
||||
* Holding down just the left mouse button will allow you to rotate the view.
|
||||
|
||||
- Holding down both right and left mouse buttons and dragging will move the entire model.
|
||||
- Holding down just the left mouse button will allow you to rotate the view.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/8hXTGlt.png" target="_blank">
|
||||
@@ -138,6 +146,7 @@ Each layer of the model performs a specific task, taking the input provided, and
|
||||
You have likely also noticed that the colors repeat. Each set of repeating **layers** is organized into **blocks**. Each **block** is a grouping of **layers** that perform the same functions, but with a slightly different focus. For example, one **block** may focus on nouns, and another may focus on adjectives, and so on.
|
||||
|
||||
The **layers** within Llama 3.1 1B are as follows:
|
||||
|
||||
<ul class="concept-pill-list">
|
||||
<li>
|
||||
<span class="concept-pill-label">Attention:</span>
|
||||
@@ -157,8 +166,7 @@ The **layers** within Llama 3.1 1B are as follows:
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
Each of these **layers** also has a different type, corresponding to Q, K, V, and much more.
|
||||
5. The **layers** between the small “Attention” **layers** are all considered to make up a single “block.”
|
||||
Each of these **layers** also has a different type, corresponding to Q, K, V, and much more. 5. The **layers** between the small “Attention” **layers** are all considered to make up a single “block.”
|
||||
To the side, we can see the actual number values of each weight within each layer.
|
||||
|
||||
Fundamentally, the LLM itself is this stack of numbers. Those numbers allow us to transform tokenized input (such as English), and transform that into a useful output. The more **layers** & **blocks**, the bigger the model, the more accurate and “intelligent” the model will behave. This 1B parameter model is incredibly small however, so the “truthfulness” of generated predictions is likely to be suspect (aka Hallucinated). The model will at least sound very confident however!
|
||||
@@ -252,10 +260,11 @@ Note how confident the model is about the word jumps in this famous phrase. For
|
||||
### Explore: Continue Exploring TransformerLab Features
|
||||
|
||||
Please continue to explore Transformers Lab until you’re ready to move on. While we will utilize many different tools other than Transformers Lab throughout this course due to its beta nature, this software is improving all the time and is worth watching! Transformers lab supports many advanced features, in various stages of development, such as:
|
||||
* Batch Text Generation
|
||||
* LLM Fine Tuning
|
||||
* LLM Evaluation
|
||||
* Retrieval Augmented Generation (RAG)
|
||||
|
||||
- Batch Text Generation
|
||||
- LLM Fine Tuning
|
||||
- LLM Evaluation
|
||||
- Retrieval Augmented Generation (RAG)
|
||||
We will discuss these topics and more throughout the course.
|
||||
|
||||
<br>
|
||||
|
||||
@@ -1,547 +0,0 @@
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 2 - LLaMa.cpp, Ollama & Quantization
|
||||
|
||||
In this lab, we will:
|
||||
|
||||
* Download a model from huggingface.com and quantize it for llama.cpp
|
||||
* Download a model from huggingface.com infer it in llama.cpp
|
||||
* Download a model from ollama.com
|
||||
* Download a custom model from huggingface.com
|
||||
* Import a custom model into Ollama.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
<strong>Explore</strong> sections focus on investigation and comparison.<br />
|
||||
<strong>Execute</strong> sections require running commands and producing output.
|
||||
</div>
|
||||
|
||||
To start this lab, you'll need CLI access:
|
||||
|
||||
* SSH - <IP>:22
|
||||
* All necessary artifacts are in the lab2 folder
|
||||
|
||||
## Objective 1: HuggingFace & LLaMa.cpp
|
||||
### 1. What Is LLaMa.cpp?
|
||||
|
||||
LLaMa.cpp is an open-source project created to enable efficient running of Meta's LLaMA (Large Language Model Meta AI) family of large language models on consumer-grade hardware. It was initially developed by **Georgi Gerganov** in early March 2023, shortly after Meta released the weights of the LLaMA models to approved researchers.
|
||||
|
||||
The project’s original goal was to make LLaMA models accessible on systems without powerful GPUs, including laptops, desktops, and even mobile devices. **LLaMa.cpp** achieves this by implementing the LLaMA inference in pure C/C++ and introducing highly efficient quantization techniques, allowing models to run with drastically reduced memory requirements. **LLaMa.cpp** is also the underlying project behind a number of inference wrappers and technologies, such as Llamafile, LM Studio, and Ollama, amongst many others.
|
||||
|
||||
### Key Features
|
||||
|
||||
| Capability | Why it matters |
|
||||
|------------|----------------|
|
||||
| **Efficient local inference** | Runs large language models without a powerful GPU. |
|
||||
| **Quantization tools** (`llama-quantize`) | Shrinks model size (down to 1-bit) while preserving usable performance. |
|
||||
| **Model conversion to .GGUF** | Provides a compact, fast-loading format that works with Ollama, LM Studio, and other wrappers. |
|
||||
| **Cross-platform support** | Works on Linux, macOS, Windows, Apple Silicon, and ARM devices. |
|
||||
| **CLI and debugging utilities** (`llama-cli`, `gguf-dump.py`) | Enables quick interactive testing and inspection of model metadata. |
|
||||
| **Perplexity measurement** (`llama-perplexity`) | Quantifies how confident the model is about its predictions. |
|
||||
| **Active community** | Powers tools such as LM Studio, Llamafile, and Ollama. |
|
||||
|
||||
---
|
||||
|
||||
## 1.2 Explore: HuggingFace - Model Cards
|
||||
|
||||
[HuggingFace](https://huggingface.com) is the “GitHub” for LLMs, datasets, and more. The following steps walk you through locating Meta’s **LLaMA‑3.2‑1B** model card and its files.
|
||||
|
||||
1. **Open the LLaMA‑3.2‑1B page**
|
||||
<https://huggingface.co/meta-llama/Llama-3.2-1B>
|
||||
<br>
|
||||
2. **Read the model card** – note the description, license, tags (e.g., *Text Generation*, *SafeTensors*, *PyTorch*), and links to fine‑tunes/quantizations.
|
||||
<br>
|
||||
3. **Navigate to “Quantizations.”**
|
||||
This tab lists community‑created quantizations, including GGUF, GTPQ, AWQ, and EXL3 versions. Common providers include **Bartowski**, **Unsloth**, and **NousResearch**, although these players change periodically. Additionally, note that we can often download quantized versions *without* having agreed to the Meta license restrictions for the original model.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/Po0Ll3o.png" target="_blank">
|
||||
<img src="https://i.imgur.com/Po0Ll3o.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Model Card Quantizations Convenience Link</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/NM1rbXV.png" target="_blank">
|
||||
<img src="https://i.imgur.com/NM1rbXV.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Model Quantization Options</figcaption>
|
||||
</figure>
|
||||
|
||||
|
||||
4. **Open “Files and versions.”**
|
||||
Here you see the raw `.safetensors` files (the un‑quantized checkpoint). For the model to successfully run, the full set of files needs to be loaded into system memory. Note how this 1 B‑parameter model is small enough to fit comfortably in a phone’s memory, even raw.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/6I9zkeu.png" target="_blank">
|
||||
<img src="https://i.imgur.com/6I9zkeu.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Distrubution Restriction</figcaption>
|
||||
</figure>
|
||||
|
||||
Unless you've accepted Meta's EULA for this model, you'll be unable to download the model directly from Meta. This view may or may not appear based on your own HuggingFace account.
|
||||
|
||||
|
||||
## 1.3 Explore: HuggingFace - Find and Download WhiteRabbitNeo
|
||||
|
||||
For this lab we will work with **WhiteRabbitNeo‑V3‑7B**, a cybersecurity‑oriented fine‑tune of Qwen2.5‑Coder‑7B. This model is less popular than LLaMA-3.2, and if we'd like to run this models in Ollama, we'll need to perform our own quantization.
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
<strong>Warning:</strong> Although the next two steps show how to find and download this model so you can replicate the process, support files are already provided in <code>/home/student/lab2/WhiteRabbitNeo</code> to speed up lab execution.
|
||||
</div>
|
||||
|
||||
|
||||
### 1. Locate & download the model
|
||||
|
||||
1. Go to <https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B>.
|
||||
2. Points of Interest on this modelcard:
|
||||
1. This model appears to be a fine tune of **Qwen2.5-Coder-7B**
|
||||
2. This model is openly licensed, and does have any requirements to download and use for our purposes.
|
||||
3. This model is in **Safetensors** format, which is compatible with **LLaMa.cpp**'s quantization tools.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/9GrHRuh.png" target="_blank">
|
||||
<img src="https://i.imgur.com/9GrHRuh.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>WhiteRabbitNeo model card.</figcaption>
|
||||
</figure>
|
||||
|
||||
3. Click **Files and versions** → review the `.safetensors` checkpoints (≈ 15 GB @ **FP16*).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/Emx97nL.png" target="_blank">
|
||||
<img src="https://i.imgur.com/Emx97nL.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Model safetensors (size ≈ 15 GB).</figcaption>
|
||||
</figure>
|
||||
|
||||
### 2 Download the Model
|
||||
|
||||
To prepare this model, create a target folder wherever you desire on your system to work out of. Once chosen, perform the following:
|
||||
|
||||
1. Ensure you have git & git-lfs installed to enable successful cloning from HuggingFace. If necessary, git can be installed on Debian based distributions via:
|
||||
|
||||
```bash
|
||||
sudo apt install git git-lfs
|
||||
git lfs install
|
||||
```
|
||||
|
||||
2. Clone the model:
|
||||
|
||||
```bash
|
||||
git clone https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B
|
||||
```
|
||||
|
||||
### 3 Execute: Convert the Downloaded Model
|
||||
|
||||
**LLaMa.cpp** makes it easy for us to package models downloaded in SafeTensors format to GGUF. We can convert the model with the following official project script command:
|
||||
|
||||
```bash
|
||||
convert_hf_to_gguf.py /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B/WhiteRabbitNeo-V3-7B --outfile /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
|
||||
```
|
||||
|
||||
### 4 Execute: Review Model Metadata
|
||||
|
||||
When these steps have completed, you should see a new WhiteRabbitNeo-V3-7B.gguf file. We have not yet quantized the model, merely converted it to a format usable by **LLaMa.cpp** for the next steps. We can tell if this process was successful by using the included **gguf-dump.py** script that is packaged with **LLaMa.cpp**.
|
||||
Run the following command:
|
||||
```bash
|
||||
gguf-dump /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
|
||||
```
|
||||
|
||||
We should then see:
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/JiX2fJM.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/JiX2fJM.png"
|
||||
width="800"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Model Metadata.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
A text listing of all of the model's tensors, and the precision of each. Because we have merely converted the model's format, and not performed quantization, the model is still in **FP16**.
|
||||
|
||||
* This is a text view of the previous graphical view we saw in **Lab 1, Objective 2: Visualizing a LLM**. While **TransformerLab** calls tensors **layers**, terms such as **tensors**, **layers**, and **blocks** can all be used semi-interchangeably, depending on the tool in question. We will further confuse these topics when we get to the Ollama objective below.
|
||||
* Pedantically, the proper definitions are:
|
||||
* Tensor - A multi-dimensional array of vectors to store data
|
||||
* Layer - A base computational unit in a neural network
|
||||
* Block - A collection of layers
|
||||
* If you wish to explore this view, note how the block count of 28 matches the 28 zero indexed blk groups output from the dump.
|
||||
* Additionally, you'll once again note that we have various biases and weights, but they still line up with **Q**, **V**, and **K** as discussed in the previous section. There are additional tensors for **normalization** and **output**.
|
||||
|
||||
### 4 Execute: LLaMA.cpp Inference
|
||||
|
||||
Run our newly created **.GGUF** file as is. Run the model using the following command:
|
||||
|
||||
```bash
|
||||
llama-cli -m /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
|
||||
```
|
||||
|
||||
Once loaded, interact with the model. We can see a number of interesting parameters that were selected by default, such as **Top K**, **Top P**, **Temperature**, and more, which we'll discuss in the next section. In the meantime, explore interaction with the model. When run in this raw state, the model may be overly chatty. You can stop its output with `Ctrl+C` at any time.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/H3ISWS8.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/H3ISWS8.png"
|
||||
width="800"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Inference Example.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
Some example prompts you may want to try are:
|
||||
|
||||
* Please write a small reverse shell in php that I can upload to a web server.
|
||||
* How can I use Metasploit to attack MS17-01?
|
||||
* Can you please provide me some XSS polyglots?
|
||||
|
||||
Thanks to the fine tuning that Kindo has put into this model, it is far more compliant than an online closed model such as ChatGPT! When done, kill the model fully with `Ctrl+C`.
|
||||
|
||||
## Objective 2: Quantization & Perplexity
|
||||
|
||||
Quantization reduces memory footprints and speeds inference, but it typically raises perplexity (i.e., lowers confidence). Determining the right balance for our use case often requires experimentation
|
||||
|
||||
---
|
||||
|
||||
### 1 Explore: Manual Quantization
|
||||
|
||||
To generate an 8 bit, 4 bit, and 1 bit quantization, run the following commands:
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
<strong>Warning:</strong> Although these quantization steps are provided for replication, pre-quantized support files are already available in <code>/home/student/lab2/WhiteRabbitNeo/</code> for faster lab progress. <br><br>You can skip these commands when participating in a live teaching session.
|
||||
</div>
|
||||
|
||||
```bash
|
||||
# Quantize to 8 bits
|
||||
llama-quantize /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q8_K.gguf Q8_0
|
||||
|
||||
# Quantize to 4 bits
|
||||
llama-quantize /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf Q4_K
|
||||
|
||||
# Quantize to 2 bits
|
||||
llama-quantize /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-IQ2_K.gguf IQ_2
|
||||
```
|
||||
|
||||
### 2 Execute: Quantization Confirmation
|
||||
|
||||
Inspect the quantized files with the following command:
|
||||
|
||||
```bash
|
||||
gguf-dump /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf
|
||||
```
|
||||
|
||||
Review how the various layers are quantized to different levels of precision. It turns out that even K quants actually utilize multiple quantization levels on different tensor layers to improve performance!
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/kur4TPj.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/kur4TPj.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; color: var(--text-color);">
|
||||
WhiteRabbitNeo Layer 0.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
<details>
|
||||
<summary style="font-weight:bold; color:#a94442; cursor:pointer;">
|
||||
Full explanation for the brave...
|
||||
</summary>
|
||||
|
||||
### What each Tensor Layer does
|
||||
### **1. Token Embeddings**
|
||||
- **Tensor 1: `token_embd.weight`**
|
||||
- **Responsibility:** Maps each token in the vocabulary to a dense vector of size 3584.
|
||||
|
||||
---
|
||||
|
||||
### **2. Layer Normalization**
|
||||
- **Tensor 2: `blk.0.attn_norm.weight`**
|
||||
- **Responsibility:** Scales the normalized inputs to the self-attention mechanism in the first block.
|
||||
|
||||
- **Tensor 6: `blk.0.ffn_norm.weight`**
|
||||
- **Responsibility:** Scales the normalized outputs of the feed-forward network (FFN) in the first block.
|
||||
|
||||
---
|
||||
|
||||
### **3. Feed-Forward Network (FFN)**
|
||||
- **Tensor 3: `blk.0.ffn_down.weight`**
|
||||
- **Responsibility:** Projects the input from dimension 3584 to 18944 in the FFN down-projection.
|
||||
|
||||
- **Tensor 4: `blk.0.ffn_gate.weight`**
|
||||
- **Responsibility:** Projects the output back to dimension 3584 after the non-linear transformation in the FFN gate mechanism.
|
||||
|
||||
- **Tensor 5: `blk.0.ffn_up.weight`**
|
||||
- **Responsibility:** Projects the output of the non-linear transformation back to dimension 3584 in the FFN up-projection.
|
||||
|
||||
---
|
||||
|
||||
### **4. Self-Attention Mechanism**
|
||||
#### **Key Projection**
|
||||
- **Tensor 7: `blk.0.attn_k.bias`**
|
||||
- **Responsibility:** Adds a learnable offset to the key vectors in the self-attention mechanism.
|
||||
|
||||
- **Tensor 8: `blk.0.attn_k.weight`**
|
||||
- **Responsibility:** Projects the input to dimension 512 for key vectors in the self-attention mechanism.
|
||||
|
||||
#### **Query Projection**
|
||||
- **Tensor 10: `blk.0.attn_q.bias`**
|
||||
- **Responsibility:** Adds a learnable offset to the query vectors in the self-attention mechanism.
|
||||
|
||||
- **Tensor 11: `blk.0.attn_q.weight`**
|
||||
- **Responsibility:** Projects the input to dimension 3584 for query vectors in the self-attention mechanism.
|
||||
|
||||
#### **Value Projection**
|
||||
- **Tensor 12: `blk.0.attn_v.bias`**
|
||||
- **Responsibility:** Adds a learnable offset to the value vectors in the self-attention mechanism.
|
||||
|
||||
- **Tensor 13: `blk.0.attn_v.weight`**
|
||||
- **Responsibility:** Projects the input to dimension 512 for value vectors in the self-attention mechanism.
|
||||
|
||||
#### **Attention Output Projection**
|
||||
- **Tensor 9: `blk.0.attn_output.weight`**
|
||||
- **Responsibility:** Projects the concatenated attention outputs back to dimension 3584 before residual connection.
|
||||
|
||||
---
|
||||
|
||||
### **Summary by Purpose**
|
||||
- **Token Embeddings:** Maps tokens to dense vectors.
|
||||
- **Layer Normalization:** Scales normalized inputs/outputs in attention and FFN blocks.
|
||||
- **Feed-Forward Network (FFN):** Handles down-projection, gating, and up-projection for non-linear transformations.
|
||||
- **Self-Attention Mechanism:** Manages key, query, value projections, biases, and output projection for attention computations.
|
||||
|
||||
</details>
|
||||
|
||||
### 3 Execute: Quantitatively Measuring Perplexity
|
||||
|
||||
Perplexity is a measurement of how confident the model is about its next token prediction. Initially confusingly, lower values indicate higher confidence. By asking the model to infer a relatively large input (minimum 1024 tokens), we can generate an average perplexity score to gauge the models confidence.
|
||||
|
||||
```bash
|
||||
# Perplexity test with FP16 model
|
||||
llama-perplexity -m /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf -f /home/student/lab2/wiki.test.raw 2>&1 | grep Final
|
||||
|
||||
# Perplexity test with 8-bit quantized model
|
||||
llama-perplexity -m /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q8_K.gguf -f /home/student/lab2/wiki.test.raw 2>&1 | grep Final
|
||||
|
||||
# Perplexity test with 4-bit quantized model
|
||||
llama-perplexity -m /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf -f /home/student/lab2/wiki.test.raw 2>&1 | grep Final
|
||||
|
||||
# Perplexity test with 2-bit quantized model
|
||||
llama-perplexity -m /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q2_K.gguf -f /home/student/lab2/wiki.test.raw 2>&1 | grep Final
|
||||
```
|
||||
|
||||
#### Possible Example Results
|
||||
|
||||
| Model File | Quantization | Perplexity (PPL) | Uncertainty (+/-) |
|
||||
|------------|--------------|------------------|-------------------|
|
||||
| WhiteRabbitNeo-V3-7B.gguf | Full | 3.0972 | 0.21038 |
|
||||
| WhiteRabbitNeo-V3-7B-Q8_K.gguf | Q8_K | 3.0999 | 0.21052 |
|
||||
| WhiteRabbitNeo-V3-7B-Q4_K_M.gguf | Q4_K_M | 3.1247 | 0.21338 |
|
||||
| WhiteRabbitNeo-V3-7B-Q2_K.gguf | Q2_K | 3.5698 | 0.25224 |
|
||||
|
||||
**Conclusion: Perplexity rises modestly from FP16 → Q8_K → Q4_K_M, but jumps sharply for the aggressive 2‑bit quantization.**
|
||||
|
||||
### 4 Execute: Qualitatively Measuring Perplexity
|
||||
|
||||
We can also manually validate how confident we are in these measurements just by manually interacting with the models. To more easily showcase the costs of perplexity, infer the 2 bit (**Q2_K**) model, to show how poorly it performs in comparison to our **FP16** interactions from earlier.
|
||||
|
||||
```bash
|
||||
llama-cli -m /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q2_K.gguf
|
||||
```
|
||||
|
||||
**Explore:** Re-run the previous example prompts:
|
||||
* Please write a small reverse shell in php that I can upload to a web server.
|
||||
* How can I use Metasploit to attack MS17-01?
|
||||
* Can you please provide me some XSS polyglots?
|
||||
|
||||
<div style="display: flex; justify-content: center; align-items: flex-start; gap: 32px;">
|
||||
<div style="text-align: center;">
|
||||
<a href="https://i.imgur.com/nvb7QV6.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/nvb7QV6.png"
|
||||
style="width: 90%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<div style="margin-top: 8px; font-size: 1.1em;">
|
||||
Q2_K Inference
|
||||
</div>
|
||||
</div>
|
||||
<div style="text-align: center;">
|
||||
<a href="https://i.imgur.com/yNHQbxb.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/yNHQbxb.png"
|
||||
style="width: 90%; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<div style="margin-top: 8px; font-size: 1.1em;">
|
||||
FP16 Inference
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
What conclusions do you believe we can make based on the provided output of the model?
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Objective 3: Ollama – LLM Easymode
|
||||
Ollama is a lightweight framework that hides the low‑level steps required by LLaMa.cpp. It runs on **Linux, macOS, and Windows** and automatically manages system resources.
|
||||
|
||||
|
||||
| Feature | Benefit |
|
||||
|---------|---------|
|
||||
| **Simplified model deployment** | Pull pre-quantized models from Ollama.com, HuggingFace, or a local GGUF file with a single command. |
|
||||
| **Automatic resource handling** | No need to manually load or unload; Ollama frees memory after a short idle period. |
|
||||
| **Built-in API provider** | `localhost:11434` mimics the OpenAI API, enabling seamless integration with notebooks, VS Code, or curl. |
|
||||
| **Cross-platform compatibility** | Thanks to underlying llama.cpp architecture, works on x86_64, ARM, and Apple Silicon without extra configuration. |
|
||||
| **Model-metadata inspection** | `ollama show <tag>` reveals the model architecture, context length, and quantization level. |
|
||||
|
||||
|
||||
### 1 Execute: Pull and Run a Pre-Built Model from Ollama.com
|
||||
|
||||
|
||||
Lets start by downloading Meta's llama3.2-3b, the "big" brother to the small model we've continuously worked with so far. The Ollama project and community have made this exceptionally easy for us to accomplish.
|
||||
|
||||
1. **Open the Ollama registry** – visit <https://ollama.com> in your browser.
|
||||
2. **Search for the model**
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/VBvOGty.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/VBvOGty.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Ollama Search.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
3. **Copy the `ollama run` command** that appears in the top‑right corner of the model card.
|
||||
4. **Paste the command into your terminal** and press **Enter**:
|
||||
|
||||
```bash
|
||||
> ollama run llama3.2
|
||||
```
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/ammtbmI.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/ammtbmI.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Ollama Run command.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
### 2 Explore: Interacting with Ollama Inference
|
||||
|
||||
When finished, you will be presented with a prompt, similar to the `llama-cli` commands. No need to download, convert, or quantize! Feel free to interact with this model until you're ready to move on.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/XZ6OYNI.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/XZ6OYNI.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Ollama Inference.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
### 3 Execute: Pull and Run a Pre-Built Model from HuggingFace.com
|
||||
|
||||
Similarly, we can do the same by pulling a model directly from **HuggingFace**. As long as the source file is a .gguf of any quantization level that fits within our system memory, Ollama can fetch it directly.
|
||||
|
||||
1. **Select the Quantized Model from Objective 1** – visit [CodeIsAbstract](https://huggingface.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF) in your browser.
|
||||
2. **Use this model** - Click Use this model → choose the Ollama tab. The page displays a ready‑to‑run command:
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/lg2INAs.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/lg2INAs.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
HuggingFace Direct Ollama Pull.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
|
||||
3. **Copy the command** and execute it in your terminal.
|
||||
|
||||
```bash
|
||||
ollama run hf.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF:Q8
|
||||
```
|
||||
|
||||
4. **Explore:** Interact with the model as normal.
|
||||
|
||||
|
||||
### 4 Execute: Load a Custom `.gguf` Model
|
||||
|
||||
We can also import our WhiteRabbitNeo **.GGUF** model into Ollama, without having to upload it to **HuggingFace** first. In order to do so however, we need to create a **ModelFile**, a `.yml` file that describes to **Ollama** where the **.GGUF** is located, as well as any additional defaults we'd like Ollama to run with when performing inference.
|
||||
|
||||
1. **Create a simple modelfile** – This will tell Ollama where the model lives.
|
||||
|
||||
```bash
|
||||
echo "FROM /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf" > Modelfile
|
||||
```
|
||||
|
||||
2. **Register the model with Ollama**
|
||||
|
||||
```bash
|
||||
ollama create WhiteRabbitNeo -f Modelfile
|
||||
```
|
||||
|
||||
3. **Run the newly registered model**
|
||||
|
||||
```bash
|
||||
ollama run WhiteRabbitNeo
|
||||
```
|
||||
|
||||
4. **Explore:** The model is now stored locally under the tag *WhiteRabbitNeo* and can be invoked just as any other model.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/ijsAl6m.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/ijsAl6m.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Importing WhiteRabbitNeo V3.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
---
|
||||
|
||||
#### Additional Useful Ollama Commands
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `ollama list` | Shows all models currently registered with Ollama. |
|
||||
| `ollama rm <tag>` | Deletes the specified model (freeing disk space). |
|
||||
| `ollama show <tag>` | Prints model metadata (architecture, context length, quantization). |
|
||||
| `ollama show <tag> --modelfile` | Prints an existing model's modelfile. Often useful for templating our own. |
|
||||
| `ollama serve` | Starts the OpenAI-compatible API server (runs automatically when you first use `ollama run`). |
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Ollama bridges the gap between low-level LLaMa.cpp tools and high-level usability, making it an ideal choice for rapid deployment and educational labs. By leveraging its API, model registry, and automation features, you can focus on experimentation rather than infrastructure. However, understanding LLaMa.cpp’s underlying mechanics (e.g., quantization, perplexity) remains critical for optimizing performance, or going off the beaten path.
|
||||
|
||||
<br>
|
||||
|
||||
---
|
||||
@@ -0,0 +1,124 @@
|
||||
---
|
||||
order: 2
|
||||
title: "Lab 2 - Quantization Tradeoffs: Comparing 2-bit, 4-bit, and 8-bit"
|
||||
description: Download Gemma 4 E2B in three GGUF quantizations and compare size, metadata, and output quality.
|
||||
---
|
||||
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
In this lab, we will:
|
||||
|
||||
- Download the same Gemma model in `UD-IQ2_M`, `Q4_K_M`, and `Q8_0`
|
||||
- Compare file size and GGUF metadata across those quantizations
|
||||
- Observe how lower precision changes the model's behavior
|
||||
- Build intuition for when a smaller quant may or may not be worth it
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
<strong>Explore</strong> sections focus on comparison and trade-off analysis.<br />
|
||||
<strong>Execute</strong> sections require collecting evidence from each quantized model.
|
||||
</div>
|
||||
|
||||
## Objective 1: Understand the Model and the Quantizations
|
||||
|
||||
For this lab, we will use the Hugging Face repository for **Unsloth's GGUF release of Gemma 4 E2B Instruct**:
|
||||
|
||||
<https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF>
|
||||
|
||||
This repository currently exposes multiple GGUF variants of the same base model. We will focus on one file from each of these precision bands:
|
||||
|
||||
| Precision band | GGUF file | Why we are using it | File Size |
|
||||
| -------------- | ------------------------------ | --------------------------------------- |-----------|
|
||||
| 2-bit | `gemma-4-E2B-it-UD-IQ2_M.gguf` | Most aggressive compression in this lab | 2.4 GB |
|
||||
| 4-bit | `gemma-4-E2B-it-Q4_K_M.gguf` | Common middle-ground quant | 3.17 GB |
|
||||
| 8-bit | `gemma-4-E2B-it-Q8_0.gguf` | Highest-quality quant in this lab | 5.05 GB |
|
||||
|
||||
Even though the filenames differ, these are all the same underlying instruction-tuned Gemma 4 E2B model. The main variable we are changing is how the weights are stored.
|
||||
|
||||
When we say these files are the same model, we mean that the overall neural network is still the same:
|
||||
|
||||
- The same architecture
|
||||
- The same layer count
|
||||
- The same tokenizer
|
||||
- The same training and instruction tuning
|
||||
- The same general behavior the model learned during training
|
||||
|
||||
What changes is the numeric representation of the learned weights.
|
||||
|
||||
Imagine one learned weight in the original model is:
|
||||
|
||||
```text
|
||||
0.156347
|
||||
```
|
||||
|
||||
That number came from training. It is one of many values the model uses while computing each next token. Quantization does not invent a new model from scratch. Instead, it takes that trained value and asks:
|
||||
|
||||
```text
|
||||
How can we store a close-enough version of this number using fewer bits?
|
||||
```
|
||||
|
||||
If we use a simplified integer-style quantization scheme, the math looks like this:
|
||||
|
||||
```text
|
||||
scale = max(|w|) / (2^(bits - 1) - 1)
|
||||
q = round(w / scale)
|
||||
w_hat = q * scale
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- `w` is the original weight
|
||||
- `q` is the stored integer bucket
|
||||
- `scale` maps integers back into the original numeric range
|
||||
- `w_hat` is the reconstructed approximation used at inference time
|
||||
|
||||
So if the original trained value was `0.156347`, a lower-bit quantized file may not store that exact number anymore. It may store an integer bucket like `1`, `5`, or `22`, plus a scale, and reconstruct an approximation such as:
|
||||
|
||||
- `0.000000`
|
||||
- `0.130029`
|
||||
- `0.146806`
|
||||
- `0.157782`
|
||||
|
||||
Those are not identical to the original weight, but they may still be close enough for useful inference.
|
||||
|
||||
<div data-quantization-explorer></div>
|
||||
|
||||
### Explore: Interactive precision viewer
|
||||
|
||||
The viewer below zooms out from one weight and instead shows a toy layer with 16 stored values. Real GGUF schemes such as `Q4_K_M` and `UD-IQ2_M` are more sophisticated than this toy example, but the core idea is the same:
|
||||
|
||||
- Fewer bits means fewer representable values
|
||||
- More weights get pushed into the same small set of stored buckets
|
||||
- The layer becomes more compressed as precision drops
|
||||
|
||||
<div data-quantization-grid-explorer></div>
|
||||
|
||||
### Explore: Compare the same prompts through the hosted chat widget
|
||||
|
||||
If your instructor provides an OpenAI-compatible endpoint, you can compare the same prompts through the embedded chat tool below:
|
||||
|
||||
- Paste the lab endpoint and API key into the settings row
|
||||
- Switch between `Q8_0`, `Q4_K_M`, and `UD-IQ2_M`
|
||||
- Re-run the same prompt so you can compare coherence, stability, and SVG output
|
||||
- Try a visual prompt such as `Draw a pelican riding a bicycle.`
|
||||
|
||||
The widget keeps the transcript in your browser so you can switch models without losing your place. Refresh the page to clear the chat history.
|
||||
|
||||
<div data-objective5-chat></div>
|
||||
|
||||
## Objective 6: Reflect on the Tradeoff
|
||||
|
||||
By this point, you should have:
|
||||
|
||||
- Compared three quantized versions of the same model
|
||||
- Measured the storage savings directly
|
||||
- Verified that the core model metadata remains largely the same
|
||||
- Observed where output quality begins to degrade
|
||||
|
||||
The important takeaway is not that one quant is always "best." The important takeaway is that quantization is a deployment decision. The right choice depends on your hardware limits, acceptable quality loss, and the task you need the model to perform.
|
||||
|
||||
## Conclusion
|
||||
|
||||
This lab isolates quantization as the main variable. By downloading **Gemma 4 E2B Instruct** in `UD-IQ2_M`, `Q4_K_M`, and `Q8_0`, you can directly observe one of the most important tradeoffs in local inference: balancing model quality against disk usage and resource constraints.
|
||||
@@ -0,0 +1,367 @@
|
||||
---
|
||||
order: 3
|
||||
title: Lab 3 - LLaMa.cpp and Ollama Workflows
|
||||
description: Convert a Hugging Face checkpoint to GGUF, run it in llama.cpp, and load it into Ollama.
|
||||
---
|
||||
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 3 - LLaMa.cpp and Ollama Workflows
|
||||
|
||||
In this lab, we will:
|
||||
|
||||
- Download a model from Hugging Face
|
||||
- Convert a model to GGUF for `llama.cpp`
|
||||
- Run a model directly in `llama.cpp`
|
||||
- Download a model from Ollama.com
|
||||
- Import a custom `.gguf` model into Ollama
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
<strong>Explore</strong> sections focus on investigation and comparison.<br />
|
||||
<strong>Execute</strong> sections require running commands and producing output.
|
||||
</div>
|
||||
|
||||
To start this lab, you'll need CLI access:
|
||||
|
||||
- SSH - <IP>:22
|
||||
- All necessary artifacts are in the `lab3` folder
|
||||
|
||||
## Objective 1: HuggingFace & LLaMa.cpp
|
||||
|
||||
### 1. What Is LLaMa.cpp?
|
||||
|
||||
LLaMa.cpp is an open-source project created to enable efficient running of Meta's LLaMA (Large Language Model Meta AI) family of large language models on consumer-grade hardware. It was initially developed by **Georgi Gerganov** in early March 2023, shortly after Meta released the weights of the LLaMA models to approved researchers.
|
||||
|
||||
The project’s original goal was to make LLaMA models accessible on systems without powerful GPUs, including laptops, desktops, and even mobile devices. **LLaMa.cpp** achieves this by implementing the LLaMA inference in pure C/C++ and introducing highly efficient quantization techniques, allowing models to run with drastically reduced memory requirements. **LLaMa.cpp** is also the underlying project behind a number of inference wrappers and technologies, such as Llamafile, LM Studio, and Ollama, amongst many others.
|
||||
|
||||
### Key Features
|
||||
|
||||
| Capability | Why it matters |
|
||||
| ------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
|
||||
| **Efficient local inference** | Runs large language models without a powerful GPU. |
|
||||
| **Quantization tools** (`llama-quantize`) | Shrinks model size (down to 1-bit) while preserving usable performance. |
|
||||
| **Model conversion to .GGUF** | Provides a compact, fast-loading format that works with Ollama, LM Studio, and other wrappers. |
|
||||
| **Cross-platform support** | Works on Linux, macOS, Windows, Apple Silicon, and ARM devices. |
|
||||
| **CLI and debugging utilities** (`llama-cli`, `gguf-dump.py`) | Enables quick interactive testing and inspection of model metadata. |
|
||||
| **Perplexity measurement** (`llama-perplexity`) | Quantifies how confident the model is about its predictions. |
|
||||
| **Active community** | Powers tools such as LM Studio, Llamafile, and Ollama. |
|
||||
|
||||
---
|
||||
|
||||
## 1.2 Explore: HuggingFace - Model Cards
|
||||
|
||||
[HuggingFace](https://huggingface.com) is the “GitHub” for LLMs, datasets, and more. The following steps walk you through locating Meta’s **LLaMA‑3.2‑1B** model card and its files.
|
||||
|
||||
1. **Open the LLaMA‑3.2‑1B page**
|
||||
<https://huggingface.co/meta-llama/Llama-3.2-1B>
|
||||
<br>
|
||||
2. **Read the model card** – note the description, license, tags (e.g., _Text Generation_, _SafeTensors_, _PyTorch_), and links to fine‑tunes/quantizations.
|
||||
<br>
|
||||
3. **Navigate to “Quantizations.”**
|
||||
This tab lists community‑created quantizations, including GGUF, GTPQ, AWQ, and EXL3 versions. Common providers include **Bartowski**, **Unsloth**, and **NousResearch**, although these players change periodically. Additionally, note that we can often download quantized versions _without_ having agreed to the Meta license restrictions for the original model.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/Po0Ll3o.png" target="_blank">
|
||||
<img src="https://i.imgur.com/Po0Ll3o.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Model Card Quantizations Convenience Link</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/NM1rbXV.png" target="_blank">
|
||||
<img src="https://i.imgur.com/NM1rbXV.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Model Quantization Options</figcaption>
|
||||
</figure>
|
||||
|
||||
4. **Open “Files and versions.”**
|
||||
Here you see the raw `.safetensors` files (the un‑quantized checkpoint). For the model to successfully run, the full set of files needs to be loaded into system memory. Note how this 1 B‑parameter model is small enough to fit comfortably in a phone’s memory, even raw.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/6I9zkeu.png" target="_blank">
|
||||
<img src="https://i.imgur.com/6I9zkeu.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Distrubution Restriction</figcaption>
|
||||
</figure>
|
||||
|
||||
Unless you've accepted Meta's EULA for this model, you'll be unable to download the model directly from Meta. This view may or may not appear based on your own HuggingFace account.
|
||||
|
||||
## 1.3 Explore: HuggingFace - Find and Download WhiteRabbitNeo
|
||||
|
||||
For this lab we will work with **WhiteRabbitNeo‑V3‑7B**, a cybersecurity‑oriented fine‑tune of Qwen2.5‑Coder‑7B. This model is less popular than LLaMA-3.2, and if we'd like to run it in `llama.cpp` or Ollama, we first need to convert it into a usable GGUF artifact.
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
<strong>Warning:</strong> Although the next two steps show how to find and download this model so you can replicate the process, support files are already provided in <code>/home/student/lab3/WhiteRabbitNeo</code> to speed up lab execution.
|
||||
</div>
|
||||
|
||||
### 1. Locate & download the model
|
||||
|
||||
1. Go to <https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B>.
|
||||
2. Points of Interest on this modelcard:
|
||||
1. This model appears to be a fine tune of **Qwen2.5-Coder-7B**
|
||||
2. This model is openly licensed, and does have any requirements to download and use for our purposes.
|
||||
3. This model is in **Safetensors** format, which is compatible with **LLaMa.cpp**'s quantization tools.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/9GrHRuh.png" target="_blank">
|
||||
<img src="https://i.imgur.com/9GrHRuh.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>WhiteRabbitNeo model card.</figcaption>
|
||||
</figure>
|
||||
|
||||
3. Click **Files and versions** → review the `.safetensors` checkpoints (≈ 15 GB @ \*_FP16_).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/Emx97nL.png" target="_blank">
|
||||
<img src="https://i.imgur.com/Emx97nL.png" width="800" style="border:5px solid black;">
|
||||
</a>
|
||||
<figcaption>Model safetensors (size ≈ 15 GB).</figcaption>
|
||||
</figure>
|
||||
|
||||
### 2 Download the Model
|
||||
|
||||
To prepare this model, create a target folder wherever you desire on your system to work out of. Once chosen, perform the following:
|
||||
|
||||
1. Ensure you have git & git-lfs installed to enable successful cloning from HuggingFace. If necessary, git can be installed on Debian based distributions via:
|
||||
|
||||
```bash
|
||||
sudo apt install git git-lfs
|
||||
git lfs install
|
||||
```
|
||||
|
||||
2. Clone the model:
|
||||
|
||||
```bash
|
||||
git clone https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B
|
||||
```
|
||||
|
||||
### 3 Execute: Convert the Downloaded Model
|
||||
|
||||
**LLaMa.cpp** makes it easy for us to package models downloaded in SafeTensors format to GGUF. We can convert the model with the following official project script command:
|
||||
|
||||
```bash
|
||||
convert_hf_to_gguf.py /home/student/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B/WhiteRabbitNeo-V3-7B --outfile /home/student/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
|
||||
```
|
||||
|
||||
### 4 Execute: Review Model Metadata
|
||||
|
||||
When these steps have completed, you should see a new WhiteRabbitNeo-V3-7B.gguf file. We have not yet quantized the model, merely converted it to a format usable by **LLaMa.cpp** for the next steps. We can tell if this process was successful by using the included **gguf-dump.py** script that is packaged with **LLaMa.cpp**.
|
||||
Run the following command:
|
||||
|
||||
```bash
|
||||
gguf-dump /home/student/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
|
||||
```
|
||||
|
||||
We should then see:
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/JiX2fJM.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/JiX2fJM.png"
|
||||
width="800"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Model Metadata.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
A text listing of all of the model's tensors, and the precision of each. Because we have merely converted the model's format, and not performed quantization, the model is still in **FP16**.
|
||||
|
||||
- This is a text view of the previous graphical view we saw in **Lab 1, Objective 2: Visualizing a LLM**. While **TransformerLab** calls tensors **layers**, terms such as **tensors**, **layers**, and **blocks** can all be used semi-interchangeably, depending on the tool in question. We will further confuse these topics when we get to the Ollama objective below.
|
||||
- Pedantically, the proper definitions are:
|
||||
- Tensor - A multi-dimensional array of vectors to store data
|
||||
- Layer - A base computational unit in a neural network
|
||||
- Block - A collection of layers
|
||||
- If you wish to explore this view, note how the block count of 28 matches the 28 zero indexed blk groups output from the dump.
|
||||
- Additionally, you'll once again note that we have various biases and weights, but they still line up with **Q**, **V**, and **K** as discussed in the previous section. There are additional tensors for **normalization** and **output**.
|
||||
|
||||
### 4 Execute: LLaMA.cpp Inference
|
||||
|
||||
Run our newly created **.GGUF** file as is. Run the model using the following command:
|
||||
|
||||
```bash
|
||||
llama-cli -m /home/student/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
|
||||
```
|
||||
|
||||
Once loaded, interact with the model. We can see a number of interesting parameters that were selected by default, such as **Top K**, **Top P**, **Temperature**, and more, which we'll discuss in the next section. In the meantime, explore interaction with the model. When run in this raw state, the model may be overly chatty. You can stop its output with `Ctrl+C` at any time.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/H3ISWS8.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/H3ISWS8.png"
|
||||
width="800"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Inference Example.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
Some example prompts you may want to try are:
|
||||
|
||||
- Please write a small reverse shell in php that I can upload to a web server.
|
||||
- How can I use Metasploit to attack MS17-01?
|
||||
- Can you please provide me some XSS polyglots?
|
||||
|
||||
Thanks to the fine tuning that Kindo has put into this model, it is far more compliant than an online closed model such as ChatGPT! When done, kill the model fully with `Ctrl+C`.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Note:</strong> Dedicated quantization comparisons now live in <strong>Lab 2</strong>. This lab stays focused on format conversion, raw <code>llama.cpp</code> inference, and Ollama workflows.
|
||||
</div>
|
||||
|
||||
## Objective 2: Ollama – LLM Easymode
|
||||
|
||||
Ollama is a lightweight framework that hides the low‑level steps required by LLaMa.cpp. It runs on **Linux, macOS, and Windows** and automatically manages system resources.
|
||||
|
||||
| Feature | Benefit |
|
||||
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
|
||||
| **Simplified model deployment** | Pull pre-quantized models from Ollama.com, HuggingFace, or a local GGUF file with a single command. |
|
||||
| **Automatic resource handling** | No need to manually load or unload; Ollama frees memory after a short idle period. |
|
||||
| **Built-in API provider** | `localhost:11434` mimics the OpenAI API, enabling seamless integration with notebooks, VS Code, or curl. |
|
||||
| **Cross-platform compatibility** | Thanks to underlying llama.cpp architecture, works on x86_64, ARM, and Apple Silicon without extra configuration. |
|
||||
| **Model-metadata inspection** | `ollama show <tag>` reveals the model architecture, context length, and quantization level. |
|
||||
|
||||
### 1 Execute: Pull and Run a Pre-Built Model from Ollama.com
|
||||
|
||||
Lets start by downloading Meta's llama3.2-3b, the "big" brother to the small model we've continuously worked with so far. The Ollama project and community have made this exceptionally easy for us to accomplish.
|
||||
|
||||
1. **Open the Ollama registry** – visit <https://ollama.com> in your browser.
|
||||
2. **Search for the model**
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/VBvOGty.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/VBvOGty.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Ollama Search.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
3. **Copy the `ollama run` command** that appears in the top‑right corner of the model card.
|
||||
4. **Paste the command into your terminal** and press **Enter**:
|
||||
|
||||
```bash
|
||||
> ollama run llama3.2
|
||||
```
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/ammtbmI.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/ammtbmI.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Ollama Run command.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
### 2 Explore: Interacting with Ollama Inference
|
||||
|
||||
When finished, you will be presented with a prompt, similar to the `llama-cli` commands. No need to download, convert, or quantize! Feel free to interact with this model until you're ready to move on.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/XZ6OYNI.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/XZ6OYNI.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Ollama Inference.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
### 3 Execute: Pull and Run a Pre-Built Model from HuggingFace.com
|
||||
|
||||
Similarly, we can do the same by pulling a model directly from **HuggingFace**. As long as the source file is a .gguf of any quantization level that fits within our system memory, Ollama can fetch it directly.
|
||||
|
||||
1. **Select a pre-quantized GGUF model** – visit [CodeIsAbstract](https://huggingface.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF) in your browser.
|
||||
2. **Use this model** - Click Use this model → choose the Ollama tab. The page displays a ready‑to‑run command:
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/lg2INAs.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/lg2INAs.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
HuggingFace Direct Ollama Pull.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
3. **Copy the command** and execute it in your terminal.
|
||||
|
||||
```bash
|
||||
ollama run hf.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF:Q8
|
||||
```
|
||||
|
||||
4. **Explore:** Interact with the model as normal.
|
||||
|
||||
### 4 Execute: Load a Custom `.gguf` Model
|
||||
|
||||
We can also import our WhiteRabbitNeo **.GGUF** model into Ollama, without having to upload it to **HuggingFace** first. In order to do so however, we need to create a **ModelFile**, a `.yml` file that describes to **Ollama** where the **.GGUF** is located, as well as any additional defaults we'd like Ollama to run with when performing inference.
|
||||
|
||||
1. **Create a simple modelfile** – This will tell Ollama where the model lives.
|
||||
|
||||
```bash
|
||||
echo "FROM /home/student/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf" > Modelfile
|
||||
```
|
||||
|
||||
2. **Register the model with Ollama**
|
||||
|
||||
```bash
|
||||
ollama create WhiteRabbitNeo -f Modelfile
|
||||
```
|
||||
|
||||
3. **Run the newly registered model**
|
||||
|
||||
```bash
|
||||
ollama run WhiteRabbitNeo
|
||||
```
|
||||
|
||||
4. **Explore:** The model is now stored locally under the tag _WhiteRabbitNeo_ and can be invoked just as any other model.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/ijsAl6m.png" target="_blank">
|
||||
<img
|
||||
src="https://i.imgur.com/ijsAl6m.png"
|
||||
style="width: 800; display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
</a>
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em;">
|
||||
Importing WhiteRabbitNeo V3.
|
||||
</figcaption>
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
---
|
||||
|
||||
#### Additional Useful Ollama Commands
|
||||
|
||||
| Command | Description |
|
||||
| ------------------------------- | --------------------------------------------------------------------------------------------- |
|
||||
| `ollama list` | Shows all models currently registered with Ollama. |
|
||||
| `ollama rm <tag>` | Deletes the specified model (freeing disk space). |
|
||||
| `ollama show <tag>` | Prints model metadata (architecture, context length, quantization). |
|
||||
| `ollama show <tag> --modelfile` | Prints an existing model's modelfile. Often useful for templating our own. |
|
||||
| `ollama serve` | Starts the OpenAI-compatible API server (runs automatically when you first use `ollama run`). |
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Ollama bridges the gap between low-level LLaMa.cpp tools and high-level usability, making it an ideal choice for rapid deployment and educational labs. By leveraging its API, model registry, and automation features, you can focus on experimentation rather than infrastructure. Quantization tradeoffs still matter, but they now have a dedicated home in Lab 2 so this lab can stay centered on conversion and deployment workflows.
|
||||
|
||||
<br>
|
||||
|
||||
---
|
||||
|
Before Width: | Height: | Size: 284 KiB After Width: | Height: | Size: 284 KiB |
|
Before Width: | Height: | Size: 208 KiB After Width: | Height: | Size: 208 KiB |
@@ -1,13 +1,21 @@
|
||||
---
|
||||
order: 4
|
||||
title: Lab 4 - Open WebUI and Prompting
|
||||
description: Use Open WebUI to run local models and experiment with prompting and inference parameters.
|
||||
---
|
||||
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 3 - Open WebUI & Prompting
|
||||
# Lab 4 - Open WebUI & Prompting
|
||||
|
||||
In this lab, we will:
|
||||
* Run Open WebUI
|
||||
* Using an Ollama Model within Open WebUI
|
||||
* Experimenting with Inference Parameters
|
||||
* Experimenting with Prompting Techniques
|
||||
|
||||
- Run Open WebUI
|
||||
- Using an Ollama Model within Open WebUI
|
||||
- Experimenting with Inference Parameters
|
||||
- Experimenting with Prompting Techniques
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
@@ -16,7 +24,8 @@ In this lab, we will:
|
||||
</div>
|
||||
|
||||
To start this lab, one web service has been preconfigured:
|
||||
* Open WebUI - http://<IP>:8080
|
||||
|
||||
- Open WebUI - http://<IP>:8080
|
||||
|
||||
## Objective 1 Execute: Accessing Open WebUI
|
||||
|
||||
@@ -44,8 +53,8 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Op
|
||||
### Execute: Download Qwen 3.5 4B
|
||||
|
||||
1. **Open the Ollama model registry**
|
||||
* Go to <https://ollama.com> in your web browser.
|
||||
* Locate the search box at the top of the page.
|
||||
- Go to <https://ollama.com> in your web browser.
|
||||
- Locate the search box at the top of the page.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/btkT9IH.png" target="_blank">
|
||||
@@ -56,11 +65,11 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Op
|
||||
</figure>
|
||||
|
||||
2. **Find the Qwen 3.5 family**
|
||||
* Type **`Qwen 3.5`** and press **Enter**.
|
||||
* The results page lists several parameter sizes (1 B → 27 B).
|
||||
- Type **`Qwen 3.5`** and press **Enter**.
|
||||
- The results page lists several parameter sizes (1 B → 27 B).
|
||||
|
||||
3. **Navigate to the list of tags**
|
||||
* Click the **`Tags`** link beneath the model description.
|
||||
- Click the **`Tags`** link beneath the model description.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/TuUbK7O.png" target="_blank">
|
||||
@@ -71,8 +80,8 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Op
|
||||
</figure>
|
||||
|
||||
4. **Select the 4B variant**
|
||||
* Locate **`Qwen3.5:4b`** in the table.
|
||||
* The size column reads **`3.4 GB`**, indicating the VRAM required for inference.
|
||||
- Locate **`Qwen3.5:4b`** in the table.
|
||||
- The size column reads **`3.4 GB`**, indicating the VRAM required for inference.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/eaRaqnq.png" target="_blank">
|
||||
@@ -83,14 +92,14 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Op
|
||||
</figure>
|
||||
|
||||
5. **Copy the model tag**
|
||||
* Click the **copy‑to‑clipboard** icon next to the tag (or highlight the text and press **Ctrl +C**).
|
||||
- Click the **copy‑to‑clipboard** icon next to the tag (or highlight the text and press **Ctrl +C**).
|
||||
|
||||
6. **Open the Open WebUI interface**
|
||||
* In a new browser tab, navigate to the URL where your Open WebUI instance is running (e.g., `http://localhost:8080`).
|
||||
- In a new browser tab, navigate to the URL where your Open WebUI instance is running (e.g., `http://localhost:8080`).
|
||||
|
||||
7. **Pull the model through the UI**
|
||||
* In the **“Select a model”** dropdown, paste the copied tag into the text field.
|
||||
* Click **`Pull`**. The UI will display a progress bar while Ollama downloads the GGUF file.
|
||||
- In the **“Select a model”** dropdown, paste the copied tag into the text field.
|
||||
- Click **`Pull`**. The UI will display a progress bar while Ollama downloads the GGUF file.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/Sf8sSs3.png" target="_blank">
|
||||
@@ -101,8 +110,8 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Op
|
||||
</figure>
|
||||
|
||||
8. **Verify the model works**
|
||||
* Once the download finishes, type a prompt in the chat window (e.g., “Tell me a short, funny story about a cat that learns to code”).
|
||||
* Press **Enter** and watch the response appear.
|
||||
- Once the download finishes, type a prompt in the chat window (e.g., “Tell me a short, funny story about a cat that learns to code”).
|
||||
- Press **Enter** and watch the response appear.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<a href="https://i.imgur.com/30OMNsk.png" target="_blank">
|
||||
@@ -113,7 +122,8 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Op
|
||||
</figure>
|
||||
|
||||
9. **Download Gemma3n e2B**
|
||||
* While we're downloading models, let us download one more. You can either repeat the process from the previous steps to find and download **Gemma3n e2B**, or just use the following model tag to download the model via the Open WebUI search bar:
|
||||
|
||||
- While we're downloading models, let us download one more. You can either repeat the process from the previous steps to find and download **Gemma3n e2B**, or just use the following model tag to download the model via the Open WebUI search bar:
|
||||
|
||||
```bash
|
||||
ollama pull gemma3n:e2b
|
||||
@@ -123,17 +133,16 @@ Google has designed gemma 3n models designed for efficient execution on resource
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Objective 3: Inference Settings
|
||||
|
||||
### Explore: OUI Inference Parameter Valves
|
||||
|
||||
Prior to this lab, we discussed inference settings such as Top K, Top P, and Temperature. Let's quickly review the most common settings to customize:
|
||||
|
||||
* `Context Length` - The amount of tokens the model is allowed to keep in active memory
|
||||
* `Temperature` - Changes the score of low probability token generation
|
||||
* `Top K` - Limits the possible tokens selection during inference to the most likely `K` selections
|
||||
* `Top P` - Limits the possible tokens to those whose cumulative probability exceeds `P`
|
||||
- `Context Length` - The amount of tokens the model is allowed to keep in active memory
|
||||
- `Temperature` - Changes the score of low probability token generation
|
||||
- `Top K` - Limits the possible tokens selection during inference to the most likely `K` selections
|
||||
- `Top P` - Limits the possible tokens to those whose cumulative probability exceeds `P`
|
||||
|
||||
Open WebUI allows us to easily modify these parameters on the fly through the chat controls, found on the right hand side next to your user's icon.
|
||||
|
||||
@@ -151,10 +160,10 @@ Open WebUI allows us to easily modify these parameters on the fly through the ch
|
||||
|
||||
By default, Open WebUI selects the following generically sound options, with the expectation that users have access to modest hardware:
|
||||
|
||||
* `Context Length` - 2048
|
||||
* `Temperature` - .8
|
||||
* `Top K` - 40
|
||||
* `Top P` - .9
|
||||
- `Context Length` - 2048
|
||||
- `Temperature` - .8
|
||||
- `Top K` - 40
|
||||
- `Top P` - .9
|
||||
|
||||
While we won't play with `Context Length`, this parameter is critical for successfully accomplishing more complicated tasks using local models. With only the small default context length value, the model will quickly forget your instructions and interactions, rendering the results the model generates less useful. Unfortunately, just increasing this value is not always an option, as your selected model + `Context Length` must fit within your available memory. As with many challenges in AI, a key to solving issues with `Context Length` is often scaling your hardware to meet the demands of the task. This generally means utilizing hardware with larger amounts of VRAM or unified memory – either by purchasing it or renting access.
|
||||
|
||||
@@ -180,26 +189,25 @@ Thankfully, our lab gives us just such a reason! We can manually modify these op
|
||||
|
||||
Lets test this with a series of interactions, themed around Magic the Gathering. Qwen is considered a multi-modal model, meaning we're not just limited to inputing text! Input the following image, and ask `What is this? What does it do?`
|
||||
|
||||
|
||||
Next, set our inference parameters to the following:
|
||||
|
||||
* `Temperature` - 1.1
|
||||
* `Top K` - 100
|
||||
* `Top P` - .95
|
||||
- `Temperature` - 1.1
|
||||
- `Top K` - 100
|
||||
- `Top P` - .95
|
||||
|
||||
Repeat your first interaction, noting the differences in model output. Less "likely" or common words were hopefully selected!
|
||||
|
||||
When satisfied, lets next set our inference parameters to the following:
|
||||
|
||||
* `Temperature` - 2
|
||||
* `Top K` - 400
|
||||
* `Top P` - .95
|
||||
- `Temperature` - 2
|
||||
- `Top K` - 400
|
||||
- `Top P` - .95
|
||||
|
||||
The model this time likely has gone off the rails, answering for an extended period of time, and trailing off incoherently. This is due to us increasing the likelihood of improbable tokens far beyond the expected performance thresholds google has set for us. Lets next test the opposite:
|
||||
|
||||
* `Temperature` - Default
|
||||
* `Top K` - 1
|
||||
* `Top P` - Default
|
||||
- `Temperature` - Default
|
||||
- `Top K` - 1
|
||||
- `Top P` - Default
|
||||
|
||||
Feel free to continue to explore with other topics or images. Note how each time we restart our conversation, the model gives us the exact same answer. This is because Top K limits the model to select only the single most likely token for the provided input! Even with this restriction however, note that the model can still provide different answers based on GPU differences, random fluctuations in the GPU hardware, or other similarly improbable events. Never forget that LLMs are deterministic, and even when highly restricted, can output unexpected results.
|
||||
|
||||
@@ -217,16 +225,17 @@ Feel free to continue to explore with other topics or images. Note how each tim
|
||||
<br><br>
|
||||
|
||||
Alternatively, choose to perform these steps with **Gemma3n e2B**, which can handle tight environments more gracefully.
|
||||
|
||||
</div>
|
||||
|
||||
Next, lets review different ways we can coax a model to perform better, without having to perform fine tuning or parameter customization. We can do this by "priming" the model with our first prompt in a number of ways:
|
||||
|
||||
<br>
|
||||
|
||||
* Few Shot Prompting - Providing examples of our desired outcome up front
|
||||
* Meta Prompting - Providing a guide to reach the desired outcome
|
||||
* Chain of Thought - Providing the model guidance to think through its response
|
||||
* Self Criticism - Asking the model to play "devil's' advocate" against itself
|
||||
- Few Shot Prompting - Providing examples of our desired outcome up front
|
||||
- Meta Prompting - Providing a guide to reach the desired outcome
|
||||
- Chain of Thought - Providing the model guidance to think through its response
|
||||
- Self Criticism - Asking the model to play "devil's' advocate" against itself
|
||||
|
||||
<br>
|
||||
|
||||
@@ -234,10 +243,10 @@ Each of these tools can be combined to help achieve a greater effect. Below is
|
||||
|
||||
<br>
|
||||
|
||||
* Design a black rare creature card that fits thematically and mechanically into a Graveyard Matters Magic the Gathering set. Provide a few existing cards to help give the model a template.
|
||||
* Design the same card, but this time outline the type, mechanics, tone, and identity
|
||||
* Invent a new keyword. Have the model reason step by step how the keyword will work within the game
|
||||
* Review your new keyword for game balance. Have the model challenge its decisions.
|
||||
- Design a black rare creature card that fits thematically and mechanically into a Graveyard Matters Magic the Gathering set. Provide a few existing cards to help give the model a template.
|
||||
- Design the same card, but this time outline the type, mechanics, tone, and identity
|
||||
- Invent a new keyword. Have the model reason step by step how the keyword will work within the game
|
||||
- Review your new keyword for game balance. Have the model challenge its decisions.
|
||||
|
||||
<br>
|
||||
|
||||
@@ -262,6 +271,7 @@ In the new model window, we can customize many different options for our model,
|
||||
1. Set the name to `Qwen 3.5 LLM Demo`
|
||||
2. Set the Base Model to `Qwen3.5:4b`
|
||||
3. Provide a system prompt. You can set this to be any task you'd like model to focus on, or we can stick with our Magic the Gathering theme. Utilize the following prompt, or for bonus points, have Qwen 3.5 generate one for you.
|
||||
|
||||
```text
|
||||
"You are a creative designer for Magic: The Gathering, tasked with generating new Sliver creature cards. Follow these guidelines to ensure the cards align with the game's mechanics and lore:
|
||||
Card Outline Structure:
|
||||
@@ -302,17 +312,16 @@ When provided a name, generate a new Sliver card following this structure."
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
|
||||
4. To ensure only the best card generation, show the `Advanced Params` and set the following to add creativity:
|
||||
* `Temperature` - 1.1
|
||||
* `Top K` - 100
|
||||
* `Top P` - .95
|
||||
* `Ollama (Think)` - Off
|
||||
- `Temperature` - 1.1
|
||||
- `Top K` - 100
|
||||
- `Top P` - .95
|
||||
- `Ollama (Think)` - Off
|
||||
|
||||
Note: While we haven't actively discussed them as a part of this lab, as you play with more advanced inference problems, you may also find the following parameters of interest:
|
||||
* `Max Tokens` - Limit the possible length of a response to the desired number of tokens
|
||||
* `num_gpu` - Manually override Ollama's built in layer offload determination. Useful for increasing performance on mixed GPU setups.
|
||||
* `use_mlock` - Manually force Ollama to ensure all model components are kept within active memory. Useful for smaller systems.
|
||||
- `Max Tokens` - Limit the possible length of a response to the desired number of tokens
|
||||
- `num_gpu` - Manually override Ollama's built in layer offload determination. Useful for increasing performance on mixed GPU setups.
|
||||
- `use_mlock` - Manually force Ollama to ensure all model components are kept within active memory. Useful for smaller systems.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/9RcJVjK.png" target="_blank">
|
||||
@@ -352,4 +361,3 @@ Throughout this lab, we've explored the fascinating world of Open WebUI and prom
|
||||
4. **System Prompting**: We created custom models with specific system prompts and parameter settings, learning how to tailor LLM behavior for specialized tasks.
|
||||
|
||||
These concepts are foundational for effectively working with large language models in real-world applications. Remember that prompt engineering is both an art and a science - it requires understanding both the capabilities of the model and the nuances of human language. As you continue your journey with LLMs, don't hesitate to experiment with different approaches and parameters to find what works best for your specific use cases.
|
||||
|
||||
|
Before Width: | Height: | Size: 81 KiB After Width: | Height: | Size: 81 KiB |
|
Before Width: | Height: | Size: 118 KiB After Width: | Height: | Size: 118 KiB |
|
Before Width: | Height: | Size: 157 KiB After Width: | Height: | Size: 157 KiB |
|
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 89 KiB After Width: | Height: | Size: 89 KiB |
|
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 38 KiB |
@@ -1,475 +0,0 @@
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 5 - Dataset Generation and Fine Tuning
|
||||
|
||||
In this lab, we will:
|
||||
* Explore public datasets
|
||||
* Generate a dataset with Kiln.ai
|
||||
* Fine-tune Gemma3 with Unsloth Studio
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
<strong>Explore</strong> sections focus on understanding dataset choices and trade-offs.<br />
|
||||
<strong>Execute</strong> sections focus on building, reviewing, and preparing data for fine-tuning workflows.
|
||||
</div>
|
||||
|
||||
To start this lab, one web service has been preconfigured:
|
||||
* Unsloth - http://<IP>:8888
|
||||
|
||||
You'll need to install Kiln from the following URL - https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1
|
||||
|
||||
## Objective 1 Explore: Public Datasets
|
||||
|
||||
While fine tunes may not have the same level of impact as in the early days of LLMs, they can still provide hyper specialized capabilities to enable small LLMs such as those we've used throughout the course to compete with large, closed LLMs such as ChatGPT and Gemini. For use cases where data needs to be private, where the costs of a closed model are too high, or we want a model that is focused for a specific RAG dataset.
|
||||
|
||||
There are multiple ways to generate a useful dataset, including but not limited to:
|
||||
|
||||
| # | Method | Typical use‑case | Key advantage |
|
||||
|---|--------|-----------------|----------------|
|
||||
| 1 | **Manual data collection** | Surveys, interviews, domain‑expert annotation | Highest specificity; fully controlled quality |
|
||||
| 2 | **Web scraping** | Harvesting public articles, forum posts, code snippets | Scalable; leverages existing web content |
|
||||
| 3 | **APIs & databases** | Accessing structured resources (e.g., Wikipedia API, PubMed) | Structured data; often well‑documented |
|
||||
| 4 | **Crowdsourcing** | Large‑scale labeling (e.g., image bounding boxes) | Cost‑effective for repetitive tasks |
|
||||
| 5 | **Data augmentation** | Expanding a small set of images or text | Improves diversity without new collection |
|
||||
| 6 | **Public datasets** | Ready‑made corpora from repositories like HuggingFace | Immediate availability; often pre‑processed |
|
||||
| 7 | **Synthetic data generation** | Simulated sensor readings, procedurally generated text | Useful when real data is scarce or sensitive |
|
||||
|
||||
|
||||
Let's at least quickly touch on option 6, **Public Datasets**. While they may vary in quality, they're a great way to jumpstart a particular focus for a fine tune. Many are found on https://huggingface.co/datasets, and we can see there are over 400k datasets readily accessible for many different tasks, from many different providers, including [OpenAI](https://huggingface.co/datasets/openai/gsm8k), [Nvidia](https://huggingface.co/datasets/nvidia/Nemotron-CrossThink), and more. Much like with models, there are numerous tools we can utilize to filter these datasets, such as on format, modality, or license.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/kdnBCyL.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Example Datasets.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
#### Explore a dataset (GSM8K)
|
||||
|
||||
Navigate to [GSM8K](https://huggingface.co/datasets/openai/gsm8k). Much like how models have **model cards**, datasets have **dataset cards**. These perform a similar job, providing:
|
||||
1. Tags
|
||||
2. Example data & a *Data Studio* button for interacting with the dataset on **HuggingFace** directly.
|
||||
3. Easy Download Links (although we can also use `git clone`)
|
||||
4. The Description
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/Y55FAPV.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Dataset Model Card Contents.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
At the heart of each data set is the pairing of *input* and *result*. In the case of math, this is relatively easy, as these are quite literally *question* and *answer* pairs to math problems.
|
||||
|
||||
Larger datasets, such as [Fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), utilize more complicated structures, but all still fundamentally follow this same principle. In the case of [Fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), the inputs are titles and summaries of web pages, with links to the precise web page as scraped from the internet.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Explore:</strong> Open the <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb/viewer/sample-10BT/train" target="_blank" rel="noreferrer">Fineweb sample viewer</a> in a new tab and inspect a subset of this <strong>15 trillion token</strong> dataset directly on Hugging Face.
|
||||
</div>
|
||||
|
||||
#### Open‑weight vs. open‑source
|
||||
One last note on public datasets. A common misconception is that *open weight* models are **open source**.
|
||||
|
||||
<br>
|
||||
|
||||
- *Open‑weight* models (e.g., Gemma, DeepSeek R1, Qwen) provide publicly released checkpoints but **do not** include permissive source‑code licenses.
|
||||
- True **open‑source** LLMs remain rare; there are very few models that freely share their Dataset and Training pipeline. Examples are **INTELLECT‑2**, which was built via a distributed "SETI@Home‑style" effort, or Nvidia's **Nemotron 3** family of models.
|
||||
|
||||
<br>
|
||||
|
||||
Unfortunately, **INTELLECT‑2** does not favorably compare to existing *open weight* models such as **Gemma**, **DeepSeek R1**, **Qwen**, or other bleeding edge models. **Nemotron 3** also is behind the State of the Art (SOTA) models, but instead serves as a showcase on how anyone can train models using Nvidia hardware.
|
||||
|
||||
Regardless of model type though, when using any *open weight* model for corporate purposes, review the license for allowed use!
|
||||
|
||||
<br>
|
||||
|
||||
---
|
||||
|
||||
## Objective 2: Synthetic Dataset Generation
|
||||
|
||||
If you can, I strongly encourage you to try and find ready made, or easily massaged datasets that do not require synthetic data. You'll often obtain better results with less effort this way. After all, the original frontier ChatGPT family of models merely scraped the entire internet, every book, scientific papers, and other "pre made" raw data to help generate their first dataset. However, this is often unrealistic, as at minimum, we need **1000** input-output pairs in order to begin fine tuning, so...
|
||||
|
||||
|
||||
### Why Use Synthetic Data?
|
||||
|
||||
| Reason | Explanation |
|
||||
|--------|-------------|
|
||||
| **Data scarcity** | Niche domains (e.g., MITRE ATT&CK classification) often lack ≥ 1 000 labeled examples. |
|
||||
| **Scalability** | A single large model can produce thousands of examples in minutes, saving manual effort. |
|
||||
| **Quality control** | By generating with a *larger* model than the target (e.g., Gemma‑12B qat → Gemma‑4B), you can distill richer responses within specific domains. |
|
||||
| **Iterative refinement** | Kiln lets you rate or repair each pair, turning noisy outputs into a clean training set. |
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
<strong>Rule of Thumb:</strong> Never generate data with a model that is smaller than the model you plan to fine-tune.
|
||||
</div>
|
||||
|
||||
---
|
||||
### Execute: Install & Launch Kiln AI
|
||||
|
||||
### 1. Install & Launch Kiln AI
|
||||
|
||||
If you haven't yet, download [Kiln AI](https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1) and run the installer for your OS.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Tip:</strong> These steps were designed for <strong>Kiln v0.18</strong>. While compatible with newer versions, v0.18 features a polished, simplified UI ideal for this lab. Note that Kiln undergoes active development with frequent UI changes across versions.
|
||||
</div>
|
||||
|
||||
1. **Open Kiln**. It should automatically go to `http://localhost:3000` in your machine's browser.
|
||||
2. Click **`Get Started`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/hJNehuE.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Welcome screen – click "Get Started".</figcaption>
|
||||
</figure>
|
||||
|
||||
3. Choose **`Continue`** (or **`Skip Tour`** if you prefer).
|
||||
4. Dismiss the newsletter prompt (optional).
|
||||
|
||||
Kiln is now ready for configuration.
|
||||
|
||||
### 2. Connect Kiln to Ollama
|
||||
|
||||
1. In Kiln's left‑hand **Providers** panel, click **`Connect`** under the Ollama entry.
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
Use your Ollama instance IP to connect (I.E. http://<STUDENT IP>:11434). You must be connected to the VPN for this to work.
|
||||
</div>
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/vEwUszl.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Connect to a local or remote Ollama instance.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Click **`Continue`** to confirm the connection.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Tip:</strong> If you have access to a commercial LLM (for example, OpenAI GPT-4o), you can point Kiln to that endpoint for higher-quality synthetic data by replacing the Ollama URL in <strong>Providers → Connect</strong>.
|
||||
</div>
|
||||
---
|
||||
|
||||
### 3. Create a Kiln Project
|
||||
|
||||
1. Kiln will prompt you to **Create a Project**. Enter any descriptive name (e.g., `MITRE‑ATTACK‑FineTune`).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/8CLEp9s.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Name your project.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Press **`Create`**. You are now inside the project workspace.
|
||||
|
||||
---
|
||||
|
||||
### 4. Define the Fine‑Tuning Task
|
||||
|
||||
1. Click **`Add Task`** and fill out the form with the details below.
|
||||
|
||||
* **Task name:** `ATT&CK Classification`
|
||||
* **Goal:** "Given a description of an attack technique, tactic, or procedure, return only an accurate MITRE ATT&CK ID and Name in the format: "ID# - Technique". "
|
||||
* **System prompt (auto‑filled):** Kiln will prepend this text to every generation request.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/43o2s0Y.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Task definition screen.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Click **`Save Task`**. The task now appears in the left‑hand **Tasks** list.
|
||||
|
||||
---
|
||||
|
||||
### 5. Kiln Main Interface Overview
|
||||
|
||||
| Sidebar item | Primary use |
|
||||
|--------------|------------|
|
||||
| **Run** | Manually generate one input‑output pair at a time (useful for quick checks). |
|
||||
| **Dataset** | View, edit, export, or import the entire collection of pairs. |
|
||||
| **Synthetic Data** | Bulk‑generate pairs using a model of your choice. |
|
||||
| **Evals** | Run automatic evaluation against a held‑out test set. |
|
||||
| **Settings** | Project‑level configuration (e.g., default model, output format). |
|
||||
|
||||
When you first open a project, Kiln lands on the **Run** page.
|
||||
|
||||
---
|
||||
|
||||
## 6 Manual Generation (Run Page)
|
||||
|
||||
1. In the **Run** view, set the parameters as shown below (you may substitute a larger model if your hardware permits).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/vvW0wjk.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Configure the Run settings.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Type a **scenario description** (e.g., "An attacker dumps LSASS memory using Mimikatz") and click **`Run`**.
|
||||
3. Kiln sends the prompt to the selected Ollama model (by default `gemma3:12b‑it‑qat`).
|
||||
4. When the model returns an answer, you can **rate** it from 1 ★ to 5 ★.
|
||||
|
||||
*5 ★* → Accept and click **`Next`**.
|
||||
*< 5 ★* → Click **`Attempt Repair`**, edit the response, then **`Accept Repair`** or **`Reject`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/wqVsYMk.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Rate a correct response with 5 ★.</figcaption>
|
||||
</figure>
|
||||
|
||||
5. Repeat until you have a handful of high‑quality pairs. This manual step is optional but useful for seeding the dataset with "gold‑standard" examples.
|
||||
|
||||
---
|
||||
|
||||
### 7 Bulk Synthetic Data Generation
|
||||
|
||||
#### 7.1 Open the Generator
|
||||
|
||||
1. In the sidebar, click **`Synthetic Data` → `Generate Fine-Tuning Data`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/l6OiUeP.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Enter the bulk‑generation workflow.</figcaption>
|
||||
</figure>
|
||||
|
||||
|
||||
#### 7.2 Generate Top‑Level Topics
|
||||
|
||||
1. Click **`Add Topics`**. This will generate top level topics that follow broad MITRE ATT&CK categories.
|
||||
2. Choose **`Gemma-3n-2B`**.
|
||||
3. Set **Number of topics** to **8** and click **`Generate`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/SHh8v0y.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Select model & number of topics.</figcaption>
|
||||
</figure>
|
||||
|
||||
4. Review the generated list. Delete any unsatisfactory topics (hover → click the trash icon) or click **`Add Topics`** again to generate more. Alternatively, if additoinal depth is required, click **`Add Subtopics`** to drill down deeper into any of the high level topics created by Gemma initially.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/wHNv3Om.png" width="800"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Final set of 8 topics.</figcaption>
|
||||
</figure>
|
||||
|
||||
#### 7.3 Create Input Scenarios for All Topics
|
||||
|
||||
1. With the topics selected, click **`Generate Model Inputs`**. Ensure **`Gemma-3n-2B`** is still chosen, and then affirm your selection.
|
||||
Kiln now asks the model to produce a short *scenario description* for each topic.
|
||||
2. After the model finishes, review the generated inputs. You may edit any that look off.
|
||||
|
||||
#### 7.4 Generate Corresponding Outputs
|
||||
|
||||
1. Click **`Save All Model Outputs`**. Kiln now runs the model a second time—this time using each generated input as the prompt—to produce the *output* (the ATT&CK technique label).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/A47GRVr.png" width="800"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Produce the "output" side and store the pair.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. The full input‑output pairs are automatically added to the project's dataset.
|
||||
|
||||
#### 7.5 Review the Completed Dataset
|
||||
|
||||
1. Switch to the **`Dataset`** tab.
|
||||
2. You should see a table of 64 (8 topics × 8 samples) pairs. Clicking any row opens the same **Run** view, where you can **rate**, **repair**, or **delete** the pair.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/DnyXYJO.png" width="800"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Dataset overview with generated pairs.</figcaption>
|
||||
</figure>
|
||||
|
||||
---
|
||||
|
||||
### 8. Dataset Export (Create a Fine-Tune)
|
||||
|
||||
1. Once you are satisfied with the dataset, you can export it to numerous forms of JSONL via the **Fine Tune → Create a Fine Tune** button.
|
||||
|
||||
2. Kiln will first ask what format it would like our data to be exported to. We can leave the default setting of *Download: OpenAI chat format (JSONL). Next, select *Create a New Fine-Tuning Dataset.*
|
||||
3. Kiln supports splitting our generated data into a number of buckets, including *`Training`* *`Test`* and *`Validation`*. Each of these dataset segments is critical to a great fine tune, but at our generated 64 examples, we don't have the luxury of creating a split. As such, under **`Advanced Options`**, select *100% training*, and click *Create Dataset*.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/vp6jobS.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Dataset overview with generated pairs.</figcaption>
|
||||
</figure>
|
||||
|
||||
4. We can ignore all further options, and select *Download Split*. A new .jsonl file will be saved!
|
||||
|
||||
---
|
||||
|
||||
## Objective 3: Fine Tuning with Unsloth Studio
|
||||
|
||||
There are many popular options for performing fine tunes, although many have their drawbacks:
|
||||
* [Unsloth](https://unsloth.ai) is the most popular solution, but currently does not support multi-gpu setups without a commercial license.
|
||||
* [Axoltl](https://axolotl.ai) is built off of Unsloth, and does support multi-gpu setups, but often lags behind Unsloth in features and capability, and does not feature any Web UI.
|
||||
* [LLaMaFactory](https://github.com/hiyouga/LLaMA-Factory) is the most flexible of these options, supporting both Unsloth & Axlotle, as well as additional backends. However, this tool is daunting for a beginner to approach fine tuning, and is best left for later experimentation.
|
||||
<br>
|
||||
While I encourage you to explore all of these tools, they are unfortunately out of the scope for this lab. Instead, we're going to focus on **Unsloth**, as it provides the best web UI to easily navigate the fine tuning process.
|
||||
|
||||
### Explore: Touring Unsloth Studio
|
||||
|
||||
Although Unsloth Studio does its best to simplify the fine tuning process, there are still many dials and knobs to turn! Lets take a brief tour of the most important options:
|
||||
|
||||
1. Model Selection - This area allows us to select any model that we're interested in fine tuning. Unsloth Studio will handle downloading the FP16 version of the model from **HuggingFace** for us.
|
||||
2. Quantization Selection - Without much better hardware, we will usually be training **LoRA**s (Low-Rank Adapters). These will slightly nudge the parameters of the model in the direction we're interested in. If we need additional headroom, we can instead **quantize the base model** (e.g., reduce its precision from 16-bit to 4-bit) and then apply **LoRA** to the quantized model, generating a **QLoRA** (Quantized LoRA). This approach combines the efficiency of quantization with the parameter-efficiency of LoRA. Unsloth will conveniently tell us its estimate for how well a given combination of *Model* & **QLoRA** will fit in our system's available VRAM.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/XwAdaKJ.png"
|
||||
width="800"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Model & LoRA Type Selections. Note how models are labeled "OOM" or "Tight" based on hardware.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
3. Dataset Selection - This is where we can utilize our custom made dataset. Unfortunately, while we've gone through the process of making a dataset, we had to use a very small model to simulate the process. Conveniently, Unsloth allows us to search for any dataset available publicly on HuggingFace. We can select conveniently select the sarahwei/cyber_MITRE_CTI_dataset_v15 for our purposes. You can select "View Dataset" if you'd like to see some of the raw contents of this data.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/8xBdcnd.png"
|
||||
width="400"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Dataset Selection
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
4. Train Settings - This is where we can configure exactly how our model will be trained. The majority of these settings can stay default, until you've a specific need that pushes you down the rabbit hole. In particular, we'll be interested in
|
||||
* **Learning Rate** - Controls how large an adjustment to the model's weights are made during each step
|
||||
* **Epoch** - Determines the number of times the training algorithm will iterate over the entire dataset (aka repeats training 3 times by default). Critical to help avoid under or over fitting.
|
||||
* **Cutoff length** - Equivalent to Ollama's context. As always, larger context training requires more memory.
|
||||
* **Batch Size** - Can speed up training, as long as we have the hardware to support.
|
||||
* **Warmup Steps** - The number of initial training steps during which the learning rate gradually increases to the set target. Helps with stability.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/fzSvggY.png"
|
||||
width="400"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Fine Tuning Settings
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
### Execute: Unsloth Studio Fine Tuning
|
||||
|
||||
Set the following before we start to fine tune Gemma:
|
||||
1. **Model**: `unsloth/gemma-3-270m-it`
|
||||
2. **Max Steps**: `100` (NOTE: For real fine tuning, use Epochs, not Steps.)
|
||||
3. **Learning Rate**: `0.00005`
|
||||
4. **Dataset**: `sarahwei/cyber_MITRE_CTI_dataset_v15`
|
||||
5. **Warmup Steps**: `100`
|
||||
* Scroll to the bottom of the page, and click `Preview command`. The WebUI is merely a front end for constructuing `llamafactory-cli` commands, and this shows exactly what will be run.
|
||||
* When done reviewing, next click `Start`. It will take some time for Unsloth Studio to start its process, as it will first need to download the full `FP16` raw `Gemma-3-4B` model files.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/fzSvggY.png"
|
||||
width="400"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Setting Max Steps, Learning Rate, and Warmup Steps
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
**Monitor the loss graph** | The graph is measuring **Loss** per **Training step** (roughly 8k steps, 2.5k examples * 3 epochs), or put simply, how different the model's predicted answer is from our data. This should gradually, logarithmically slope downwards if training is stable.
|
||||
|
||||
#### What to Look For
|
||||
|
||||
- **Training Loss:** Decreasing smoothly → model is learning effectively and training is stable
|
||||
- **Gradient Norm:** Drops then stabilizes → gradients are well-behaved (no major spikes)
|
||||
- **Learning Rate:** Gradually increasing, then eventually decreasing → expected warmup behavior helping stable early training
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/Cue7afQ.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Typical Training Run
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
Unfortunately, due to the time constraints of a live classroom, we'll be unable to pursue this training run to completion. On the lab provided GPUs, a full Epoch could take up to two hours! Feel free to cancel it at your leisure.
|
||||
|
||||
We can however chat with a version of Gemma 3 4B that was trained before this class. It was trained against roughly 60,000 examples, partially generated using kiln, partially harvested from various datasets throughout Huggingface. While not perfect, we can see that the model is signifigantly better than the default.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/FKZXaV3.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Load Model for Chat
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
To test this ourselves, select:
|
||||
|
||||
1. The chat button at the very top of the screen
|
||||
2. Download our model. Its under my personal HuggingFace Account name, c4ch3c4d3
|
||||
3. Set the system prompt to the one we selected when using **Kiln.ai** - "Given a description of an attack technique, tactic, or procedure, return only an accurate MITRE ATT&CK ID and Name in the format: "ID# - Technique".
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/GHExjE3.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Test prompt
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
| Test Prompt | Expected Output Format |
|
||||
|------------|------------------------|
|
||||
| "A malicious actor uses PowerShell to download a file from a remote server." | `T1059.001 – PowerShell` |
|
||||
| "The adversary exfiltrates data via a compressed archive sent over HTTP." | `T1567.001 – Exfiltration Over Web Services` |
|
||||
| "Credential dumping is performed using Mimikatz." | `T1003.001 – LSASS Memory` |
|
||||
|
||||
The Unsloth chat view is relatively simplistic, but does provide options for changing inference perameters, such as Top-P or Temperature, as well as a location for us to input our system prompt. If we're looking to test the model's accuracy with our fine tune, we normally need to ensure these values match the desired endstate values as closely as possible.
|
||||
|
||||
### Export the Fine‑Tuned Model
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
<strong>Skippable:</strong> These steps are provided for reference as we never successfully finished a fine tune within the lab time period.
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
1. Switch to the **Export** tab.
|
||||
2. Select the training run of the model you've performed.
|
||||
3. Select the latest checkpoint, or if you'd like to explore an alternative, the checkpoint desired.
|
||||
4. We can export in a number of formats:
|
||||
|
||||
- **Merged Model** – A BF16 .safetensors format of the model which can be utilized in other projects
|
||||
- **LORA** – Only export the LORA adapter layers generated during training. Useful if we wish to share only our new files with other users who already have the model downloaded, but not our fine tune.
|
||||
- **GGUF** – A compact file ready for import into **Ollama** or other GGUF‑compatible runtimes.
|
||||
|
||||
<br>
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
In this lab, we completed a LoRA fine-tuning workflow:
|
||||
|
||||
1. **Dataset Generation** - We explored public datasets on HuggingFace and used Kiln AI to generate a synthetic dataset for MITRE ATT&CK classification.
|
||||
2. **Fine Tuning** - We used Unsloth Studio to fine-tune Gemma-3-4B on our generated dataset.
|
||||
3. **Validation & Export** - We tested the model with sample prompts and exported the fine-tuned model in both FP16 and GGUF formats.
|
||||
|
||||
If all has gone well, then the model should be much more accurate at identifying MITRE ATT&CK codes from user input scenarios. If not, additional experimentation may be necessary to produce a good fine tune. Playing with the parameters we've discussed, improving and expanding our dataset, or even fine tuning a larger or better base model can also help affect our success rate.
|
||||
@@ -1,13 +1,20 @@
|
||||
---
|
||||
order: 5
|
||||
title: Lab 5 - Embedding and Chunking
|
||||
description: Explore chunking strategies and embeddings, then connect them to retrieval workflows.
|
||||
---
|
||||
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 4 - Embedding and Chunking
|
||||
# Lab 5 - Embedding and Chunking
|
||||
|
||||
In this lab, we will:
|
||||
* Explore various chunking strategies
|
||||
* Explore how embeddings and vectors allow similar concepts to cluster together within n-dimensional spaces
|
||||
* Connect chunking and embedding concepts to a functional RAG workflow
|
||||
|
||||
- Explore various chunking strategies
|
||||
- Explore how embeddings and vectors allow similar concepts to cluster together within n-dimensional spaces
|
||||
- Connect chunking and embedding concepts to a functional RAG workflow
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
@@ -17,8 +24,8 @@ In this lab, we will:
|
||||
|
||||
To start this lab, two web services have been preconfigured:
|
||||
|
||||
* ChunkViz - http://<IP>:3000
|
||||
* Embedding Atlas - http://<IP>:5055
|
||||
- ChunkViz - http://<IP>:3000
|
||||
- Embedding Atlas - http://<IP>:5055
|
||||
|
||||
## Objective 1 Explore: Chunking Strategy
|
||||
|
||||
@@ -42,8 +49,8 @@ In a web browser, navigate to http://<STUDENT ASSIGNED SYSTEM IP>:3000. Once loa
|
||||
|
||||
ChunkViz starts with example text that has already been split using a default character-based strategy. In this view, every 200 characters is treated as a chunk. Modify the sliders to set the following values:
|
||||
|
||||
* `Chunk Size` - `256`
|
||||
* `Chunk Overlap` - `20`
|
||||
- `Chunk Size` - `256`
|
||||
- `Chunk Overlap` - `20`
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/9SDyh7I.png" target="_blank">
|
||||
@@ -62,7 +69,7 @@ Notice how the colors in the text below dynamically change. Each color represent
|
||||
Next, explore the major chunking strategies available in ChunkViz:
|
||||
|
||||
| Strategy | Description |
|
||||
|---|---|
|
||||
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Character Splitter | Splits text into chunks based on a fixed number of characters. |
|
||||
| Token Splitter | Splits chunks based on tokenization values using **tiktoken**. |
|
||||
| Sentence Splitter | Splits chunks into rough sizes based on what the tool interprets as a sentence. |
|
||||
@@ -84,7 +91,7 @@ Select each option and observe the different ways ChunkViz breaks text into chun
|
||||
|
||||
Each strategy comes with its own benefits and drawbacks. Character-based splitting is often one of the easiest strategies to implement because OCR and text extraction ultimately produce characters. Token-based splitting is useful when keeping chunk sizes consistent for a specific model matters most. Sentence and recursive strategies are often better at preserving complete thoughts, although real-world documents do not always follow clean sentence boundaries.
|
||||
|
||||
Explore one more chunking example using a larger document. Open your provided copy of *Blindsight* by Peter Watts in `.txt` format, paste its contents into ChunkViz, and then continue experimenting with chunk sizes from `64` up to `1024` using different strategies. Notice how different chunk sizes and separators change the resulting structure.
|
||||
Explore one more chunking example using a larger document. Open your provided copy of _Blindsight_ by Peter Watts in `.txt` format, paste its contents into ChunkViz, and then continue experimenting with chunk sizes from `64` up to `1024` using different strategies. Notice how different chunk sizes and separators change the resulting structure.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/M51ASNK.png" target="_blank">
|
||||
@@ -205,10 +212,10 @@ At this point, you have seen the two major stages that make retrieval-augmented
|
||||
|
||||
Use what you observed in ChunkViz and Embedding Atlas to reason through the following questions:
|
||||
|
||||
* How would a chunk that is too small affect retrieval quality?
|
||||
* How would a chunk that is too large dilute the meaning of an embedding?
|
||||
* Why might a semantically similar result appear visually distant on a 2D projection?
|
||||
* How do chunking strategy and embedding quality work together to improve downstream answers?
|
||||
- How would a chunk that is too small affect retrieval quality?
|
||||
- How would a chunk that is too large dilute the meaning of an embedding?
|
||||
- Why might a semantically similar result appear visually distant on a 2D projection?
|
||||
- How do chunking strategy and embedding quality work together to improve downstream answers?
|
||||
|
||||
This objective is meant to connect the lab tools back to the full RAG workflow. The better your chunking choices and embeddings are, the more useful the retrieved context will be for the model that answers the user.
|
||||
|
||||
|
Before Width: | Height: | Size: 278 KiB After Width: | Height: | Size: 278 KiB |
|
Before Width: | Height: | Size: 216 KiB After Width: | Height: | Size: 216 KiB |
|
Before Width: | Height: | Size: 323 KiB After Width: | Height: | Size: 323 KiB |
|
Before Width: | Height: | Size: 353 KiB After Width: | Height: | Size: 353 KiB |
|
Before Width: | Height: | Size: 333 KiB After Width: | Height: | Size: 333 KiB |
|
Before Width: | Height: | Size: 792 KiB After Width: | Height: | Size: 792 KiB |
|
Before Width: | Height: | Size: 632 KiB After Width: | Height: | Size: 632 KiB |
|
Before Width: | Height: | Size: 294 KiB After Width: | Height: | Size: 294 KiB |
|
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 26 KiB |
@@ -0,0 +1,482 @@
|
||||
---
|
||||
order: 6
|
||||
title: Lab 6 - Dataset Generation and Fine Tuning
|
||||
description: Review dataset options, generate examples with Kiln.ai, and fine-tune a model in Unsloth.
|
||||
---
|
||||
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 6 - Dataset Generation and Fine Tuning
|
||||
|
||||
In this lab, we will:
|
||||
|
||||
- Explore public datasets
|
||||
- Generate a dataset with Kiln.ai
|
||||
- Fine-tune Gemma3 with Unsloth Studio
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
<strong>Explore</strong> sections focus on understanding dataset choices and trade-offs.<br />
|
||||
<strong>Execute</strong> sections focus on building, reviewing, and preparing data for fine-tuning workflows.
|
||||
</div>
|
||||
|
||||
To start this lab, one web service has been preconfigured:
|
||||
|
||||
- Unsloth - http://<IP>:8888
|
||||
|
||||
You'll need to install Kiln from the following URL - https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1
|
||||
|
||||
## Objective 1 Explore: Public Datasets
|
||||
|
||||
While fine tunes may not have the same level of impact as in the early days of LLMs, they can still provide hyper specialized capabilities to enable small LLMs such as those we've used throughout the course to compete with large, closed LLMs such as ChatGPT and Gemini. For use cases where data needs to be private, where the costs of a closed model are too high, or we want a model that is focused for a specific RAG dataset.
|
||||
|
||||
There are multiple ways to generate a useful dataset, including but not limited to:
|
||||
|
||||
| # | Method | Typical use‑case | Key advantage |
|
||||
| --- | ----------------------------- | ------------------------------------------------------------ | --------------------------------------------- |
|
||||
| 1 | **Manual data collection** | Surveys, interviews, domain‑expert annotation | Highest specificity; fully controlled quality |
|
||||
| 2 | **Web scraping** | Harvesting public articles, forum posts, code snippets | Scalable; leverages existing web content |
|
||||
| 3 | **APIs & databases** | Accessing structured resources (e.g., Wikipedia API, PubMed) | Structured data; often well‑documented |
|
||||
| 4 | **Crowdsourcing** | Large‑scale labeling (e.g., image bounding boxes) | Cost‑effective for repetitive tasks |
|
||||
| 5 | **Data augmentation** | Expanding a small set of images or text | Improves diversity without new collection |
|
||||
| 6 | **Public datasets** | Ready‑made corpora from repositories like HuggingFace | Immediate availability; often pre‑processed |
|
||||
| 7 | **Synthetic data generation** | Simulated sensor readings, procedurally generated text | Useful when real data is scarce or sensitive |
|
||||
|
||||
Let's at least quickly touch on option 6, **Public Datasets**. While they may vary in quality, they're a great way to jumpstart a particular focus for a fine tune. Many are found on https://huggingface.co/datasets, and we can see there are over 400k datasets readily accessible for many different tasks, from many different providers, including [OpenAI](https://huggingface.co/datasets/openai/gsm8k), [Nvidia](https://huggingface.co/datasets/nvidia/Nemotron-CrossThink), and more. Much like with models, there are numerous tools we can utilize to filter these datasets, such as on format, modality, or license.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/kdnBCyL.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Example Datasets.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
#### Explore a dataset (GSM8K)
|
||||
|
||||
Navigate to [GSM8K](https://huggingface.co/datasets/openai/gsm8k). Much like how models have **model cards**, datasets have **dataset cards**. These perform a similar job, providing:
|
||||
|
||||
1. Tags
|
||||
2. Example data & a _Data Studio_ button for interacting with the dataset on **HuggingFace** directly.
|
||||
3. Easy Download Links (although we can also use `git clone`)
|
||||
4. The Description
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/Y55FAPV.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Dataset Model Card Contents.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
At the heart of each data set is the pairing of _input_ and _result_. In the case of math, this is relatively easy, as these are quite literally _question_ and _answer_ pairs to math problems.
|
||||
|
||||
Larger datasets, such as [Fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), utilize more complicated structures, but all still fundamentally follow this same principle. In the case of [Fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), the inputs are titles and summaries of web pages, with links to the precise web page as scraped from the internet.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Explore:</strong> Open the <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb/viewer/sample-10BT/train" target="_blank" rel="noreferrer">Fineweb sample viewer</a> in a new tab and inspect a subset of this <strong>15 trillion token</strong> dataset directly on Hugging Face.
|
||||
</div>
|
||||
|
||||
#### Open‑weight vs. open‑source
|
||||
|
||||
One last note on public datasets. A common misconception is that _open weight_ models are **open source**.
|
||||
|
||||
<br>
|
||||
|
||||
- _Open‑weight_ models (e.g., Gemma, DeepSeek R1, Qwen) provide publicly released checkpoints but **do not** include permissive source‑code licenses.
|
||||
- True **open‑source** LLMs remain rare; there are very few models that freely share their Dataset and Training pipeline. Examples are **INTELLECT‑2**, which was built via a distributed "SETI@Home‑style" effort, or Nvidia's **Nemotron 3** family of models.
|
||||
|
||||
<br>
|
||||
|
||||
Unfortunately, **INTELLECT‑2** does not favorably compare to existing _open weight_ models such as **Gemma**, **DeepSeek R1**, **Qwen**, or other bleeding edge models. **Nemotron 3** also is behind the State of the Art (SOTA) models, but instead serves as a showcase on how anyone can train models using Nvidia hardware.
|
||||
|
||||
Regardless of model type though, when using any _open weight_ model for corporate purposes, review the license for allowed use!
|
||||
|
||||
<br>
|
||||
|
||||
---
|
||||
|
||||
## Objective 2: Synthetic Dataset Generation
|
||||
|
||||
If you can, I strongly encourage you to try and find ready made, or easily massaged datasets that do not require synthetic data. You'll often obtain better results with less effort this way. After all, the original frontier ChatGPT family of models merely scraped the entire internet, every book, scientific papers, and other "pre made" raw data to help generate their first dataset. However, this is often unrealistic, as at minimum, we need **1000** input-output pairs in order to begin fine tuning, so...
|
||||
|
||||
### Why Use Synthetic Data?
|
||||
|
||||
| Reason | Explanation |
|
||||
| ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Data scarcity** | Niche domains (e.g., MITRE ATT&CK classification) often lack ≥ 1 000 labeled examples. |
|
||||
| **Scalability** | A single large model can produce thousands of examples in minutes, saving manual effort. |
|
||||
| **Quality control** | By generating with a _larger_ model than the target (e.g., Gemma‑12B qat → Gemma‑4B), you can distill richer responses within specific domains. |
|
||||
| **Iterative refinement** | Kiln lets you rate or repair each pair, turning noisy outputs into a clean training set. |
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
<strong>Rule of Thumb:</strong> Never generate data with a model that is smaller than the model you plan to fine-tune.
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
### Execute: Install & Launch Kiln AI
|
||||
|
||||
### 1. Install & Launch Kiln AI
|
||||
|
||||
If you haven't yet, download [Kiln AI](https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1) and run the installer for your OS.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Tip:</strong> These steps were designed for <strong>Kiln v0.18</strong>. While compatible with newer versions, v0.18 features a polished, simplified UI ideal for this lab. Note that Kiln undergoes active development with frequent UI changes across versions.
|
||||
</div>
|
||||
|
||||
1. **Open Kiln**. It should automatically go to `http://localhost:3000` in your machine's browser.
|
||||
2. Click **`Get Started`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/hJNehuE.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Welcome screen – click "Get Started".</figcaption>
|
||||
</figure>
|
||||
|
||||
3. Choose **`Continue`** (or **`Skip Tour`** if you prefer).
|
||||
4. Dismiss the newsletter prompt (optional).
|
||||
|
||||
Kiln is now ready for configuration.
|
||||
|
||||
### 2. Connect Kiln to Ollama
|
||||
|
||||
1. In Kiln's left‑hand **Providers** panel, click **`Connect`** under the Ollama entry.
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
Use your Ollama instance IP to connect (I.E. http://<STUDENT IP>:11434). You must be connected to the VPN for this to work.
|
||||
</div>
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/vEwUszl.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Connect to a local or remote Ollama instance.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Click **`Continue`** to confirm the connection.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Tip:</strong> If you have access to a commercial LLM (for example, OpenAI GPT-4o), you can point Kiln to that endpoint for higher-quality synthetic data by replacing the Ollama URL in <strong>Providers → Connect</strong>.
|
||||
</div>
|
||||
---
|
||||
|
||||
### 3. Create a Kiln Project
|
||||
|
||||
1. Kiln will prompt you to **Create a Project**. Enter any descriptive name (e.g., `MITRE‑ATTACK‑FineTune`).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/8CLEp9s.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Name your project.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Press **`Create`**. You are now inside the project workspace.
|
||||
|
||||
---
|
||||
|
||||
### 4. Define the Fine‑Tuning Task
|
||||
|
||||
1. Click **`Add Task`** and fill out the form with the details below.
|
||||
- **Task name:** `ATT&CK Classification`
|
||||
- **Goal:** "Given a description of an attack technique, tactic, or procedure, return only an accurate MITRE ATT&CK ID and Name in the format: "ID# - Technique". "
|
||||
- **System prompt (auto‑filled):** Kiln will prepend this text to every generation request.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/43o2s0Y.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Task definition screen.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Click **`Save Task`**. The task now appears in the left‑hand **Tasks** list.
|
||||
|
||||
---
|
||||
|
||||
### 5. Kiln Main Interface Overview
|
||||
|
||||
| Sidebar item | Primary use |
|
||||
| ------------------ | ---------------------------------------------------------------------------- |
|
||||
| **Run** | Manually generate one input‑output pair at a time (useful for quick checks). |
|
||||
| **Dataset** | View, edit, export, or import the entire collection of pairs. |
|
||||
| **Synthetic Data** | Bulk‑generate pairs using a model of your choice. |
|
||||
| **Evals** | Run automatic evaluation against a held‑out test set. |
|
||||
| **Settings** | Project‑level configuration (e.g., default model, output format). |
|
||||
|
||||
When you first open a project, Kiln lands on the **Run** page.
|
||||
|
||||
---
|
||||
|
||||
## 6 Manual Generation (Run Page)
|
||||
|
||||
1. In the **Run** view, set the parameters as shown below (you may substitute a larger model if your hardware permits).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/vvW0wjk.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Configure the Run settings.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. Type a **scenario description** (e.g., "An attacker dumps LSASS memory using Mimikatz") and click **`Run`**.
|
||||
3. Kiln sends the prompt to the selected Ollama model (by default `gemma3:12b‑it‑qat`).
|
||||
4. When the model returns an answer, you can **rate** it from 1 ★ to 5 ★.
|
||||
|
||||
_5 ★_ → Accept and click **`Next`**.
|
||||
_< 5 ★_ → Click **`Attempt Repair`**, edit the response, then **`Accept Repair`** or **`Reject`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/wqVsYMk.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Rate a correct response with 5 ★.</figcaption>
|
||||
</figure>
|
||||
|
||||
5. Repeat until you have a handful of high‑quality pairs. This manual step is optional but useful for seeding the dataset with "gold‑standard" examples.
|
||||
|
||||
---
|
||||
|
||||
### 7 Bulk Synthetic Data Generation
|
||||
|
||||
#### 7.1 Open the Generator
|
||||
|
||||
1. In the sidebar, click **`Synthetic Data` → `Generate Fine-Tuning Data`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/l6OiUeP.png" width="600"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Enter the bulk‑generation workflow.</figcaption>
|
||||
</figure>
|
||||
|
||||
#### 7.2 Generate Top‑Level Topics
|
||||
|
||||
1. Click **`Add Topics`**. This will generate top level topics that follow broad MITRE ATT&CK categories.
|
||||
2. Choose **`Gemma-3n-2B`**.
|
||||
3. Set **Number of topics** to **8** and click **`Generate`**.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/SHh8v0y.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Select model & number of topics.</figcaption>
|
||||
</figure>
|
||||
|
||||
4. Review the generated list. Delete any unsatisfactory topics (hover → click the trash icon) or click **`Add Topics`** again to generate more. Alternatively, if additoinal depth is required, click **`Add Subtopics`** to drill down deeper into any of the high level topics created by Gemma initially.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/wHNv3Om.png" width="800"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Final set of 8 topics.</figcaption>
|
||||
</figure>
|
||||
|
||||
#### 7.3 Create Input Scenarios for All Topics
|
||||
|
||||
1. With the topics selected, click **`Generate Model Inputs`**. Ensure **`Gemma-3n-2B`** is still chosen, and then affirm your selection.
|
||||
Kiln now asks the model to produce a short _scenario description_ for each topic.
|
||||
2. After the model finishes, review the generated inputs. You may edit any that look off.
|
||||
|
||||
#### 7.4 Generate Corresponding Outputs
|
||||
|
||||
1. Click **`Save All Model Outputs`**. Kiln now runs the model a second time—this time using each generated input as the prompt—to produce the _output_ (the ATT&CK technique label).
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/A47GRVr.png" width="800"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Produce the "output" side and store the pair.</figcaption>
|
||||
</figure>
|
||||
|
||||
2. The full input‑output pairs are automatically added to the project's dataset.
|
||||
|
||||
#### 7.5 Review the Completed Dataset
|
||||
|
||||
1. Switch to the **`Dataset`** tab.
|
||||
2. You should see a table of 64 (8 topics × 8 samples) pairs. Clicking any row opens the same **Run** view, where you can **rate**, **repair**, or **delete** the pair.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/DnyXYJO.png" width="800"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Dataset overview with generated pairs.</figcaption>
|
||||
</figure>
|
||||
|
||||
---
|
||||
|
||||
### 8. Dataset Export (Create a Fine-Tune)
|
||||
|
||||
1. Once you are satisfied with the dataset, you can export it to numerous forms of JSONL via the **Fine Tune → Create a Fine Tune** button.
|
||||
|
||||
2. Kiln will first ask what format it would like our data to be exported to. We can leave the default setting of *Download: OpenAI chat format (JSONL). Next, select *Create a New Fine-Tuning Dataset.\*
|
||||
3. Kiln supports splitting our generated data into a number of buckets, including _`Training`_ _`Test`_ and _`Validation`_. Each of these dataset segments is critical to a great fine tune, but at our generated 64 examples, we don't have the luxury of creating a split. As such, under **`Advanced Options`**, select _100% training_, and click _Create Dataset_.
|
||||
|
||||
<figure style="text-align:center;">
|
||||
<img src="https://i.imgur.com/vp6jobS.png" width="400"
|
||||
style="display:block; margin-left:auto; margin-right:auto; border:5px solid black;">
|
||||
<figcaption>Dataset overview with generated pairs.</figcaption>
|
||||
</figure>
|
||||
|
||||
4. We can ignore all further options, and select _Download Split_. A new .jsonl file will be saved!
|
||||
|
||||
---
|
||||
|
||||
## Objective 3: Fine Tuning with Unsloth Studio
|
||||
|
||||
There are many popular options for performing fine tunes, although many have their drawbacks:
|
||||
|
||||
- [Unsloth](https://unsloth.ai) is the most popular solution, but currently does not support multi-gpu setups without a commercial license.
|
||||
- [Axoltl](https://axolotl.ai) is built off of Unsloth, and does support multi-gpu setups, but often lags behind Unsloth in features and capability, and does not feature any Web UI.
|
||||
- [LLaMaFactory](https://github.com/hiyouga/LLaMA-Factory) is the most flexible of these options, supporting both Unsloth & Axlotle, as well as additional backends. However, this tool is daunting for a beginner to approach fine tuning, and is best left for later experimentation.
|
||||
<br>
|
||||
While I encourage you to explore all of these tools, they are unfortunately out of the scope for this lab. Instead, we're going to focus on **Unsloth**, as it provides the best web UI to easily navigate the fine tuning process.
|
||||
|
||||
### Explore: Touring Unsloth Studio
|
||||
|
||||
Although Unsloth Studio does its best to simplify the fine tuning process, there are still many dials and knobs to turn! Lets take a brief tour of the most important options:
|
||||
|
||||
1. Model Selection - This area allows us to select any model that we're interested in fine tuning. Unsloth Studio will handle downloading the FP16 version of the model from **HuggingFace** for us.
|
||||
2. Quantization Selection - Without much better hardware, we will usually be training **LoRA**s (Low-Rank Adapters). These will slightly nudge the parameters of the model in the direction we're interested in. If we need additional headroom, we can instead **quantize the base model** (e.g., reduce its precision from 16-bit to 4-bit) and then apply **LoRA** to the quantized model, generating a **QLoRA** (Quantized LoRA). This approach combines the efficiency of quantization with the parameter-efficiency of LoRA. Unsloth will conveniently tell us its estimate for how well a given combination of _Model_ & **QLoRA** will fit in our system's available VRAM.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/XwAdaKJ.png"
|
||||
width="800"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Model & LoRA Type Selections. Note how models are labeled "OOM" or "Tight" based on hardware.
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
3. Dataset Selection - This is where we can utilize our custom made dataset. Unfortunately, while we've gone through the process of making a dataset, we had to use a very small model to simulate the process. Conveniently, Unsloth allows us to search for any dataset available publicly on HuggingFace. We can select conveniently select the sarahwei/cyber_MITRE_CTI_dataset_v15 for our purposes. You can select "View Dataset" if you'd like to see some of the raw contents of this data.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/8xBdcnd.png"
|
||||
width="400"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Dataset Selection
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
4. Train Settings - This is where we can configure exactly how our model will be trained. The majority of these settings can stay default, until you've a specific need that pushes you down the rabbit hole. In particular, we'll be interested in
|
||||
- **Learning Rate** - Controls how large an adjustment to the model's weights are made during each step
|
||||
- **Epoch** - Determines the number of times the training algorithm will iterate over the entire dataset (aka repeats training 3 times by default). Critical to help avoid under or over fitting.
|
||||
- **Cutoff length** - Equivalent to Ollama's context. As always, larger context training requires more memory.
|
||||
- **Batch Size** - Can speed up training, as long as we have the hardware to support.
|
||||
- **Warmup Steps** - The number of initial training steps during which the learning rate gradually increases to the set target. Helps with stability.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/fzSvggY.png"
|
||||
width="400"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Fine Tuning Settings
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
### Execute: Unsloth Studio Fine Tuning
|
||||
|
||||
Set the following before we start to fine tune Gemma:
|
||||
|
||||
1. **Model**: `unsloth/gemma-3-270m-it`
|
||||
2. **Max Steps**: `100` (NOTE: For real fine tuning, use Epochs, not Steps.)
|
||||
3. **Learning Rate**: `0.00005`
|
||||
4. **Dataset**: `sarahwei/cyber_MITRE_CTI_dataset_v15`
|
||||
5. **Warmup Steps**: `100`
|
||||
|
||||
- Scroll to the bottom of the page, and click `Preview command`. The WebUI is merely a front end for constructuing `llamafactory-cli` commands, and this shows exactly what will be run.
|
||||
- When done reviewing, next click `Start`. It will take some time for Unsloth Studio to start its process, as it will first need to download the full `FP16` raw `Gemma-3-4B` model files.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/fzSvggY.png"
|
||||
width="400"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Setting Max Steps, Learning Rate, and Warmup Steps
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
**Monitor the loss graph** | The graph is measuring **Loss** per **Training step** (roughly 8k steps, 2.5k examples \* 3 epochs), or put simply, how different the model's predicted answer is from our data. This should gradually, logarithmically slope downwards if training is stable.
|
||||
|
||||
#### What to Look For
|
||||
|
||||
- **Training Loss:** Decreasing smoothly → model is learning effectively and training is stable
|
||||
- **Gradient Norm:** Drops then stabilizes → gradients are well-behaved (no major spikes)
|
||||
- **Learning Rate:** Gradually increasing, then eventually decreasing → expected warmup behavior helping stable early training
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/Cue7afQ.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Typical Training Run
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
Unfortunately, due to the time constraints of a live classroom, we'll be unable to pursue this training run to completion. On the lab provided GPUs, a full Epoch could take up to two hours! Feel free to cancel it at your leisure.
|
||||
|
||||
We can however chat with a version of Gemma 3 4B that was trained before this class. It was trained against roughly 60,000 examples, partially generated using kiln, partially harvested from various datasets throughout Huggingface. While not perfect, we can see that the model is signifigantly better than the default.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/FKZXaV3.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Load Model for Chat
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
To test this ourselves, select:
|
||||
|
||||
1. The chat button at the very top of the screen
|
||||
2. Download our model. Its under my personal HuggingFace Account name, c4ch3c4d3
|
||||
3. Set the system prompt to the one we selected when using **Kiln.ai** - "Given a description of an attack technique, tactic, or procedure, return only an accurate MITRE ATT&CK ID and Name in the format: "ID# - Technique".
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<img
|
||||
src="https://i.imgur.com/GHExjE3.png"
|
||||
width="600"
|
||||
style="display: block; margin-left: auto; margin-right: auto; border: 5px solid black;">
|
||||
<figcaption style="margin-top: 8px; font-size: 1.1em; ">
|
||||
Test prompt
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
| Test Prompt | Expected Output Format |
|
||||
| ---------------------------------------------------------------------------- | -------------------------------------------- |
|
||||
| "A malicious actor uses PowerShell to download a file from a remote server." | `T1059.001 – PowerShell` |
|
||||
| "The adversary exfiltrates data via a compressed archive sent over HTTP." | `T1567.001 – Exfiltration Over Web Services` |
|
||||
| "Credential dumping is performed using Mimikatz." | `T1003.001 – LSASS Memory` |
|
||||
|
||||
The Unsloth chat view is relatively simplistic, but does provide options for changing inference perameters, such as Top-P or Temperature, as well as a location for us to input our system prompt. If we're looking to test the model's accuracy with our fine tune, we normally need to ensure these values match the desired endstate values as closely as possible.
|
||||
|
||||
### Export the Fine‑Tuned Model
|
||||
|
||||
<div class="lab-callout lab-callout--warning">
|
||||
<strong>Skippable:</strong> These steps are provided for reference as we never successfully finished a fine tune within the lab time period.
|
||||
</div>
|
||||
|
||||
1. Switch to the **Export** tab.
|
||||
2. Select the training run of the model you've performed.
|
||||
3. Select the latest checkpoint, or if you'd like to explore an alternative, the checkpoint desired.
|
||||
4. We can export in a number of formats:
|
||||
- **Merged Model** – A BF16 .safetensors format of the model which can be utilized in other projects
|
||||
- **LORA** – Only export the LORA adapter layers generated during training. Useful if we wish to share only our new files with other users who already have the model downloaded, but not our fine tune.
|
||||
- **GGUF** – A compact file ready for import into **Ollama** or other GGUF‑compatible runtimes.
|
||||
|
||||
<br>
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
In this lab, we completed a LoRA fine-tuning workflow:
|
||||
|
||||
1. **Dataset Generation** - We explored public datasets on HuggingFace and used Kiln AI to generate a synthetic dataset for MITRE ATT&CK classification.
|
||||
2. **Fine Tuning** - We used Unsloth Studio to fine-tune Gemma-3-4B on our generated dataset.
|
||||
3. **Validation & Export** - We tested the model with sample prompts and exported the fine-tuned model in both FP16 and GGUF formats.
|
||||
|
||||
If all has gone well, then the model should be much more accurate at identifying MITRE ATT&CK codes from user input scenarios. If not, additional experimentation may be necessary to produce a good fine tune. Playing with the parameters we've discussed, improving and expanding our dataset, or even fine tuning a larger or better base model can also help affect our success rate.
|
||||
|
Before Width: | Height: | Size: 55 KiB After Width: | Height: | Size: 55 KiB |
|
Before Width: | Height: | Size: 55 KiB After Width: | Height: | Size: 55 KiB |
|
Before Width: | Height: | Size: 55 KiB After Width: | Height: | Size: 55 KiB |
|
Before Width: | Height: | Size: 55 KiB After Width: | Height: | Size: 55 KiB |
@@ -1,12 +1,19 @@
|
||||
---
|
||||
order: 7
|
||||
title: Lab 7 - Evaluation and Red Teaming
|
||||
description: Probe model defenses manually and with Promptfoo to evaluate security controls.
|
||||
---
|
||||
|
||||
<!-- breakout-style: instruction-rails -->
|
||||
<!-- step-style: underline -->
|
||||
<!-- objective-style: divider -->
|
||||
|
||||
# Lab 6 - Evaluation and Red Teaming
|
||||
# Lab 7 - Evaluation and Red Teaming
|
||||
|
||||
In this lab, we will:
|
||||
* Perform prompt injection against three layers of model protection
|
||||
* Use Promptfoo to programmatically evaluate a model's security protections
|
||||
|
||||
- Perform prompt injection against three layers of model protection
|
||||
- Use Promptfoo to programmatically evaluate a model's security protections
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Lab Flow Guide</strong><br />
|
||||
@@ -15,10 +22,12 @@ In this lab, we will:
|
||||
</div>
|
||||
|
||||
To start this lab, one web service has been preconfigured:
|
||||
* Promptfoo - http://<IP>:15500
|
||||
|
||||
- Promptfoo - http://<IP>:15500
|
||||
|
||||
You'll also need to access:
|
||||
* Open WebUI - https://ai.zuccaro.me/
|
||||
|
||||
- Open WebUI - https://ai.zuccaro.me/
|
||||
|
||||
## Objective 1 Explore: Direct Prompt Injection
|
||||
|
||||
@@ -38,8 +47,8 @@ Each level will be more difficult than the last, based on how the protection int
|
||||
|
||||
To access the lab, navigate to https://ai.zuccaro.me and log in with the following credentials:
|
||||
|
||||
* `Username` - `student@zuccaro.me`
|
||||
* `Password` - `Student9205!`
|
||||
- `Username` - `student@zuccaro.me`
|
||||
- `Password` - `Student9205!`
|
||||
|
||||
<br>
|
||||
|
||||
@@ -88,7 +97,7 @@ Promptfoo is available on our lab machine at http://<YOUR STUDENT IP>:15500. We
|
||||
Promptfoo is designed to be approachable for both beginners and practitioners. Its wizard guides you through configuring the target, selecting datasets and mutation strategies, and tracking execution.
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Tip:</strong> Although the Promptfoo WebUI is convenient, it hides a critical configuration option for this lab inside the YAML file. Please use the provided configuration file: [lab-6-evaluation-and-red-teaming/promptfoo.yaml](content/labs/lab-6-evaluation-and-red-teaming/promptfoo.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
|
||||
<strong>Tip:</strong> Although the Promptfoo WebUI is convenient, it hides a critical configuration option for this lab inside the YAML file. Please use the provided configuration file: [lab-7-evaluation-and-red-teaming/promptfoo.yaml](content/labs/lab-7-evaluation-and-red-teaming/promptfoo.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
|
||||
</div>
|
||||
|
||||
<figure style="text-align: center;">
|
||||
@@ -139,7 +148,6 @@ Promptfoo is designed to be approachable for both beginners and practitioners. I
|
||||
</figure>
|
||||
<br>
|
||||
|
||||
|
||||
Once we select `Start`, Promptfoo handles the rest. Mutations, tests, and results are all tracked by the WebUI. Promptfoo runs can take a significant amount of time, but when they finish you will be presented with a new results screen.
|
||||
|
||||
<figure style="text-align: center;">
|
||||
@@ -159,10 +167,9 @@ Promptfoo is highly flexible. Anything that involves mass evaluation of prompts
|
||||
### Explore: Promptfoo evaluation workflow
|
||||
|
||||
<div class="lab-callout lab-callout--info">
|
||||
<strong>Tip:</strong> Please use the provided evaluation configuration file: [lab-6-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml](content/labs/lab-6-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
|
||||
<strong>Tip:</strong> Please use the provided evaluation configuration file: [lab-7-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml](content/labs/lab-7-evaluation-and-red-teaming/mmlu-promptfoo-config.yaml). Upload it with the <strong>Load Config</strong> button in the lower-left corner, then proceed with the following screenshot steps.
|
||||
</div>
|
||||
|
||||
|
||||
<figure style="text-align: center;">
|
||||
<a href="https://i.imgur.com/23iFYNo.png" target="_blank">
|
||||
<img
|
||||
|
Before Width: | Height: | Size: 68 KiB After Width: | Height: | Size: 68 KiB |
|
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 88 KiB |
|
Before Width: | Height: | Size: 91 KiB After Width: | Height: | Size: 91 KiB |
|
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 47 KiB |
|
Before Width: | Height: | Size: 200 KiB After Width: | Height: | Size: 200 KiB |
|
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 103 KiB |
|
Before Width: | Height: | Size: 140 KiB After Width: | Height: | Size: 140 KiB |
|
Before Width: | Height: | Size: 163 KiB After Width: | Height: | Size: 163 KiB |
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 25 KiB |
@@ -8,12 +8,14 @@
|
||||
"build": "next build",
|
||||
"start": "next start",
|
||||
"lint": "next lint",
|
||||
"test": "vitest run",
|
||||
"typecheck": "tsc --noEmit",
|
||||
"check": "npm run lint && npm run typecheck",
|
||||
"format:check": "prettier --check \"**/*.{ts,tsx,js,jsx,md,mdx}\" --cache",
|
||||
"format:write": "prettier --write \"**/*.{ts,tsx,js,jsx,md,mdx}\" --cache"
|
||||
},
|
||||
"dependencies": {
|
||||
"@xmldom/xmldom": "^0.9.9",
|
||||
"gray-matter": "^4.0.3",
|
||||
"micromark": "^4.0.2",
|
||||
"next": "^15.0.1",
|
||||
@@ -21,15 +23,20 @@
|
||||
"react-dom": "^18.3.1"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@testing-library/jest-dom": "^6.9.1",
|
||||
"@testing-library/react": "^16.3.2",
|
||||
"@types/node": "^20.14.10",
|
||||
"@types/react": "^18.3.3",
|
||||
"@types/react-dom": "^18.3.0",
|
||||
"@vitejs/plugin-react": "^6.0.1",
|
||||
"eslint": "^8.57.1",
|
||||
"eslint-config-next": "^15.0.1",
|
||||
"jsdom": "^29.0.1",
|
||||
"postcss": "^8.4.39",
|
||||
"prettier": "^3.3.2",
|
||||
"prettier-plugin-tailwindcss": "^0.6.5",
|
||||
"tailwindcss": "^3.4.3",
|
||||
"typescript": "^5.5.3"
|
||||
"typescript": "^5.5.3",
|
||||
"vitest": "^4.1.2"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,228 @@
|
||||
import { NextResponse } from "next/server";
|
||||
import {
|
||||
buildUpstreamMessages,
|
||||
extractAssistantTextContent,
|
||||
extractObjective5Metrics,
|
||||
extractSvgMarkup,
|
||||
isLocalEndpoint,
|
||||
looksLikeOllamaModel,
|
||||
normalizeOllamaChatEndpoint,
|
||||
normalizeUpstreamChatEndpoint,
|
||||
sanitizeSvgDocument,
|
||||
type Objective5Message,
|
||||
} from "~/lib/lab2-chat";
|
||||
|
||||
type ChatRouteRequestBody = {
|
||||
apiKey?: string;
|
||||
endpoint?: string;
|
||||
messages?: Objective5Message[];
|
||||
model?: string;
|
||||
};
|
||||
|
||||
const OPENAI_UPSTREAM_TIMEOUT_MS = 30000;
|
||||
const OLLAMA_UPSTREAM_TIMEOUT_MS = 90000;
|
||||
const LOCAL_OPENAI_UPSTREAM_TIMEOUT_MS = 90000;
|
||||
|
||||
export async function POST(request: Request) {
|
||||
let body: ChatRouteRequestBody;
|
||||
|
||||
try {
|
||||
body = (await request.json()) as ChatRouteRequestBody;
|
||||
} catch {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "The request body must be valid JSON.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
const endpoint = body.endpoint?.trim();
|
||||
const apiKey = body.apiKey?.trim();
|
||||
const model = body.model?.trim();
|
||||
|
||||
if (!endpoint) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "An endpoint is required.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
if (!apiKey && !isLocalEndpoint(endpoint)) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "An API key is required for remote endpoints.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
if (!model) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "A model is required.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
if (!Array.isArray(body.messages) || body.messages.length === 0) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "At least one chat message is required.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
const useOllamaChat = looksLikeOllamaModel(model);
|
||||
const useLocalOpenAI = !useOllamaChat && isLocalEndpoint(endpoint);
|
||||
let upstreamUrl: string;
|
||||
try {
|
||||
upstreamUrl = useOllamaChat
|
||||
? normalizeOllamaChatEndpoint(endpoint)
|
||||
: normalizeUpstreamChatEndpoint(endpoint);
|
||||
} catch {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "The endpoint must be a valid URL.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
const upstreamTimeoutMs = useOllamaChat
|
||||
? OLLAMA_UPSTREAM_TIMEOUT_MS
|
||||
: useLocalOpenAI
|
||||
? LOCAL_OPENAI_UPSTREAM_TIMEOUT_MS
|
||||
: OPENAI_UPSTREAM_TIMEOUT_MS;
|
||||
const controller = new AbortController();
|
||||
const timeoutId = setTimeout(() => controller.abort(), upstreamTimeoutMs);
|
||||
|
||||
try {
|
||||
const upstreamResponse = await fetch(upstreamUrl, {
|
||||
body: JSON.stringify(
|
||||
useOllamaChat
|
||||
? {
|
||||
messages: buildUpstreamMessages(body.messages),
|
||||
model,
|
||||
stream: false,
|
||||
}
|
||||
: {
|
||||
messages: buildUpstreamMessages(body.messages),
|
||||
model,
|
||||
stream: false,
|
||||
temperature: 0.8,
|
||||
},
|
||||
),
|
||||
headers: {
|
||||
...(apiKey
|
||||
? {
|
||||
Authorization: `Bearer ${apiKey}`,
|
||||
}
|
||||
: {}),
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
method: "POST",
|
||||
signal: controller.signal,
|
||||
});
|
||||
|
||||
const responseText = await upstreamResponse.text();
|
||||
let parsedBody: unknown = null;
|
||||
|
||||
try {
|
||||
parsedBody = JSON.parse(responseText);
|
||||
} catch {
|
||||
parsedBody = null;
|
||||
}
|
||||
|
||||
if (!upstreamResponse.ok) {
|
||||
const message =
|
||||
typeof parsedBody === "object" &&
|
||||
parsedBody !== null &&
|
||||
"error" in parsedBody &&
|
||||
typeof parsedBody.error === "object" &&
|
||||
parsedBody.error !== null &&
|
||||
"message" in parsedBody.error &&
|
||||
typeof parsedBody.error.message === "string"
|
||||
? parsedBody.error.message
|
||||
: `The upstream endpoint returned ${upstreamResponse.status}.`;
|
||||
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: message,
|
||||
},
|
||||
{ status: upstreamResponse.status },
|
||||
);
|
||||
}
|
||||
|
||||
if (!parsedBody || typeof parsedBody !== "object") {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "The upstream endpoint returned an unreadable response.",
|
||||
},
|
||||
{ status: 502 },
|
||||
);
|
||||
}
|
||||
|
||||
const content = extractAssistantTextContent(parsedBody);
|
||||
const metrics = extractObjective5Metrics(parsedBody);
|
||||
if (!content) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "The upstream endpoint returned no assistant content.",
|
||||
},
|
||||
{ status: 502 },
|
||||
);
|
||||
}
|
||||
|
||||
const svgMarkup = extractSvgMarkup(content);
|
||||
if (!svgMarkup) {
|
||||
return NextResponse.json({
|
||||
content,
|
||||
metrics,
|
||||
renderMode: "text",
|
||||
role: "assistant",
|
||||
});
|
||||
}
|
||||
|
||||
const sanitizedSvg = sanitizeSvgDocument(svgMarkup);
|
||||
if (!sanitizedSvg.ok) {
|
||||
return NextResponse.json({
|
||||
content,
|
||||
error: `${sanitizedSvg.error} Showing the raw response instead.`,
|
||||
metrics,
|
||||
renderMode: "text",
|
||||
role: "assistant",
|
||||
});
|
||||
}
|
||||
|
||||
return NextResponse.json({
|
||||
content,
|
||||
metrics,
|
||||
renderMode: "svg",
|
||||
role: "assistant",
|
||||
svg: sanitizedSvg.svg,
|
||||
});
|
||||
} catch (caughtError) {
|
||||
if (caughtError instanceof Error && caughtError.name === "AbortError") {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: `The upstream endpoint timed out after ${Math.floor(upstreamTimeoutMs / 1000)} seconds.`,
|
||||
},
|
||||
{ status: 504 },
|
||||
);
|
||||
}
|
||||
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "The chat request could not reach the upstream endpoint.",
|
||||
},
|
||||
{ status: 502 },
|
||||
);
|
||||
} finally {
|
||||
clearTimeout(timeoutId);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,108 @@
|
||||
import { NextResponse } from "next/server";
|
||||
import {
|
||||
extractModelOptions,
|
||||
getDefaultObjective5ModelOptions,
|
||||
getModelListEndpointCandidates,
|
||||
isLocalEndpoint,
|
||||
} from "~/lib/lab2-chat";
|
||||
|
||||
type ModelsRouteRequestBody = {
|
||||
apiKey?: string;
|
||||
endpoint?: string;
|
||||
};
|
||||
|
||||
const MODELS_TIMEOUT_MS = 15000;
|
||||
|
||||
export async function POST(request: Request) {
|
||||
let body: ModelsRouteRequestBody;
|
||||
|
||||
try {
|
||||
body = (await request.json()) as ModelsRouteRequestBody;
|
||||
} catch {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "The request body must be valid JSON.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
const endpoint = body.endpoint?.trim();
|
||||
const apiKey = body.apiKey?.trim();
|
||||
|
||||
if (!endpoint) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "An endpoint is required.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
if (!apiKey && !isLocalEndpoint(endpoint)) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "An API key is required for remote endpoints.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
let candidates: string[];
|
||||
try {
|
||||
candidates = getModelListEndpointCandidates(endpoint);
|
||||
} catch {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "The endpoint must be a valid URL.",
|
||||
},
|
||||
{ status: 400 },
|
||||
);
|
||||
}
|
||||
|
||||
const headers = {
|
||||
...(apiKey
|
||||
? {
|
||||
Authorization: `Bearer ${apiKey}`,
|
||||
}
|
||||
: {}),
|
||||
};
|
||||
|
||||
for (const candidate of candidates) {
|
||||
const controller = new AbortController();
|
||||
const timeoutId = setTimeout(() => controller.abort(), MODELS_TIMEOUT_MS);
|
||||
|
||||
try {
|
||||
const response = await fetch(candidate, {
|
||||
headers,
|
||||
method: "GET",
|
||||
signal: controller.signal,
|
||||
});
|
||||
|
||||
const responseText = await response.text();
|
||||
const parsed = JSON.parse(responseText) as unknown;
|
||||
|
||||
if (!response.ok) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const models = extractModelOptions(parsed);
|
||||
return NextResponse.json({
|
||||
models:
|
||||
models.length > 0 ? models : getDefaultObjective5ModelOptions(),
|
||||
});
|
||||
} catch {
|
||||
continue;
|
||||
} finally {
|
||||
clearTimeout(timeoutId);
|
||||
}
|
||||
}
|
||||
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "Could not load models from the endpoint.",
|
||||
models: getDefaultObjective5ModelOptions(),
|
||||
},
|
||||
{ status: 502 },
|
||||
);
|
||||
}
|
||||
@@ -31,11 +31,9 @@ export default function HomePage() {
|
||||
</section>
|
||||
|
||||
<section className="mt-8">
|
||||
<h2 className="mb-4 text-xl font-semibold text-[#004E78]">
|
||||
Recent Labs
|
||||
</h2>
|
||||
<h2 className="mb-4 text-xl font-semibold text-[#004E78]">All Labs</h2>
|
||||
<div className="grid gap-4 md:grid-cols-2">
|
||||
{labs.slice(0, 6).map((lab) => (
|
||||
{labs.map((lab) => (
|
||||
<Link
|
||||
key={lab.slug}
|
||||
href={`/labs/${lab.slug}`}
|
||||
|
||||
@@ -1,16 +1,21 @@
|
||||
"use client";
|
||||
|
||||
import { useEffect, useRef, useState } from "react";
|
||||
import { Fragment, useEffect, useRef, useState } from "react";
|
||||
import { Objective5Chat } from "~/components/labs/Objective5Chat";
|
||||
import { QuantizationGridExplorer } from "~/components/labs/QuantizationGridExplorer";
|
||||
import { QuantizationExplorer } from "~/components/labs/QuantizationExplorer";
|
||||
|
||||
type LabContentProps = {
|
||||
className: string;
|
||||
html: string;
|
||||
};
|
||||
|
||||
const cliLanguagePattern = /\b(language-(bash|sh|shell|zsh|console|terminal)|bash|shell|zsh)\b/i;
|
||||
const cliLanguagePattern =
|
||||
/\b(language-(bash|sh|shell|zsh|console|terminal)|bash|shell|zsh)\b/i;
|
||||
const cliCommandPattern =
|
||||
/(^|\n)\s*(\$|sudo\s|git\s|python3?\s|pip\s|npm\s|pnpm\s|yarn\s|llama-|ollama\s|curl\s|wget\s|apt\s|cd\s|ls\s|cat\s|cp\s|mv\s|chmod\s|make\s)/i;
|
||||
const promptLanguagePattern = /\b(language-(text|plaintext|md|markdown)|text|plaintext|markdown)\b/i;
|
||||
const promptLanguagePattern =
|
||||
/\b(language-(text|plaintext|md|markdown)|text|plaintext|markdown)\b/i;
|
||||
const promptSignalPattern =
|
||||
/\b(you are|guidelines|follow these|example|when provided|system prompt|tasked with)\b/i;
|
||||
|
||||
@@ -24,9 +29,16 @@ type ZoomedImageState = {
|
||||
alt: string;
|
||||
};
|
||||
|
||||
const quantizationExplorerToken = "<div data-quantization-explorer></div>";
|
||||
const quantizationGridExplorerToken =
|
||||
"<div data-quantization-grid-explorer></div>";
|
||||
const objective5ChatToken = "<div data-objective5-chat></div>";
|
||||
|
||||
function looksLikeCliCommand(commandText: string, className: string) {
|
||||
if (cliLanguagePattern.test(className)) return true;
|
||||
return cliCommandPattern.test(commandText) || /--[a-z0-9-]+/i.test(commandText);
|
||||
return (
|
||||
cliCommandPattern.test(commandText) || /--[a-z0-9-]+/i.test(commandText)
|
||||
);
|
||||
}
|
||||
|
||||
function looksLikePromptTextBlock(text: string, className: string) {
|
||||
@@ -36,7 +48,8 @@ function looksLikePromptTextBlock(text: string, className: string) {
|
||||
if (!normalizedText) return false;
|
||||
|
||||
const lineCount = normalizedText.split("\n").length;
|
||||
if (promptLanguagePattern.test(className) && normalizedText.length > 80) return true;
|
||||
if (promptLanguagePattern.test(className) && normalizedText.length > 80)
|
||||
return true;
|
||||
if (lineCount >= 4 && promptSignalPattern.test(normalizedText)) return true;
|
||||
if (lineCount >= 6 && /(^|\n)\s*[*-]\s+/.test(normalizedText)) return true;
|
||||
return false;
|
||||
@@ -63,7 +76,9 @@ function parseSettingListItem(item: HTMLLIElement): ParsedSetting | null {
|
||||
if (!key || key.length > 40) return null;
|
||||
|
||||
const text = (item.textContent ?? "").replace(/\s+/g, " ").trim();
|
||||
const match = new RegExp(`^${escapeRegex(key)}\\s*(?:-|–|—|:|=)\\s*(.+)$`).exec(text);
|
||||
const match = new RegExp(
|
||||
`^${escapeRegex(key)}\\s*(?:-|–|—|:|=)\\s*(.+)$`,
|
||||
).exec(text);
|
||||
if (!match) return null;
|
||||
|
||||
const value = (match[1] ?? "").replace(/\s+/g, " ").trim();
|
||||
@@ -79,17 +94,22 @@ function enhanceSettingsLists(root: HTMLElement) {
|
||||
for (const list of lists) {
|
||||
if (list.dataset.settingsEnhanced === "true") continue;
|
||||
|
||||
const items = Array.from(list.children).filter((node): node is HTMLLIElement => {
|
||||
const items = Array.from(list.children).filter(
|
||||
(node): node is HTMLLIElement => {
|
||||
return node.tagName === "LI";
|
||||
});
|
||||
},
|
||||
);
|
||||
if (items.length < 2) continue;
|
||||
|
||||
const parsedItems = items.map((item) => parseSettingListItem(item));
|
||||
if (parsedItems.some((parsedItem) => parsedItem === null)) continue;
|
||||
|
||||
const settings = parsedItems as ParsedSetting[];
|
||||
const compactValueCount = settings.filter((setting) => setting.value.length <= 20).length;
|
||||
if (compactValueCount < Math.max(2, Math.ceil(settings.length * 0.66))) continue;
|
||||
const compactValueCount = settings.filter(
|
||||
(setting) => setting.value.length <= 20,
|
||||
).length;
|
||||
if (compactValueCount < Math.max(2, Math.ceil(settings.length * 0.66)))
|
||||
continue;
|
||||
|
||||
list.dataset.settingsEnhanced = "true";
|
||||
list.classList.add("lab-settings-list");
|
||||
@@ -126,11 +146,16 @@ async function copyTextToClipboard(text: string) {
|
||||
return;
|
||||
}
|
||||
|
||||
const activeElement = document.activeElement instanceof HTMLElement ? document.activeElement : null;
|
||||
const activeElement =
|
||||
document.activeElement instanceof HTMLElement
|
||||
? document.activeElement
|
||||
: null;
|
||||
const selection = document.getSelection();
|
||||
const previousRanges =
|
||||
selection && selection.rangeCount > 0
|
||||
? Array.from({ length: selection.rangeCount }, (_, index) => selection.getRangeAt(index).cloneRange())
|
||||
? Array.from({ length: selection.rangeCount }, (_, index) =>
|
||||
selection.getRangeAt(index).cloneRange(),
|
||||
)
|
||||
: [];
|
||||
|
||||
const textarea = document.createElement("textarea");
|
||||
@@ -169,6 +194,38 @@ export function LabContent({ className, html }: LabContentProps) {
|
||||
const containerRef = useRef<HTMLElement>(null);
|
||||
const [zoomedImage, setZoomedImage] = useState<ZoomedImageState | null>(null);
|
||||
|
||||
const renderedContent = html
|
||||
.split(
|
||||
new RegExp(
|
||||
`(${escapeRegex(quantizationExplorerToken)}|${escapeRegex(quantizationGridExplorerToken)}|${escapeRegex(objective5ChatToken)})`,
|
||||
"g",
|
||||
),
|
||||
)
|
||||
.filter(Boolean)
|
||||
.map((part, index) => {
|
||||
if (part === quantizationExplorerToken) {
|
||||
return <QuantizationExplorer key={`quantization-explorer-${index}`} />;
|
||||
}
|
||||
|
||||
if (part === quantizationGridExplorerToken) {
|
||||
return (
|
||||
<QuantizationGridExplorer
|
||||
key={`quantization-grid-explorer-${index}`}
|
||||
/>
|
||||
);
|
||||
}
|
||||
|
||||
if (part === objective5ChatToken) {
|
||||
return <Objective5Chat key={`objective5-chat-${index}`} />;
|
||||
}
|
||||
|
||||
return (
|
||||
<Fragment key={`html-segment-${index}`}>
|
||||
<div dangerouslySetInnerHTML={{ __html: part }} />
|
||||
</Fragment>
|
||||
);
|
||||
});
|
||||
|
||||
useEffect(() => {
|
||||
const root = containerRef.current;
|
||||
if (!root) return;
|
||||
@@ -195,7 +252,9 @@ export function LabContent({ className, html }: LabContentProps) {
|
||||
|
||||
const handleRootClick = (event: Event) => {
|
||||
const target = event.target as HTMLElement;
|
||||
const button = target.closest<HTMLButtonElement>("button.lab-copy-button");
|
||||
const button = target.closest<HTMLButtonElement>(
|
||||
"button.lab-copy-button",
|
||||
);
|
||||
if (button) {
|
||||
const pre = button.closest("pre");
|
||||
const code = pre?.querySelector("code");
|
||||
@@ -203,14 +262,16 @@ export function LabContent({ className, html }: LabContentProps) {
|
||||
if (!commandText) return;
|
||||
const defaultLabel = button.dataset.defaultLabel ?? "Copy";
|
||||
|
||||
void copyTextToClipboard(commandText).then(() => {
|
||||
void copyTextToClipboard(commandText)
|
||||
.then(() => {
|
||||
button.textContent = "Copied";
|
||||
button.classList.add("is-copied");
|
||||
window.setTimeout(() => {
|
||||
button.textContent = defaultLabel;
|
||||
button.classList.remove("is-copied");
|
||||
}, 1200);
|
||||
}).catch(() => {
|
||||
})
|
||||
.catch(() => {
|
||||
button.textContent = "Failed";
|
||||
window.setTimeout(() => {
|
||||
button.textContent = defaultLabel;
|
||||
@@ -246,7 +307,8 @@ export function LabContent({ className, html }: LabContentProps) {
|
||||
document.body.style.overflow = "hidden";
|
||||
|
||||
const activeElement = document.activeElement;
|
||||
const previousFocusedElement = activeElement instanceof HTMLElement ? activeElement : null;
|
||||
const previousFocusedElement =
|
||||
activeElement instanceof HTMLElement ? activeElement : null;
|
||||
|
||||
const handleEscape = (event: KeyboardEvent) => {
|
||||
if (event.key === "Escape") {
|
||||
@@ -264,20 +326,25 @@ export function LabContent({ className, html }: LabContentProps) {
|
||||
|
||||
return (
|
||||
<>
|
||||
<article
|
||||
ref={containerRef}
|
||||
className={className}
|
||||
dangerouslySetInnerHTML={{ __html: html }}
|
||||
/>
|
||||
<article ref={containerRef} className={className}>
|
||||
{renderedContent}
|
||||
</article>
|
||||
{zoomedImage ? (
|
||||
<div
|
||||
className="lab-image-modal"
|
||||
role="presentation"
|
||||
onClick={() => setZoomedImage(null)}
|
||||
>
|
||||
<div className="lab-image-modal__surface" onClick={(event) => event.stopPropagation()}>
|
||||
<div
|
||||
className="lab-image-modal__surface"
|
||||
onClick={(event) => event.stopPropagation()}
|
||||
>
|
||||
{/* eslint-disable-next-line @next/next/no-img-element */}
|
||||
<img className="lab-image-modal__image" src={zoomedImage.src} alt={zoomedImage.alt} />
|
||||
<img
|
||||
className="lab-image-modal__image"
|
||||
src={zoomedImage.src}
|
||||
alt={zoomedImage.alt}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
) : null}
|
||||
|
||||
@@ -0,0 +1,177 @@
|
||||
import { fireEvent, render, screen, waitFor } from "@testing-library/react";
|
||||
import { beforeEach, describe, expect, it, vi } from "vitest";
|
||||
import { Objective5Chat } from "~/components/labs/Objective5Chat";
|
||||
import {
|
||||
LAB2_CHAT_STORAGE_KEY,
|
||||
LAB2_CUSTOM_MODEL_VALUE,
|
||||
} from "~/lib/lab2-chat";
|
||||
|
||||
describe("Objective5Chat", () => {
|
||||
beforeEach(() => {
|
||||
window.localStorage.clear();
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
function mockFetch() {
|
||||
const fetchMock = vi.fn(async (input: RequestInfo | URL) => {
|
||||
const url = String(input);
|
||||
|
||||
if (url === "/api/lab2/models") {
|
||||
return {
|
||||
json: async () => ({
|
||||
models: [
|
||||
{ label: "LM Studio Qwen", value: "qwen3.5-9b-mlx" },
|
||||
{ label: "Custom model", value: LAB2_CUSTOM_MODEL_VALUE },
|
||||
],
|
||||
}),
|
||||
ok: true,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
json: async () => ({
|
||||
content: "Q8_0 stayed the most coherent in this run.",
|
||||
metrics: {
|
||||
completionTokens: 451,
|
||||
tokensPerSecond: 14.4,
|
||||
},
|
||||
renderMode: "text",
|
||||
role: "assistant",
|
||||
}),
|
||||
ok: true,
|
||||
};
|
||||
});
|
||||
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
return fetchMock;
|
||||
}
|
||||
|
||||
it("loads persisted settings from localStorage", async () => {
|
||||
mockFetch();
|
||||
|
||||
window.localStorage.setItem(
|
||||
LAB2_CHAT_STORAGE_KEY,
|
||||
JSON.stringify({
|
||||
apiKey: "sk-student",
|
||||
customModel: "custom/quantized-model",
|
||||
endpoint: "https://example.com/api",
|
||||
selectedModel: LAB2_CUSTOM_MODEL_VALUE,
|
||||
}),
|
||||
);
|
||||
|
||||
render(<Objective5Chat />);
|
||||
|
||||
expect(await screen.findByLabelText("Endpoint")).toHaveValue(
|
||||
"https://example.com/api",
|
||||
);
|
||||
expect(screen.getByLabelText("API key")).toHaveValue("sk-student");
|
||||
expect(screen.getByLabelText("Model")).toHaveValue(
|
||||
LAB2_CUSTOM_MODEL_VALUE,
|
||||
);
|
||||
expect(screen.getByLabelText("Custom model id")).toHaveValue(
|
||||
"custom/quantized-model",
|
||||
);
|
||||
});
|
||||
|
||||
it("persists settings updates back to localStorage", async () => {
|
||||
render(<Objective5Chat />);
|
||||
|
||||
fireEvent.change(screen.getByLabelText("Endpoint"), {
|
||||
target: { value: "https://saved.example/api" },
|
||||
});
|
||||
|
||||
await waitFor(() => {
|
||||
const saved = window.localStorage.getItem(LAB2_CHAT_STORAGE_KEY);
|
||||
expect(saved).toContain("https://saved.example/api");
|
||||
});
|
||||
});
|
||||
|
||||
it("refreshes models from the endpoint into the dropdown", async () => {
|
||||
const fetchMock = mockFetch();
|
||||
|
||||
render(<Objective5Chat />);
|
||||
|
||||
fireEvent.change(screen.getByLabelText("Endpoint"), {
|
||||
target: { value: "http://127.0.0.1:1234" },
|
||||
});
|
||||
|
||||
expect(await screen.findByRole("option", { name: "LM Studio Qwen" })).toBeInTheDocument();
|
||||
expect(fetchMock).toHaveBeenCalledWith(
|
||||
"/api/lab2/models",
|
||||
expect.objectContaining({
|
||||
method: "POST",
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it("renders a text response from the lab chat route", async () => {
|
||||
mockFetch();
|
||||
|
||||
render(<Objective5Chat />);
|
||||
|
||||
fireEvent.change(screen.getByLabelText("API key"), {
|
||||
target: { value: "sk-test" },
|
||||
});
|
||||
fireEvent.change(screen.getByLabelText("Prompt"), {
|
||||
target: { value: "Compare these quantized models." },
|
||||
});
|
||||
fireEvent.submit(screen.getByRole("button", { name: "Send Prompt" }).closest("form")!);
|
||||
|
||||
expect(
|
||||
await screen.findByText("Q8_0 stayed the most coherent in this run."),
|
||||
).toBeInTheDocument();
|
||||
expect(screen.getByText("Tokens/sec 14.4")).toBeInTheDocument();
|
||||
});
|
||||
|
||||
it("renders sanitized svg responses as an image preview", async () => {
|
||||
vi.stubGlobal(
|
||||
"fetch",
|
||||
vi.fn(async (input: RequestInfo | URL) => {
|
||||
const url = String(input);
|
||||
|
||||
if (url === "/api/lab2/models") {
|
||||
return {
|
||||
json: async () => ({
|
||||
models: [
|
||||
{ label: "Gemma 4 E2B Q4_K_M", value: "gemma4:e2b-it-q4_K_M" },
|
||||
{ label: "Custom model", value: LAB2_CUSTOM_MODEL_VALUE },
|
||||
],
|
||||
}),
|
||||
ok: true,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
json: async () => ({
|
||||
content:
|
||||
"<svg viewBox=\"0 0 10 10\"><circle cx=\"5\" cy=\"5\" r=\"4\" fill=\"#0f4f76\" /></svg>",
|
||||
metrics: {
|
||||
completionTokens: 451,
|
||||
tokensPerSecond: 14.4,
|
||||
},
|
||||
renderMode: "svg",
|
||||
role: "assistant",
|
||||
svg: "<svg xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 10 10\"><circle cx=\"5\" cy=\"5\" r=\"4\" fill=\"#0f4f76\" /></svg>",
|
||||
}),
|
||||
ok: true,
|
||||
};
|
||||
}),
|
||||
);
|
||||
|
||||
render(<Objective5Chat />);
|
||||
|
||||
fireEvent.change(screen.getByLabelText("API key"), {
|
||||
target: { value: "sk-test" },
|
||||
});
|
||||
fireEvent.change(screen.getByLabelText("Prompt"), {
|
||||
target: { value: "Draw a pelican riding a bicycle." },
|
||||
});
|
||||
fireEvent.submit(screen.getByRole("button", { name: "Send Prompt" }).closest("form")!);
|
||||
|
||||
expect(
|
||||
await screen.findByAltText(/svg sketch generated by/i),
|
||||
).toBeInTheDocument();
|
||||
expect(screen.getByText("View SVG source")).toBeInTheDocument();
|
||||
expect(screen.getByText("Tokens/sec 14.4")).toBeInTheDocument();
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,509 @@
|
||||
"use client";
|
||||
|
||||
import { FormEvent, useCallback, useEffect, useMemo, useState } from "react";
|
||||
import {
|
||||
getActiveModel,
|
||||
getDefaultObjective5ModelOptions,
|
||||
getDefaultObjective5Settings,
|
||||
isLocalEndpoint,
|
||||
LAB2_CHAT_STORAGE_KEY,
|
||||
LAB2_CUSTOM_MODEL_VALUE,
|
||||
LAB2_DEFAULT_ENDPOINT,
|
||||
type Objective5ModelOption,
|
||||
type Objective5Metrics,
|
||||
type Objective5Message,
|
||||
type Objective5RenderMode,
|
||||
svgToDataUrl,
|
||||
} from "~/lib/lab2-chat";
|
||||
|
||||
type ChatTurn = Objective5Message & {
|
||||
error?: string;
|
||||
id: string;
|
||||
metrics?: Objective5Metrics | null;
|
||||
model?: string;
|
||||
renderMode: Objective5RenderMode;
|
||||
svg?: string;
|
||||
};
|
||||
|
||||
type ChatApiSuccess = {
|
||||
content: string;
|
||||
error?: string;
|
||||
metrics?: Objective5Metrics | null;
|
||||
renderMode: Objective5RenderMode;
|
||||
role: "assistant";
|
||||
svg?: string;
|
||||
};
|
||||
|
||||
const starterPrompts = [
|
||||
"Draw a pelican riding a bicycle.",
|
||||
"Draw a raccoon conducting an orchestra with a baguette baton.",
|
||||
"Draw a capybara skateboarding through a volcano museum.",
|
||||
] as const;
|
||||
|
||||
function formatMetricValue(value: number, suffix = "") {
|
||||
if (Number.isInteger(value)) {
|
||||
return `${value}${suffix}`;
|
||||
}
|
||||
|
||||
return `${value.toFixed(1)}${suffix}`;
|
||||
}
|
||||
|
||||
function renderMetrics(metrics: Objective5Metrics | null | undefined) {
|
||||
if (!metrics) return null;
|
||||
|
||||
const metricItems = [
|
||||
typeof metrics.tokensPerSecond === "number"
|
||||
? `Tokens/sec ${formatMetricValue(metrics.tokensPerSecond)}`
|
||||
: null,
|
||||
typeof metrics.completionTokens === "number"
|
||||
? `Output tokens ${formatMetricValue(metrics.completionTokens)}`
|
||||
: null,
|
||||
typeof metrics.promptTokens === "number"
|
||||
? `Prompt tokens ${formatMetricValue(metrics.promptTokens)}`
|
||||
: null,
|
||||
typeof metrics.evalDurationMs === "number"
|
||||
? `Eval ${formatMetricValue(metrics.evalDurationMs, " ms")}`
|
||||
: null,
|
||||
typeof metrics.totalDurationMs === "number"
|
||||
? `Total ${formatMetricValue(metrics.totalDurationMs, " ms")}`
|
||||
: null,
|
||||
].filter(Boolean);
|
||||
|
||||
if (metricItems.length === 0) {
|
||||
return null;
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="objective5-chat__metrics" aria-label="Response metrics">
|
||||
{metricItems.map((item) => (
|
||||
<span className="objective5-chat__metric-pill" key={item}>
|
||||
{item}
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
function buildTurnId() {
|
||||
return `turn-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`;
|
||||
}
|
||||
|
||||
function toApiConversation(messages: ChatTurn[]) {
|
||||
return messages.map(({ content, role }) => ({ content, role }));
|
||||
}
|
||||
|
||||
export function Objective5Chat() {
|
||||
const defaults = useMemo(() => getDefaultObjective5Settings(), []);
|
||||
const defaultModelOptions = useMemo(() => getDefaultObjective5ModelOptions(), []);
|
||||
const [endpoint, setEndpoint] = useState(defaults.endpoint);
|
||||
const [apiKey, setApiKey] = useState(defaults.apiKey);
|
||||
const [selectedModel, setSelectedModel] = useState(defaults.selectedModel);
|
||||
const [customModel, setCustomModel] = useState(defaults.customModel);
|
||||
const [draft, setDraft] = useState<string>(starterPrompts[1]);
|
||||
const [messages, setMessages] = useState<ChatTurn[]>([]);
|
||||
const [modelOptions, setModelOptions] =
|
||||
useState<Objective5ModelOption[]>(defaultModelOptions);
|
||||
const [modelError, setModelError] = useState<string | null>(null);
|
||||
const [isRefreshingModels, setIsRefreshingModels] = useState(false);
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
const [isSubmitting, setIsSubmitting] = useState(false);
|
||||
const [hasLoadedSettings, setHasLoadedSettings] = useState(false);
|
||||
|
||||
const activeModel = getActiveModel(selectedModel, customModel);
|
||||
|
||||
useEffect(() => {
|
||||
try {
|
||||
const savedSettings = window.localStorage.getItem(LAB2_CHAT_STORAGE_KEY);
|
||||
if (!savedSettings) {
|
||||
setHasLoadedSettings(true);
|
||||
return;
|
||||
}
|
||||
|
||||
const parsed = JSON.parse(savedSettings) as Partial<typeof defaults>;
|
||||
setEndpoint(parsed.endpoint?.trim() || LAB2_DEFAULT_ENDPOINT);
|
||||
setApiKey(parsed.apiKey ?? "");
|
||||
setSelectedModel(parsed.selectedModel?.trim() || defaults.selectedModel);
|
||||
setCustomModel(parsed.customModel?.trim() || "");
|
||||
} catch {
|
||||
window.localStorage.removeItem(LAB2_CHAT_STORAGE_KEY);
|
||||
} finally {
|
||||
setHasLoadedSettings(true);
|
||||
}
|
||||
}, [defaults.selectedModel]);
|
||||
|
||||
useEffect(() => {
|
||||
if (!hasLoadedSettings) return;
|
||||
|
||||
window.localStorage.setItem(
|
||||
LAB2_CHAT_STORAGE_KEY,
|
||||
JSON.stringify({
|
||||
apiKey,
|
||||
customModel,
|
||||
endpoint,
|
||||
selectedModel,
|
||||
}),
|
||||
);
|
||||
}, [apiKey, customModel, endpoint, hasLoadedSettings, selectedModel]);
|
||||
|
||||
const refreshModels = useCallback(async () => {
|
||||
const trimmedEndpoint = endpoint.trim();
|
||||
const trimmedKey = apiKey.trim();
|
||||
|
||||
if (!trimmedEndpoint) {
|
||||
setModelError("Enter an endpoint before refreshing models.");
|
||||
return;
|
||||
}
|
||||
|
||||
if (!trimmedKey && !isLocalEndpoint(trimmedEndpoint)) {
|
||||
setModelError("Enter an API key before refreshing remote models.");
|
||||
return;
|
||||
}
|
||||
|
||||
setIsRefreshingModels(true);
|
||||
setModelError(null);
|
||||
|
||||
try {
|
||||
const response = await fetch("/api/lab2/models", {
|
||||
body: JSON.stringify({
|
||||
apiKey: trimmedKey,
|
||||
endpoint: trimmedEndpoint,
|
||||
}),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
method: "POST",
|
||||
});
|
||||
|
||||
const payload = (await response.json()) as {
|
||||
error?: string;
|
||||
models?: Objective5ModelOption[];
|
||||
};
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(payload.error || "Could not load models.");
|
||||
}
|
||||
|
||||
const nextOptions = Array.isArray(payload.models) ? payload.models : [];
|
||||
const optionsWithCustom = ensureCustomOption(nextOptions);
|
||||
setModelOptions(optionsWithCustom);
|
||||
setSelectedModel((currentModel) => {
|
||||
if (
|
||||
currentModel === LAB2_CUSTOM_MODEL_VALUE ||
|
||||
optionsWithCustom.some((option) => option.value === currentModel)
|
||||
) {
|
||||
return currentModel;
|
||||
}
|
||||
|
||||
return optionsWithCustom[0]?.value ?? currentModel;
|
||||
});
|
||||
} catch (caughtError) {
|
||||
setModelError(
|
||||
caughtError instanceof Error
|
||||
? caughtError.message
|
||||
: "Could not load models.",
|
||||
);
|
||||
} finally {
|
||||
setIsRefreshingModels(false);
|
||||
}
|
||||
}, [apiKey, endpoint]);
|
||||
|
||||
useEffect(() => {
|
||||
if (!hasLoadedSettings) return;
|
||||
if (!endpoint.trim()) return;
|
||||
if (!apiKey.trim() && !isLocalEndpoint(endpoint.trim())) return;
|
||||
|
||||
void refreshModels();
|
||||
}, [apiKey, endpoint, hasLoadedSettings, refreshModels]);
|
||||
|
||||
async function handleSubmit(event: FormEvent<HTMLFormElement>) {
|
||||
event.preventDefault();
|
||||
|
||||
const prompt = draft.trim();
|
||||
const trimmedEndpoint = endpoint.trim();
|
||||
const trimmedKey = apiKey.trim();
|
||||
|
||||
if (!trimmedEndpoint) {
|
||||
setError("Enter the model endpoint before sending a prompt.");
|
||||
return;
|
||||
}
|
||||
|
||||
if (!trimmedKey && !isLocalEndpoint(trimmedEndpoint)) {
|
||||
setError("Enter an API key before sending a prompt to a remote endpoint.");
|
||||
return;
|
||||
}
|
||||
|
||||
if (!activeModel) {
|
||||
setError("Choose one of the quantized models or enter a custom model name.");
|
||||
return;
|
||||
}
|
||||
|
||||
if (!prompt) {
|
||||
setError("Enter a prompt to compare qualitative output.");
|
||||
return;
|
||||
}
|
||||
|
||||
const nextUserTurn: ChatTurn = {
|
||||
content: prompt,
|
||||
id: buildTurnId(),
|
||||
renderMode: "text",
|
||||
role: "user",
|
||||
};
|
||||
|
||||
const nextConversation = [...messages, nextUserTurn];
|
||||
|
||||
setMessages(nextConversation);
|
||||
setDraft("");
|
||||
setError(null);
|
||||
setIsSubmitting(true);
|
||||
|
||||
try {
|
||||
const response = await fetch("/api/lab2/chat", {
|
||||
body: JSON.stringify({
|
||||
apiKey: trimmedKey,
|
||||
endpoint: trimmedEndpoint,
|
||||
messages: toApiConversation(nextConversation),
|
||||
model: activeModel,
|
||||
}),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
method: "POST",
|
||||
});
|
||||
|
||||
const payload = (await response.json()) as ChatApiSuccess & {
|
||||
error?: string;
|
||||
};
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(payload.error || "The chat request failed.");
|
||||
}
|
||||
|
||||
const assistantTurn: ChatTurn = {
|
||||
content: payload.content,
|
||||
error: payload.error,
|
||||
id: buildTurnId(),
|
||||
metrics: payload.metrics,
|
||||
model: activeModel,
|
||||
renderMode: payload.renderMode,
|
||||
role: "assistant",
|
||||
svg: payload.svg,
|
||||
};
|
||||
|
||||
setMessages((currentMessages) => [...currentMessages, assistantTurn]);
|
||||
} catch (caughtError) {
|
||||
setError(
|
||||
caughtError instanceof Error
|
||||
? caughtError.message
|
||||
: "The chat request failed.",
|
||||
);
|
||||
} finally {
|
||||
setIsSubmitting(false);
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<section className="objective5-chat" data-widget-enhanced="true">
|
||||
<div className="objective5-chat__header">
|
||||
<p className="objective5-chat__eyebrow">Objective 5 Lab Widget</p>
|
||||
<h3>Compare qualitative output with a hosted chat endpoint</h3>
|
||||
<p className="objective5-chat__lede">
|
||||
Switch between quantized models, reuse the same prompt, and ask for
|
||||
text or simple SVG sketches like{" "}
|
||||
<code>Draw a pelican riding a bicycle.</code>
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className="objective5-chat__settings">
|
||||
<label className="objective5-chat__field">
|
||||
<span>Endpoint</span>
|
||||
<input
|
||||
autoComplete="off"
|
||||
name="endpoint"
|
||||
onChange={(event) => setEndpoint(event.target.value)}
|
||||
placeholder={LAB2_DEFAULT_ENDPOINT}
|
||||
type="url"
|
||||
value={endpoint}
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label className="objective5-chat__field">
|
||||
<span>API key</span>
|
||||
<input
|
||||
autoComplete="off"
|
||||
name="apiKey"
|
||||
onChange={(event) => setApiKey(event.target.value)}
|
||||
placeholder="Paste a lab key here"
|
||||
type="text"
|
||||
value={apiKey}
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label className="objective5-chat__field">
|
||||
<span>Model</span>
|
||||
<div className="objective5-chat__model-row">
|
||||
<select
|
||||
name="selectedModel"
|
||||
onChange={(event) => setSelectedModel(event.target.value)}
|
||||
value={selectedModel}
|
||||
>
|
||||
{modelOptions.map((option) => (
|
||||
<option key={option.value} value={option.value}>
|
||||
{option.label}
|
||||
</option>
|
||||
))}
|
||||
</select>
|
||||
<button
|
||||
className="objective5-chat__refresh-button"
|
||||
disabled={isRefreshingModels}
|
||||
onClick={() => void refreshModels()}
|
||||
type="button"
|
||||
>
|
||||
{isRefreshingModels ? "Refreshing..." : "Refresh Models"}
|
||||
</button>
|
||||
</div>
|
||||
</label>
|
||||
|
||||
{selectedModel === LAB2_CUSTOM_MODEL_VALUE ? (
|
||||
<label className="objective5-chat__field">
|
||||
<span>Custom model id</span>
|
||||
<input
|
||||
autoComplete="off"
|
||||
name="customModel"
|
||||
onChange={(event) => setCustomModel(event.target.value)}
|
||||
placeholder="provider/model-name"
|
||||
type="text"
|
||||
value={customModel}
|
||||
/>
|
||||
</label>
|
||||
) : null}
|
||||
</div>
|
||||
|
||||
<p className="objective5-chat__settings-note">
|
||||
Settings stay in your browser for this lab only. Available models are
|
||||
refreshed from the configured endpoint, and changing the model does not
|
||||
clear the transcript.
|
||||
</p>
|
||||
|
||||
{modelError ? (
|
||||
<p className="objective5-chat__error objective5-chat__error--inline">
|
||||
{modelError}
|
||||
</p>
|
||||
) : null}
|
||||
|
||||
<div className="objective5-chat__prompt-row">
|
||||
{starterPrompts.map((prompt) => (
|
||||
<button
|
||||
className="objective5-chat__prompt-chip"
|
||||
key={prompt}
|
||||
onClick={() => setDraft(prompt)}
|
||||
type="button"
|
||||
>
|
||||
{prompt}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
|
||||
<div className="objective5-chat__transcript" aria-live="polite">
|
||||
{messages.length === 0 ? (
|
||||
<div className="objective5-chat__empty">
|
||||
<strong>Try the same prompt on each model.</strong>
|
||||
<p>
|
||||
Start with one of the suggested prompts, then switch the model and
|
||||
send the same question again to compare coherence and SVG fidelity.
|
||||
</p>
|
||||
</div>
|
||||
) : (
|
||||
messages.map((message) => {
|
||||
const svgDataUrl =
|
||||
message.renderMode === "svg" && message.svg
|
||||
? svgToDataUrl(message.svg)
|
||||
: null;
|
||||
|
||||
return (
|
||||
<article
|
||||
className={`objective5-chat__message objective5-chat__message--${message.role}`}
|
||||
key={message.id}
|
||||
>
|
||||
<div className="objective5-chat__message-meta">
|
||||
<span>{message.role === "user" ? "You" : "Assistant"}</span>
|
||||
{message.model ? <code>{message.model}</code> : null}
|
||||
</div>
|
||||
|
||||
{message.renderMode === "svg" && svgDataUrl ? (
|
||||
<div className="objective5-chat__svg-block">
|
||||
{/* eslint-disable-next-line @next/next/no-img-element */}
|
||||
<img
|
||||
alt={`SVG sketch generated by ${message.model ?? "the selected model"}`}
|
||||
className="objective5-chat__svg-preview"
|
||||
src={svgDataUrl}
|
||||
/>
|
||||
<details className="objective5-chat__svg-source">
|
||||
<summary>View SVG source</summary>
|
||||
<pre>
|
||||
<code>{message.svg}</code>
|
||||
</pre>
|
||||
</details>
|
||||
</div>
|
||||
) : (
|
||||
<pre className="objective5-chat__message-body">
|
||||
<code>{message.content}</code>
|
||||
</pre>
|
||||
)}
|
||||
|
||||
{message.role === "assistant" ? renderMetrics(message.metrics) : null}
|
||||
|
||||
{message.error ? (
|
||||
<p className="objective5-chat__message-warning">
|
||||
{message.error}
|
||||
</p>
|
||||
) : null}
|
||||
</article>
|
||||
);
|
||||
})
|
||||
)}
|
||||
</div>
|
||||
|
||||
<form className="objective5-chat__composer" onSubmit={handleSubmit}>
|
||||
<label className="objective5-chat__composer-label" htmlFor="objective5-draft">
|
||||
Prompt
|
||||
</label>
|
||||
<textarea
|
||||
id="objective5-draft"
|
||||
name="draft"
|
||||
onChange={(event) => setDraft(event.target.value)}
|
||||
placeholder="Ask a text question or request an SVG sketch."
|
||||
rows={5}
|
||||
value={draft}
|
||||
/>
|
||||
|
||||
<div className="objective5-chat__composer-actions">
|
||||
<div className="objective5-chat__composer-state">
|
||||
<span>Current model</span>
|
||||
<strong>{activeModel || "Choose a model"}</strong>
|
||||
</div>
|
||||
<button disabled={isSubmitting} type="submit">
|
||||
{isSubmitting ? "Sending..." : "Send Prompt"}
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{error ? <p className="objective5-chat__error">{error}</p> : null}
|
||||
</form>
|
||||
</section>
|
||||
);
|
||||
}
|
||||
|
||||
function ensureCustomOption(modelOptions: Objective5ModelOption[]) {
|
||||
if (
|
||||
modelOptions.some((option) => option.value === LAB2_CUSTOM_MODEL_VALUE)
|
||||
) {
|
||||
return modelOptions;
|
||||
}
|
||||
|
||||
return [
|
||||
...modelOptions,
|
||||
{
|
||||
label: "Custom model",
|
||||
value: LAB2_CUSTOM_MODEL_VALUE,
|
||||
},
|
||||
];
|
||||
}
|
||||
@@ -0,0 +1,217 @@
|
||||
"use client";
|
||||
|
||||
import { useMemo, useState } from "react";
|
||||
|
||||
const BIT_DEPTHS = [2, 4, 6, 8, 16] as const;
|
||||
const FOCUS_WEIGHT = 0.156347;
|
||||
const EXAMPLE_WEIGHTS = [0.156347, -0.3734, 0.7234] as const;
|
||||
|
||||
type BitDepth = (typeof BIT_DEPTHS)[number];
|
||||
|
||||
type QuantizedWeight = {
|
||||
bf16Word?: string;
|
||||
error: number;
|
||||
original: number;
|
||||
reconstructed: number;
|
||||
stored: number | string;
|
||||
};
|
||||
|
||||
function clamp(value: number, min: number, max: number) {
|
||||
return Math.min(max, Math.max(min, value));
|
||||
}
|
||||
|
||||
function formatFloat(value: number, decimals = 6) {
|
||||
return value.toFixed(decimals);
|
||||
}
|
||||
|
||||
function toBfloat16(value: number) {
|
||||
const floatView = new Float32Array(1);
|
||||
const intView = new Uint32Array(floatView.buffer);
|
||||
|
||||
floatView[0] = value;
|
||||
const current = intView[0] ?? 0;
|
||||
const leastSignificantBit = (current >> 16) & 1;
|
||||
const roundingBias = 0x7fff + leastSignificantBit;
|
||||
const rounded = (current + roundingBias) & 0xffff0000;
|
||||
|
||||
intView[0] = rounded;
|
||||
|
||||
return {
|
||||
reconstructed: floatView[0] ?? value,
|
||||
word: `0x${(rounded >>> 16).toString(16).padStart(4, "0")}`,
|
||||
};
|
||||
}
|
||||
|
||||
function getSharedScale(bitDepth: Exclude<BitDepth, 16>) {
|
||||
const maxMagnitude = Math.max(
|
||||
...EXAMPLE_WEIGHTS.map((value) => Math.abs(value)),
|
||||
);
|
||||
const qmax = Math.pow(2, bitDepth - 1) - 1;
|
||||
|
||||
return {
|
||||
qmax,
|
||||
scale: maxMagnitude / qmax,
|
||||
};
|
||||
}
|
||||
|
||||
function quantizeWeight(value: number, bitDepth: BitDepth): QuantizedWeight {
|
||||
if (bitDepth === 16) {
|
||||
const bf16 = toBfloat16(value);
|
||||
return {
|
||||
bf16Word: bf16.word,
|
||||
error: bf16.reconstructed - value,
|
||||
original: value,
|
||||
reconstructed: bf16.reconstructed,
|
||||
stored: bf16.word,
|
||||
};
|
||||
}
|
||||
|
||||
const { scale, qmax } = getSharedScale(bitDepth);
|
||||
const stored = clamp(Math.round(value / scale), -qmax, qmax);
|
||||
const reconstructed = stored * scale;
|
||||
|
||||
return {
|
||||
error: reconstructed - value,
|
||||
original: value,
|
||||
reconstructed,
|
||||
stored,
|
||||
};
|
||||
}
|
||||
|
||||
function getShortExplanation(bitDepth: BitDepth) {
|
||||
if (bitDepth === 16) {
|
||||
return "BF16 keeps much more of the original value because it is still a floating-point format.";
|
||||
}
|
||||
|
||||
if (bitDepth === 2) {
|
||||
return "At 2 bits, many nearby weights are forced into the same tiny set of buckets.";
|
||||
}
|
||||
|
||||
if (bitDepth === 4) {
|
||||
return "At 4 bits, the model has more buckets to work with, so the approximation gets better.";
|
||||
}
|
||||
|
||||
if (bitDepth === 6) {
|
||||
return "At 6 bits, the rounded result is usually noticeably closer to the original.";
|
||||
}
|
||||
|
||||
return "At 8 bits, the rounded result is often fairly close to the original weight.";
|
||||
}
|
||||
|
||||
export function QuantizationExplorer() {
|
||||
const [bitDepthIndex, setBitDepthIndex] = useState(0);
|
||||
|
||||
const bitDepth = BIT_DEPTHS[bitDepthIndex] ?? BIT_DEPTHS[0];
|
||||
const scaleSummary =
|
||||
bitDepth === 16 ? null : getSharedScale(bitDepth as Exclude<BitDepth, 16>);
|
||||
|
||||
const focusWeight = useMemo(
|
||||
() => quantizeWeight(FOCUS_WEIGHT, bitDepth),
|
||||
[bitDepth],
|
||||
);
|
||||
|
||||
return (
|
||||
<div className="quantization-explorer" data-widget-enhanced="true">
|
||||
<div className="quantization-explorer__header">
|
||||
<p className="quantization-explorer__eyebrow">Single Weight View</p>
|
||||
<h3>See one stored value become an approximation</h3>
|
||||
<p className="quantization-explorer__lede">
|
||||
The original weight is <code>{formatFloat(FOCUS_WEIGHT)}</code>. Lower
|
||||
precision stores a rougher version of it.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className="quantization-explorer__controls">
|
||||
<div className="quantization-explorer__slider-card">
|
||||
<label
|
||||
className="quantization-explorer__slider-label"
|
||||
htmlFor="single-quant-depth"
|
||||
>
|
||||
Precision:{" "}
|
||||
<strong>
|
||||
{bitDepth === 16 ? "16-bit BF16" : `${bitDepth}-bit`}
|
||||
</strong>
|
||||
</label>
|
||||
<input
|
||||
id="single-quant-depth"
|
||||
type="range"
|
||||
min={0}
|
||||
max={BIT_DEPTHS.length - 1}
|
||||
step={1}
|
||||
value={bitDepthIndex}
|
||||
onChange={(event) => setBitDepthIndex(Number(event.target.value))}
|
||||
/>
|
||||
<div className="quantization-explorer__tick-row" aria-hidden="true">
|
||||
{BIT_DEPTHS.map((depth) => (
|
||||
<span key={depth}>{depth}</span>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="quantization-explorer__focus-grid quantization-explorer__focus-grid--four">
|
||||
<div className="quantization-explorer__focus-card">
|
||||
<span className="quantization-explorer__focus-label">
|
||||
Original weight
|
||||
</span>
|
||||
<strong className="quantization-explorer__focus-value">
|
||||
{formatFloat(focusWeight.original)}
|
||||
</strong>
|
||||
</div>
|
||||
|
||||
<div className="quantization-explorer__focus-card">
|
||||
<span className="quantization-explorer__focus-label">
|
||||
{bitDepth === 16 ? "Stored BF16 word" : "Stored bucket"}
|
||||
</span>
|
||||
<strong className="quantization-explorer__focus-value">
|
||||
{String(focusWeight.stored)}
|
||||
</strong>
|
||||
</div>
|
||||
|
||||
<div className="quantization-explorer__focus-card">
|
||||
<span className="quantization-explorer__focus-label">
|
||||
{bitDepth === 16 ? "Decoded float" : "Scaled back to float"}
|
||||
</span>
|
||||
<strong className="quantization-explorer__focus-value">
|
||||
{formatFloat(focusWeight.reconstructed)}
|
||||
</strong>
|
||||
</div>
|
||||
|
||||
<div className="quantization-explorer__focus-card">
|
||||
<span className="quantization-explorer__focus-label">
|
||||
Absolute error
|
||||
</span>
|
||||
<strong className="quantization-explorer__focus-value">
|
||||
{formatFloat(Math.abs(focusWeight.error))}
|
||||
</strong>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p className="quantization-explorer__helper">
|
||||
{bitDepth === 16 ? (
|
||||
<>
|
||||
<strong>How to read this:</strong> BF16 stores a 16-bit word and
|
||||
then decodes it back into an approximate float.
|
||||
</>
|
||||
) : (
|
||||
<>
|
||||
<strong>How to read this:</strong> the stored bucket is a small
|
||||
integer. We multiply it by the scale to get the approximate float.
|
||||
</>
|
||||
)}{" "}
|
||||
{getShortExplanation(bitDepth)}
|
||||
</p>
|
||||
|
||||
<div className="quantization-explorer__formula">
|
||||
<span className="quantization-explorer__formula-label">
|
||||
{bitDepth === 16 ? "BF16 decoding" : "Bucket math"}
|
||||
</span>
|
||||
<code>
|
||||
{bitDepth === 16
|
||||
? `${focusWeight.bf16Word} -> ${formatFloat(focusWeight.reconstructed)}`
|
||||
: `${focusWeight.stored} x ${formatFloat(scaleSummary?.scale ?? 0)} = ${formatFloat(focusWeight.reconstructed)}`}
|
||||
</code>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,150 @@
|
||||
"use client";
|
||||
|
||||
import { useMemo, useState } from "react";
|
||||
|
||||
const BIT_DEPTHS = [2, 4, 6, 8, 16] as const;
|
||||
const TOY_LAYER_WEIGHTS = [
|
||||
0.91, -0.72, 0.64, -0.58, 0.41, -0.33, 0.28, -0.19, 0.15, -0.11, 0.09, -0.06,
|
||||
0.04, -0.03, 0.02, -0.01,
|
||||
] as const;
|
||||
|
||||
type BitDepth = (typeof BIT_DEPTHS)[number];
|
||||
|
||||
type StoredCell = {
|
||||
stored: string;
|
||||
};
|
||||
|
||||
function clamp(value: number, min: number, max: number) {
|
||||
return Math.min(max, Math.max(min, value));
|
||||
}
|
||||
|
||||
function formatFloat(value: number, decimals = 3) {
|
||||
return value.toFixed(decimals);
|
||||
}
|
||||
|
||||
function toBfloat16Word(value: number) {
|
||||
const floatView = new Float32Array(1);
|
||||
const intView = new Uint32Array(floatView.buffer);
|
||||
|
||||
floatView[0] = value;
|
||||
const current = intView[0] ?? 0;
|
||||
const leastSignificantBit = (current >> 16) & 1;
|
||||
const roundingBias = 0x7fff + leastSignificantBit;
|
||||
const rounded = (current + roundingBias) & 0xffff0000;
|
||||
|
||||
intView[0] = rounded;
|
||||
|
||||
return {
|
||||
reconstructed: floatView[0] ?? value,
|
||||
word: `0x${(rounded >>> 16).toString(16).padStart(4, "0")}`,
|
||||
};
|
||||
}
|
||||
|
||||
function getSharedScale(bitDepth: Exclude<BitDepth, 16>) {
|
||||
const maxMagnitude = Math.max(
|
||||
...TOY_LAYER_WEIGHTS.map((value) => Math.abs(value)),
|
||||
);
|
||||
const qmax = Math.pow(2, bitDepth - 1) - 1;
|
||||
|
||||
return {
|
||||
qmax,
|
||||
scale: maxMagnitude / qmax,
|
||||
};
|
||||
}
|
||||
|
||||
function quantizeCell(value: number, bitDepth: BitDepth): StoredCell {
|
||||
if (bitDepth === 16) {
|
||||
const bf16 = toBfloat16Word(value);
|
||||
return {
|
||||
stored: bf16.word,
|
||||
};
|
||||
}
|
||||
|
||||
const { scale, qmax } = getSharedScale(bitDepth);
|
||||
const stored = clamp(Math.round(value / scale), -qmax, qmax);
|
||||
|
||||
return {
|
||||
stored: String(stored),
|
||||
};
|
||||
}
|
||||
|
||||
export function QuantizationGridExplorer() {
|
||||
const [bitDepthIndex, setBitDepthIndex] = useState(0);
|
||||
|
||||
const bitDepth = BIT_DEPTHS[bitDepthIndex] ?? BIT_DEPTHS[0];
|
||||
const scaleSummary =
|
||||
bitDepth === 16 ? null : getSharedScale(bitDepth as Exclude<BitDepth, 16>);
|
||||
const cells = useMemo(
|
||||
() => TOY_LAYER_WEIGHTS.map((weight) => quantizeCell(weight, bitDepth)),
|
||||
[bitDepth],
|
||||
);
|
||||
|
||||
return (
|
||||
<div className="quantization-grid-explorer" data-widget-enhanced="true">
|
||||
<div className="quantization-grid-explorer__header">
|
||||
<p className="quantization-grid-explorer__eyebrow">Toy Layer View</p>
|
||||
<h3>Watch a tiny layer get stored as 16 buckets</h3>
|
||||
<p className="quantization-grid-explorer__lede">
|
||||
Each square below is one toy weight slot. The number shown is the
|
||||
stored bucket value.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className="quantization-grid-explorer__slider-card">
|
||||
<label
|
||||
className="quantization-grid-explorer__slider-label"
|
||||
htmlFor="grid-quant-depth"
|
||||
>
|
||||
Precision:{" "}
|
||||
<strong>{bitDepth === 16 ? "16-bit BF16" : `${bitDepth}-bit`}</strong>
|
||||
</label>
|
||||
<input
|
||||
id="grid-quant-depth"
|
||||
type="range"
|
||||
min={0}
|
||||
max={BIT_DEPTHS.length - 1}
|
||||
step={1}
|
||||
value={bitDepthIndex}
|
||||
onChange={(event) => setBitDepthIndex(Number(event.target.value))}
|
||||
/>
|
||||
<div
|
||||
className="quantization-grid-explorer__tick-row"
|
||||
aria-hidden="true"
|
||||
>
|
||||
{BIT_DEPTHS.map((depth) => (
|
||||
<span key={depth}>{depth}</span>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p className="quantization-grid-explorer__helper">
|
||||
{bitDepth === 16 ? (
|
||||
<>
|
||||
In BF16, each square stores a 16-bit word instead of a tiny bucket.
|
||||
</>
|
||||
) : (
|
||||
<>
|
||||
Smaller bit depths force more weights into the same few bucket
|
||||
values. Scale = <code>{formatFloat(scaleSummary?.scale ?? 0)}</code>
|
||||
</>
|
||||
)}
|
||||
</p>
|
||||
|
||||
<div className="quantization-grid-explorer__grid">
|
||||
{cells.map((cell, index) => (
|
||||
<div
|
||||
className="quantization-grid-explorer__cell"
|
||||
key={`${bitDepth}-${index}`}
|
||||
>
|
||||
<span className="quantization-grid-explorer__cell-label">
|
||||
W{index}
|
||||
</span>
|
||||
<strong className="quantization-grid-explorer__cell-value">
|
||||
{cell.stored}
|
||||
</strong>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,192 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import {
|
||||
extractAssistantTextContent,
|
||||
extractObjective5Metrics,
|
||||
extractModelOptions,
|
||||
extractSvgMarkup,
|
||||
getModelListEndpointCandidates,
|
||||
isLocalEndpoint,
|
||||
normalizeOllamaChatEndpoint,
|
||||
normalizeUpstreamChatEndpoint,
|
||||
sanitizeSvgDocument,
|
||||
} from "~/lib/lab2-chat";
|
||||
|
||||
describe("normalizeUpstreamChatEndpoint", () => {
|
||||
it("appends the chat completions path to a base api endpoint", () => {
|
||||
expect(normalizeUpstreamChatEndpoint("https://ai.zuccaro.me/api")).toBe(
|
||||
"https://ai.zuccaro.me/api/v1/chat/completions",
|
||||
);
|
||||
});
|
||||
|
||||
it("preserves endpoints that already include the full chat completions path", () => {
|
||||
expect(
|
||||
normalizeUpstreamChatEndpoint(
|
||||
"https://ai.zuccaro.me/api/v1/chat/completions",
|
||||
),
|
||||
).toBe("https://ai.zuccaro.me/api/v1/chat/completions");
|
||||
});
|
||||
});
|
||||
|
||||
describe("extractSvgMarkup", () => {
|
||||
it("extracts fenced svg output", () => {
|
||||
const markup = extractSvgMarkup(
|
||||
"```svg\n<svg viewBox=\"0 0 10 10\"></svg>\n```",
|
||||
);
|
||||
|
||||
expect(markup).toBe("<svg viewBox=\"0 0 10 10\"></svg>");
|
||||
});
|
||||
});
|
||||
|
||||
describe("normalizeOllamaChatEndpoint", () => {
|
||||
it("appends the ollama chat path to a base api endpoint", () => {
|
||||
expect(normalizeOllamaChatEndpoint("https://ai.zuccaro.me/api")).toBe(
|
||||
"https://ai.zuccaro.me/ollama/api/chat",
|
||||
);
|
||||
});
|
||||
});
|
||||
|
||||
describe("getModelListEndpointCandidates", () => {
|
||||
it("prefers v1 models for bare local endpoints", () => {
|
||||
expect(getModelListEndpointCandidates("http://127.0.0.1:1234")).toEqual([
|
||||
"http://127.0.0.1:1234/v1/models",
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
||||
describe("isLocalEndpoint", () => {
|
||||
it("detects localhost endpoints", () => {
|
||||
expect(isLocalEndpoint("http://127.0.0.1:1234")).toBe(true);
|
||||
expect(isLocalEndpoint("https://ai.zuccaro.me/api")).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe("extractAssistantTextContent", () => {
|
||||
it("reads ollama-style chat responses", () => {
|
||||
expect(
|
||||
extractAssistantTextContent({
|
||||
message: {
|
||||
content: "hello from gemma",
|
||||
role: "assistant",
|
||||
},
|
||||
}),
|
||||
).toBe("hello from gemma");
|
||||
});
|
||||
|
||||
it("falls back to reasoning content for local reasoning models", () => {
|
||||
expect(
|
||||
extractAssistantTextContent({
|
||||
choices: [
|
||||
{
|
||||
message: {
|
||||
content: "",
|
||||
reasoning_content: "Thinking only output",
|
||||
},
|
||||
},
|
||||
],
|
||||
}),
|
||||
).toBe("Thinking only output");
|
||||
});
|
||||
});
|
||||
|
||||
describe("extractObjective5Metrics", () => {
|
||||
it("computes tokens per second for ollama-style responses", () => {
|
||||
expect(
|
||||
extractObjective5Metrics({
|
||||
eval_count: 451,
|
||||
eval_duration: 31_332_836_667,
|
||||
prompt_eval_count: 16,
|
||||
prompt_eval_duration: 96_516_186,
|
||||
total_duration: 33_471_213_310,
|
||||
}),
|
||||
).toEqual({
|
||||
completionTokens: 451,
|
||||
evalDurationMs: 31332.8,
|
||||
promptEvalDurationMs: 96.5,
|
||||
promptTokens: 16,
|
||||
tokensPerSecond: 14.4,
|
||||
totalDurationMs: 33471.2,
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
describe("extractModelOptions", () => {
|
||||
it("maps model list payloads into dropdown options", () => {
|
||||
expect(
|
||||
extractModelOptions({
|
||||
data: [
|
||||
{ id: "qwen3.5-9b-mlx", object: "model" },
|
||||
{ id: "gemma4:e2b-it-q4_K_M", name: "Gemma 4 E2B Q4_K_M" },
|
||||
],
|
||||
}),
|
||||
).toEqual([
|
||||
{ label: "qwen3.5-9b-mlx", value: "qwen3.5-9b-mlx" },
|
||||
{ label: "Gemma 4 E2B Q4_K_M", value: "gemma4:e2b-it-q4_K_M" },
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
||||
describe("sanitizeSvgDocument", () => {
|
||||
it("accepts a simple safe svg", () => {
|
||||
const result = sanitizeSvgDocument(
|
||||
"<svg viewBox=\"0 0 10 10\"><rect x=\"1\" y=\"1\" width=\"8\" height=\"8\" fill=\"#0f4f76\" /></svg>",
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(true);
|
||||
if (!result.ok) {
|
||||
throw new Error(result.error);
|
||||
}
|
||||
|
||||
expect(result.svg).toContain("<svg");
|
||||
expect(result.svg).toContain("xmlns=\"http://www.w3.org/2000/svg\"");
|
||||
});
|
||||
|
||||
it("accepts an explicit safe xmlns on the root svg element", () => {
|
||||
const result = sanitizeSvgDocument(
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 512 512\"><rect width=\"512\" height=\"512\" fill=\"#87CEEB\" /></svg>",
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(true);
|
||||
if (!result.ok) {
|
||||
throw new Error(result.error);
|
||||
}
|
||||
|
||||
expect(result.svg).toContain("xmlns=\"http://www.w3.org/2000/svg\"");
|
||||
});
|
||||
|
||||
it("rejects malicious event handlers", () => {
|
||||
const result = sanitizeSvgDocument(
|
||||
"<svg viewBox=\"0 0 10 10\"><rect x=\"1\" y=\"1\" width=\"8\" height=\"8\" onload=\"alert(1)\" /></svg>",
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (result.ok) {
|
||||
throw new Error("Expected unsafe SVG to fail sanitization.");
|
||||
}
|
||||
|
||||
expect(result.error).toMatch(/blocked attribute|blocked event/i);
|
||||
});
|
||||
|
||||
it("rejects foreignObject", () => {
|
||||
const result = sanitizeSvgDocument(
|
||||
"<svg viewBox=\"0 0 10 10\"><foreignObject width=\"10\" height=\"10\"></foreignObject></svg>",
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (result.ok) {
|
||||
throw new Error("Expected blocked SVG element to fail sanitization.");
|
||||
}
|
||||
|
||||
expect(result.error).toMatch(/blocked element/i);
|
||||
});
|
||||
|
||||
it("rejects malformed xml", () => {
|
||||
const result = sanitizeSvgDocument("<svg><g></svg>");
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (result.ok) {
|
||||
throw new Error("Expected malformed SVG to fail sanitization.");
|
||||
}
|
||||
|
||||
expect(result.error).toMatch(/malformed/i);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,783 @@
|
||||
import {
|
||||
DOMParser,
|
||||
XMLSerializer,
|
||||
type Element as XmlDomElement,
|
||||
} from "@xmldom/xmldom";
|
||||
|
||||
export const LAB2_CHAT_STORAGE_KEY = "lab2-objective5-chat-settings";
|
||||
export const LAB2_DEFAULT_ENDPOINT = "https://ai.zuccaro.me/api";
|
||||
export const LAB2_CUSTOM_MODEL_VALUE = "__custom__";
|
||||
export const LAB2_MAX_CONTEXT_MESSAGES = 10;
|
||||
export const LAB2_MAX_MESSAGE_LENGTH = 4000;
|
||||
export const LAB2_MAX_SVG_LENGTH = 20000;
|
||||
|
||||
export const LAB2_MODEL_OPTIONS = [
|
||||
{
|
||||
label: "Gemma 4 E2B Q8_0",
|
||||
value: "gemma4:e2b-it-q8_0",
|
||||
},
|
||||
{
|
||||
label: "Gemma 4 E2B Q4_K_M",
|
||||
value: "gemma4:e2b-it-q4_K_M",
|
||||
},
|
||||
{
|
||||
label: "Custom model",
|
||||
value: LAB2_CUSTOM_MODEL_VALUE,
|
||||
},
|
||||
] as const;
|
||||
|
||||
export type Objective5Role = "user" | "assistant";
|
||||
|
||||
export type Objective5Message = {
|
||||
content: string;
|
||||
role: Objective5Role;
|
||||
};
|
||||
|
||||
export type Objective5RenderMode = "text" | "svg";
|
||||
|
||||
export type Objective5Metrics = {
|
||||
completionTokens?: number;
|
||||
evalDurationMs?: number;
|
||||
promptTokens?: number;
|
||||
promptEvalDurationMs?: number;
|
||||
tokensPerSecond?: number;
|
||||
totalDurationMs?: number;
|
||||
};
|
||||
|
||||
export type Objective5StoredSettings = {
|
||||
apiKey: string;
|
||||
customModel: string;
|
||||
endpoint: string;
|
||||
selectedModel: string;
|
||||
};
|
||||
|
||||
export type Objective5ModelOption = {
|
||||
label: string;
|
||||
value: string;
|
||||
};
|
||||
|
||||
type AssistantMessageContentPart =
|
||||
| string
|
||||
| {
|
||||
text?: string;
|
||||
type?: string;
|
||||
};
|
||||
|
||||
type ChatCompletionPayload = {
|
||||
choices?: Array<{
|
||||
message?: {
|
||||
content?: AssistantMessageContentPart | AssistantMessageContentPart[];
|
||||
reasoning_content?: string;
|
||||
};
|
||||
}>;
|
||||
output?: Array<{
|
||||
content?: Array<{
|
||||
text?: string;
|
||||
type?: string;
|
||||
}>;
|
||||
type?: string;
|
||||
}>;
|
||||
output_text?: string;
|
||||
message?: {
|
||||
content?: string;
|
||||
role?: string;
|
||||
thinking?: string;
|
||||
};
|
||||
completion_tokens?: number;
|
||||
prompt_tokens?: number;
|
||||
usage?: {
|
||||
completion_tokens?: number;
|
||||
prompt_tokens?: number;
|
||||
total_tokens?: number;
|
||||
};
|
||||
eval_count?: number;
|
||||
eval_duration?: number;
|
||||
prompt_eval_count?: number;
|
||||
prompt_eval_duration?: number;
|
||||
total_duration?: number;
|
||||
};
|
||||
|
||||
type SvgSanitizationSuccess = {
|
||||
ok: true;
|
||||
svg: string;
|
||||
};
|
||||
|
||||
type SvgSanitizationFailure = {
|
||||
error: string;
|
||||
ok: false;
|
||||
};
|
||||
|
||||
export type SvgSanitizationResult =
|
||||
| SvgSanitizationFailure
|
||||
| SvgSanitizationSuccess;
|
||||
|
||||
const SVG_NAMESPACE = "http://www.w3.org/2000/svg";
|
||||
const allowedSvgElements = new Set([
|
||||
"svg",
|
||||
"g",
|
||||
"path",
|
||||
"circle",
|
||||
"ellipse",
|
||||
"rect",
|
||||
"line",
|
||||
"polyline",
|
||||
"polygon",
|
||||
"text",
|
||||
"tspan",
|
||||
"defs",
|
||||
"linearGradient",
|
||||
"radialGradient",
|
||||
"stop",
|
||||
"title",
|
||||
"desc",
|
||||
]);
|
||||
|
||||
const allowedSvgAttributes = new Set([
|
||||
"cx",
|
||||
"cy",
|
||||
"d",
|
||||
"dominant-baseline",
|
||||
"fill",
|
||||
"fill-opacity",
|
||||
"fill-rule",
|
||||
"font-family",
|
||||
"font-size",
|
||||
"font-weight",
|
||||
"gradientTransform",
|
||||
"gradientUnits",
|
||||
"height",
|
||||
"id",
|
||||
"offset",
|
||||
"opacity",
|
||||
"points",
|
||||
"preserveAspectRatio",
|
||||
"r",
|
||||
"rect",
|
||||
"role",
|
||||
"rx",
|
||||
"ry",
|
||||
"stop-color",
|
||||
"stop-opacity",
|
||||
"stroke",
|
||||
"stroke-linecap",
|
||||
"stroke-linejoin",
|
||||
"stroke-opacity",
|
||||
"stroke-width",
|
||||
"text-anchor",
|
||||
"transform",
|
||||
"version",
|
||||
"viewBox",
|
||||
"width",
|
||||
"x",
|
||||
"x1",
|
||||
"x2",
|
||||
"xml:space",
|
||||
"xmlns",
|
||||
"y",
|
||||
"y1",
|
||||
"y2",
|
||||
]);
|
||||
|
||||
const enumValues = {
|
||||
"dominant-baseline": new Set([
|
||||
"auto",
|
||||
"alphabetic",
|
||||
"central",
|
||||
"hanging",
|
||||
"ideographic",
|
||||
"mathematical",
|
||||
"middle",
|
||||
"text-after-edge",
|
||||
"text-before-edge",
|
||||
]),
|
||||
"fill-rule": new Set(["evenodd", "inherit", "nonzero"]),
|
||||
"gradientUnits": new Set(["objectBoundingBox", "userSpaceOnUse"]),
|
||||
"stroke-linecap": new Set(["butt", "round", "square"]),
|
||||
"stroke-linejoin": new Set(["arcs", "bevel", "miter", "miter-clip", "round"]),
|
||||
"text-anchor": new Set(["end", "inherit", "middle", "start"]),
|
||||
} as const;
|
||||
|
||||
const numberPattern = /^-?(?:\d+|\d*\.\d+)(?:e[-+]?\d+)?%?$/i;
|
||||
const numberListPattern =
|
||||
/^[-+0-9eE.,%\s]+$/;
|
||||
const pathPattern = /^[MmZzLlHhVvCcSsQqTtAa0-9eE.,\s-]+$/;
|
||||
const pointsPattern = /^[-+0-9eE.,\s]+$/;
|
||||
const transformPattern =
|
||||
/^[a-zA-Z0-9(),.\s-]+$/;
|
||||
const viewBoxPattern = /^[-+0-9eE.,\s]+$/;
|
||||
const fontFamilyPattern =
|
||||
/^[a-zA-Z0-9\s,'"_-]+$/;
|
||||
const textPattern =
|
||||
/^[^\u0000-\u0008\u000b\u000c\u000e-\u001f<>]+$/;
|
||||
const idPattern = /^[A-Za-z_][A-Za-z0-9_.-]*$/;
|
||||
|
||||
export function getLab2SystemPrompt() {
|
||||
return [
|
||||
"You are helping students compare quantized models in a lab.",
|
||||
"Answer normal questions clearly and concisely.",
|
||||
"If the user asks you to draw something, respond with only one complete standalone SVG document.",
|
||||
"Do not wrap SVG in markdown fences.",
|
||||
"Do not include any explanation before or after the SVG.",
|
||||
"Use a 512 by 512 canvas with viewBox=\"0 0 512 512\".",
|
||||
"Do not use scripts, foreignObject, animation, external references, or remote assets.",
|
||||
"Prefer simple shapes, strokes, fills, and text labels that render reliably.",
|
||||
].join(" ");
|
||||
}
|
||||
|
||||
export function normalizeUpstreamChatEndpoint(endpoint: string) {
|
||||
const url = new URL(endpoint);
|
||||
const trimmedPath = url.pathname.replace(/\/+$/, "");
|
||||
|
||||
if (trimmedPath.endsWith("/chat/completions")) {
|
||||
url.pathname = trimmedPath;
|
||||
} else if (trimmedPath.endsWith("/v1")) {
|
||||
url.pathname = `${trimmedPath}/chat/completions`;
|
||||
} else if (trimmedPath.length === 0) {
|
||||
url.pathname = "/v1/chat/completions";
|
||||
} else {
|
||||
url.pathname = `${trimmedPath}/v1/chat/completions`;
|
||||
}
|
||||
|
||||
url.hash = "";
|
||||
return url.toString();
|
||||
}
|
||||
|
||||
export function getModelListEndpointCandidates(endpoint: string) {
|
||||
const url = new URL(endpoint);
|
||||
const trimmedPath = url.pathname.replace(/\/+$/, "");
|
||||
|
||||
if (trimmedPath.endsWith("/models")) {
|
||||
url.hash = "";
|
||||
return [url.toString()];
|
||||
}
|
||||
|
||||
const paths = new Set<string>();
|
||||
|
||||
if (trimmedPath.endsWith("/api")) {
|
||||
paths.add("/api/v1/models");
|
||||
paths.add("/api/models");
|
||||
} else if (trimmedPath.endsWith("/api/v1")) {
|
||||
paths.add("/api/v1/models");
|
||||
paths.add("/api/models");
|
||||
} else if (trimmedPath.endsWith("/v1")) {
|
||||
paths.add("/v1/models");
|
||||
} else if (trimmedPath.length === 0) {
|
||||
paths.add("/v1/models");
|
||||
} else {
|
||||
paths.add(`${trimmedPath}/v1/models`);
|
||||
paths.add(`${trimmedPath}/models`);
|
||||
}
|
||||
|
||||
return Array.from(paths).map((path) => {
|
||||
const candidate = new URL(url.toString());
|
||||
candidate.pathname = path;
|
||||
candidate.hash = "";
|
||||
return candidate.toString();
|
||||
});
|
||||
}
|
||||
|
||||
export function normalizeOllamaChatEndpoint(endpoint: string) {
|
||||
const url = new URL(endpoint);
|
||||
const trimmedPath = url.pathname.replace(/\/+$/, "");
|
||||
|
||||
if (trimmedPath.endsWith("/ollama/api/chat")) {
|
||||
url.pathname = trimmedPath;
|
||||
} else if (trimmedPath.endsWith("/api") || trimmedPath.endsWith("/api/v1")) {
|
||||
url.pathname = "/ollama/api/chat";
|
||||
} else if (trimmedPath.length === 0) {
|
||||
url.pathname = "/ollama/api/chat";
|
||||
} else {
|
||||
url.pathname = `${trimmedPath}/ollama/api/chat`;
|
||||
}
|
||||
|
||||
url.hash = "";
|
||||
return url.toString();
|
||||
}
|
||||
|
||||
export function looksLikeOllamaModel(model: string) {
|
||||
return model.includes(":");
|
||||
}
|
||||
|
||||
export function isLocalEndpoint(endpoint: string) {
|
||||
try {
|
||||
const url = new URL(endpoint);
|
||||
return (
|
||||
url.hostname === "127.0.0.1" ||
|
||||
url.hostname === "localhost" ||
|
||||
url.hostname === "::1"
|
||||
);
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
export function clampChatMessages(messages: Objective5Message[]) {
|
||||
return messages
|
||||
.filter((message) => {
|
||||
return (
|
||||
(message.role === "assistant" || message.role === "user") &&
|
||||
typeof message.content === "string"
|
||||
);
|
||||
})
|
||||
.map((message) => {
|
||||
return {
|
||||
content: message.content.slice(0, LAB2_MAX_MESSAGE_LENGTH),
|
||||
role: message.role,
|
||||
} satisfies Objective5Message;
|
||||
})
|
||||
.slice(-LAB2_MAX_CONTEXT_MESSAGES);
|
||||
}
|
||||
|
||||
export function buildUpstreamMessages(messages: Objective5Message[]) {
|
||||
return [
|
||||
{
|
||||
content: getLab2SystemPrompt(),
|
||||
role: "system" as const,
|
||||
},
|
||||
...clampChatMessages(messages),
|
||||
];
|
||||
}
|
||||
|
||||
export function extractAssistantTextContent(payload: ChatCompletionPayload) {
|
||||
if (
|
||||
payload.message &&
|
||||
typeof payload.message.content === "string" &&
|
||||
payload.message.content.trim()
|
||||
) {
|
||||
return payload.message.content.trim();
|
||||
}
|
||||
|
||||
if (typeof payload.output_text === "string" && payload.output_text.trim()) {
|
||||
return payload.output_text.trim();
|
||||
}
|
||||
|
||||
const choiceContent = payload.choices?.[0]?.message?.content;
|
||||
const choiceText = normalizeContentParts(choiceContent);
|
||||
if (choiceText) {
|
||||
return choiceText;
|
||||
}
|
||||
|
||||
const reasoningContent = payload.choices?.[0]?.message?.reasoning_content;
|
||||
if (typeof reasoningContent === "string" && reasoningContent.trim()) {
|
||||
return reasoningContent.trim();
|
||||
}
|
||||
|
||||
const outputText = payload.output
|
||||
?.flatMap((item) => item.content ?? [])
|
||||
.map((item) => item.text?.trim() ?? "")
|
||||
.filter(Boolean)
|
||||
.join("\n\n")
|
||||
.trim();
|
||||
|
||||
return outputText || null;
|
||||
}
|
||||
|
||||
export function extractModelOptions(payload: unknown): Objective5ModelOption[] {
|
||||
if (
|
||||
!payload ||
|
||||
typeof payload !== "object" ||
|
||||
!("data" in payload) ||
|
||||
!Array.isArray(payload.data)
|
||||
) {
|
||||
return [];
|
||||
}
|
||||
|
||||
return payload.data
|
||||
.map((item) => {
|
||||
if (!item || typeof item !== "object") return null;
|
||||
|
||||
const value =
|
||||
"id" in item && typeof item.id === "string" ? item.id.trim() : "";
|
||||
const label =
|
||||
"name" in item && typeof item.name === "string" && item.name.trim()
|
||||
? item.name.trim()
|
||||
: value;
|
||||
|
||||
if (!value) return null;
|
||||
return { label, value } satisfies Objective5ModelOption;
|
||||
})
|
||||
.filter((item): item is Objective5ModelOption => item !== null);
|
||||
}
|
||||
|
||||
export function extractObjective5Metrics(
|
||||
payload: ChatCompletionPayload,
|
||||
): Objective5Metrics | null {
|
||||
const promptTokens =
|
||||
payload.prompt_eval_count ??
|
||||
payload.prompt_tokens ??
|
||||
payload.usage?.prompt_tokens;
|
||||
const completionTokens =
|
||||
payload.eval_count ??
|
||||
payload.completion_tokens ??
|
||||
payload.usage?.completion_tokens;
|
||||
const promptEvalDurationMs = toMilliseconds(payload.prompt_eval_duration);
|
||||
const evalDurationMs = toMilliseconds(payload.eval_duration);
|
||||
const totalDurationMs = toMilliseconds(payload.total_duration);
|
||||
const tokensPerSecond =
|
||||
typeof completionTokens === "number" &&
|
||||
typeof evalDurationMs === "number" &&
|
||||
evalDurationMs > 0
|
||||
? roundMetric((completionTokens / evalDurationMs) * 1000)
|
||||
: undefined;
|
||||
|
||||
if (
|
||||
typeof promptTokens !== "number" &&
|
||||
typeof completionTokens !== "number" &&
|
||||
typeof promptEvalDurationMs !== "number" &&
|
||||
typeof evalDurationMs !== "number" &&
|
||||
typeof totalDurationMs !== "number" &&
|
||||
typeof tokensPerSecond !== "number"
|
||||
) {
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
completionTokens,
|
||||
evalDurationMs,
|
||||
promptEvalDurationMs,
|
||||
promptTokens,
|
||||
tokensPerSecond,
|
||||
totalDurationMs,
|
||||
};
|
||||
}
|
||||
|
||||
function normalizeContentParts(
|
||||
content: AssistantMessageContentPart | AssistantMessageContentPart[] | undefined,
|
||||
) {
|
||||
if (typeof content === "string") {
|
||||
return content.trim();
|
||||
}
|
||||
|
||||
if (!Array.isArray(content)) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const text = content
|
||||
.map((part) => {
|
||||
if (typeof part === "string") {
|
||||
return part;
|
||||
}
|
||||
|
||||
return part.text ?? "";
|
||||
})
|
||||
.join("\n\n")
|
||||
.trim();
|
||||
|
||||
return text || null;
|
||||
}
|
||||
|
||||
function toMilliseconds(durationNs?: number) {
|
||||
if (typeof durationNs !== "number" || !Number.isFinite(durationNs)) {
|
||||
return undefined;
|
||||
}
|
||||
|
||||
return roundMetric(durationNs / 1_000_000);
|
||||
}
|
||||
|
||||
function roundMetric(value: number) {
|
||||
return Math.round(value * 10) / 10;
|
||||
}
|
||||
|
||||
export function extractSvgMarkup(content: string) {
|
||||
const trimmed = content.trim();
|
||||
if (!trimmed) return null;
|
||||
|
||||
const fencedMatch =
|
||||
/^```(?:svg|xml)?\s*([\s\S]*?)\s*```$/i.exec(trimmed) ??
|
||||
/```(?:svg|xml)?\s*([\s\S]*?)\s*```/i.exec(trimmed);
|
||||
const unfenced = fencedMatch?.[1]?.trim() ?? trimmed;
|
||||
|
||||
const svgMatch = unfenced.match(/<svg[\s\S]*?<\/svg>/i);
|
||||
return svgMatch?.[0]?.trim() ?? null;
|
||||
}
|
||||
|
||||
export function sanitizeSvgDocument(svgMarkup: string): SvgSanitizationResult {
|
||||
const trimmed = svgMarkup.trim();
|
||||
|
||||
if (!trimmed) {
|
||||
return {
|
||||
error: "The model returned an empty SVG response.",
|
||||
ok: false,
|
||||
};
|
||||
}
|
||||
|
||||
if (trimmed.length > LAB2_MAX_SVG_LENGTH) {
|
||||
return {
|
||||
error: "The SVG response is too large to render safely.",
|
||||
ok: false,
|
||||
};
|
||||
}
|
||||
|
||||
const parseErrors: string[] = [];
|
||||
const parser = new DOMParser({
|
||||
onError: (level, message) => {
|
||||
if (level !== "warning") {
|
||||
parseErrors.push(String(message));
|
||||
}
|
||||
},
|
||||
});
|
||||
|
||||
let document;
|
||||
|
||||
try {
|
||||
document = parser.parseFromString(trimmed, "image/svg+xml");
|
||||
} catch {
|
||||
return {
|
||||
error: "The model returned malformed SVG markup.",
|
||||
ok: false,
|
||||
};
|
||||
}
|
||||
if (parseErrors.length > 0) {
|
||||
return {
|
||||
error: "The model returned malformed SVG markup.",
|
||||
ok: false,
|
||||
};
|
||||
}
|
||||
|
||||
const root = document.documentElement;
|
||||
if (!root || root.tagName !== "svg") {
|
||||
return {
|
||||
error: "The response did not contain a standalone SVG document.",
|
||||
ok: false,
|
||||
};
|
||||
}
|
||||
|
||||
const validationError = validateSvgNode(root);
|
||||
if (validationError) {
|
||||
return {
|
||||
error: validationError,
|
||||
ok: false,
|
||||
};
|
||||
}
|
||||
|
||||
root.setAttribute("xmlns", SVG_NAMESPACE);
|
||||
if (!root.getAttribute("viewBox")) {
|
||||
root.setAttribute("viewBox", "0 0 512 512");
|
||||
}
|
||||
if (!root.getAttribute("width")) {
|
||||
root.setAttribute("width", "512");
|
||||
}
|
||||
if (!root.getAttribute("height")) {
|
||||
root.setAttribute("height", "512");
|
||||
}
|
||||
|
||||
const serialized = new XMLSerializer().serializeToString(root).trim();
|
||||
if (!serialized.startsWith("<svg") || serialized.length > LAB2_MAX_SVG_LENGTH) {
|
||||
return {
|
||||
error: "The sanitized SVG could not be rendered safely.",
|
||||
ok: false,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
ok: true,
|
||||
svg: serialized,
|
||||
};
|
||||
}
|
||||
|
||||
export function svgToDataUrl(svg: string) {
|
||||
return `data:image/svg+xml;charset=utf-8,${encodeURIComponent(svg)}`;
|
||||
}
|
||||
|
||||
export function getActiveModel(selectedModel: string, customModel: string) {
|
||||
if (selectedModel === LAB2_CUSTOM_MODEL_VALUE) {
|
||||
return customModel.trim();
|
||||
}
|
||||
|
||||
return selectedModel.trim();
|
||||
}
|
||||
|
||||
export function getDefaultObjective5Settings(): Objective5StoredSettings {
|
||||
return {
|
||||
apiKey: "",
|
||||
customModel: "",
|
||||
endpoint: LAB2_DEFAULT_ENDPOINT,
|
||||
selectedModel: LAB2_MODEL_OPTIONS[0].value,
|
||||
};
|
||||
}
|
||||
|
||||
export function getDefaultObjective5ModelOptions(): Objective5ModelOption[] {
|
||||
return [...LAB2_MODEL_OPTIONS];
|
||||
}
|
||||
|
||||
function validateSvgNode(node: XmlDomElement): string | null {
|
||||
if (!allowedSvgElements.has(node.tagName)) {
|
||||
return `The SVG used a blocked element: <${node.tagName}>.`;
|
||||
}
|
||||
|
||||
const attributes = Array.from(node.attributes ?? []);
|
||||
for (const attribute of attributes) {
|
||||
const validationError = validateSvgAttribute(attribute.name, attribute.value);
|
||||
if (validationError) {
|
||||
return validationError;
|
||||
}
|
||||
}
|
||||
|
||||
const children = Array.from(node.childNodes);
|
||||
for (const child of children) {
|
||||
if (child.nodeType === child.ELEMENT_NODE) {
|
||||
const childError = validateSvgNode(child as XmlDomElement);
|
||||
if (childError) {
|
||||
return childError;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
if (child.nodeType === child.TEXT_NODE) {
|
||||
const textValue = child.nodeValue ?? "";
|
||||
if (textValue.trim().length === 0) {
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!allowsTextChildren(node.tagName)) {
|
||||
return `The SVG contained unexpected text inside <${node.tagName}>.`;
|
||||
}
|
||||
|
||||
if (!textPattern.test(textValue)) {
|
||||
return "The SVG contained unsafe text content.";
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
function allowsTextChildren(tagName: string) {
|
||||
return tagName === "text" || tagName === "tspan" || tagName === "title" || tagName === "desc";
|
||||
}
|
||||
|
||||
function validateSvgAttribute(name: string, value: string) {
|
||||
const normalizedName = name.trim();
|
||||
const normalizedValue = value.trim();
|
||||
|
||||
if (!allowedSvgAttributes.has(normalizedName)) {
|
||||
return `The SVG used a blocked attribute: ${normalizedName}.`;
|
||||
}
|
||||
|
||||
if (!normalizedValue) {
|
||||
return null;
|
||||
}
|
||||
|
||||
if (/^on/i.test(normalizedName)) {
|
||||
return "The SVG contained blocked event handlers.";
|
||||
}
|
||||
|
||||
if (normalizedName === "xmlns") {
|
||||
return normalizedValue === SVG_NAMESPACE
|
||||
? null
|
||||
: "The SVG used an unexpected XML namespace.";
|
||||
}
|
||||
|
||||
if (
|
||||
/(?:javascript:|data:|https?:|ftp:|file:)/i.test(normalizedValue) ||
|
||||
normalizedValue.includes("<") ||
|
||||
normalizedValue.includes(">")
|
||||
) {
|
||||
return `The SVG used an unsafe value for ${normalizedName}.`;
|
||||
}
|
||||
|
||||
if (normalizedName in enumValues) {
|
||||
return enumValues[normalizedName as keyof typeof enumValues].has(
|
||||
normalizedValue,
|
||||
)
|
||||
? null
|
||||
: `The SVG used an unsupported value for ${normalizedName}.`;
|
||||
}
|
||||
|
||||
switch (normalizedName) {
|
||||
case "cx":
|
||||
case "cy":
|
||||
case "fill-opacity":
|
||||
case "font-size":
|
||||
case "height":
|
||||
case "offset":
|
||||
case "opacity":
|
||||
case "r":
|
||||
case "rx":
|
||||
case "ry":
|
||||
case "stop-opacity":
|
||||
case "stroke-opacity":
|
||||
case "stroke-width":
|
||||
case "width":
|
||||
case "x":
|
||||
case "x1":
|
||||
case "x2":
|
||||
case "y":
|
||||
case "y1":
|
||||
case "y2":
|
||||
return numberPattern.test(normalizedValue)
|
||||
? null
|
||||
: `The SVG used an invalid numeric value for ${normalizedName}.`;
|
||||
case "viewBox":
|
||||
return viewBoxPattern.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an invalid viewBox.";
|
||||
case "d":
|
||||
return pathPattern.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an invalid path definition.";
|
||||
case "points":
|
||||
return pointsPattern.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used invalid polygon or polyline points.";
|
||||
case "transform":
|
||||
case "gradientTransform":
|
||||
return transformPattern.test(normalizedValue)
|
||||
? null
|
||||
: `The SVG used an invalid ${normalizedName}.`;
|
||||
case "fill":
|
||||
case "stroke":
|
||||
case "stop-color":
|
||||
return validateSvgPaintValue(normalizedValue)
|
||||
? null
|
||||
: `The SVG used an unsupported paint value for ${normalizedName}.`;
|
||||
case "font-family":
|
||||
return fontFamilyPattern.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an unsupported font family.";
|
||||
case "font-weight":
|
||||
return /^(?:normal|bold|bolder|lighter|[1-9]00)$/i.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an unsupported font weight.";
|
||||
case "id":
|
||||
return idPattern.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an invalid id attribute.";
|
||||
case "preserveAspectRatio":
|
||||
return /^[A-Za-z0-9\s]+$/.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an invalid preserveAspectRatio value.";
|
||||
case "role":
|
||||
return /^[a-zA-Z0-9\s-]+$/.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an invalid role attribute.";
|
||||
case "version":
|
||||
return /^1\.1$/.test(normalizedValue) || /^2(?:\.0)?$/.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an unsupported version value.";
|
||||
case "xml:space":
|
||||
return /^(?:default|preserve)$/.test(normalizedValue)
|
||||
? null
|
||||
: "The SVG used an invalid xml:space value.";
|
||||
default:
|
||||
return numberListPattern.test(normalizedValue)
|
||||
? null
|
||||
: `The SVG used an invalid value for ${normalizedName}.`;
|
||||
}
|
||||
}
|
||||
|
||||
function validateSvgPaintValue(value: string) {
|
||||
if (
|
||||
/^(?:none|currentColor|transparent)$/i.test(value) ||
|
||||
/^#[0-9a-f]{3,8}$/i.test(value) ||
|
||||
/^(?:rgb|rgba|hsl|hsla)\([\d\s.,%+-]+\)$/i.test(value)
|
||||
) {
|
||||
return true;
|
||||
}
|
||||
|
||||
if (/^url\(#[-A-Za-z0-9_.]+\)$/i.test(value)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
return /^[a-zA-Z]+$/.test(value);
|
||||
}
|
||||
@@ -29,6 +29,59 @@ function hasSupportedExtension(fileName: string) {
|
||||
return CONTENT_EXTENSIONS.some((ext) => fileName.toLowerCase().endsWith(ext));
|
||||
}
|
||||
|
||||
function extractFirstMarkdownHeading(markdown: string) {
|
||||
const match = markdown.match(/^#\s+(.+)$/m);
|
||||
return match?.[1]?.trim() ?? null;
|
||||
}
|
||||
|
||||
function extractLabNumber(...candidates: Array<string | null | undefined>) {
|
||||
for (const candidate of candidates) {
|
||||
if (!candidate) continue;
|
||||
const match = candidate.match(/\blab[-\s]+(\d+)\b/i);
|
||||
if (match?.[1]) {
|
||||
return Number.parseInt(match[1], 10);
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
function getLabMetadata(fileName: string) {
|
||||
const filePath = path.join(CONTENT_DIR, fileName);
|
||||
const source = fs.readFileSync(filePath, "utf8");
|
||||
const { content, data } = matter(source);
|
||||
const slug = getSlugFromFileName(fileName);
|
||||
const headingTitle = extractFirstMarkdownHeading(content);
|
||||
|
||||
const title =
|
||||
typeof data.title === "string" && data.title.trim().length > 0
|
||||
? data.title.trim()
|
||||
: (headingTitle ?? toTitleCaseFromSlug(slug));
|
||||
|
||||
const description =
|
||||
typeof data.description === "string" && data.description.trim().length > 0
|
||||
? data.description.trim()
|
||||
: "";
|
||||
|
||||
const explicitOrder =
|
||||
typeof data.order === "number" && Number.isFinite(data.order)
|
||||
? data.order
|
||||
: null;
|
||||
|
||||
const labNumber = extractLabNumber(title, slug);
|
||||
const order = explicitOrder ?? labNumber ?? Number.MAX_SAFE_INTEGER;
|
||||
|
||||
return {
|
||||
content,
|
||||
data,
|
||||
description,
|
||||
fileName,
|
||||
order,
|
||||
slug,
|
||||
title,
|
||||
};
|
||||
}
|
||||
|
||||
export function listLabFiles() {
|
||||
if (!fs.existsSync(CONTENT_DIR)) {
|
||||
return [];
|
||||
@@ -37,26 +90,20 @@ export function listLabFiles() {
|
||||
return fs
|
||||
.readdirSync(CONTENT_DIR)
|
||||
.filter((fileName) => hasSupportedExtension(fileName))
|
||||
.sort((a, b) => a.localeCompare(b));
|
||||
.sort((a, b) => a.localeCompare(b, undefined, { numeric: true }));
|
||||
}
|
||||
|
||||
export function getLabSummaries() {
|
||||
return listLabFiles().map((fileName) => {
|
||||
const filePath = path.join(CONTENT_DIR, fileName);
|
||||
const source = fs.readFileSync(filePath, "utf8");
|
||||
const { data } = matter(source);
|
||||
const slug = getSlugFromFileName(fileName);
|
||||
|
||||
const title =
|
||||
typeof data.title === "string" && data.title.trim().length > 0
|
||||
? data.title
|
||||
: toTitleCaseFromSlug(slug);
|
||||
|
||||
const description =
|
||||
typeof data.description === "string" && data.description.trim().length > 0
|
||||
? data.description
|
||||
: "";
|
||||
return listLabFiles()
|
||||
.map((fileName) => getLabMetadata(fileName))
|
||||
.sort((a, b) => {
|
||||
if (a.order !== b.order) {
|
||||
return a.order - b.order;
|
||||
}
|
||||
|
||||
return a.title.localeCompare(b.title, undefined, { numeric: true });
|
||||
})
|
||||
.map(({ slug, title, description, fileName }) => {
|
||||
return {
|
||||
slug,
|
||||
title,
|
||||
@@ -75,19 +122,7 @@ export function getLabDocument(slug: string): LabDocument | null {
|
||||
return null;
|
||||
}
|
||||
|
||||
const filePath = path.join(CONTENT_DIR, fileName);
|
||||
const source = fs.readFileSync(filePath, "utf8");
|
||||
const { content, data } = matter(source);
|
||||
|
||||
const title =
|
||||
typeof data.title === "string" && data.title.trim().length > 0
|
||||
? data.title
|
||||
: toTitleCaseFromSlug(slug);
|
||||
|
||||
const description =
|
||||
typeof data.description === "string" && data.description.trim().length > 0
|
||||
? data.description
|
||||
: "";
|
||||
const { content, data, title, description } = getLabMetadata(fileName);
|
||||
|
||||
return {
|
||||
slug,
|
||||
|
||||
@@ -8,31 +8,31 @@ h1 {
|
||||
font-size: 2.25rem;
|
||||
line-height: 2.5rem;
|
||||
margin-bottom: 10px;
|
||||
color: #004E78;
|
||||
color: #004e78;
|
||||
}
|
||||
h2 {
|
||||
font-size: 1.875rem;
|
||||
line-height: 2.25rem;
|
||||
margin-bottom: 10px;
|
||||
color: #004E78;
|
||||
color: #004e78;
|
||||
}
|
||||
h3 {
|
||||
font-size: 1.5rem;
|
||||
line-height: 2rem;
|
||||
margin-bottom: 10px;
|
||||
color: #004E78;
|
||||
color: #004e78;
|
||||
}
|
||||
h4 {
|
||||
font-size: 1.25rem;
|
||||
line-height: 1.75rem;
|
||||
margin-bottom: 10px;
|
||||
color: #004E78;
|
||||
color: #004e78;
|
||||
}
|
||||
h5 {
|
||||
font-size: 1.125rem;
|
||||
line-height: 1.75rem;
|
||||
margin-bottom: 10px;
|
||||
color: #004E78;
|
||||
color: #004e78;
|
||||
}
|
||||
p {
|
||||
font-size: 1rem;
|
||||
@@ -353,7 +353,9 @@ ol {
|
||||
line-height: 1;
|
||||
padding: 0.36rem 0.62rem;
|
||||
cursor: pointer;
|
||||
box-shadow: 0 1px 0 rgba(255, 255, 255, 0.35), 0 1px 2px rgba(61, 36, 1, 0.18);
|
||||
box-shadow:
|
||||
0 1px 0 rgba(255, 255, 255, 0.35),
|
||||
0 1px 2px rgba(61, 36, 1, 0.18);
|
||||
}
|
||||
|
||||
.lab-content pre.lab-prompt-card .lab-copy-button:hover {
|
||||
@@ -402,7 +404,8 @@ ol {
|
||||
}
|
||||
|
||||
.lab-content ul.lab-settings-list .lab-setting-value {
|
||||
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
|
||||
font-family:
|
||||
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
|
||||
"Courier New", monospace;
|
||||
font-size: 0.86rem;
|
||||
font-weight: 600;
|
||||
@@ -474,7 +477,11 @@ ol {
|
||||
margin: 1.8rem 0;
|
||||
padding: 0.4rem 0 0.5rem 1.15rem;
|
||||
border-left: 4px solid #004e78;
|
||||
background: linear-gradient(90deg, rgba(0, 78, 120, 0.08), rgba(0, 78, 120, 0));
|
||||
background: linear-gradient(
|
||||
90deg,
|
||||
rgba(0, 78, 120, 0.08),
|
||||
rgba(0, 78, 120, 0)
|
||||
);
|
||||
}
|
||||
|
||||
.lab-content.objective-style-rail .objective-segment > h2 {
|
||||
@@ -552,17 +559,20 @@ ol {
|
||||
content: none;
|
||||
}
|
||||
|
||||
.lab-content.step-style-pills .lab-step-title[data-step-mode="execute"]::before {
|
||||
.lab-content.step-style-pills
|
||||
.lab-step-title[data-step-mode="execute"]::before {
|
||||
color: #8a4d00;
|
||||
background: #fee7c7;
|
||||
}
|
||||
|
||||
.lab-content.step-style-pills .lab-step-title[data-step-mode="explore"]::before {
|
||||
.lab-content.step-style-pills
|
||||
.lab-step-title[data-step-mode="explore"]::before {
|
||||
color: #0f4970;
|
||||
background: #d9ebfb;
|
||||
}
|
||||
|
||||
.lab-content.step-style-pills .lab-step-title[data-step-mode="checkpoint"]::before {
|
||||
.lab-content.step-style-pills
|
||||
.lab-step-title[data-step-mode="checkpoint"]::before {
|
||||
color: #0e5e35;
|
||||
background: #d9f7e4;
|
||||
}
|
||||
@@ -645,11 +655,13 @@ ol {
|
||||
border-left: 2px dashed #c6d5e3;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-workflow .step-segment[data-step-kind="explanation"] {
|
||||
.lab-content.breakout-style-workflow
|
||||
.step-segment[data-step-kind="explanation"] {
|
||||
border-left-color: #6da9d8;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-workflow .step-segment[data-step-kind="instruction"] {
|
||||
.lab-content.breakout-style-workflow
|
||||
.step-segment[data-step-kind="instruction"] {
|
||||
border-left-color: #de9a2e;
|
||||
}
|
||||
|
||||
@@ -669,11 +681,13 @@ ol {
|
||||
content: "";
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-workflow .step-segment[data-step-kind="instruction"]::before {
|
||||
.lab-content.breakout-style-workflow
|
||||
.step-segment[data-step-kind="instruction"]::before {
|
||||
background: #de9a2e;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-workflow .step-segment[data-step-kind="mixed"]::before {
|
||||
.lab-content.breakout-style-workflow
|
||||
.step-segment[data-step-kind="mixed"]::before {
|
||||
background: #4a95ab;
|
||||
}
|
||||
|
||||
@@ -689,22 +703,26 @@ ol {
|
||||
padding: 0.3rem 0 0.45rem;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-command-pills .step-segment[data-step-kind="instruction"] {
|
||||
.lab-content.breakout-style-command-pills
|
||||
.step-segment[data-step-kind="instruction"] {
|
||||
border-left: 3px solid #f0b45f;
|
||||
padding-left: 0.75rem;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-command-pills .step-segment[data-step-kind="explanation"] {
|
||||
.lab-content.breakout-style-command-pills
|
||||
.step-segment[data-step-kind="explanation"] {
|
||||
border-left: 3px solid #8dc1e7;
|
||||
padding-left: 0.75rem;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-command-pills .step-segment[data-step-kind="mixed"] {
|
||||
.lab-content.breakout-style-command-pills
|
||||
.step-segment[data-step-kind="mixed"] {
|
||||
border-left: 3px solid #6db0bf;
|
||||
padding-left: 0.75rem;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-command-pills .step-segment[data-step-kind]::before {
|
||||
.lab-content.breakout-style-command-pills
|
||||
.step-segment[data-step-kind]::before {
|
||||
color: #4a6477;
|
||||
content: attr(data-step-kind);
|
||||
}
|
||||
@@ -726,15 +744,18 @@ ol {
|
||||
background: #6db0bf;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-instruction-rails .step-segment[data-step-kind="instruction"]::after {
|
||||
.lab-content.breakout-style-instruction-rails
|
||||
.step-segment[data-step-kind="instruction"]::after {
|
||||
background: #f0b45f;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-instruction-rails .step-segment[data-step-kind="explanation"]::after {
|
||||
.lab-content.breakout-style-instruction-rails
|
||||
.step-segment[data-step-kind="explanation"]::after {
|
||||
background: #8dc1e7;
|
||||
}
|
||||
|
||||
.lab-content.breakout-style-instruction-rails .step-segment[data-step-kind]::before {
|
||||
.lab-content.breakout-style-instruction-rails
|
||||
.step-segment[data-step-kind]::before {
|
||||
color: #4a6477;
|
||||
content: attr(data-step-kind);
|
||||
}
|
||||
@@ -815,6 +836,653 @@ ol {
|
||||
line-height: 1.25;
|
||||
}
|
||||
|
||||
.lab-content [data-quantization-explorer] {
|
||||
margin: 1.25rem 0 1.5rem;
|
||||
}
|
||||
|
||||
.lab-content [data-quantization-grid-explorer] {
|
||||
margin: 1.25rem 0 1.5rem;
|
||||
}
|
||||
|
||||
.lab-content [data-objective5-chat] {
|
||||
margin: 1.25rem 0 1.5rem;
|
||||
}
|
||||
|
||||
.quantization-explorer {
|
||||
border: 1px solid #d7e4ef;
|
||||
border-radius: 16px;
|
||||
background: linear-gradient(180deg, #fbfdff, #f4f9fd);
|
||||
padding: 1rem;
|
||||
}
|
||||
|
||||
.quantization-explorer h3 {
|
||||
margin: 0.1rem 0 0;
|
||||
color: #0f3d58;
|
||||
font-size: 1.2rem;
|
||||
}
|
||||
|
||||
.quantization-explorer code {
|
||||
font-family:
|
||||
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
|
||||
"Courier New", monospace;
|
||||
}
|
||||
|
||||
.quantization-explorer__header {
|
||||
margin-bottom: 0.9rem;
|
||||
}
|
||||
|
||||
.quantization-explorer__eyebrow {
|
||||
margin: 0;
|
||||
color: #9a5f00;
|
||||
font-size: 0.72rem;
|
||||
font-weight: 800;
|
||||
letter-spacing: 0.08em;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.quantization-explorer__lede {
|
||||
margin: 0.55rem 0 0;
|
||||
color: #334155;
|
||||
}
|
||||
|
||||
.quantization-explorer__controls {
|
||||
margin-top: 0.95rem;
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-card,
|
||||
.quantization-explorer__focus-card,
|
||||
.quantization-grid-explorer__slider-card,
|
||||
.quantization-grid-explorer__cell {
|
||||
border: 1px solid #dce6ee;
|
||||
border-radius: 14px;
|
||||
background: rgba(255, 255, 255, 0.92);
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-card,
|
||||
.quantization-grid-explorer__slider-card {
|
||||
--slider-thumb-size: 1.1rem;
|
||||
--slider-thumb-offset: calc(var(--slider-thumb-size) / 2);
|
||||
padding: 0.85rem 0.95rem;
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-label,
|
||||
.quantization-grid-explorer__slider-label {
|
||||
display: block;
|
||||
color: #334155;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-card input[type="range"],
|
||||
.quantization-grid-explorer__slider-card input[type="range"] {
|
||||
-webkit-appearance: none;
|
||||
appearance: none;
|
||||
display: block;
|
||||
width: calc(100% - var(--slider-thumb-size));
|
||||
margin-left: var(--slider-thumb-offset);
|
||||
margin-right: var(--slider-thumb-offset);
|
||||
margin-top: 0.75rem;
|
||||
background: transparent;
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-card
|
||||
input[type="range"]::-webkit-slider-runnable-track,
|
||||
.quantization-grid-explorer__slider-card
|
||||
input[type="range"]::-webkit-slider-runnable-track {
|
||||
height: 0.72rem;
|
||||
border-radius: 999px;
|
||||
background: linear-gradient(180deg, #dbe7f2, #d4e1ec);
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-card input[type="range"]::-webkit-slider-thumb,
|
||||
.quantization-grid-explorer__slider-card
|
||||
input[type="range"]::-webkit-slider-thumb {
|
||||
-webkit-appearance: none;
|
||||
appearance: none;
|
||||
width: var(--slider-thumb-size);
|
||||
height: var(--slider-thumb-size);
|
||||
margin-top: calc((0.72rem - var(--slider-thumb-size)) / 2);
|
||||
border: 1px solid #c8d6e3;
|
||||
border-radius: 999px;
|
||||
background: linear-gradient(180deg, #ffffff, #eef3f8);
|
||||
box-shadow: 0 1px 4px rgba(15, 23, 42, 0.18);
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-card input[type="range"]::-moz-range-track,
|
||||
.quantization-grid-explorer__slider-card input[type="range"]::-moz-range-track {
|
||||
height: 0.72rem;
|
||||
border: none;
|
||||
border-radius: 999px;
|
||||
background: linear-gradient(180deg, #dbe7f2, #d4e1ec);
|
||||
}
|
||||
|
||||
.quantization-explorer__slider-card input[type="range"]::-moz-range-thumb,
|
||||
.quantization-grid-explorer__slider-card input[type="range"]::-moz-range-thumb {
|
||||
width: var(--slider-thumb-size);
|
||||
height: var(--slider-thumb-size);
|
||||
border: 1px solid #c8d6e3;
|
||||
border-radius: 999px;
|
||||
background: linear-gradient(180deg, #ffffff, #eef3f8);
|
||||
box-shadow: 0 1px 4px rgba(15, 23, 42, 0.18);
|
||||
}
|
||||
|
||||
.quantization-explorer__tick-row,
|
||||
.quantization-grid-explorer__tick-row {
|
||||
position: relative;
|
||||
height: 1.35rem;
|
||||
margin-top: 0.3rem;
|
||||
width: calc(100% - var(--slider-thumb-size));
|
||||
margin-left: var(--slider-thumb-offset);
|
||||
margin-right: var(--slider-thumb-offset);
|
||||
color: #64748b;
|
||||
font-size: 0.78rem;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.quantization-explorer__tick-row span,
|
||||
.quantization-grid-explorer__tick-row span {
|
||||
position: absolute;
|
||||
top: 0;
|
||||
transform: translateX(-50%);
|
||||
}
|
||||
|
||||
.quantization-explorer__tick-row span:nth-child(1),
|
||||
.quantization-grid-explorer__tick-row span:nth-child(1) {
|
||||
left: 0%;
|
||||
transform: translateX(0);
|
||||
}
|
||||
|
||||
.quantization-explorer__tick-row span:nth-child(2),
|
||||
.quantization-grid-explorer__tick-row span:nth-child(2) {
|
||||
left: 25%;
|
||||
}
|
||||
|
||||
.quantization-explorer__tick-row span:nth-child(3),
|
||||
.quantization-grid-explorer__tick-row span:nth-child(3) {
|
||||
left: 50%;
|
||||
}
|
||||
|
||||
.quantization-explorer__tick-row span:nth-child(4),
|
||||
.quantization-grid-explorer__tick-row span:nth-child(4) {
|
||||
left: 75%;
|
||||
}
|
||||
|
||||
.quantization-explorer__tick-row span:nth-child(5),
|
||||
.quantization-grid-explorer__tick-row span:nth-child(5) {
|
||||
left: 100%;
|
||||
transform: translateX(-100%);
|
||||
}
|
||||
|
||||
.quantization-explorer__focus-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(3, minmax(0, 1fr));
|
||||
gap: 0.75rem;
|
||||
margin-top: 0.95rem;
|
||||
}
|
||||
|
||||
.quantization-explorer__focus-grid--four {
|
||||
grid-template-columns: repeat(4, minmax(0, 1fr));
|
||||
}
|
||||
|
||||
.quantization-explorer__focus-card {
|
||||
padding: 0.85rem;
|
||||
}
|
||||
|
||||
.quantization-explorer__focus-label {
|
||||
display: block;
|
||||
margin-bottom: 0.45rem;
|
||||
color: #64748b;
|
||||
font-size: 0.78rem;
|
||||
font-weight: 700;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.04em;
|
||||
}
|
||||
|
||||
.quantization-explorer__focus-value {
|
||||
color: #0f3d58;
|
||||
font-size: 1rem;
|
||||
}
|
||||
|
||||
.quantization-explorer__helper {
|
||||
margin: 0.95rem 0 0;
|
||||
color: #334155;
|
||||
}
|
||||
|
||||
.quantization-explorer__formula {
|
||||
margin-top: 0.85rem;
|
||||
padding: 0.8rem 0.9rem;
|
||||
border: 1px solid #dce6ee;
|
||||
border-radius: 14px;
|
||||
background: rgba(255, 255, 255, 0.92);
|
||||
}
|
||||
|
||||
.quantization-explorer__formula-label {
|
||||
display: block;
|
||||
margin-bottom: 0.35rem;
|
||||
color: #64748b;
|
||||
font-size: 0.76rem;
|
||||
font-weight: 800;
|
||||
letter-spacing: 0.04em;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.quantization-explorer__formula code {
|
||||
color: #0f3d58;
|
||||
font-size: 0.95rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer {
|
||||
border: 1px solid #d7e4ef;
|
||||
border-radius: 16px;
|
||||
background: linear-gradient(180deg, #fbfdff, #f4f9fd);
|
||||
padding: 1rem;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer h3 {
|
||||
margin: 0.1rem 0 0;
|
||||
color: #0f3d58;
|
||||
font-size: 1.2rem;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer code {
|
||||
font-family:
|
||||
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
|
||||
"Courier New", monospace;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__header {
|
||||
margin-bottom: 0.9rem;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__eyebrow {
|
||||
margin: 0;
|
||||
color: #9a5f00;
|
||||
font-size: 0.72rem;
|
||||
font-weight: 800;
|
||||
letter-spacing: 0.08em;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__lede {
|
||||
margin: 0.55rem 0 0;
|
||||
color: #334155;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__helper {
|
||||
margin: 0.95rem 0 0;
|
||||
color: #334155;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(4, minmax(0, 1fr));
|
||||
gap: 0.75rem;
|
||||
margin-top: 0.95rem;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__cell {
|
||||
padding: 0.8rem;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__cell-label {
|
||||
display: block;
|
||||
margin-bottom: 0.35rem;
|
||||
color: #64748b;
|
||||
font-size: 0.74rem;
|
||||
font-weight: 800;
|
||||
letter-spacing: 0.05em;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__cell-value {
|
||||
display: block;
|
||||
color: #0f3d58;
|
||||
font-size: 1rem;
|
||||
font-weight: 800;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__cell-caption {
|
||||
display: block;
|
||||
margin-top: 0.35rem;
|
||||
color: #64748b;
|
||||
font-size: 0.78rem;
|
||||
}
|
||||
|
||||
.objective5-chat {
|
||||
border: 1px solid #d7e4ef;
|
||||
border-radius: 18px;
|
||||
background:
|
||||
radial-gradient(circle at top right, rgba(251, 191, 36, 0.16), transparent 30%),
|
||||
linear-gradient(180deg, #fbfdff, #f3f8fc);
|
||||
padding: 1rem;
|
||||
}
|
||||
|
||||
.objective5-chat h3 {
|
||||
margin: 0.12rem 0 0;
|
||||
color: #0f3d58;
|
||||
font-size: 1.2rem;
|
||||
}
|
||||
|
||||
.objective5-chat code,
|
||||
.objective5-chat pre,
|
||||
.objective5-chat input,
|
||||
.objective5-chat textarea,
|
||||
.objective5-chat select {
|
||||
font-family:
|
||||
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
|
||||
"Courier New", monospace;
|
||||
}
|
||||
|
||||
.objective5-chat__header {
|
||||
margin-bottom: 0.9rem;
|
||||
}
|
||||
|
||||
.objective5-chat__eyebrow {
|
||||
margin: 0;
|
||||
color: #9a5f00;
|
||||
font-size: 0.72rem;
|
||||
font-weight: 800;
|
||||
letter-spacing: 0.08em;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.objective5-chat__lede {
|
||||
margin: 0.55rem 0 0;
|
||||
color: #334155;
|
||||
}
|
||||
|
||||
.objective5-chat__settings {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(3, minmax(0, 1fr));
|
||||
gap: 0.85rem;
|
||||
}
|
||||
|
||||
.objective5-chat__field {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.42rem;
|
||||
padding: 0.9rem;
|
||||
border: 1px solid #dce6ee;
|
||||
border-radius: 14px;
|
||||
background: rgba(255, 255, 255, 0.94);
|
||||
}
|
||||
|
||||
.objective5-chat__field span {
|
||||
color: #334155;
|
||||
font-size: 0.8rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.objective5-chat__field input,
|
||||
.objective5-chat__field select,
|
||||
.objective5-chat__composer textarea {
|
||||
width: 100%;
|
||||
border: 1px solid #cbd9e5;
|
||||
border-radius: 12px;
|
||||
background: #f8fbfe;
|
||||
color: #0f172a;
|
||||
font-size: 0.95rem;
|
||||
padding: 0.7rem 0.8rem;
|
||||
}
|
||||
|
||||
.objective5-chat__field input:focus,
|
||||
.objective5-chat__field select:focus,
|
||||
.objective5-chat__composer textarea:focus {
|
||||
outline: 2px solid rgba(37, 99, 235, 0.18);
|
||||
outline-offset: 2px;
|
||||
border-color: #8bb4db;
|
||||
}
|
||||
|
||||
.objective5-chat__model-row {
|
||||
display: flex;
|
||||
gap: 0.6rem;
|
||||
}
|
||||
|
||||
.objective5-chat__model-row select {
|
||||
flex: 1 1 auto;
|
||||
}
|
||||
|
||||
.objective5-chat__refresh-button {
|
||||
flex: 0 0 auto;
|
||||
border: 1px solid #cbd9e5;
|
||||
border-radius: 12px;
|
||||
background: linear-gradient(180deg, #ffffff, #eff5fb);
|
||||
color: #18466a;
|
||||
cursor: pointer;
|
||||
font-size: 0.86rem;
|
||||
font-weight: 700;
|
||||
padding: 0.7rem 0.9rem;
|
||||
}
|
||||
|
||||
.objective5-chat__refresh-button:disabled {
|
||||
cursor: wait;
|
||||
opacity: 0.75;
|
||||
}
|
||||
|
||||
.objective5-chat__settings-note {
|
||||
margin: 0.85rem 0 0;
|
||||
color: #526173;
|
||||
font-size: 0.92rem;
|
||||
}
|
||||
|
||||
.objective5-chat__prompt-row {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.6rem;
|
||||
margin-top: 0.95rem;
|
||||
}
|
||||
|
||||
.objective5-chat__prompt-chip {
|
||||
border: 1px solid #d4dfea;
|
||||
border-radius: 999px;
|
||||
background: linear-gradient(180deg, #ffffff, #f3f8fd);
|
||||
color: #18466a;
|
||||
cursor: pointer;
|
||||
font-size: 0.86rem;
|
||||
font-weight: 600;
|
||||
padding: 0.52rem 0.82rem;
|
||||
transition:
|
||||
transform 160ms ease,
|
||||
border-color 160ms ease,
|
||||
box-shadow 160ms ease;
|
||||
}
|
||||
|
||||
.objective5-chat__prompt-chip:hover {
|
||||
transform: translateY(-1px);
|
||||
border-color: #b8cadc;
|
||||
box-shadow: 0 6px 18px rgba(15, 23, 42, 0.07);
|
||||
}
|
||||
|
||||
.objective5-chat__transcript {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.85rem;
|
||||
min-height: 16rem;
|
||||
margin-top: 1rem;
|
||||
padding: 0.3rem;
|
||||
}
|
||||
|
||||
.objective5-chat__empty,
|
||||
.objective5-chat__message {
|
||||
border: 1px solid #dce6ee;
|
||||
border-radius: 16px;
|
||||
background: rgba(255, 255, 255, 0.94);
|
||||
padding: 0.9rem 0.95rem;
|
||||
}
|
||||
|
||||
.objective5-chat__empty strong {
|
||||
display: block;
|
||||
color: #0f3d58;
|
||||
}
|
||||
|
||||
.objective5-chat__empty p {
|
||||
margin: 0.45rem 0 0;
|
||||
color: #516273;
|
||||
}
|
||||
|
||||
.objective5-chat__message--user {
|
||||
background: linear-gradient(180deg, #fff9ee, #fffdf8);
|
||||
border-color: #f0dfbb;
|
||||
}
|
||||
|
||||
.objective5-chat__message-meta {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 0.75rem;
|
||||
margin-bottom: 0.7rem;
|
||||
color: #516273;
|
||||
font-size: 0.8rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.objective5-chat__message-meta code {
|
||||
font-size: 0.76rem;
|
||||
color: #0f4f76;
|
||||
}
|
||||
|
||||
.objective5-chat__message-body {
|
||||
margin: 0;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
color: #0f172a;
|
||||
font-size: 0.93rem;
|
||||
}
|
||||
|
||||
.objective5-chat__message-body code {
|
||||
font-family: inherit;
|
||||
}
|
||||
|
||||
.objective5-chat__svg-block {
|
||||
display: grid;
|
||||
gap: 0.8rem;
|
||||
}
|
||||
|
||||
.objective5-chat__svg-preview {
|
||||
width: 100%;
|
||||
max-width: 26rem;
|
||||
border: 1px solid #d7e4ef;
|
||||
border-radius: 14px;
|
||||
background:
|
||||
linear-gradient(45deg, #f8fbff 25%, #eef4fa 25%, #eef4fa 50%, #f8fbff 50%, #f8fbff 75%, #eef4fa 75%, #eef4fa);
|
||||
background-size: 18px 18px;
|
||||
padding: 0.7rem;
|
||||
}
|
||||
|
||||
.objective5-chat__svg-source summary {
|
||||
cursor: pointer;
|
||||
color: #18466a;
|
||||
font-size: 0.88rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.objective5-chat__svg-source pre {
|
||||
overflow-x: auto;
|
||||
margin: 0.6rem 0 0;
|
||||
padding: 0.75rem;
|
||||
border-radius: 12px;
|
||||
background: #0f172a;
|
||||
color: #dbeafe;
|
||||
font-size: 0.8rem;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.objective5-chat__metrics {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.5rem;
|
||||
margin-top: 0.8rem;
|
||||
}
|
||||
|
||||
.objective5-chat__metric-pill {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
min-height: 2rem;
|
||||
padding: 0.36rem 0.7rem;
|
||||
border: 1px solid #d7e4ef;
|
||||
border-radius: 999px;
|
||||
background: linear-gradient(180deg, #f8fbfe, #eef5fb);
|
||||
color: #18466a;
|
||||
font-size: 0.8rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.objective5-chat__message-warning,
|
||||
.objective5-chat__error {
|
||||
margin: 0.75rem 0 0;
|
||||
color: #8a3b12;
|
||||
font-size: 0.88rem;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.objective5-chat__error--inline {
|
||||
margin-top: 0.55rem;
|
||||
}
|
||||
|
||||
.objective5-chat__composer {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.55rem;
|
||||
margin-top: 1rem;
|
||||
}
|
||||
|
||||
.objective5-chat__composer-label {
|
||||
color: #334155;
|
||||
font-size: 0.86rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.objective5-chat__composer textarea {
|
||||
min-height: 8.5rem;
|
||||
resize: vertical;
|
||||
}
|
||||
|
||||
.objective5-chat__composer-actions {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 1rem;
|
||||
margin-top: 0.2rem;
|
||||
}
|
||||
|
||||
.objective5-chat__composer-state {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.18rem;
|
||||
}
|
||||
|
||||
.objective5-chat__composer-state span {
|
||||
color: #64748b;
|
||||
font-size: 0.78rem;
|
||||
font-weight: 700;
|
||||
letter-spacing: 0.03em;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.objective5-chat__composer-state strong {
|
||||
color: #0f3d58;
|
||||
font-size: 0.92rem;
|
||||
}
|
||||
|
||||
.objective5-chat__composer button {
|
||||
border: 1px solid #c3d4e4;
|
||||
border-radius: 999px;
|
||||
background: linear-gradient(180deg, #0f4f76, #123c58);
|
||||
color: #ffffff;
|
||||
cursor: pointer;
|
||||
font-size: 0.92rem;
|
||||
font-weight: 700;
|
||||
padding: 0.72rem 1.05rem;
|
||||
}
|
||||
|
||||
.objective5-chat__composer button:disabled {
|
||||
cursor: wait;
|
||||
opacity: 0.72;
|
||||
}
|
||||
|
||||
@media (max-width: 640px) {
|
||||
.lab-content.objective-style-cards .objective-segment {
|
||||
padding: 0.9rem 1rem 1rem;
|
||||
@@ -857,4 +1525,28 @@ ol {
|
||||
.lab-content ul.concept-pill-list > li {
|
||||
border-radius: 16px;
|
||||
}
|
||||
|
||||
.quantization-explorer__controls,
|
||||
.quantization-explorer__focus-grid,
|
||||
.quantization-explorer__focus-grid--four {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
|
||||
.quantization-grid-explorer__grid {
|
||||
grid-template-columns: repeat(2, minmax(0, 1fr));
|
||||
}
|
||||
|
||||
.objective5-chat__settings {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
|
||||
.objective5-chat__model-row {
|
||||
flex-direction: column;
|
||||
}
|
||||
|
||||
.objective5-chat__composer-actions,
|
||||
.objective5-chat__message-meta {
|
||||
align-items: flex-start;
|
||||
flex-direction: column;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,17 @@
|
||||
import react from "@vitejs/plugin-react";
|
||||
import path from "path";
|
||||
import { defineConfig } from "vitest/config";
|
||||
|
||||
export default defineConfig({
|
||||
plugins: [react()],
|
||||
resolve: {
|
||||
alias: {
|
||||
"~": path.resolve(__dirname, "src"),
|
||||
},
|
||||
},
|
||||
test: {
|
||||
environment: "jsdom",
|
||||
globals: true,
|
||||
setupFiles: ["./vitest.setup.ts"],
|
||||
},
|
||||
});
|
||||
@@ -0,0 +1,31 @@
|
||||
import "@testing-library/jest-dom/vitest";
|
||||
|
||||
function createStorageMock() {
|
||||
const store = new Map<string, string>();
|
||||
|
||||
return {
|
||||
clear() {
|
||||
store.clear();
|
||||
},
|
||||
getItem(key: string) {
|
||||
return store.get(key) ?? null;
|
||||
},
|
||||
key(index: number) {
|
||||
return Array.from(store.keys())[index] ?? null;
|
||||
},
|
||||
removeItem(key: string) {
|
||||
store.delete(key);
|
||||
},
|
||||
setItem(key: string, value: string) {
|
||||
store.set(key, value);
|
||||
},
|
||||
get length() {
|
||||
return store.size;
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
Object.defineProperty(window, "localStorage", {
|
||||
configurable: true,
|
||||
value: createStorageMock(),
|
||||
});
|
||||