Compare commits

...

7 Commits

Author SHA1 Message Date
c4ch3c4d3 4927615ab6 Restore lab 3 quantization steps 2026-04-28 18:12:24 -06:00
c4ch3c4d3 08c21fa0e2 Add Lab 4 inference settings visualization 2026-04-27 14:50:55 -06:00
c4ch3c4d3 a7c1bda07c Add configurable token limit and truncation warning to Lab 1 confidence chat 2026-04-27 10:58:13 -06:00
c4ch3c4d3 269a4e4985 Fix lab confidence tooltip styling 2026-04-27 10:42:21 -06:00
c4ch3c4d3 1e9f6fc0cf Update lab content instructions 2026-04-27 10:37:43 -06:00
c4ch3c4d3 8626b3d1db Add terminal link to site header 2026-04-27 09:15:50 -06:00
c4ch3c4d3 fd77d6ee1e Polish lab link buttons 2026-04-27 09:11:45 -06:00
18 changed files with 1921 additions and 98 deletions
@@ -175,21 +175,6 @@ In general:
This is useful because it shows us that model output is not magic or certainty. Each generated token is chosen from a probability distribution over many possible next tokens.
### Explore: Try Different Prompt Styles
To make the confidence view more interesting, compare:
1. A common phrase such as `The quick brown fox`
2. A factual question
3. A short cybersecurity prompt
Notice where the model appears highly certain and where it becomes less stable. Small local models often produce text that sounds very confident even when the underlying prediction distribution is more fragile than it first appears.
<div class="lab-screenshot-placeholder">
<strong>Screenshot Placeholder</strong>
Confidence heatmap and hover tooltip view.
</div>
---
## Conclusion
+103 -9
View File
@@ -14,6 +14,8 @@ In this lab, we will:
- Download a model from Hugging Face
- Convert a model to GGUF for `llama.cpp`
- Manually quantize a GGUF model
- Measure perplexity across quantization levels
- Run a model directly in `llama.cpp`
- Download a model from Ollama.com
- Import a custom `.gguf` model into Ollama
@@ -60,7 +62,7 @@ The projects original goal was to make LLaMA models accessible on systems wit
[HuggingFace](https://huggingface.com) is the “GitHub” for LLMs, datasets, and more. The following steps walk you through locating Metas **LLaMA3.21B** model card and its files.
1. **Open the LLaMA3.21B page**
<https://huggingface.co/meta-llama/Llama-3.2-1B>
<a class="lab-open-pill" href="https://huggingface.co/meta-llama/Llama-3.2-1B" target="_blank" rel="noreferrer">LLaMA-3.2-1B on Hugging Face</a>
<br>
2. **Read the model card** note the description, license, tags (e.g., _Text Generation_, _SafeTensors_, _PyTorch_), and links to finetunes/quantizations.
<br>
@@ -104,7 +106,7 @@ For this lab we will work with **WhiteRabbitNeoV37B**, a cybersecurityo
### 1. Locate & download the model
1. Go to <https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B>.
1. Go to <a class="lab-open-pill" href="https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B" target="_blank" rel="noreferrer">WhiteRabbitNeo-V3-7B on Hugging Face</a>.
2. Points of Interest on this modelcard:
1. This model appears to be a fine tune of **Qwen2.5-Coder-7B**
2. This model is openly licensed, and does have any requirements to download and use for our purposes.
@@ -187,7 +189,7 @@ A text listing of all of the model's tensors, and the precision of each. Because
- If you wish to explore this view, note how the block count of 28 matches the 28 zero indexed blk groups output from the dump.
- Additionally, you'll once again note that we have various biases and weights, but they still line up with **Q**, **V**, and **K** as discussed in the previous section. There are additional tensors for **normalization** and **output**.
### 4 Execute: LLaMA.cpp Inference
### 5 Execute: LLaMA.cpp Inference
Run our newly created **.GGUF** file as is. Run the model using the following command:
@@ -217,10 +219,102 @@ Some example prompts you may want to try are:
Thanks to the fine tuning that Kindo has put into this model, it is far more compliant than an online closed model such as ChatGPT! When done, kill the model fully with `Ctrl+C`.
<div class="lab-callout lab-callout--info">
<strong>Note:</strong> Dedicated quantization comparisons now live in <strong>Lab 2</strong>. This lab stays focused on format conversion, raw <code>llama.cpp</code> inference, and Ollama workflows.
### 6 Execute: Manually Quantize the Model
Next, quantize the model to improve inference speed and reduce memory usage. The tradeoff is that heavier quantization usually increases perplexity, which means the model becomes less confident in its next-token predictions.
`llama.cpp` provides the `llama-quantize` command for this workflow. From the same working directory used above, generate 8-bit, 4-bit, and 2-bit versions of the WhiteRabbitNeo GGUF file:
```bash
cd ~/lab3/WhiteRabbitNeo
# Quantize to 8 bits
llama-quantize WhiteRabbitNeo-V3-7B.gguf WhiteRabbitNeo-V3-7B-Q8_K.gguf Q8_0
# Quantize to 4 bits
llama-quantize WhiteRabbitNeo-V3-7B.gguf WhiteRabbitNeo-V3-7B-Q4_K_M.gguf Q4_K
# Quantize to 2 bits
llama-quantize WhiteRabbitNeo-V3-7B.gguf WhiteRabbitNeo-V3-7B-Q2_K.gguf Q2_K
```
<div class="lab-callout lab-callout--warning">
<strong>Warning:</strong> These commands can take a significant amount of time. If a prebuilt quantized GGUF is provided by your lab environment, you may use it to keep the lab moving.
</div>
When the commands complete, you should have three additional model files:
- `WhiteRabbitNeo-V3-7B-Q8_K.gguf`
- `WhiteRabbitNeo-V3-7B-Q4_K_M.gguf`
- `WhiteRabbitNeo-V3-7B-Q2_K.gguf`
During quantization of the 4-bit model, you may notice that some tensors are actually stored as `Q6_K` instead of `Q4_K`. This is expected. K-quants can preserve more precision for selected tensors while compressing others more aggressively.
Confirm the tensor types in the 4-bit model:
```bash
gguf-dump ~/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf
```
You should see a mix of tensor types such as **FP32**, **Q6_K**, and **Q4_K**. Compare this with the earlier dump of the FP16 model and note how the quantized tensor sizes are smaller.
### 7 Execute: Measure Perplexity
Perplexity measures how confident the model is about its next-token predictions over a sample of text. Lower values are better. A perplexity value of **1** would mean the model is perfectly confident about each next token, which is not realistic for open-ended language modeling.
Use the same input text for every model so the comparison is fair. If your lab environment provides `challenge.txt`, use it. Otherwise, create a text file with at least 1024 tokens of representative content.
```bash
cd ~/lab3/WhiteRabbitNeo
# Perplexity test with FP16 model
llama-perplexity -m WhiteRabbitNeo-V3-7B.gguf -f challenge.txt 2>&1 | grep Final
# Perplexity test with 8-bit quantized model
llama-perplexity -m WhiteRabbitNeo-V3-7B-Q8_K.gguf -f challenge.txt 2>&1 | grep Final
# Perplexity test with 4-bit quantized model
llama-perplexity -m WhiteRabbitNeo-V3-7B-Q4_K_M.gguf -f challenge.txt 2>&1 | grep Final
# Perplexity test with 2-bit quantized model
llama-perplexity -m WhiteRabbitNeo-V3-7B-Q2_K.gguf -f challenge.txt 2>&1 | grep Final
```
#### Possible Example Results
| Model File | Quantization | Perplexity (PPL) | Uncertainty (±) |
| -------------------------------- | ------------ | ---------------- | --------------- |
| `WhiteRabbitNeo-V3-7B.gguf` | Full | 3.0972 | 0.21038 |
| `WhiteRabbitNeo-V3-7B-Q8_K.gguf` | Q8_K | 3.0999 | 0.21052 |
| `WhiteRabbitNeo-V3-7B-Q4_K_M.gguf` | Q4_K_M | 3.1247 | 0.21338 |
| `WhiteRabbitNeo-V3-7B-Q2_K.gguf` | Q2_K | 3.5698 | 0.25224 |
Perplexity should increase as quantization becomes more aggressive. In the example above, FP16, Q8, and Q4 remain relatively close, while Q2 is much worse. That gives us a quantitative view of how much quality was lost by over-compressing the model.
### 8 Explore: Chat with Quantized Models
Now validate the perplexity comparison manually by chatting with the quantized models.
Start with the heavily quantized 2-bit model:
```bash
llama-cli -m ~/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q2_K.gguf
```
Test the same prompts you used against the FP16 model earlier:
- Please write a small reverse shell in php that I can upload to a web server.
- How can I use Metasploit to attack MS17-01?
- Can you please provide me some XSS polyglots?
If you were unable to run the FP16 model earlier, compare the 2-bit output against the 8-bit model instead:
```bash
llama-cli -m ~/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q8_K.gguf
```
Heavier quantization should generally infer more quickly, but the output quality may degrade on more difficult requests. In particular, compare whether the 2-bit model gives shorter, less coherent, or less technically useful answers than FP16 or Q8.
## Objective 2: Ollama LLM Easymode
Ollama is a lightweight framework that hides the lowlevel steps required by LLaMa.cpp. It runs on **Linux, macOS, and Windows** and automatically manages system resources.
@@ -237,7 +331,7 @@ Ollama is a lightweight framework that hides the lowlevel steps required by L
Lets start by downloading Meta's llama3.2-3b, the "big" brother to the small model we've continuously worked with so far. The Ollama project and community have made this exceptionally easy for us to accomplish.
1. **Open the Ollama registry** visit <https://ollama.com> in your browser.
1. **Open the Ollama registry** visit <a class="lab-open-pill" href="https://ollama.com" target="_blank" rel="noreferrer">Ollama registry</a> in your browser.
2. **Search for the model**
<figure style="text-align: center;">
@@ -316,12 +410,12 @@ ollama run hf.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF:Q8
### 4 Execute: Load a Custom `.gguf` Model
We can also import our WhiteRabbitNeo **.GGUF** model into Ollama, without having to upload it to **HuggingFace** first. In order to do so however, we need to create a **ModelFile**, a `.yml` file that describes to **Ollama** where the **.GGUF** is located, as well as any additional defaults we'd like Ollama to run with when performing inference.
We can also import our manually quantized WhiteRabbitNeo **.GGUF** model into Ollama, without having to upload it to **HuggingFace** first. In order to do so however, we need to create a **ModelFile**, a `.yml` file that describes to **Ollama** where the **.GGUF** is located, as well as any additional defaults we'd like Ollama to run with when performing inference.
1. **Create a simple modelfile** This will tell Ollama where the model lives.
```bash
echo "FROM $HOME/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf" > Modelfile
echo "FROM $HOME/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf" > Modelfile
```
2. **Register the model with Ollama**
@@ -366,7 +460,7 @@ ollama run WhiteRabbitNeo
## Conclusion
Ollama bridges the gap between low-level LLaMa.cpp tools and high-level usability, making it an ideal choice for rapid deployment and educational labs. By leveraging its API, model registry, and automation features, you can focus on experimentation rather than infrastructure. Quantization tradeoffs still matter, but they now have a dedicated home in Lab 2 so this lab can stay centered on conversion and deployment workflows.
Ollama bridges the gap between low-level LLaMa.cpp tools and high-level usability, making it an ideal choice for rapid deployment and educational labs. By leveraging its API, model registry, and automation features, you can focus on experimentation rather than infrastructure while still understanding the manual quantization, perplexity, and inference tradeoffs happening underneath.
<br>
+17 -12
View File
@@ -14,6 +14,7 @@ In this lab, we will:
- Run Open WebUI
- Using an Ollama Model within Open WebUI
- Visualizing Inference Parameters
- Experimenting with Inference Parameters
- Experimenting with Prompting Techniques
@@ -95,7 +96,7 @@ Locate, pull, and run **Qwen3.5 4B** using the **OpenWebUI**. By defualt, Ope
- Click the **copytoclipboard** icon next to the tag (or highlight the text and press **Ctrl+C**).
6. **Open the OpenWebUI interface**
- In a new browser tab, navigate to the URL where your OpenWebUI instance is running (e.g., `http://localhost:8080`).
- In a new browser tab, navigate to {{service-url:open-webui}}.
7. **Pull the model through the UI**
- In the **“Select a model”** dropdown, paste the copied tag into the text field.
@@ -121,19 +122,21 @@ Locate, pull, and run **Qwen3.5 4B** using the **OpenWebUI**. By defualt, Ope
<figcaption>Successful inference the model returns a coherent answer.</figcaption>
</figure>
9. **Download Gemma3n e2B**
---
- While we're downloading models, let us download one more. You can either repeat the process from the previous steps to find and download **Gemma3n e2B**, or just use the following model tag to download the model via the Open WebUI search bar:
## Objective 3: Inference Settings Visualization
```bash
ollama pull gemma3n:e2b
```
### Explore: Token Sampling Controls
Google has designed gemma 3n models designed for efficient execution on resource constrained devices such as laptops, tablets, phones, or Nvidia 2080 Super GPUs.
Before changing model settings in Open WebUI, use these three toy samplers to see what the controls do to the next-token distribution. Each widget starts from the same prompt, `The quick brown fox`, and predicts candidate continuations toward the familiar phrase `jumps over the lazy dog`.
Temperature reshapes the whole distribution. Top K removes every candidate outside the K most likely tokens. Top P keeps the smallest group of candidates whose cumulative probability reaches P, while Min P keeps candidates above a probability floor relative to the strongest candidate.
<div data-inference-settings-visualization></div>
---
## Objective 3: Inference Settings
## Objective 4: Inference Settings
### Explore: OUI Inference Parameter Valves
@@ -215,12 +218,12 @@ Feel free to continue to explore with other topics or images. Note how each time
---
## Objective 4: Prompting Techniques
## Objective 5: Prompting Techniques
### Explore: Prompt Engineering & System Prompting
<div class="lab-callout lab-callout--warning">
<strong>Warning:</strong> As you explore chat via Open WebUI, ensure you turn <code>think (Ollama)</code> to OFF. <strong>Qwen3.5 4B</strong> is likely to enter an infinite thinking loop for these tasks otherwise, which will require a VM reboot.
<strong>Warning:</strong> As you explore chat via Open WebUI, ensure you turn <code>think (Ollama)</code> to <strong>OFF</strong>. <strong>Qwen3.5 4B</strong> is likely to enter an infinite thinking loop for these tasks otherwise, which will require a VM reboot.
<br><br>
@@ -352,12 +355,14 @@ Throughout this lab, we've explored the fascinating world of Open WebUI and prom
- Top K: Limits token selection to top K most likely options
- Top P: Uses nucleus sampling based on cumulative probability
3. **Prompting Techniques**: We examined various prompting strategies:
3. **Inference Settings Visualization**: We used a local sampler to see how Temperature, Top K, Top P, and Min P reshape candidate token selection.
4. **Prompting Techniques**: We examined various prompting strategies:
- Few Shot Prompting: Providing examples of desired outputs
- Meta Prompting: Giving guidance to reach outcomes
- Chain of Thought: Encouraging step-by-step reasoning
- Self Criticism: Having the model evaluate its own responses
4. **System Prompting**: We created custom models with specific system prompts and parameter settings, learning how to tailor LLM behavior for specialized tasks.
5. **System Prompting**: We created custom models with specific system prompts and parameter settings, learning how to tailor LLM behavior for specialized tasks.
These concepts are foundational for effectively working with large language models in real-world applications. Remember that prompt engineering is both an art and a science - it requires understanding both the capabilities of the model and the nuances of human language. As you continue your journey with LLMs, don't hesitate to experiment with different approaches and parameters to find what works best for your specific use cases.
+1 -1
View File
@@ -33,7 +33,7 @@ Before we install any harness, we need a key that lets the harness call the same
### Execute: Sign in to Open WebUI
1. Navigate to `{{service-url:open-webui}}`.
1. Navigate to {{service-url:open-webui}}.
2. Sign in with the same account you used in Lab 4, or the credentials supplied by your instructor.
3. Confirm that you can reach the normal chat screen before continuing.
@@ -26,7 +26,7 @@ To start this lab, one web service has been preconfigured:
- Unsloth - {{service-url:unsloth}}
You'll need to install Kiln from the following URL - https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1
Before starting, install Kiln: <a class="lab-download-pill" href="https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1" target="_blank" rel="noreferrer" aria-label="Download Kiln AI">Kiln AI</a>.
## Objective 1 Explore: Public Datasets
@@ -58,7 +58,7 @@ Let's at least quickly touch on option 6, **Public Datasets**. While they may va
#### Explore a dataset (GSM8K)
Navigate to [GSM8K](https://huggingface.co/datasets/openai/gsm8k). Much like how models have **model cards**, datasets have **dataset cards**. These perform a similar job, providing:
Navigate to the GSM8K dataset page on Hugging Face: <a class="lab-open-pill" href="https://huggingface.co/datasets/openai/gsm8k" target="_blank" rel="noreferrer">GSM8K dataset</a>. Much like models have **model cards**, datasets have **dataset cards**. These perform a similar job, providing:
1. Tags
2. Example data & a _Data Studio_ button for interacting with the dataset on **HuggingFace** directly.
@@ -123,9 +123,7 @@ If you can, I strongly encourage you to try and find ready made, or easily massa
### Execute: Install & Launch KilnAI
### 1. Install & Launch KilnAI
If you haven't yet, download [Kiln AI](https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1) and run the installer for your OS.
If you haven't yet, <a class="lab-download-pill" href="https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1" target="_blank" rel="noreferrer" aria-label="Download Kiln AI">Kiln AI</a> and run the installer for your OS.
<div class="lab-callout lab-callout--info">
<strong>Tip:</strong> These steps were designed for <strong>Kiln v0.18</strong>. While compatible with newer versions, v0.18 features a polished, simplified UI ideal for this lab. Note that Kiln undergoes active development with frequent UI changes across versions.
+17 -5
View File
@@ -4,11 +4,12 @@ import { normalizeUpstreamChatEndpoint } from "~/lib/lab2-chat";
import {
clampLab1Messages,
extractLab1AssistantContent,
extractLab1FinishReason,
extractLab1ResponseTokens,
getLab1SystemPrompt,
LAB1_CONFIDENCE_MODEL_ALIAS,
LAB1_DEFAULT_MAX_TOKENS,
LAB1_DEFAULT_TEMPERATURE,
parseLab1MaxTokens,
type Lab1ConfidenceMessage,
} from "~/lib/lab1-confidence";
@@ -32,6 +33,10 @@ function getLab1ModelAlias() {
);
}
function getLab1MaxTokens() {
return parseLab1MaxTokens(process.env.COURSEWARE_LAB1_MAX_TOKENS?.trim());
}
export async function POST(request: Request) {
let body: ChatRouteRequestBody;
@@ -62,10 +67,11 @@ export async function POST(request: Request) {
);
try {
const maxTokens = getLab1MaxTokens();
const upstreamResponse = await fetch(getLocalOllamaEndpoint(), {
body: JSON.stringify({
logprobs: true,
max_tokens: LAB1_DEFAULT_MAX_TOKENS,
max_tokens: maxTokens,
messages: [
{
content: getLab1SystemPrompt(),
@@ -131,13 +137,18 @@ export async function POST(request: Request) {
const content =
extractLab1AssistantContent(parsedBody) ||
tokens.map((token) => token.token).join("");
const finishReason = extractLab1FinishReason(parsedBody);
const isTruncated = finishReason === "length";
return NextResponse.json({
content,
finishReason,
isTruncated,
maxTokens,
model:
("model" in parsedBody && typeof parsedBody.model === "string"
"model" in parsedBody && typeof parsedBody.model === "string"
? parsedBody.model
: getLab1ModelAlias()),
: getLab1ModelAlias(),
role: "assistant",
tokens,
});
@@ -153,7 +164,8 @@ export async function POST(request: Request) {
return NextResponse.json(
{
error: "The Lab 1 confidence route could not reach the local Ollama endpoint.",
error:
"The Lab 1 confidence route could not reach the local Ollama endpoint.",
},
{ status: 502 },
);
+3
View File
@@ -1,6 +1,8 @@
import Image from "next/image";
import Link from "next/link";
import { TerminalNavLink } from "~/components/TerminalNavLink";
export function SiteHeader() {
return (
<header className="sticky top-0 z-20 border-b border-[#f8c27a] bg-white/95 shadow-sm backdrop-blur">
@@ -16,6 +18,7 @@ export function SiteHeader() {
<Link href="/labs" className="hover:text-[#F89C27]">
Labs
</Link>
<TerminalNavLink />
<Link
href="https://discord.gg/Ma9UZNBxvh"
className="rounded-md border border-[#F89C27] px-3 py-1.5 text-[#004E78] hover:bg-[#F89C27] hover:text-white"
+48
View File
@@ -0,0 +1,48 @@
import { render, screen, waitFor } from "@testing-library/react";
import { afterEach, describe, expect, it, vi } from "vitest";
import { TerminalNavLink } from "~/components/TerminalNavLink";
import {
COURSEWARE_RUNTIME_CONFIG_PATH,
LAB3_DEFAULT_TERMINAL_PATH,
} from "~/lib/courseware-runtime";
describe("TerminalNavLink", () => {
afterEach(() => {
vi.restoreAllMocks();
});
it("defaults to the same-origin WeTTY path", () => {
vi.spyOn(globalThis, "fetch").mockRejectedValue(new Error("not found"));
render(<TerminalNavLink />);
expect(screen.getByRole("link", { name: "Terminal" })).toHaveAttribute(
"href",
LAB3_DEFAULT_TERMINAL_PATH,
);
});
it("loads the terminal link from runtime config", async () => {
const fetchMock = vi.spyOn(globalThis, "fetch").mockResolvedValue(
new Response(
JSON.stringify({
lab3TerminalUrl: "http://127.0.0.1:7681/wetty",
}),
{ status: 200 },
),
);
render(<TerminalNavLink />);
await waitFor(() => {
expect(screen.getByRole("link", { name: "Terminal" })).toHaveAttribute(
"href",
"http://localhost:7681/wetty",
);
});
expect(fetchMock).toHaveBeenCalledWith(COURSEWARE_RUNTIME_CONFIG_PATH, {
cache: "no-store",
});
});
});
+41
View File
@@ -0,0 +1,41 @@
"use client";
import { useEffect, useState } from "react";
import {
LAB3_DEFAULT_TERMINAL_PATH,
fetchCoursewareRuntimeConfig,
} from "~/lib/courseware-runtime";
export function TerminalNavLink() {
const [terminalPath, setTerminalPath] = useState(LAB3_DEFAULT_TERMINAL_PATH);
useEffect(() => {
let isCancelled = false;
void fetchCoursewareRuntimeConfig()
.then((runtimeConfig) => {
if (isCancelled) return;
setTerminalPath(runtimeConfig.lab3TerminalUrl);
})
.catch(() => {
if (isCancelled) return;
setTerminalPath(LAB3_DEFAULT_TERMINAL_PATH);
});
return () => {
isCancelled = true;
};
}, []);
return (
<a
className="hover:text-[#F89C27]"
href={terminalPath}
rel="noreferrer"
target="_blank"
>
Terminal
</a>
);
}
@@ -0,0 +1,129 @@
import { fireEvent, render, screen, within } from "@testing-library/react";
import { afterEach, describe, expect, it, vi } from "vitest";
import { InferenceSettingsVisualization } from "~/components/labs/InferenceSettingsVisualization";
function getCard(name: string) {
const card = screen.getByRole("heading", { name }).closest("article");
expect(card).not.toBeNull();
return card as HTMLElement;
}
function getCandidateRow(card: HTMLElement, token: string) {
const row = Array.from(
card.querySelectorAll<HTMLElement>(".inference-settings-viz__row"),
).find((candidateRow) => candidateRow.textContent?.includes(token));
expect(row).not.toBeNull();
return row as HTMLElement;
}
describe("InferenceSettingsVisualization", () => {
afterEach(() => {
vi.restoreAllMocks();
});
it("renders three separate samplers with the shared fox prompt", () => {
render(<InferenceSettingsVisualization />);
expect(
screen.getByRole("heading", {
name: "See inference filters reshape the next-token choice",
}),
).toBeInTheDocument();
expect(
screen.getByRole("heading", { name: "Temperature" }),
).toBeInTheDocument();
expect(screen.getByRole("heading", { name: "Top K" })).toBeInTheDocument();
expect(
screen.getByRole("heading", { name: "Top P / Min P" }),
).toBeInTheDocument();
expect(
screen.getAllByText("The quick brown fox").length,
).toBeGreaterThanOrEqual(3);
expect(getCard("Top P / Min P")).toHaveClass(
"inference-settings-viz__card--wide",
);
});
it("updates the temperature distribution when the slider changes", () => {
render(<InferenceSettingsVisualization />);
const card = getCard("Temperature");
const jumpsRow = getCandidateRow(card, "jumps");
const initialText = jumpsRow.textContent;
fireEvent.change(within(card).getByLabelText("Temperature"), {
target: { value: "2" },
});
expect(jumpsRow.textContent).not.toBe(initialText);
});
it("excludes lower-ranked candidates from Top K sampling", () => {
vi.spyOn(Math, "random").mockReturnValue(0.99);
render(<InferenceSettingsVisualization />);
const card = getCard("Top K");
fireEvent.change(within(card).getByLabelText("Top K"), {
target: { value: "1" },
});
expect(getCandidateRow(card, "jumps")).toHaveTextContent("Included");
expect(getCandidateRow(card, "leaps")).toHaveTextContent("Excluded");
fireEvent.click(
within(card).getByRole("button", { name: "Sample Next Token" }),
);
expect(
within(card).getByText("The quick brown fox jumps"),
).toBeInTheDocument();
});
it("toggles Top P into Min P mode and applies the relative probability floor", () => {
render(<InferenceSettingsVisualization />);
const card = getCard("Top P / Min P");
expect(within(card).getByText("Top P threshold math")).toBeInTheDocument();
expect(within(card).getByText("Target P")).toBeInTheDocument();
expect(
within(card).getByLabelText("Top P cumulative probability strip"),
).toBeInTheDocument();
fireEvent.click(within(card).getByRole("button", { name: "Min P" }));
const minPSlider = within(card).getByLabelText("Min P");
expect(minPSlider).toBeInTheDocument();
expect(within(card).getByText("Min P threshold math")).toBeInTheDocument();
expect(
within(card).getByLabelText("Min P raw probability cutoff bars"),
).toBeInTheDocument();
fireEvent.change(minPSlider, {
target: { value: "0.2" },
});
expect(getCandidateRow(card, "hops")).toHaveTextContent("Included");
expect(getCandidateRow(card, "darts")).toHaveTextContent("Excluded");
});
it("samples and resets a card sequence", () => {
vi.spyOn(Math, "random").mockReturnValue(0);
render(<InferenceSettingsVisualization />);
const card = getCard("Temperature");
fireEvent.click(
within(card).getByRole("button", { name: "Sample Next Token" }),
);
expect(
within(card).getByText("The quick brown fox jumps"),
).toBeInTheDocument();
expect(within(card).getByText(/Sampled "jumps"/)).toBeInTheDocument();
fireEvent.click(within(card).getByRole("button", { name: "Reset" }));
expect(within(card).getByText("The quick brown fox")).toBeInTheDocument();
expect(within(card).getByText("No token sampled yet")).toBeInTheDocument();
});
});
@@ -0,0 +1,642 @@
"use client";
import { useMemo, useState } from "react";
type Candidate = {
token: string;
raw: number;
};
type ProcessedCandidate = Candidate & {
included: boolean;
samplingProb: number;
};
type CumulativeCandidate = Candidate & {
cumulativeEnd: number;
cumulativeStart: number;
included: boolean;
};
type SamplerKind = "temperature" | "top-k" | "top-p";
type NucleusMode = "top-p" | "min-p";
const INITIAL_PROMPT = "The quick brown fox";
const BAR_COLORS = [
"#0b72ba",
"#0f766e",
"#b77400",
"#7c3aed",
"#be123c",
"#4f46e5",
"#15803d",
"#a16207",
"#0e7490",
"#9333ea",
] as const;
const CANDIDATE_SETS: Record<string, Candidate[]> = {
[INITIAL_PROMPT]: [
{ token: " jumps", raw: 0.34 },
{ token: " leaps", raw: 0.16 },
{ token: " runs", raw: 0.12 },
{ token: " bounds", raw: 0.1 },
{ token: " hops", raw: 0.08 },
{ token: " darts", raw: 0.06 },
{ token: " sneaks", raw: 0.05 },
{ token: " watches", raw: 0.04 },
{ token: " sleeps", raw: 0.03 },
{ token: " ignores", raw: 0.02 },
],
[`${INITIAL_PROMPT} jumps`]: [
{ token: " over", raw: 0.48 },
{ token: " across", raw: 0.16 },
{ token: " past", raw: 0.12 },
{ token: " toward", raw: 0.08 },
{ token: " beside", raw: 0.06 },
{ token: " near", raw: 0.04 },
{ token: " under", raw: 0.03 },
{ token: " through", raw: 0.03 },
],
[`${INITIAL_PROMPT} jumps over`]: [
{ token: " the", raw: 0.64 },
{ token: " a", raw: 0.14 },
{ token: " one", raw: 0.06 },
{ token: " every", raw: 0.05 },
{ token: " that", raw: 0.04 },
{ token: " another", raw: 0.03 },
{ token: " this", raw: 0.02 },
{ token: " each", raw: 0.02 },
],
[`${INITIAL_PROMPT} jumps over the`]: [
{ token: " lazy", raw: 0.46 },
{ token: " sleepy", raw: 0.14 },
{ token: " old", raw: 0.1 },
{ token: " tired", raw: 0.09 },
{ token: " quiet", raw: 0.07 },
{ token: " brown", raw: 0.05 },
{ token: " startled", raw: 0.05 },
{ token: " patient", raw: 0.04 },
],
[`${INITIAL_PROMPT} jumps over the lazy`]: [
{ token: " dog", raw: 0.68 },
{ token: " hound", raw: 0.1 },
{ token: " pup", raw: 0.07 },
{ token: " cat", raw: 0.05 },
{ token: " animal", raw: 0.04 },
{ token: " spaniel", raw: 0.03 },
{ token: " retriever", raw: 0.02 },
{ token: " watchdog", raw: 0.01 },
],
};
const FALLBACK_CANDIDATES: Candidate[] = [
{ token: ".", raw: 0.28 },
{ token: " and", raw: 0.18 },
{ token: " while", raw: 0.12 },
{ token: " before", raw: 0.1 },
{ token: " near", raw: 0.09 },
{ token: " again", raw: 0.08 },
{ token: ",", raw: 0.08 },
{ token: " quickly", raw: 0.07 },
];
function normalize(candidates: Candidate[]): Candidate[] {
const sum = candidates.reduce((total, candidate) => total + candidate.raw, 0);
if (sum <= 0) return candidates;
return candidates.map((candidate) => ({
...candidate,
raw: candidate.raw / sum,
}));
}
function getCandidates(sequence: string) {
return normalize(CANDIDATE_SETS[sequence] ?? FALLBACK_CANDIDATES);
}
function renormalizeIncluded(
candidates: Array<Candidate & { included: boolean; score: number }>,
): ProcessedCandidate[] {
const includedSum = candidates.reduce((total, candidate) => {
return candidate.included ? total + candidate.score : total;
}, 0);
return candidates.map(({ score: _score, ...candidate }) => ({
...candidate,
samplingProb:
candidate.included && includedSum > 0 ? _score / includedSum : 0,
}));
}
function applyTemperature(
candidates: Candidate[],
temperature: number,
): ProcessedCandidate[] {
const logits = candidates.map(
(candidate) => Math.log(candidate.raw) / temperature,
);
const maxLogit = Math.max(...logits);
const exps = logits.map((logit) => Math.exp(logit - maxLogit));
const sum = exps.reduce((total, value) => total + value, 0);
return candidates.map((candidate, index) => ({
...candidate,
included: true,
samplingProb: exps[index] ? exps[index] / sum : 0,
}));
}
function applyTopK(
candidates: Candidate[],
topK: number,
): ProcessedCandidate[] {
const includedTokens = new Set(
[...candidates]
.sort((left, right) => right.raw - left.raw)
.slice(0, topK)
.map((candidate) => candidate.token),
);
return renormalizeIncluded(
candidates.map((candidate) => ({
...candidate,
included: includedTokens.has(candidate.token),
score: candidate.raw,
})),
);
}
function applyTopP(
candidates: Candidate[],
topP: number,
): ProcessedCandidate[] {
const sortedCandidates = [...candidates].sort(
(left, right) => right.raw - left.raw,
);
const includedTokens = new Set<string>();
let cumulativeProbability = 0;
for (const candidate of sortedCandidates) {
includedTokens.add(candidate.token);
cumulativeProbability += candidate.raw;
if (cumulativeProbability >= topP) break;
}
return renormalizeIncluded(
candidates.map((candidate) => ({
...candidate,
included: includedTokens.has(candidate.token),
score: candidate.raw,
})),
);
}
function applyMinP(
candidates: Candidate[],
minP: number,
): ProcessedCandidate[] {
const highestProbability = Math.max(
...candidates.map((candidate) => candidate.raw),
);
const threshold = highestProbability * minP;
return renormalizeIncluded(
candidates.map((candidate) => ({
...candidate,
included: candidate.raw >= threshold,
score: candidate.raw,
})),
);
}
function sampleCandidate(candidates: ProcessedCandidate[]) {
const includedCandidates = candidates.filter(
(candidate) => candidate.included,
);
if (includedCandidates.length === 0) return candidates[0] ?? null;
let cursor = Math.random();
for (const candidate of includedCandidates) {
cursor -= candidate.samplingProb;
if (cursor <= 0) return candidate;
}
return includedCandidates[includedCandidates.length - 1] ?? null;
}
function formatPercent(value: number) {
return `${(value * 100).toFixed(1)}%`;
}
function getSortedCandidates(candidates: Candidate[]) {
return [...candidates].sort((left, right) => right.raw - left.raw);
}
function getCumulativeCandidates(
candidates: Candidate[],
processedCandidates: ProcessedCandidate[],
): CumulativeCandidate[] {
const includedTokens = new Set(
processedCandidates
.filter((candidate) => candidate.included)
.map((candidate) => candidate.token),
);
let cumulativeProbability = 0;
return getSortedCandidates(candidates).map((candidate) => {
const cumulativeStart = cumulativeProbability;
cumulativeProbability += candidate.raw;
return {
...candidate,
cumulativeEnd: cumulativeProbability,
cumulativeStart,
included: includedTokens.has(candidate.token),
};
});
}
function getIncludedRawSum(processedCandidates: ProcessedCandidate[]) {
return processedCandidates.reduce((total, candidate) => {
return candidate.included ? total + candidate.raw : total;
}, 0);
}
function NucleusThresholdVisual({
candidates,
minP,
mode,
processedCandidates,
topP,
}: {
candidates: Candidate[];
minP: number;
mode: NucleusMode;
processedCandidates: ProcessedCandidate[];
topP: number;
}) {
if (mode === "top-p") {
const cumulativeCandidates = getCumulativeCandidates(
candidates,
processedCandidates,
);
const includedSum = getIncludedRawSum(processedCandidates);
const includedTokens = cumulativeCandidates.filter(
(candidate) => candidate.included,
);
return (
<div className="inference-settings-viz__threshold-panel">
<div className="inference-settings-viz__threshold-header">
<strong>Top P threshold math</strong>
<span>
Keep adding highest-probability tokens until cumulative probability
reaches <code>{topP.toFixed(2)}</code>.
</span>
</div>
<div className="inference-settings-viz__formula-row">
<span>Target P</span>
<code>{formatPercent(topP)}</code>
<span>Included mass</span>
<code>{formatPercent(includedSum)}</code>
</div>
<div
className="inference-settings-viz__cumulative-strip"
aria-label="Top P cumulative probability strip"
>
{cumulativeCandidates.map((candidate, index) => (
<span
className="inference-settings-viz__cumulative-segment"
data-included={candidate.included ? "true" : "false"}
key={candidate.token}
style={{
backgroundColor: BAR_COLORS[index % BAR_COLORS.length],
width: `${candidate.raw * 100}%`,
}}
title={`${candidate.token.trim()}: ${formatPercent(
candidate.raw,
)}, cumulative ${formatPercent(candidate.cumulativeEnd)}`}
>
{candidate.raw >= 0.08 ? candidate.token.trim() : ""}
</span>
))}
<span
className="inference-settings-viz__threshold-marker"
style={{ left: `${topP * 100}%` }}
>
P
</span>
</div>
<p className="inference-settings-viz__threshold-note">
Included prefix:{" "}
<strong>
{includedTokens
.map((candidate) => candidate.token.trim())
.join(" + ")}
</strong>
. The last included token can push the total past the target because
tokens are discrete choices.
</p>
</div>
);
}
const sortedCandidates = getSortedCandidates(candidates);
const maxProbability = sortedCandidates[0]?.raw ?? 0;
const threshold = maxProbability * minP;
return (
<div className="inference-settings-viz__threshold-panel">
<div className="inference-settings-viz__threshold-header">
<strong>Min P threshold math</strong>
<span>
Keep tokens whose probability is at least{" "}
<code>min_p x strongest token</code>.
</span>
</div>
<div className="inference-settings-viz__formula-row">
<span>Strongest token</span>
<code>{formatPercent(maxProbability)}</code>
<span>Cutoff</span>
<code>
{formatPercent(maxProbability)} x {minP.toFixed(2)} ={" "}
{formatPercent(threshold)}
</code>
</div>
<div
className="inference-settings-viz__minp-bars"
aria-label="Min P raw probability cutoff bars"
>
{sortedCandidates.map((candidate, index) => {
const included = candidate.raw >= threshold;
return (
<div
className="inference-settings-viz__minp-row"
data-included={included ? "true" : "false"}
key={candidate.token}
>
<span>{candidate.token.trim()}</span>
<div className="inference-settings-viz__minp-track">
<div
className="inference-settings-viz__minp-fill"
style={{
backgroundColor: BAR_COLORS[index % BAR_COLORS.length],
width: `${maxProbability > 0 ? (candidate.raw / maxProbability) * 100 : 0}%`,
}}
/>
<i
className="inference-settings-viz__minp-marker"
style={{ left: `${Math.min(minP * 100, 100)}%` }}
/>
</div>
<code>{formatPercent(candidate.raw)}</code>
</div>
);
})}
</div>
<p className="inference-settings-viz__threshold-note">
The vertical marker is the minimum allowed fraction of the strongest
token. Bars that do not reach it are removed before sampling.
</p>
</div>
);
}
type SamplerCardProps = {
description: string;
kind: SamplerKind;
title: string;
};
function SamplerCard({ description, kind, title }: SamplerCardProps) {
const [sequence, setSequence] = useState(INITIAL_PROMPT);
const [sampledMessage, setSampledMessage] = useState("");
const [temperature, setTemperature] = useState(0.8);
const [topK, setTopK] = useState(5);
const [topP, setTopP] = useState(0.9);
const [minP, setMinP] = useState(0.05);
const [nucleusMode, setNucleusMode] = useState<NucleusMode>("top-p");
const candidates = useMemo(() => getCandidates(sequence), [sequence]);
const processedCandidates = useMemo(() => {
if (kind === "temperature")
return applyTemperature(candidates, temperature);
if (kind === "top-k") return applyTopK(candidates, topK);
if (nucleusMode === "min-p") return applyMinP(candidates, minP);
return applyTopP(candidates, topP);
}, [candidates, kind, minP, nucleusMode, temperature, topK, topP]);
const sampleNextToken = () => {
const selectedCandidate = sampleCandidate(processedCandidates);
if (!selectedCandidate) return;
setSequence(
(currentSequence) => `${currentSequence}${selectedCandidate.token}`,
);
setSampledMessage(
`Sampled "${selectedCandidate.token.trim()}" (${formatPercent(
selectedCandidate.samplingProb,
)})`,
);
};
const resetSampler = () => {
setSequence(INITIAL_PROMPT);
setSampledMessage("");
};
return (
<article
className={`inference-settings-viz__card${
kind === "top-p" ? " inference-settings-viz__card--wide" : ""
}`}
aria-labelledby={`inference-settings-viz-${kind}`}
>
<div className="inference-settings-viz__card-header">
<h4 id={`inference-settings-viz-${kind}`}>{title}</h4>
<p>{description}</p>
</div>
<div className="inference-settings-viz__sequence" aria-live="polite">
{sequence}
</div>
{kind === "temperature" ? (
<label className="inference-settings-viz__control">
<span>
Temperature <strong>{temperature.toFixed(1)}</strong>
</span>
<input
aria-label="Temperature"
type="range"
min={0.1}
max={2}
step={0.1}
value={temperature}
onChange={(event) => setTemperature(Number(event.target.value))}
/>
</label>
) : null}
{kind === "top-k" ? (
<label className="inference-settings-viz__control">
<span>
Top K <strong>{topK}</strong>
</span>
<input
aria-label="Top K"
type="range"
min={1}
max={10}
step={1}
value={topK}
onChange={(event) => setTopK(Number(event.target.value))}
/>
</label>
) : null}
{kind === "top-p" ? (
<div className="inference-settings-viz__nucleus-controls">
<div
className="inference-settings-viz__segmented"
aria-label="Top P or Min P mode"
role="group"
>
<button
type="button"
aria-pressed={nucleusMode === "top-p"}
onClick={() => setNucleusMode("top-p")}
>
Top P
</button>
<button
type="button"
aria-pressed={nucleusMode === "min-p"}
onClick={() => setNucleusMode("min-p")}
>
Min P
</button>
</div>
{nucleusMode === "top-p" ? (
<label className="inference-settings-viz__control">
<span>
Top P <strong>{topP.toFixed(2)}</strong>
</span>
<input
aria-label="Top P"
type="range"
min={0.1}
max={1}
step={0.05}
value={topP}
onChange={(event) => setTopP(Number(event.target.value))}
/>
</label>
) : (
<label className="inference-settings-viz__control">
<span>
Min P <strong>{minP.toFixed(2)}</strong>
</span>
<input
aria-label="Min P"
type="range"
min={0}
max={0.2}
step={0.01}
value={minP}
onChange={(event) => setMinP(Number(event.target.value))}
/>
</label>
)}
</div>
) : null}
{kind === "top-p" ? (
<NucleusThresholdVisual
candidates={candidates}
minP={minP}
mode={nucleusMode}
processedCandidates={processedCandidates}
topP={topP}
/>
) : null}
<div
className="inference-settings-viz__bars"
aria-label={`${title} candidates`}
>
{processedCandidates.map((candidate, index) => (
<div
className="inference-settings-viz__row"
data-included={candidate.included ? "true" : "false"}
key={candidate.token}
>
<span className="inference-settings-viz__token">
{candidate.token.trim() || candidate.token}
</span>
<div className="inference-settings-viz__bar-track">
<div
className="inference-settings-viz__bar-fill"
style={{
backgroundColor: BAR_COLORS[index % BAR_COLORS.length],
width: `${Math.max(candidate.samplingProb * 100, candidate.included ? 4 : 0)}%`,
}}
>
<span>{formatPercent(candidate.samplingProb)}</span>
</div>
</div>
<span className="inference-settings-viz__row-state">
{candidate.included ? "Included" : "Excluded"}
</span>
</div>
))}
</div>
<div className="inference-settings-viz__actions">
<button type="button" onClick={sampleNextToken}>
Sample Next Token
</button>
<button type="button" onClick={resetSampler}>
Reset
</button>
<span aria-live="polite">
{sampledMessage || "No token sampled yet"}
</span>
</div>
</article>
);
}
export function InferenceSettingsVisualization() {
return (
<section className="inference-settings-viz" data-widget-enhanced="true">
<div className="inference-settings-viz__header">
<p className="inference-settings-viz__eyebrow">
Objective 3 Lab Widget
</p>
<h3>See inference filters reshape the next-token choice</h3>
<p>
Each card starts with <code>{INITIAL_PROMPT}</code>. Adjust one
setting, compare the candidate bars, then sample the next token.
</p>
</div>
<div className="inference-settings-viz__grid">
<SamplerCard
kind="temperature"
title="Temperature"
description="Temperature smooths or sharpens the whole probability distribution before sampling."
/>
<SamplerCard
kind="top-k"
title="Top K"
description="Top K keeps only the K most likely candidates and removes the rest from sampling."
/>
<SamplerCard
kind="top-p"
title="Top P / Min P"
description="Top P keeps a cumulative probability nucleus. Min P keeps tokens above a relative floor."
/>
</div>
</section>
);
}
@@ -15,6 +15,9 @@ describe("Lab1ConfidenceChat", () => {
return {
json: async () => ({
content: "often works",
finishReason: "stop",
isTruncated: false,
maxTokens: 512,
model: "batiai/gemma4-e2b:q4",
role: "assistant",
tokens: [
@@ -49,7 +52,12 @@ describe("Lab1ConfidenceChat", () => {
screen.getByRole("button", { name: "Generate Output" }).closest("form")!,
);
expect(await screen.findByLabelText("often 40.0%")).toBeInTheDocument();
const token = await screen.findByLabelText("often 40.0%");
expect(token).toBeInTheDocument();
fireEvent.mouseEnter(token);
expect(screen.getByRole("tooltip")).toBeInTheDocument();
expect(screen.getByText("14.0%:")).toBeInTheDocument();
expect(screen.getByText("commonly")).toBeInTheDocument();
expect(screen.getByText("batiai/gemma4-e2b:q4")).toBeInTheDocument();
@@ -81,4 +89,46 @@ describe("Lab1ConfidenceChat", () => {
await screen.findByText("The local Ollama request failed."),
).toBeInTheDocument();
});
it("explains when the response hit the configured token limit", async () => {
vi.stubGlobal(
"fetch",
vi.fn(async () => {
return {
json: async () => ({
content: "partial output",
finishReason: "length",
isTruncated: true,
maxTokens: 512,
model: "batiai/gemma4-e2b:q4",
role: "assistant",
tokens: [
{
logprob: Math.log(0.5),
probability: 50,
token: "partial",
topAlternatives: [],
},
],
}),
ok: true,
};
}),
);
render(<Lab1ConfidenceChat />);
fireEvent.change(screen.getByLabelText("Prompt"), {
target: { value: "Write a longer answer." },
});
fireEvent.submit(
screen.getByRole("button", { name: "Generate Output" }).closest("form")!,
);
expect(
await screen.findByText(
/Response reached the configured 512-token limit/,
),
).toBeInTheDocument();
});
});
+140 -20
View File
@@ -1,6 +1,12 @@
"use client";
import { FormEvent, useState } from "react";
import {
type CSSProperties,
type FocusEvent,
FormEvent,
type MouseEvent,
useState,
} from "react";
import {
formatProbabilityPercent,
@@ -23,12 +29,28 @@ type AssistantTurn = Lab1ConfidenceResponse & {
type ChatTurn = AssistantTurn | UserTurn;
type TooltipPlacement = "above" | "below";
type ActiveTooltip = {
left: number;
placement: TooltipPlacement;
token: Lab1ResponseToken;
tokenId: string;
top: number;
};
const starterPrompts = [
"The quick brown fox",
"Write one sentence explaining what a firewall does.",
"List three words that describe a phishing email.",
] as const;
const CONFIDENCE_TOOLTIP_ID = "lab1-confidence-tooltip";
const TOOLTIP_ESTIMATED_HEIGHT = 180;
const TOOLTIP_ESTIMATED_WIDTH = 260;
const TOOLTIP_VIEWPORT_PADDING = 16;
const TOOLTIP_OFFSET = 10;
function buildTurnId() {
return `lab1-turn-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`;
}
@@ -37,14 +59,56 @@ function toConversation(messages: ChatTurn[]) {
return messages.map(({ content, role }) => ({ content, role }));
}
function renderTooltip(token: Lab1ResponseToken) {
function getTooltipPosition(element: HTMLElement) {
const rect = element.getBoundingClientRect();
const viewportWidth =
window.innerWidth || document.documentElement.clientWidth;
const viewportHeight =
window.innerHeight || document.documentElement.clientHeight;
const halfTooltipWidth = TOOLTIP_ESTIMATED_WIDTH / 2;
const minLeft = TOOLTIP_VIEWPORT_PADDING + halfTooltipWidth;
const maxLeft = viewportWidth - TOOLTIP_VIEWPORT_PADDING - halfTooltipWidth;
const centeredLeft = rect.left + rect.width / 2;
const left =
maxLeft > minLeft
? Math.min(Math.max(centeredLeft, minLeft), maxLeft)
: viewportWidth / 2;
const belowTop = rect.bottom + TOOLTIP_OFFSET;
const hasRoomBelow = belowTop + TOOLTIP_ESTIMATED_HEIGHT <= viewportHeight;
const hasRoomAbove = rect.top - TOOLTIP_OFFSET - TOOLTIP_ESTIMATED_HEIGHT > 0;
if (!hasRoomBelow && hasRoomAbove) {
return {
left,
placement: "above" as const,
top: rect.top - TOOLTIP_OFFSET,
};
}
return {
left,
placement: "below" as const,
top: belowTop,
};
}
function renderTooltip(
token: Lab1ResponseToken,
placement: TooltipPlacement,
style?: CSSProperties,
) {
return (
<span className="lab1-confidence__tooltip">
<span
className={`lab1-confidence__tooltip lab1-confidence__tooltip--${placement}`}
id={CONFIDENCE_TOOLTIP_ID}
role="tooltip"
style={style}
>
<strong>{formatProbabilityPercent(token.probability)}</strong>
{token.topAlternatives.length > 0 ? (
<span className="lab1-confidence__tooltip-list">
{token.topAlternatives.map((candidate) => (
<span key={`${token.token}-${candidate.token}`}>
{token.topAlternatives.map((candidate, index) => (
<span key={`${token.token}-${candidate.token}-${index}`}>
{formatProbabilityPercent(candidate.probability)}:{" "}
<code>{candidate.token}</code>
</span>
@@ -64,6 +128,27 @@ export function Lab1ConfidenceChat() {
const [messages, setMessages] = useState<ChatTurn[]>([]);
const [error, setError] = useState<string | null>(null);
const [isSubmitting, setIsSubmitting] = useState(false);
const [activeTooltip, setActiveTooltip] = useState<ActiveTooltip | null>(
null,
);
function showTooltip(
tokenId: string,
token: Lab1ResponseToken,
element: HTMLElement,
) {
setActiveTooltip({
token,
tokenId,
...getTooltipPosition(element),
});
}
function hideTooltip(tokenId: string) {
setActiveTooltip((currentTooltip) =>
currentTooltip?.tokenId === tokenId ? null : currentTooltip,
);
}
async function handleSubmit(event: FormEvent<HTMLFormElement>) {
event.preventDefault();
@@ -183,23 +268,51 @@ export function Lab1ConfidenceChat() {
</div>
<div className="lab1-confidence__token-stream" role="list">
{message.tokens.map((token, index) => (
<span
aria-label={`${token.token} ${formatProbabilityPercent(
token.probability,
)}`}
className={`lab1-confidence__token lab1-confidence__token--${getConfidenceBand(
token.probability,
)}`}
key={`${message.id}-${index}-${token.token}`}
role="listitem"
>
{token.token}
{renderTooltip(token)}
</span>
))}
{message.tokens.map((token, index) => {
const tokenId = `${message.id}-${index}-${token.token}`;
const isTooltipActive = activeTooltip?.tokenId === tokenId;
return (
<span
aria-describedby={
isTooltipActive ? CONFIDENCE_TOOLTIP_ID : undefined
}
aria-label={`${token.token} ${formatProbabilityPercent(
token.probability,
)}`}
className={`lab1-confidence__token lab1-confidence__token--${getConfidenceBand(
token.probability,
)}`}
key={tokenId}
onBlur={() => hideTooltip(tokenId)}
onFocus={(
event: FocusEvent<HTMLSpanElement, Element>,
) => showTooltip(tokenId, token, event.currentTarget)}
onMouseEnter={(
event: MouseEvent<
HTMLSpanElement,
globalThis.MouseEvent
>,
) => showTooltip(tokenId, token, event.currentTarget)}
onMouseLeave={() => hideTooltip(tokenId)}
role="listitem"
tabIndex={0}
>
{token.token}
</span>
);
})}
</div>
{message.isTruncated ? (
<p className="lab1-confidence__message-warning">
Response reached the configured{" "}
{message.maxTokens ? `${message.maxTokens}-token` : "token"}{" "}
limit. Increase <code>COURSEWARE_LAB1_MAX_TOKENS</code> to
allow longer Lab 1 generations.
</p>
) : null}
{message.error ? (
<p className="lab1-confidence__message-warning">
{message.error}
@@ -239,6 +352,13 @@ export function Lab1ConfidenceChat() {
{error ? <p className="lab1-confidence__error">{error}</p> : null}
</form>
{activeTooltip
? renderTooltip(activeTooltip.token, activeTooltip.placement, {
left: activeTooltip.left,
top: activeTooltip.top,
})
: null}
</section>
);
}
+100 -10
View File
@@ -56,6 +56,30 @@ describe("LabContent", () => {
).toBeInTheDocument();
});
it("renders the Lab 4 inference visualization token into an interactive component", async () => {
mockRuntimeConfig();
render(
<LabContent
className="lab-content"
html="<div data-inference-settings-visualization></div>"
/>,
);
expect(
screen.getByRole("heading", {
name: "See inference filters reshape the next-token choice",
}),
).toBeInTheDocument();
expect(
screen.getByRole("heading", { name: "Temperature" }),
).toBeInTheDocument();
expect(screen.getByRole("heading", { name: "Top K" })).toBeInTheDocument();
expect(
screen.getByRole("heading", { name: "Top P / Min P" }),
).toBeInTheDocument();
});
it("filters harness branches from a single Objective 2 selector", async () => {
mockRuntimeConfig();
@@ -138,6 +162,73 @@ describe("LabContent", () => {
expect(link).toHaveClass("lab-service-pill");
});
it("renders Lab 3 browser targets as polished open buttons", async () => {
mockRuntimeConfig();
const lab = getLabDocument("lab-3-llama-cpp-and-ollama");
expect(lab).not.toBeNull();
render(
<LabContent
className="lab-content"
html={micromark(lab?.content ?? "", { allowDangerousHtml: true })}
/>,
);
const llamaLink = await screen.findByRole("link", {
name: "LLaMA-3.2-1B on Hugging Face",
});
expect(llamaLink).toHaveAttribute(
"href",
"https://huggingface.co/meta-llama/Llama-3.2-1B",
);
expect(llamaLink).toHaveClass("lab-open-pill");
expect(
screen.getByRole("link", {
name: "WhiteRabbitNeo-V3-7B on Hugging Face",
}),
).toHaveClass("lab-open-pill");
expect(screen.getByRole("link", { name: "Ollama registry" })).toHaveClass(
"lab-open-pill",
);
});
it("renders Lab 7 dataset and download targets as polished buttons", async () => {
mockRuntimeConfig();
const lab = getLabDocument("lab-7-dataset-generation-and-fine-tuning");
expect(lab).not.toBeNull();
render(
<LabContent
className="lab-content"
html={micromark(lab?.content ?? "", { allowDangerousHtml: true })}
/>,
);
const gsm8kLink = await screen.findByRole("link", {
name: "GSM8K dataset",
});
expect(gsm8kLink).toHaveAttribute(
"href",
"https://huggingface.co/datasets/openai/gsm8k",
);
expect(gsm8kLink).toHaveClass("lab-open-pill");
const kilnLinks = screen.getAllByRole("link", {
name: "Download Kiln AI",
});
expect(kilnLinks).toHaveLength(2);
for (const kilnLink of kilnLinks) {
expect(kilnLink).toHaveAttribute(
"href",
"https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1",
);
expect(kilnLink).toHaveClass("lab-download-pill");
}
});
it("keeps rendered service URL links after opening an image zoom modal", async () => {
mockRuntimeConfig();
@@ -238,16 +329,15 @@ describe("LabContent", () => {
/>,
);
expect(
await screen.findByRole("link", { name: "Open WebUI" }),
).toHaveAttribute("href", "https://lab.example/openwebui");
expect(screen.getByRole("link", { name: "Open WebUI" })).toHaveClass(
"lab-service-pill",
);
expect(screen.getByRole("link", { name: "Open WebUI" })).toHaveAttribute(
"title",
"https://lab.example/openwebui",
);
const openWebUiLinks = await screen.findAllByRole("link", {
name: "Open WebUI",
});
expect(openWebUiLinks).toHaveLength(2);
for (const link of openWebUiLinks) {
expect(link).toHaveAttribute("href", "https://lab.example/openwebui");
expect(link).toHaveClass("lab-service-pill");
expect(link).toHaveAttribute("title", "https://lab.example/openwebui");
}
const apiMatches = await screen.findAllByText(
"https://lab.example/openwebui/api",
);
+12 -1
View File
@@ -12,6 +12,7 @@ import { Lab1ConfidenceChat } from "~/components/labs/Lab1ConfidenceChat";
import { Lab1NetronPanel } from "~/components/labs/Lab1NetronPanel";
import { Lab3TerminalFrame } from "~/components/labs/Lab3TerminalFrame";
import { Lab8Chat } from "~/components/labs/Lab8Chat";
import { InferenceSettingsVisualization } from "~/components/labs/InferenceSettingsVisualization";
import { Objective5Chat } from "~/components/labs/Objective5Chat";
import { QuantizationGridExplorer } from "~/components/labs/QuantizationGridExplorer";
import { QuantizationExplorer } from "~/components/labs/QuantizationExplorer";
@@ -62,6 +63,8 @@ const lab3TerminalToken = "<div data-lab3-terminal></div>";
const lab1ConfidenceToken = "<div data-lab1-confidence></div>";
const lab1NetronToken = "<div data-lab1-netron-panel></div>";
const tokenizerPlaygroundToken = "<div data-tokenizer-playground></div>";
const inferenceSettingsVisualizationToken =
"<div data-inference-settings-visualization></div>";
const serviceTokenPattern =
/\{\{service-(url|address):([a-z0-9-]+)(?::([^}]+))?\}\}/g;
const serviceLabels: Record<string, string> = {
@@ -461,7 +464,7 @@ const LabContentArticle = memo(function LabContentArticle({
const renderedContent = html
.split(
new RegExp(
`(${escapeRegex(quantizationExplorerToken)}|${escapeRegex(quantizationGridExplorerToken)}|${escapeRegex(objective5ChatToken)}|${escapeRegex(lab8ChatToken)}|${escapeRegex(lab3TerminalToken)}|${escapeRegex(lab1ConfidenceToken)}|${escapeRegex(lab1NetronToken)}|${escapeRegex(tokenizerPlaygroundToken)})`,
`(${escapeRegex(quantizationExplorerToken)}|${escapeRegex(quantizationGridExplorerToken)}|${escapeRegex(objective5ChatToken)}|${escapeRegex(lab8ChatToken)}|${escapeRegex(lab3TerminalToken)}|${escapeRegex(lab1ConfidenceToken)}|${escapeRegex(lab1NetronToken)}|${escapeRegex(tokenizerPlaygroundToken)}|${escapeRegex(inferenceSettingsVisualizationToken)})`,
"g",
),
)
@@ -505,6 +508,14 @@ const LabContentArticle = memo(function LabContentArticle({
);
}
if (part === inferenceSettingsVisualizationToken) {
return (
<InferenceSettingsVisualization
key={`inference-settings-viz-${index}`}
/>
);
}
return (
<Fragment key={`html-segment-${index}`}>
<div dangerouslySetInnerHTML={{ __html: part }} />
+24
View File
@@ -2,10 +2,12 @@ import { describe, expect, it } from "vitest";
import {
extractLab1AssistantContent,
extractLab1FinishReason,
extractLab1ResponseTokens,
formatProbabilityPercent,
getConfidenceBand,
logprobToProbabilityPercent,
parseLab1MaxTokens,
} from "~/lib/lab1-confidence";
describe("logprobToProbabilityPercent", () => {
@@ -30,6 +32,28 @@ describe("extractLab1AssistantContent", () => {
});
});
describe("extractLab1FinishReason", () => {
it("reads the upstream finish reason when it is present", () => {
expect(
extractLab1FinishReason({
choices: [
{
finish_reason: "length",
},
],
}),
).toBe("length");
});
});
describe("parseLab1MaxTokens", () => {
it("uses a bounded positive environment override", () => {
expect(parseLab1MaxTokens("768")).toBe(768);
expect(parseLab1MaxTokens("999999")).toBe(2048);
expect(parseLab1MaxTokens("nope")).toBe(512);
});
});
describe("extractLab1ResponseTokens", () => {
it("maps token logprobs and alternate candidates into display data", () => {
expect(
+26 -1
View File
@@ -1,6 +1,7 @@
export const LAB1_CONFIDENCE_MODEL_ALIAS = "batiai/gemma4-e2b:q4";
export const LAB1_DEFAULT_MAX_TOKENS = 64;
export const LAB1_DEFAULT_MAX_TOKENS = 512;
export const LAB1_DEFAULT_TEMPERATURE = 0.7;
export const LAB1_MAX_COMPLETION_TOKENS = 2048;
export const LAB1_MAX_CONTEXT_MESSAGES = 10;
export const LAB1_MAX_MESSAGE_LENGTH = 4000;
@@ -25,6 +26,9 @@ export type Lab1ResponseToken = {
export type Lab1ConfidenceResponse = {
content: string;
finishReason: string | null;
isTruncated: boolean;
maxTokens: number;
model: string;
role: "assistant";
tokens: Lab1ResponseToken[];
@@ -43,6 +47,7 @@ type OpenAiLogprobToken = {
type OpenAiCompatibilityPayload = {
choices?: Array<{
finish_reason?: string;
logprobs?: {
content?: OpenAiLogprobToken[];
};
@@ -61,6 +66,19 @@ export function getLab1SystemPrompt() {
].join(" ");
}
export function parseLab1MaxTokens(value: string | undefined) {
if (!value) {
return LAB1_DEFAULT_MAX_TOKENS;
}
const parsedValue = Number.parseInt(value, 10);
if (!Number.isFinite(parsedValue) || parsedValue <= 0) {
return LAB1_DEFAULT_MAX_TOKENS;
}
return Math.min(parsedValue, LAB1_MAX_COMPLETION_TOKENS);
}
export function clampLab1Messages(messages: Lab1ConfidenceMessage[]) {
return messages
.filter((message) => {
@@ -117,6 +135,13 @@ export function extractLab1AssistantContent(
return content || null;
}
export function extractLab1FinishReason(payload: OpenAiCompatibilityPayload) {
const finishReason = payload.choices?.[0]?.finish_reason;
return typeof finishReason === "string" && finishReason.trim()
? finishReason
: null;
}
export function extractLab1ResponseTokens(
payload: OpenAiCompatibilityPayload,
): Lab1ResponseToken[] {
+564 -18
View File
@@ -914,17 +914,22 @@ ol {
}
.lab-content ul.concept-pill-list > li {
display: flex;
flex-wrap: wrap;
align-items: center;
gap: 0.55rem;
display: grid;
grid-template-columns: max-content minmax(0, 1fr);
align-items: baseline;
column-gap: 0.7rem;
row-gap: 0.28rem;
margin: 0;
padding: 0.48rem 0.78rem;
padding: 0.72rem 1rem;
border: 1px solid #d5e2ee;
border-radius: 999px;
background: linear-gradient(180deg, #f9fcff, #f4f9fe);
}
.lab-content ul.concept-pill-list > li > span:not(.concept-pill-label) {
line-height: 1.45;
}
.lab-content .concept-pill-label {
display: inline;
color: #0f4f76;
@@ -951,6 +956,511 @@ ol {
margin: 1.25rem 0 1.5rem;
}
.lab-content [data-inference-settings-visualization] {
margin: 1.25rem 0 1.5rem;
}
.inference-settings-viz {
margin: 1.25rem 0 1.5rem;
border: 1px solid #d7e4ef;
border-radius: 16px;
background: linear-gradient(180deg, #fbfdff, #f4f9fd);
padding: 1rem;
}
.inference-settings-viz code {
font-family:
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
"Courier New", monospace;
}
.inference-settings-viz__header {
margin-bottom: 1rem;
}
.inference-settings-viz__eyebrow {
margin: 0;
color: #9a5f00;
font-size: 0.72rem;
font-weight: 800;
letter-spacing: 0.08em;
text-transform: uppercase;
}
.inference-settings-viz__header h3 {
margin: 0.1rem 0 0;
color: #0f3d58;
font-size: 1.2rem;
}
.inference-settings-viz__header p:not(.inference-settings-viz__eyebrow) {
margin: 0.55rem 0 0;
color: #334155;
}
.inference-settings-viz__grid {
display: grid;
grid-template-columns: repeat(2, minmax(0, 1fr));
gap: 0.9rem;
align-items: start;
}
.inference-settings-viz__card {
display: flex;
flex-direction: column;
min-width: 0;
min-height: 100%;
border: 1px solid #dce6ee;
border-radius: 14px;
background: rgba(255, 255, 255, 0.92);
padding: 0.9rem;
}
.inference-settings-viz__card--wide {
grid-column: 1 / -1;
width: 100%;
}
.inference-settings-viz__card--wide > * {
width: 100%;
}
.inference-settings-viz__card-header h4 {
margin: 0;
color: #0f3d58;
font-size: 1.05rem;
line-height: 1.35;
}
.inference-settings-viz__card-header p {
margin: 0.35rem 0 0;
color: #475569;
font-size: 0.92rem;
line-height: 1.42;
}
.inference-settings-viz__sequence {
margin: 0.8rem 0;
padding: 0.7rem 0.75rem;
border: 1px solid #d6e2ed;
border-radius: 10px;
background: #f7fbff;
color: #12364e;
font-family:
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
"Courier New", monospace;
font-size: 0.92rem;
line-height: 1.35;
min-height: 2.85rem;
}
.inference-settings-viz__control {
--slider-thumb-size: 1rem;
--slider-thumb-offset: calc(var(--slider-thumb-size) / 2);
display: block;
margin-bottom: 0.85rem;
}
.inference-settings-viz__control > span {
display: flex;
justify-content: space-between;
gap: 0.75rem;
color: #334155;
font-size: 0.86rem;
font-weight: 700;
}
.inference-settings-viz__control strong {
color: #0b72ba;
font-family:
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
"Courier New", monospace;
}
.inference-settings-viz__control input[type="range"] {
-webkit-appearance: none;
appearance: none;
display: block;
width: calc(100% - var(--slider-thumb-size));
margin-left: var(--slider-thumb-offset);
margin-right: var(--slider-thumb-offset);
margin-top: 0.55rem;
background: transparent;
}
.inference-settings-viz__control
input[type="range"]::-webkit-slider-runnable-track {
height: 0.68rem;
border-radius: 999px;
background: linear-gradient(180deg, #dbe7f2, #d4e1ec);
}
.inference-settings-viz__control input[type="range"]::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: var(--slider-thumb-size);
height: var(--slider-thumb-size);
margin-top: calc((0.68rem - var(--slider-thumb-size)) / 2);
border: 1px solid #c8d6e3;
border-radius: 999px;
background: linear-gradient(180deg, #ffffff, #eef3f8);
box-shadow: 0 1px 4px rgba(15, 23, 42, 0.18);
}
.inference-settings-viz__control input[type="range"]::-moz-range-track {
height: 0.68rem;
border: none;
border-radius: 999px;
background: linear-gradient(180deg, #dbe7f2, #d4e1ec);
}
.inference-settings-viz__control input[type="range"]::-moz-range-thumb {
width: var(--slider-thumb-size);
height: var(--slider-thumb-size);
border: 1px solid #c8d6e3;
border-radius: 999px;
background: linear-gradient(180deg, #ffffff, #eef3f8);
box-shadow: 0 1px 4px rgba(15, 23, 42, 0.18);
}
.inference-settings-viz__nucleus-controls {
margin-bottom: 0.85rem;
}
.inference-settings-viz__nucleus-controls .inference-settings-viz__control {
margin-bottom: 0;
}
.inference-settings-viz__segmented {
display: grid;
grid-template-columns: repeat(2, minmax(0, 1fr));
gap: 0.25rem;
margin-bottom: 0.75rem;
padding: 0.2rem;
border: 1px solid #d6e2ed;
border-radius: 10px;
background: #f7fbff;
}
.inference-settings-viz__segmented button {
border: 1px solid transparent;
border-radius: 8px;
background: transparent;
color: #426075;
cursor: pointer;
font: inherit;
font-size: 0.84rem;
font-weight: 800;
line-height: 1;
padding: 0.5rem 0.55rem;
}
.inference-settings-viz__segmented button[aria-pressed="true"] {
border-color: #9cc5e5;
background: #ffffff;
color: #0f4f76;
box-shadow: 0 1px 2px rgba(15, 23, 42, 0.08);
}
.inference-settings-viz__threshold-panel {
margin-bottom: 0.95rem;
padding: 0.85rem;
border: 1px solid #d6e2ed;
border-radius: 12px;
background: #f7fbff;
}
.inference-settings-viz__threshold-header {
display: grid;
gap: 0.25rem;
margin-bottom: 0.75rem;
}
.inference-settings-viz__threshold-header strong {
color: #0f3d58;
font-size: 0.92rem;
}
.inference-settings-viz__threshold-header span {
color: #475569;
font-size: 0.86rem;
line-height: 1.4;
}
.inference-settings-viz__formula-row {
display: grid;
grid-template-columns: max-content max-content max-content minmax(0, 1fr);
align-items: center;
gap: 0.45rem 0.6rem;
margin-bottom: 0.75rem;
color: #64748b;
font-size: 0.8rem;
font-weight: 700;
}
.inference-settings-viz__formula-row code {
color: #0f4f76;
font-size: 0.8rem;
font-weight: 800;
}
.inference-settings-viz__cumulative-strip {
position: relative;
display: flex;
height: 2.35rem;
overflow: visible;
border: 1px solid #cbdbe8;
border-radius: 10px;
background: #e8f1f8;
}
.inference-settings-viz__cumulative-segment {
display: flex;
align-items: center;
justify-content: center;
min-width: 0;
height: 100%;
overflow: hidden;
color: #ffffff;
font-size: 0.72rem;
font-weight: 800;
text-overflow: ellipsis;
white-space: nowrap;
}
.inference-settings-viz__cumulative-segment:first-child {
border-radius: 9px 0 0 9px;
}
.inference-settings-viz__cumulative-segment:nth-last-child(2) {
border-radius: 0 9px 9px 0;
}
.inference-settings-viz__cumulative-segment[data-included="false"] {
background: #cbd5e1 !important;
color: #475569;
}
.inference-settings-viz__threshold-marker {
position: absolute;
top: -0.42rem;
bottom: -0.42rem;
width: 2px;
transform: translateX(-1px);
background: #be123c;
color: #be123c;
}
.inference-settings-viz__threshold-marker {
font-size: 0;
}
.inference-settings-viz__threshold-marker::after {
content: "P threshold";
position: absolute;
left: 50%;
bottom: calc(100% + 0.18rem);
transform: translateX(-50%);
border: 1px solid #fecdd3;
border-radius: 999px;
background: #fff1f2;
color: #9f1239;
font-size: 0.66rem;
font-weight: 800;
line-height: 1;
padding: 0.2rem 0.34rem;
white-space: nowrap;
}
.inference-settings-viz__threshold-note {
margin: 0.7rem 0 0;
color: #475569;
font-size: 0.84rem;
line-height: 1.42;
}
.inference-settings-viz__threshold-note strong {
color: #0f3d58;
}
.inference-settings-viz__minp-bars {
display: grid;
gap: 0.45rem;
}
.inference-settings-viz__minp-row {
display: grid;
grid-template-columns: 4.35rem minmax(0, 1fr) 3.8rem;
align-items: center;
gap: 0.45rem;
}
.inference-settings-viz__minp-row > span {
overflow: hidden;
color: #334155;
font-family:
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
"Courier New", monospace;
font-size: 0.78rem;
font-weight: 700;
text-align: right;
text-overflow: ellipsis;
white-space: nowrap;
}
.inference-settings-viz__minp-row > code {
color: #334155;
font-size: 0.72rem;
font-weight: 800;
}
.inference-settings-viz__minp-track {
position: relative;
height: 1.1rem;
border: 1px solid #d8e3ed;
border-radius: 999px;
background: #edf4fa;
}
.inference-settings-viz__minp-fill {
height: 100%;
border-radius: 999px;
}
.inference-settings-viz__minp-marker {
position: absolute;
top: -0.28rem;
bottom: -0.28rem;
width: 2px;
transform: translateX(-1px);
background: #be123c;
}
.inference-settings-viz__minp-row[data-included="false"]
.inference-settings-viz__minp-fill {
opacity: 0.24;
}
.inference-settings-viz__minp-row[data-included="false"] > span,
.inference-settings-viz__minp-row[data-included="false"] > code {
color: #94a3b8;
}
.inference-settings-viz__bars {
display: grid;
gap: 0.42rem;
}
.inference-settings-viz__row {
display: grid;
grid-template-columns: 4.35rem minmax(0, 1fr) 4.45rem;
align-items: center;
gap: 0.45rem;
}
.inference-settings-viz__token {
overflow: hidden;
color: #334155;
font-family:
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
"Courier New", monospace;
font-size: 0.78rem;
font-weight: 700;
text-align: right;
text-overflow: ellipsis;
white-space: nowrap;
}
.inference-settings-viz__bar-track {
height: 1.4rem;
overflow: hidden;
border: 1px solid #d8e3ed;
border-radius: 6px;
background: #edf4fa;
}
.inference-settings-viz__bar-fill {
display: flex;
align-items: center;
justify-content: flex-end;
min-width: 0;
height: 100%;
border-radius: 5px;
color: #ffffff;
transition:
opacity 0.18s ease,
width 0.24s ease;
}
.inference-settings-viz__bar-fill span {
padding: 0 0.36rem;
font-family:
ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
"Courier New", monospace;
font-size: 0.68rem;
font-weight: 800;
}
.inference-settings-viz__row[data-included="false"]
.inference-settings-viz__bar-fill {
opacity: 0.22;
}
.inference-settings-viz__row[data-included="false"]
.inference-settings-viz__token {
color: #94a3b8;
}
.inference-settings-viz__row-state {
color: #64748b;
font-size: 0.68rem;
font-weight: 800;
letter-spacing: 0.04em;
text-transform: uppercase;
}
.inference-settings-viz__row[data-included="true"]
.inference-settings-viz__row-state {
color: #0f766e;
}
.inference-settings-viz__actions {
display: flex;
flex-wrap: wrap;
align-items: center;
gap: 0.45rem;
margin-top: 0.9rem;
}
.inference-settings-viz__actions button {
border: 1px solid #bad5e8;
border-radius: 8px;
background: #ffffff;
color: #0f4f76;
cursor: pointer;
font: inherit;
font-size: 0.82rem;
font-weight: 800;
line-height: 1;
padding: 0.55rem 0.7rem;
}
.inference-settings-viz__actions button:first-child {
border-color: #0b72ba;
background: #0b72ba;
color: #ffffff;
}
.inference-settings-viz__actions button:hover {
border-color: #0f4f76;
}
.inference-settings-viz__actions span {
color: #64748b;
font-size: 0.78rem;
font-weight: 700;
}
.quantization-explorer {
border: 1px solid #d7e4ef;
border-radius: 16px;
@@ -1899,7 +2409,10 @@ ol {
}
.lab-content ul.concept-pill-list > li {
grid-template-columns: 1fr;
row-gap: 0.3rem;
border-radius: 16px;
padding: 0.72rem 0.9rem;
}
.quantization-explorer__controls,
@@ -1912,6 +2425,23 @@ ol {
grid-template-columns: repeat(2, minmax(0, 1fr));
}
.inference-settings-viz {
padding: 0.9rem;
}
.inference-settings-viz__grid {
grid-template-columns: 1fr;
}
.inference-settings-viz__row {
grid-template-columns: 3.75rem minmax(0, 1fr);
}
.inference-settings-viz__row-state {
grid-column: 2;
margin-top: -0.22rem;
}
.objective5-chat__settings {
grid-template-columns: 1fr;
}
@@ -2051,7 +2581,9 @@ ol {
box-shadow: 0 12px 28px -22px rgba(15, 92, 139, 0.85);
}
.lab-content a.lab-service-pill {
.lab-content a.lab-service-pill,
.lab-content a.lab-open-pill,
.lab-content a.lab-download-pill {
display: inline-flex;
align-items: center;
gap: 0.45rem;
@@ -2072,7 +2604,9 @@ ol {
background-color 120ms ease;
}
.lab-content a.lab-service-pill::before {
.lab-content a.lab-service-pill::before,
.lab-content a.lab-open-pill::before,
.lab-content a.lab-download-pill::before {
content: "Open";
display: inline-flex;
align-items: center;
@@ -2086,7 +2620,13 @@ ol {
text-transform: uppercase;
}
.lab-content a.lab-service-pill:hover {
.lab-content a.lab-download-pill::before {
content: "Download";
}
.lab-content a.lab-service-pill:hover,
.lab-content a.lab-open-pill:hover,
.lab-content a.lab-download-pill:hover {
transform: translateY(-1px);
box-shadow: 0 12px 28px -22px rgba(15, 92, 139, 0.85);
}
@@ -2182,14 +2722,21 @@ ol {
.lab1-confidence__token {
position: relative;
border-radius: 0.42rem;
cursor: help;
padding: 0.12rem 0.08rem;
transition: filter 120ms ease;
}
.lab1-confidence__token:hover {
.lab1-confidence__token:hover,
.lab1-confidence__token[aria-describedby="lab1-confidence-tooltip"] {
filter: saturate(1.05);
}
.lab1-confidence__token:focus-visible {
outline: 2px solid rgba(15, 92, 139, 0.35);
outline-offset: 2px;
}
.lab1-confidence__token--very-high {
background: rgba(88, 185, 102, 0.3);
}
@@ -2211,25 +2758,24 @@ ol {
}
.lab1-confidence__tooltip {
position: absolute;
left: 0;
top: calc(100% + 0.45rem);
z-index: 5;
display: none;
position: fixed;
z-index: 50;
display: block;
min-width: 180px;
max-width: 260px;
max-width: min(260px, calc(100vw - 2rem));
border: 1px solid #d7e2ee;
border-radius: 0.85rem;
background: rgba(255, 255, 255, 0.98);
box-shadow: 0 18px 38px -26px rgba(17, 44, 73, 0.7);
color: #24384c;
padding: 0.7rem 0.8rem;
pointer-events: none;
transform: translateX(-50%);
white-space: normal;
}
.lab1-confidence__token:hover .lab1-confidence__tooltip,
.lab1-confidence__token:focus-visible .lab1-confidence__tooltip {
display: block;
.lab1-confidence__tooltip--above {
transform: translate(-50%, -100%);
}
.lab1-confidence__tooltip strong {