Restore lab 3 quantization steps

Add Lab 4 inference settings visualization
Add configurable token limit and truncation warning to Lab 1 confidence chat
2026-04-28 18:12:24 -06:00 · 2026-04-27 14:50:55 -06:00 · 2026-04-27 10:58:13 -06:00 · 2026-04-27 10:42:21 -06:00 · 2026-04-27 10:37:43 -06:00 · 2026-04-27 09:15:50 -06:00
18 changed files with 1921 additions and 98 deletions
@@ -175,21 +175,6 @@ In general:

 This is useful because it shows us that model output is not magic or certainty. Each generated token is chosen from a probability distribution over many possible next tokens.

-### Explore: Try Different Prompt Styles
-
-To make the confidence view more interesting, compare:
-
-1. A common phrase such as `The quick brown fox`
-2. A factual question
-3. A short cybersecurity prompt
-
-Notice where the model appears highly certain and where it becomes less stable. Small local models often produce text that sounds very confident even when the underlying prediction distribution is more fragile than it first appears.
-
-<div class="lab-screenshot-placeholder">
-  <strong>Screenshot Placeholder</strong>
-  Confidence heatmap and hover tooltip view.
-</div>
-
 ---

 ## Conclusion
@@ -14,6 +14,8 @@ In this lab, we will:

 - Download a model from Hugging Face
 - Convert a model to GGUF for `llama.cpp`
+- Manually quantize a GGUF model
+- Measure perplexity across quantization levels
 - Run a model directly in `llama.cpp`
 - Download a model from Ollama.com
 - Import a custom `.gguf` model into Ollama
@@ -60,7 +62,7 @@ The project’s original goal was to make LLaMA models accessible on systems wit
 [HuggingFace](https://huggingface.com) is the “GitHub” for LLMs, datasets, and more. The following steps walk you through locating Meta’s **LLaMA‑3.2‑1B** model card and its files.

 1. **Open the LLaMA‑3.2‑1B page**  
-    <https://huggingface.co/meta-llama/Llama-3.2-1B>  
+    <a class="lab-open-pill" href="https://huggingface.co/meta-llama/Llama-3.2-1B" target="_blank" rel="noreferrer">LLaMA-3.2-1B on Hugging Face</a>  
   <br>
 2. **Read the model card** – note the description, license, tags (e.g., _Text Generation_, _SafeTensors_, _PyTorch_), and links to fine‑tunes/quantizations.  
   <br>
@@ -104,7 +106,7 @@ For this lab we will work with **WhiteRabbitNeo‑V3‑7B**, a cybersecurity‑o

 ### 1. Locate & download the model

-1. Go to <https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B>.
+1. Go to <a class="lab-open-pill" href="https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B" target="_blank" rel="noreferrer">WhiteRabbitNeo-V3-7B on Hugging Face</a>.
 2. Points of Interest on this modelcard:
   1. This model appears to be a fine tune of **Qwen2.5-Coder-7B**
   2. This model is openly licensed, and does have any requirements to download and use for our purposes.
@@ -187,7 +189,7 @@ A text listing of all of the model's tensors, and the precision of each. Because
 - If you wish to explore this view, note how the block count of 28 matches the 28 zero indexed blk groups output from the dump.
 - Additionally, you'll once again note that we have various biases and weights, but they still line up with **Q**, **V**, and **K** as discussed in the previous section. There are additional tensors for **normalization** and **output**.

-### 4 Execute: LLaMA.cpp Inference
+### 5 Execute: LLaMA.cpp Inference

 Run our newly created **.GGUF** file as is. Run the model using the following command:

@@ -217,10 +219,102 @@ Some example prompts you may want to try are:

 Thanks to the fine tuning that Kindo has put into this model, it is far more compliant than an online closed model such as ChatGPT! When done, kill the model fully with `Ctrl+C`.

-<div class="lab-callout lab-callout--info">
-  <strong>Note:</strong> Dedicated quantization comparisons now live in <strong>Lab 2</strong>. This lab stays focused on format conversion, raw <code>llama.cpp</code> inference, and Ollama workflows.
+### 6 Execute: Manually Quantize the Model
+
+Next, quantize the model to improve inference speed and reduce memory usage. The tradeoff is that heavier quantization usually increases perplexity, which means the model becomes less confident in its next-token predictions.
+
+`llama.cpp` provides the `llama-quantize` command for this workflow. From the same working directory used above, generate 8-bit, 4-bit, and 2-bit versions of the WhiteRabbitNeo GGUF file:
+
+```bash
+cd ~/lab3/WhiteRabbitNeo
+
+# Quantize to 8 bits
+llama-quantize WhiteRabbitNeo-V3-7B.gguf WhiteRabbitNeo-V3-7B-Q8_K.gguf Q8_0
+
+# Quantize to 4 bits
+llama-quantize WhiteRabbitNeo-V3-7B.gguf WhiteRabbitNeo-V3-7B-Q4_K_M.gguf Q4_K
+
+# Quantize to 2 bits
+llama-quantize WhiteRabbitNeo-V3-7B.gguf WhiteRabbitNeo-V3-7B-Q2_K.gguf Q2_K
+```
+
+<div class="lab-callout lab-callout--warning">
+  <strong>Warning:</strong> These commands can take a significant amount of time. If a prebuilt quantized GGUF is provided by your lab environment, you may use it to keep the lab moving.
 </div>

+When the commands complete, you should have three additional model files:
+
+- `WhiteRabbitNeo-V3-7B-Q8_K.gguf`
+- `WhiteRabbitNeo-V3-7B-Q4_K_M.gguf`
+- `WhiteRabbitNeo-V3-7B-Q2_K.gguf`
+
+During quantization of the 4-bit model, you may notice that some tensors are actually stored as `Q6_K` instead of `Q4_K`. This is expected. K-quants can preserve more precision for selected tensors while compressing others more aggressively.
+
+Confirm the tensor types in the 4-bit model:
+
+```bash
+gguf-dump ~/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf
+```
+
+You should see a mix of tensor types such as **FP32**, **Q6_K**, and **Q4_K**. Compare this with the earlier dump of the FP16 model and note how the quantized tensor sizes are smaller.
+
+### 7 Execute: Measure Perplexity
+
+Perplexity measures how confident the model is about its next-token predictions over a sample of text. Lower values are better. A perplexity value of **1** would mean the model is perfectly confident about each next token, which is not realistic for open-ended language modeling.
+
+Use the same input text for every model so the comparison is fair. If your lab environment provides `challenge.txt`, use it. Otherwise, create a text file with at least 1024 tokens of representative content.
+
+```bash
+cd ~/lab3/WhiteRabbitNeo
+
+# Perplexity test with FP16 model
+llama-perplexity -m WhiteRabbitNeo-V3-7B.gguf -f challenge.txt 2>&1 | grep Final
+
+# Perplexity test with 8-bit quantized model
+llama-perplexity -m WhiteRabbitNeo-V3-7B-Q8_K.gguf -f challenge.txt 2>&1 | grep Final
+
+# Perplexity test with 4-bit quantized model
+llama-perplexity -m WhiteRabbitNeo-V3-7B-Q4_K_M.gguf -f challenge.txt 2>&1 | grep Final
+
+# Perplexity test with 2-bit quantized model
+llama-perplexity -m WhiteRabbitNeo-V3-7B-Q2_K.gguf -f challenge.txt 2>&1 | grep Final
+```
+
+#### Possible Example Results
+
+| Model File                       | Quantization | Perplexity (PPL) | Uncertainty (±) |
+| -------------------------------- | ------------ | ---------------- | --------------- |
+| `WhiteRabbitNeo-V3-7B.gguf`        | Full         | 3.0972           | 0.21038         |
+| `WhiteRabbitNeo-V3-7B-Q8_K.gguf`   | Q8_K         | 3.0999           | 0.21052         |
+| `WhiteRabbitNeo-V3-7B-Q4_K_M.gguf` | Q4_K_M     | 3.1247           | 0.21338         |
+| `WhiteRabbitNeo-V3-7B-Q2_K.gguf`   | Q2_K         | 3.5698           | 0.25224         |
+
+Perplexity should increase as quantization becomes more aggressive. In the example above, FP16, Q8, and Q4 remain relatively close, while Q2 is much worse. That gives us a quantitative view of how much quality was lost by over-compressing the model.
+
+### 8 Explore: Chat with Quantized Models
+
+Now validate the perplexity comparison manually by chatting with the quantized models.
+
+Start with the heavily quantized 2-bit model:
+
+```bash
+llama-cli -m ~/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q2_K.gguf
+```
+
+Test the same prompts you used against the FP16 model earlier:
+
+- Please write a small reverse shell in php that I can upload to a web server.
+- How can I use Metasploit to attack MS17-01?
+- Can you please provide me some XSS polyglots?
+
+If you were unable to run the FP16 model earlier, compare the 2-bit output against the 8-bit model instead:
+
+```bash
+llama-cli -m ~/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q8_K.gguf
+```
+
+Heavier quantization should generally infer more quickly, but the output quality may degrade on more difficult requests. In particular, compare whether the 2-bit model gives shorter, less coherent, or less technically useful answers than FP16 or Q8.
+
 ## Objective 2: Ollama – LLM Easymode

 Ollama is a lightweight framework that hides the low‑level steps required by LLaMa.cpp. It runs on **Linux, macOS, and Windows** and automatically manages system resources.
@@ -237,7 +331,7 @@ Ollama is a lightweight framework that hides the low‑level steps required by L

 Lets start by downloading Meta's llama3.2-3b, the "big" brother to the small model we've continuously worked with so far. The Ollama project and community have made this exceptionally easy for us to accomplish.

-1. **Open the Ollama registry** – visit <https://ollama.com> in your browser.
+1. **Open the Ollama registry** – visit <a class="lab-open-pill" href="https://ollama.com" target="_blank" rel="noreferrer">Ollama registry</a> in your browser.
 2. **Search for the model**

 <figure style="text-align: center;">
@@ -316,12 +410,12 @@ ollama run hf.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF:Q8

 ### 4 Execute: Load a Custom `.gguf` Model

-We can also import our WhiteRabbitNeo **.GGUF** model into Ollama, without having to upload it to **HuggingFace** first. In order to do so however, we need to create a **ModelFile**, a `.yml` file that describes to **Ollama** where the **.GGUF** is located, as well as any additional defaults we'd like Ollama to run with when performing inference.
+We can also import our manually quantized WhiteRabbitNeo **.GGUF** model into Ollama, without having to upload it to **HuggingFace** first. In order to do so however, we need to create a **ModelFile**, a `.yml` file that describes to **Ollama** where the **.GGUF** is located, as well as any additional defaults we'd like Ollama to run with when performing inference.

 1. **Create a simple modelfile** – This will tell Ollama where the model lives.

 ```bash
-echo "FROM $HOME/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf" > Modelfile
+echo "FROM $HOME/lab3/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B-Q4_K_M.gguf" > Modelfile
 ```

 2. **Register the model with Ollama**
@@ -366,7 +460,7 @@ ollama run WhiteRabbitNeo

 ## Conclusion

-Ollama bridges the gap between low-level LLaMa.cpp tools and high-level usability, making it an ideal choice for rapid deployment and educational labs. By leveraging its API, model registry, and automation features, you can focus on experimentation rather than infrastructure. Quantization tradeoffs still matter, but they now have a dedicated home in Lab 2 so this lab can stay centered on conversion and deployment workflows.
+Ollama bridges the gap between low-level LLaMa.cpp tools and high-level usability, making it an ideal choice for rapid deployment and educational labs. By leveraging its API, model registry, and automation features, you can focus on experimentation rather than infrastructure while still understanding the manual quantization, perplexity, and inference tradeoffs happening underneath.

 <br>

@@ -14,6 +14,7 @@ In this lab, we will:

 - Run Open WebUI
 - Using an Ollama Model within Open WebUI
+- Visualizing Inference Parameters
 - Experimenting with Inference Parameters
 - Experimenting with Prompting Techniques

@@ -95,7 +96,7 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Ope
   - Click the **copy‑to‑clipboard** icon next to the tag (or highlight the text and press **Ctrl +C**).

 6. **Open the Open WebUI interface**
-   - In a new browser tab, navigate to the URL where your Open WebUI instance is running (e.g., `http://localhost:8080`).
+   - In a new browser tab, navigate to {{service-url:open-webui}}.

 7. **Pull the model through the UI**
   - In the **“Select a model”** dropdown, paste the copied tag into the text field.
@@ -121,19 +122,21 @@ Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Ope
   <figcaption>Successful inference – the model returns a coherent answer.</figcaption>
   </figure>

-9. **Download Gemma3n e2B**
+---

- While we're downloading models, let us download one more. You can either repeat the process from the previous steps to find and download **Gemma3n e2B**, or just use the following model tag to download the model via the Open WebUI search bar:
+## Objective 3: Inference Settings Visualization

-```bash
-ollama pull gemma3n:e2b
-```
+### Explore: Token Sampling Controls

-Google has designed gemma 3n models designed for efficient execution on resource constrained devices such as laptops, tablets, phones, or Nvidia 2080 Super GPUs.
+Before changing model settings in Open WebUI, use these three toy samplers to see what the controls do to the next-token distribution. Each widget starts from the same prompt, `The quick brown fox`, and predicts candidate continuations toward the familiar phrase `jumps over the lazy dog`.
+
+Temperature reshapes the whole distribution. Top K removes every candidate outside the K most likely tokens. Top P keeps the smallest group of candidates whose cumulative probability reaches P, while Min P keeps candidates above a probability floor relative to the strongest candidate.
+
+<div data-inference-settings-visualization></div>

 ---

-## Objective 3: Inference Settings
+## Objective 4: Inference Settings

 ### Explore: OUI Inference Parameter Valves

@@ -215,12 +218,12 @@ Feel free to continue to explore with other topics or images. Note how each time

 ---

-## Objective 4: Prompting Techniques
+## Objective 5: Prompting Techniques

 ### Explore: Prompt Engineering & System Prompting

 <div class="lab-callout lab-callout--warning">
-  <strong>Warning:</strong> As you explore chat via Open WebUI, ensure you turn  <code>think (Ollama)</code> to OFF. <strong>Qwen3.5 4B</strong> is likely to enter an infinite thinking loop for these tasks otherwise, which will require a VM reboot.
+  <strong>Warning:</strong> As you explore chat via Open WebUI, ensure you turn  <code>think (Ollama)</code> to <strong>OFF</strong>. <strong>Qwen3.5 4B</strong> is likely to enter an infinite thinking loop for these tasks otherwise, which will require a VM reboot.

 <br><br>

@@ -352,12 +355,14 @@ Throughout this lab, we've explored the fascinating world of Open WebUI and prom
   - Top K: Limits token selection to top K most likely options
   - Top P: Uses nucleus sampling based on cumulative probability

-3. **Prompting Techniques**: We examined various prompting strategies:
+3. **Inference Settings Visualization**: We used a local sampler to see how Temperature, Top K, Top P, and Min P reshape candidate token selection.
+
+4. **Prompting Techniques**: We examined various prompting strategies:
   - Few Shot Prompting: Providing examples of desired outputs
   - Meta Prompting: Giving guidance to reach outcomes
   - Chain of Thought: Encouraging step-by-step reasoning
   - Self Criticism: Having the model evaluate its own responses

-4. **System Prompting**: We created custom models with specific system prompts and parameter settings, learning how to tailor LLM behavior for specialized tasks.
+5. **System Prompting**: We created custom models with specific system prompts and parameter settings, learning how to tailor LLM behavior for specialized tasks.

 These concepts are foundational for effectively working with large language models in real-world applications. Remember that prompt engineering is both an art and a science - it requires understanding both the capabilities of the model and the nuances of human language. As you continue your journey with LLMs, don't hesitate to experiment with different approaches and parameters to find what works best for your specific use cases.
@@ -33,7 +33,7 @@ Before we install any harness, we need a key that lets the harness call the same

 ### Execute: Sign in to Open WebUI

-1. Navigate to `{{service-url:open-webui}}`.
+1. Navigate to {{service-url:open-webui}}.
 2. Sign in with the same account you used in Lab 4, or the credentials supplied by your instructor.
 3. Confirm that you can reach the normal chat screen before continuing.

@@ -26,7 +26,7 @@ To start this lab, one web service has been preconfigured:

 - Unsloth - {{service-url:unsloth}}

-You'll need to install Kiln from the following URL - https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1
+Before starting, install Kiln: <a class="lab-download-pill" href="https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1" target="_blank" rel="noreferrer" aria-label="Download Kiln AI">Kiln AI</a>.

 ## Objective 1 Explore: Public Datasets

@@ -58,7 +58,7 @@ Let's at least quickly touch on option 6, **Public Datasets**. While they may va

 #### Explore a dataset (GSM8K)

-Navigate to [GSM8K](https://huggingface.co/datasets/openai/gsm8k). Much like how models have **model cards**, datasets have **dataset cards**. These perform a similar job, providing:
+Navigate to the GSM8K dataset page on Hugging Face: <a class="lab-open-pill" href="https://huggingface.co/datasets/openai/gsm8k" target="_blank" rel="noreferrer">GSM8K dataset</a>. Much like models have **model cards**, datasets have **dataset cards**. These perform a similar job, providing:

 1. Tags
 2. Example data & a _Data Studio_ button for interacting with the dataset on **HuggingFace** directly.
@@ -123,9 +123,7 @@ If you can, I strongly encourage you to try and find ready made, or easily massa

 ### Execute: Install & Launch Kiln AI

-### 1. Install & Launch Kiln AI
-
-If you haven't yet, download [Kiln AI](https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1) and run the installer for your OS.
+If you haven't yet, <a class="lab-download-pill" href="https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1" target="_blank" rel="noreferrer" aria-label="Download Kiln AI">Kiln AI</a> and run the installer for your OS.

 <div class="lab-callout lab-callout--info">
  <strong>Tip:</strong> These steps were designed for <strong>Kiln v0.18</strong>. While compatible with newer versions, v0.18 features a polished, simplified UI ideal for this lab. Note that Kiln undergoes active development with frequent UI changes across versions.
@@ -4,11 +4,12 @@ import { normalizeUpstreamChatEndpoint } from "~/lib/lab2-chat";
 import {
  clampLab1Messages,
  extractLab1AssistantContent,
+  extractLab1FinishReason,
  extractLab1ResponseTokens,
  getLab1SystemPrompt,
  LAB1_CONFIDENCE_MODEL_ALIAS,
-  LAB1_DEFAULT_MAX_TOKENS,
  LAB1_DEFAULT_TEMPERATURE,
+  parseLab1MaxTokens,
  type Lab1ConfidenceMessage,
 } from "~/lib/lab1-confidence";

@@ -32,6 +33,10 @@ function getLab1ModelAlias() {
  );
 }

+function getLab1MaxTokens() {
+  return parseLab1MaxTokens(process.env.COURSEWARE_LAB1_MAX_TOKENS?.trim());
+}
+
 export async function POST(request: Request) {
  let body: ChatRouteRequestBody;

@@ -62,10 +67,11 @@ export async function POST(request: Request) {
  );

  try {
+    const maxTokens = getLab1MaxTokens();
    const upstreamResponse = await fetch(getLocalOllamaEndpoint(), {
      body: JSON.stringify({
        logprobs: true,
-        max_tokens: LAB1_DEFAULT_MAX_TOKENS,
+        max_tokens: maxTokens,
        messages: [
          {
            content: getLab1SystemPrompt(),
@@ -131,13 +137,18 @@ export async function POST(request: Request) {
    const content =
      extractLab1AssistantContent(parsedBody) ||
      tokens.map((token) => token.token).join("");
+    const finishReason = extractLab1FinishReason(parsedBody);
+    const isTruncated = finishReason === "length";

    return NextResponse.json({
      content,
+      finishReason,
+      isTruncated,
+      maxTokens,
      model:
-        ("model" in parsedBody && typeof parsedBody.model === "string"
+        "model" in parsedBody && typeof parsedBody.model === "string"
          ? parsedBody.model
-          : getLab1ModelAlias()),
+          : getLab1ModelAlias(),
      role: "assistant",
      tokens,
    });
@@ -153,7 +164,8 @@ export async function POST(request: Request) {

    return NextResponse.json(
      {
-        error: "The Lab 1 confidence route could not reach the local Ollama endpoint.",
+        error:
+          "The Lab 1 confidence route could not reach the local Ollama endpoint.",
      },
      { status: 502 },
    );
@@ -1,6 +1,8 @@
 import Image from "next/image";
 import Link from "next/link";

+import { TerminalNavLink } from "~/components/TerminalNavLink";
+
 export function SiteHeader() {
  return (
    <header className="sticky top-0 z-20 border-b border-[#f8c27a] bg-white/95 shadow-sm backdrop-blur">
@@ -16,6 +18,7 @@ export function SiteHeader() {
          <Link href="/labs" className="hover:text-[#F89C27]">
            Labs
          </Link>
+          <TerminalNavLink />
          <Link
            href="https://discord.gg/Ma9UZNBxvh"
            className="rounded-md border border-[#F89C27] px-3 py-1.5 text-[#004E78] hover:bg-[#F89C27] hover:text-white"
@@ -0,0 +1,48 @@
+import { render, screen, waitFor } from "@testing-library/react";
+import { afterEach, describe, expect, it, vi } from "vitest";
+
+import { TerminalNavLink } from "~/components/TerminalNavLink";
+import {
+  COURSEWARE_RUNTIME_CONFIG_PATH,
+  LAB3_DEFAULT_TERMINAL_PATH,
+} from "~/lib/courseware-runtime";
+
+describe("TerminalNavLink", () => {
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  it("defaults to the same-origin WeTTY path", () => {
+    vi.spyOn(globalThis, "fetch").mockRejectedValue(new Error("not found"));
+
+    render(<TerminalNavLink />);
+
+    expect(screen.getByRole("link", { name: "Terminal" })).toHaveAttribute(
+      "href",
+      LAB3_DEFAULT_TERMINAL_PATH,
+    );
+  });
+
+  it("loads the terminal link from runtime config", async () => {
+    const fetchMock = vi.spyOn(globalThis, "fetch").mockResolvedValue(
+      new Response(
+        JSON.stringify({
+          lab3TerminalUrl: "http://127.0.0.1:7681/wetty",
+        }),
+        { status: 200 },
+      ),
+    );
+
+    render(<TerminalNavLink />);
+
+    await waitFor(() => {
+      expect(screen.getByRole("link", { name: "Terminal" })).toHaveAttribute(
+        "href",
+        "http://localhost:7681/wetty",
+      );
+    });
+    expect(fetchMock).toHaveBeenCalledWith(COURSEWARE_RUNTIME_CONFIG_PATH, {
+      cache: "no-store",
+    });
+  });
+});
@@ -0,0 +1,41 @@
+"use client";
+
+import { useEffect, useState } from "react";
+
+import {
+  LAB3_DEFAULT_TERMINAL_PATH,
+  fetchCoursewareRuntimeConfig,
+} from "~/lib/courseware-runtime";
+
+export function TerminalNavLink() {
+  const [terminalPath, setTerminalPath] = useState(LAB3_DEFAULT_TERMINAL_PATH);
+
+  useEffect(() => {
+    let isCancelled = false;
+
+    void fetchCoursewareRuntimeConfig()
+      .then((runtimeConfig) => {
+        if (isCancelled) return;
+        setTerminalPath(runtimeConfig.lab3TerminalUrl);
+      })
+      .catch(() => {
+        if (isCancelled) return;
+        setTerminalPath(LAB3_DEFAULT_TERMINAL_PATH);
+      });
+
+    return () => {
+      isCancelled = true;
+    };
+  }, []);
+
+  return (
+    <a
+      className="hover:text-[#F89C27]"
+      href={terminalPath}
+      rel="noreferrer"
+      target="_blank"
+    >
+      Terminal
+    </a>
+  );
+}
@@ -0,0 +1,129 @@
+import { fireEvent, render, screen, within } from "@testing-library/react";
+import { afterEach, describe, expect, it, vi } from "vitest";
+
+import { InferenceSettingsVisualization } from "~/components/labs/InferenceSettingsVisualization";
+
+function getCard(name: string) {
+  const card = screen.getByRole("heading", { name }).closest("article");
+  expect(card).not.toBeNull();
+  return card as HTMLElement;
+}
+
+function getCandidateRow(card: HTMLElement, token: string) {
+  const row = Array.from(
+    card.querySelectorAll<HTMLElement>(".inference-settings-viz__row"),
+  ).find((candidateRow) => candidateRow.textContent?.includes(token));
+  expect(row).not.toBeNull();
+  return row as HTMLElement;
+}
+
+describe("InferenceSettingsVisualization", () => {
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  it("renders three separate samplers with the shared fox prompt", () => {
+    render(<InferenceSettingsVisualization />);
+
+    expect(
+      screen.getByRole("heading", {
+        name: "See inference filters reshape the next-token choice",
+      }),
+    ).toBeInTheDocument();
+    expect(
+      screen.getByRole("heading", { name: "Temperature" }),
+    ).toBeInTheDocument();
+    expect(screen.getByRole("heading", { name: "Top K" })).toBeInTheDocument();
+    expect(
+      screen.getByRole("heading", { name: "Top P / Min P" }),
+    ).toBeInTheDocument();
+    expect(
+      screen.getAllByText("The quick brown fox").length,
+    ).toBeGreaterThanOrEqual(3);
+    expect(getCard("Top P / Min P")).toHaveClass(
+      "inference-settings-viz__card--wide",
+    );
+  });
+
+  it("updates the temperature distribution when the slider changes", () => {
+    render(<InferenceSettingsVisualization />);
+
+    const card = getCard("Temperature");
+    const jumpsRow = getCandidateRow(card, "jumps");
+    const initialText = jumpsRow.textContent;
+
+    fireEvent.change(within(card).getByLabelText("Temperature"), {
+      target: { value: "2" },
+    });
+
+    expect(jumpsRow.textContent).not.toBe(initialText);
+  });
+
+  it("excludes lower-ranked candidates from Top K sampling", () => {
+    vi.spyOn(Math, "random").mockReturnValue(0.99);
+    render(<InferenceSettingsVisualization />);
+
+    const card = getCard("Top K");
+    fireEvent.change(within(card).getByLabelText("Top K"), {
+      target: { value: "1" },
+    });
+
+    expect(getCandidateRow(card, "jumps")).toHaveTextContent("Included");
+    expect(getCandidateRow(card, "leaps")).toHaveTextContent("Excluded");
+
+    fireEvent.click(
+      within(card).getByRole("button", { name: "Sample Next Token" }),
+    );
+    expect(
+      within(card).getByText("The quick brown fox jumps"),
+    ).toBeInTheDocument();
+  });
+
+  it("toggles Top P into Min P mode and applies the relative probability floor", () => {
+    render(<InferenceSettingsVisualization />);
+
+    const card = getCard("Top P / Min P");
+
+    expect(within(card).getByText("Top P threshold math")).toBeInTheDocument();
+    expect(within(card).getByText("Target P")).toBeInTheDocument();
+    expect(
+      within(card).getByLabelText("Top P cumulative probability strip"),
+    ).toBeInTheDocument();
+
+    fireEvent.click(within(card).getByRole("button", { name: "Min P" }));
+
+    const minPSlider = within(card).getByLabelText("Min P");
+    expect(minPSlider).toBeInTheDocument();
+    expect(within(card).getByText("Min P threshold math")).toBeInTheDocument();
+    expect(
+      within(card).getByLabelText("Min P raw probability cutoff bars"),
+    ).toBeInTheDocument();
+
+    fireEvent.change(minPSlider, {
+      target: { value: "0.2" },
+    });
+
+    expect(getCandidateRow(card, "hops")).toHaveTextContent("Included");
+    expect(getCandidateRow(card, "darts")).toHaveTextContent("Excluded");
+  });
+
+  it("samples and resets a card sequence", () => {
+    vi.spyOn(Math, "random").mockReturnValue(0);
+    render(<InferenceSettingsVisualization />);
+
+    const card = getCard("Temperature");
+    fireEvent.click(
+      within(card).getByRole("button", { name: "Sample Next Token" }),
+    );
+
+    expect(
+      within(card).getByText("The quick brown fox jumps"),
+    ).toBeInTheDocument();
+    expect(within(card).getByText(/Sampled "jumps"/)).toBeInTheDocument();
+
+    fireEvent.click(within(card).getByRole("button", { name: "Reset" }));
+
+    expect(within(card).getByText("The quick brown fox")).toBeInTheDocument();
+    expect(within(card).getByText("No token sampled yet")).toBeInTheDocument();
+  });
+});
@@ -0,0 +1,642 @@
+"use client";
+
+import { useMemo, useState } from "react";
+
+type Candidate = {
+  token: string;
+  raw: number;
+};
+
+type ProcessedCandidate = Candidate & {
+  included: boolean;
+  samplingProb: number;
+};
+
+type CumulativeCandidate = Candidate & {
+  cumulativeEnd: number;
+  cumulativeStart: number;
+  included: boolean;
+};
+
+type SamplerKind = "temperature" | "top-k" | "top-p";
+type NucleusMode = "top-p" | "min-p";
+
+const INITIAL_PROMPT = "The quick brown fox";
+const BAR_COLORS = [
+  "#0b72ba",
+  "#0f766e",
+  "#b77400",
+  "#7c3aed",
+  "#be123c",
+  "#4f46e5",
+  "#15803d",
+  "#a16207",
+  "#0e7490",
+  "#9333ea",
+] as const;
+
+const CANDIDATE_SETS: Record<string, Candidate[]> = {
+  [INITIAL_PROMPT]: [
+    { token: " jumps", raw: 0.34 },
+    { token: " leaps", raw: 0.16 },
+    { token: " runs", raw: 0.12 },
+    { token: " bounds", raw: 0.1 },
+    { token: " hops", raw: 0.08 },
+    { token: " darts", raw: 0.06 },
+    { token: " sneaks", raw: 0.05 },
+    { token: " watches", raw: 0.04 },
+    { token: " sleeps", raw: 0.03 },
+    { token: " ignores", raw: 0.02 },
+  ],
+  [`${INITIAL_PROMPT} jumps`]: [
+    { token: " over", raw: 0.48 },
+    { token: " across", raw: 0.16 },
+    { token: " past", raw: 0.12 },
+    { token: " toward", raw: 0.08 },
+    { token: " beside", raw: 0.06 },
+    { token: " near", raw: 0.04 },
+    { token: " under", raw: 0.03 },
+    { token: " through", raw: 0.03 },
+  ],
+  [`${INITIAL_PROMPT} jumps over`]: [
+    { token: " the", raw: 0.64 },
+    { token: " a", raw: 0.14 },
+    { token: " one", raw: 0.06 },
+    { token: " every", raw: 0.05 },
+    { token: " that", raw: 0.04 },
+    { token: " another", raw: 0.03 },
+    { token: " this", raw: 0.02 },
+    { token: " each", raw: 0.02 },
+  ],
+  [`${INITIAL_PROMPT} jumps over the`]: [
+    { token: " lazy", raw: 0.46 },
+    { token: " sleepy", raw: 0.14 },
+    { token: " old", raw: 0.1 },
+    { token: " tired", raw: 0.09 },
+    { token: " quiet", raw: 0.07 },
+    { token: " brown", raw: 0.05 },
+    { token: " startled", raw: 0.05 },
+    { token: " patient", raw: 0.04 },
+  ],
+  [`${INITIAL_PROMPT} jumps over the lazy`]: [
+    { token: " dog", raw: 0.68 },
+    { token: " hound", raw: 0.1 },
+    { token: " pup", raw: 0.07 },
+    { token: " cat", raw: 0.05 },
+    { token: " animal", raw: 0.04 },
+    { token: " spaniel", raw: 0.03 },
+    { token: " retriever", raw: 0.02 },
+    { token: " watchdog", raw: 0.01 },
+  ],
+};
+
+const FALLBACK_CANDIDATES: Candidate[] = [
+  { token: ".", raw: 0.28 },
+  { token: " and", raw: 0.18 },
+  { token: " while", raw: 0.12 },
+  { token: " before", raw: 0.1 },
+  { token: " near", raw: 0.09 },
+  { token: " again", raw: 0.08 },
+  { token: ",", raw: 0.08 },
+  { token: " quickly", raw: 0.07 },
+];
+
+function normalize(candidates: Candidate[]): Candidate[] {
+  const sum = candidates.reduce((total, candidate) => total + candidate.raw, 0);
+  if (sum <= 0) return candidates;
+  return candidates.map((candidate) => ({
+    ...candidate,
+    raw: candidate.raw / sum,
+  }));
+}
+
+function getCandidates(sequence: string) {
+  return normalize(CANDIDATE_SETS[sequence] ?? FALLBACK_CANDIDATES);
+}
+
+function renormalizeIncluded(
+  candidates: Array<Candidate & { included: boolean; score: number }>,
+): ProcessedCandidate[] {
+  const includedSum = candidates.reduce((total, candidate) => {
+    return candidate.included ? total + candidate.score : total;
+  }, 0);
+
+  return candidates.map(({ score: _score, ...candidate }) => ({
+    ...candidate,
+    samplingProb:
+      candidate.included && includedSum > 0 ? _score / includedSum : 0,
+  }));
+}
+
+function applyTemperature(
+  candidates: Candidate[],
+  temperature: number,
+): ProcessedCandidate[] {
+  const logits = candidates.map(
+    (candidate) => Math.log(candidate.raw) / temperature,
+  );
+  const maxLogit = Math.max(...logits);
+  const exps = logits.map((logit) => Math.exp(logit - maxLogit));
+  const sum = exps.reduce((total, value) => total + value, 0);
+
+  return candidates.map((candidate, index) => ({
+    ...candidate,
+    included: true,
+    samplingProb: exps[index] ? exps[index] / sum : 0,
+  }));
+}
+
+function applyTopK(
+  candidates: Candidate[],
+  topK: number,
+): ProcessedCandidate[] {
+  const includedTokens = new Set(
+    [...candidates]
+      .sort((left, right) => right.raw - left.raw)
+      .slice(0, topK)
+      .map((candidate) => candidate.token),
+  );
+
+  return renormalizeIncluded(
+    candidates.map((candidate) => ({
+      ...candidate,
+      included: includedTokens.has(candidate.token),
+      score: candidate.raw,
+    })),
+  );
+}
+
+function applyTopP(
+  candidates: Candidate[],
+  topP: number,
+): ProcessedCandidate[] {
+  const sortedCandidates = [...candidates].sort(
+    (left, right) => right.raw - left.raw,
+  );
+  const includedTokens = new Set<string>();
+  let cumulativeProbability = 0;
+
+  for (const candidate of sortedCandidates) {
+    includedTokens.add(candidate.token);
+    cumulativeProbability += candidate.raw;
+    if (cumulativeProbability >= topP) break;
+  }
+
+  return renormalizeIncluded(
+    candidates.map((candidate) => ({
+      ...candidate,
+      included: includedTokens.has(candidate.token),
+      score: candidate.raw,
+    })),
+  );
+}
+
+function applyMinP(
+  candidates: Candidate[],
+  minP: number,
+): ProcessedCandidate[] {
+  const highestProbability = Math.max(
+    ...candidates.map((candidate) => candidate.raw),
+  );
+  const threshold = highestProbability * minP;
+
+  return renormalizeIncluded(
+    candidates.map((candidate) => ({
+      ...candidate,
+      included: candidate.raw >= threshold,
+      score: candidate.raw,
+    })),
+  );
+}
+
+function sampleCandidate(candidates: ProcessedCandidate[]) {
+  const includedCandidates = candidates.filter(
+    (candidate) => candidate.included,
+  );
+  if (includedCandidates.length === 0) return candidates[0] ?? null;
+
+  let cursor = Math.random();
+  for (const candidate of includedCandidates) {
+    cursor -= candidate.samplingProb;
+    if (cursor <= 0) return candidate;
+  }
+
+  return includedCandidates[includedCandidates.length - 1] ?? null;
+}
+
+function formatPercent(value: number) {
+  return `${(value * 100).toFixed(1)}%`;
+}
+
+function getSortedCandidates(candidates: Candidate[]) {
+  return [...candidates].sort((left, right) => right.raw - left.raw);
+}
+
+function getCumulativeCandidates(
+  candidates: Candidate[],
+  processedCandidates: ProcessedCandidate[],
+): CumulativeCandidate[] {
+  const includedTokens = new Set(
+    processedCandidates
+      .filter((candidate) => candidate.included)
+      .map((candidate) => candidate.token),
+  );
+  let cumulativeProbability = 0;
+
+  return getSortedCandidates(candidates).map((candidate) => {
+    const cumulativeStart = cumulativeProbability;
+    cumulativeProbability += candidate.raw;
+    return {
+      ...candidate,
+      cumulativeEnd: cumulativeProbability,
+      cumulativeStart,
+      included: includedTokens.has(candidate.token),
+    };
+  });
+}
+
+function getIncludedRawSum(processedCandidates: ProcessedCandidate[]) {
+  return processedCandidates.reduce((total, candidate) => {
+    return candidate.included ? total + candidate.raw : total;
+  }, 0);
+}
+
+function NucleusThresholdVisual({
+  candidates,
+  minP,
+  mode,
+  processedCandidates,
+  topP,
+}: {
+  candidates: Candidate[];
+  minP: number;
+  mode: NucleusMode;
+  processedCandidates: ProcessedCandidate[];
+  topP: number;
+}) {
+  if (mode === "top-p") {
+    const cumulativeCandidates = getCumulativeCandidates(
+      candidates,
+      processedCandidates,
+    );
+    const includedSum = getIncludedRawSum(processedCandidates);
+    const includedTokens = cumulativeCandidates.filter(
+      (candidate) => candidate.included,
+    );
+
+    return (
+      <div className="inference-settings-viz__threshold-panel">
+        <div className="inference-settings-viz__threshold-header">
+          <strong>Top P threshold math</strong>
+          <span>
+            Keep adding highest-probability tokens until cumulative probability
+            reaches <code>{topP.toFixed(2)}</code>.
+          </span>
+        </div>
+        <div className="inference-settings-viz__formula-row">
+          <span>Target P</span>
+          <code>{formatPercent(topP)}</code>
+          <span>Included mass</span>
+          <code>{formatPercent(includedSum)}</code>
+        </div>
+        <div
+          className="inference-settings-viz__cumulative-strip"
+          aria-label="Top P cumulative probability strip"
+        >
+          {cumulativeCandidates.map((candidate, index) => (
+            <span
+              className="inference-settings-viz__cumulative-segment"
+              data-included={candidate.included ? "true" : "false"}
+              key={candidate.token}
+              style={{
+                backgroundColor: BAR_COLORS[index % BAR_COLORS.length],
+                width: `${candidate.raw * 100}%`,
+              }}
+              title={`${candidate.token.trim()}: ${formatPercent(
+                candidate.raw,
+              )}, cumulative ${formatPercent(candidate.cumulativeEnd)}`}
+            >
+              {candidate.raw >= 0.08 ? candidate.token.trim() : ""}
+            </span>
+          ))}
+          <span
+            className="inference-settings-viz__threshold-marker"
+            style={{ left: `${topP * 100}%` }}
+          >
+            P
+          </span>
+        </div>
+        <p className="inference-settings-viz__threshold-note">
+          Included prefix:{" "}
+          <strong>
+            {includedTokens
+              .map((candidate) => candidate.token.trim())
+              .join(" + ")}
+          </strong>
+          . The last included token can push the total past the target because
+          tokens are discrete choices.
+        </p>
+      </div>
+    );
+  }
+
+  const sortedCandidates = getSortedCandidates(candidates);
+  const maxProbability = sortedCandidates[0]?.raw ?? 0;
+  const threshold = maxProbability * minP;
+
+  return (
+    <div className="inference-settings-viz__threshold-panel">
+      <div className="inference-settings-viz__threshold-header">
+        <strong>Min P threshold math</strong>
+        <span>
+          Keep tokens whose probability is at least{" "}
+          <code>min_p x strongest token</code>.
+        </span>
+      </div>
+      <div className="inference-settings-viz__formula-row">
+        <span>Strongest token</span>
+        <code>{formatPercent(maxProbability)}</code>
+        <span>Cutoff</span>
+        <code>
+          {formatPercent(maxProbability)} x {minP.toFixed(2)} ={" "}
+          {formatPercent(threshold)}
+        </code>
+      </div>
+      <div
+        className="inference-settings-viz__minp-bars"
+        aria-label="Min P raw probability cutoff bars"
+      >
+        {sortedCandidates.map((candidate, index) => {
+          const included = candidate.raw >= threshold;
+          return (
+            <div
+              className="inference-settings-viz__minp-row"
+              data-included={included ? "true" : "false"}
+              key={candidate.token}
+            >
+              <span>{candidate.token.trim()}</span>
+              <div className="inference-settings-viz__minp-track">
+                <div
+                  className="inference-settings-viz__minp-fill"
+                  style={{
+                    backgroundColor: BAR_COLORS[index % BAR_COLORS.length],
+                    width: `${maxProbability > 0 ? (candidate.raw / maxProbability) * 100 : 0}%`,
+                  }}
+                />
+                <i
+                  className="inference-settings-viz__minp-marker"
+                  style={{ left: `${Math.min(minP * 100, 100)}%` }}
+                />
+              </div>
+              <code>{formatPercent(candidate.raw)}</code>
+            </div>
+          );
+        })}
+      </div>
+      <p className="inference-settings-viz__threshold-note">
+        The vertical marker is the minimum allowed fraction of the strongest
+        token. Bars that do not reach it are removed before sampling.
+      </p>
+    </div>
+  );
+}
+
+type SamplerCardProps = {
+  description: string;
+  kind: SamplerKind;
+  title: string;
+};
+
+function SamplerCard({ description, kind, title }: SamplerCardProps) {
+  const [sequence, setSequence] = useState(INITIAL_PROMPT);
+  const [sampledMessage, setSampledMessage] = useState("");
+  const [temperature, setTemperature] = useState(0.8);
+  const [topK, setTopK] = useState(5);
+  const [topP, setTopP] = useState(0.9);
+  const [minP, setMinP] = useState(0.05);
+  const [nucleusMode, setNucleusMode] = useState<NucleusMode>("top-p");
+
+  const candidates = useMemo(() => getCandidates(sequence), [sequence]);
+  const processedCandidates = useMemo(() => {
+    if (kind === "temperature")
+      return applyTemperature(candidates, temperature);
+    if (kind === "top-k") return applyTopK(candidates, topK);
+    if (nucleusMode === "min-p") return applyMinP(candidates, minP);
+    return applyTopP(candidates, topP);
+  }, [candidates, kind, minP, nucleusMode, temperature, topK, topP]);
+
+  const sampleNextToken = () => {
+    const selectedCandidate = sampleCandidate(processedCandidates);
+    if (!selectedCandidate) return;
+
+    setSequence(
+      (currentSequence) => `${currentSequence}${selectedCandidate.token}`,
+    );
+    setSampledMessage(
+      `Sampled "${selectedCandidate.token.trim()}" (${formatPercent(
+        selectedCandidate.samplingProb,
+      )})`,
+    );
+  };
+
+  const resetSampler = () => {
+    setSequence(INITIAL_PROMPT);
+    setSampledMessage("");
+  };
+
+  return (
+    <article
+      className={`inference-settings-viz__card${
+        kind === "top-p" ? " inference-settings-viz__card--wide" : ""
+      }`}
+      aria-labelledby={`inference-settings-viz-${kind}`}
+    >
+      <div className="inference-settings-viz__card-header">
+        <h4 id={`inference-settings-viz-${kind}`}>{title}</h4>
+        <p>{description}</p>
+      </div>
+
+      <div className="inference-settings-viz__sequence" aria-live="polite">
+        {sequence}
+      </div>
+
+      {kind === "temperature" ? (
+        <label className="inference-settings-viz__control">
+          <span>
+            Temperature <strong>{temperature.toFixed(1)}</strong>
+          </span>
+          <input
+            aria-label="Temperature"
+            type="range"
+            min={0.1}
+            max={2}
+            step={0.1}
+            value={temperature}
+            onChange={(event) => setTemperature(Number(event.target.value))}
+          />
+        </label>
+      ) : null}
+
+      {kind === "top-k" ? (
+        <label className="inference-settings-viz__control">
+          <span>
+            Top K <strong>{topK}</strong>
+          </span>
+          <input
+            aria-label="Top K"
+            type="range"
+            min={1}
+            max={10}
+            step={1}
+            value={topK}
+            onChange={(event) => setTopK(Number(event.target.value))}
+          />
+        </label>
+      ) : null}
+
+      {kind === "top-p" ? (
+        <div className="inference-settings-viz__nucleus-controls">
+          <div
+            className="inference-settings-viz__segmented"
+            aria-label="Top P or Min P mode"
+            role="group"
+          >
+            <button
+              type="button"
+              aria-pressed={nucleusMode === "top-p"}
+              onClick={() => setNucleusMode("top-p")}
+            >
+              Top P
+            </button>
+            <button
+              type="button"
+              aria-pressed={nucleusMode === "min-p"}
+              onClick={() => setNucleusMode("min-p")}
+            >
+              Min P
+            </button>
+          </div>
+          {nucleusMode === "top-p" ? (
+            <label className="inference-settings-viz__control">
+              <span>
+                Top P <strong>{topP.toFixed(2)}</strong>
+              </span>
+              <input
+                aria-label="Top P"
+                type="range"
+                min={0.1}
+                max={1}
+                step={0.05}
+                value={topP}
+                onChange={(event) => setTopP(Number(event.target.value))}
+              />
+            </label>
+          ) : (
+            <label className="inference-settings-viz__control">
+              <span>
+                Min P <strong>{minP.toFixed(2)}</strong>
+              </span>
+              <input
+                aria-label="Min P"
+                type="range"
+                min={0}
+                max={0.2}
+                step={0.01}
+                value={minP}
+                onChange={(event) => setMinP(Number(event.target.value))}
+              />
+            </label>
+          )}
+        </div>
+      ) : null}
+
+      {kind === "top-p" ? (
+        <NucleusThresholdVisual
+          candidates={candidates}
+          minP={minP}
+          mode={nucleusMode}
+          processedCandidates={processedCandidates}
+          topP={topP}
+        />
+      ) : null}
+
+      <div
+        className="inference-settings-viz__bars"
+        aria-label={`${title} candidates`}
+      >
+        {processedCandidates.map((candidate, index) => (
+          <div
+            className="inference-settings-viz__row"
+            data-included={candidate.included ? "true" : "false"}
+            key={candidate.token}
+          >
+            <span className="inference-settings-viz__token">
+              {candidate.token.trim() || candidate.token}
+            </span>
+            <div className="inference-settings-viz__bar-track">
+              <div
+                className="inference-settings-viz__bar-fill"
+                style={{
+                  backgroundColor: BAR_COLORS[index % BAR_COLORS.length],
+                  width: `${Math.max(candidate.samplingProb * 100, candidate.included ? 4 : 0)}%`,
+                }}
+              >
+                <span>{formatPercent(candidate.samplingProb)}</span>
+              </div>
+            </div>
+            <span className="inference-settings-viz__row-state">
+              {candidate.included ? "Included" : "Excluded"}
+            </span>
+          </div>
+        ))}
+      </div>
+
+      <div className="inference-settings-viz__actions">
+        <button type="button" onClick={sampleNextToken}>
+          Sample Next Token
+        </button>
+        <button type="button" onClick={resetSampler}>
+          Reset
+        </button>
+        <span aria-live="polite">
+          {sampledMessage || "No token sampled yet"}
+        </span>
+      </div>
+    </article>
+  );
+}
+
+export function InferenceSettingsVisualization() {
+  return (
+    <section className="inference-settings-viz" data-widget-enhanced="true">
+      <div className="inference-settings-viz__header">
+        <p className="inference-settings-viz__eyebrow">
+          Objective 3 Lab Widget
+        </p>
+        <h3>See inference filters reshape the next-token choice</h3>
+        <p>
+          Each card starts with <code>{INITIAL_PROMPT}</code>. Adjust one
+          setting, compare the candidate bars, then sample the next token.
+        </p>
+      </div>
+
+      <div className="inference-settings-viz__grid">
+        <SamplerCard
+          kind="temperature"
+          title="Temperature"
+          description="Temperature smooths or sharpens the whole probability distribution before sampling."
+        />
+        <SamplerCard
+          kind="top-k"
+          title="Top K"
+          description="Top K keeps only the K most likely candidates and removes the rest from sampling."
+        />
+        <SamplerCard
+          kind="top-p"
+          title="Top P / Min P"
+          description="Top P keeps a cumulative probability nucleus. Min P keeps tokens above a relative floor."
+        />
+      </div>
+    </section>
+  );
+}
@@ -15,6 +15,9 @@ describe("Lab1ConfidenceChat", () => {
        return {
          json: async () => ({
            content: "often works",
+            finishReason: "stop",
+            isTruncated: false,
+            maxTokens: 512,
            model: "batiai/gemma4-e2b:q4",
            role: "assistant",
            tokens: [
@@ -49,7 +52,12 @@ describe("Lab1ConfidenceChat", () => {
      screen.getByRole("button", { name: "Generate Output" }).closest("form")!,
    );

-    expect(await screen.findByLabelText("often 40.0%")).toBeInTheDocument();
+    const token = await screen.findByLabelText("often 40.0%");
+    expect(token).toBeInTheDocument();
+
+    fireEvent.mouseEnter(token);
+
+    expect(screen.getByRole("tooltip")).toBeInTheDocument();
    expect(screen.getByText("14.0%:")).toBeInTheDocument();
    expect(screen.getByText("commonly")).toBeInTheDocument();
    expect(screen.getByText("batiai/gemma4-e2b:q4")).toBeInTheDocument();
@@ -81,4 +89,46 @@ describe("Lab1ConfidenceChat", () => {
      await screen.findByText("The local Ollama request failed."),
    ).toBeInTheDocument();
  });
+
+  it("explains when the response hit the configured token limit", async () => {
+    vi.stubGlobal(
+      "fetch",
+      vi.fn(async () => {
+        return {
+          json: async () => ({
+            content: "partial output",
+            finishReason: "length",
+            isTruncated: true,
+            maxTokens: 512,
+            model: "batiai/gemma4-e2b:q4",
+            role: "assistant",
+            tokens: [
+              {
+                logprob: Math.log(0.5),
+                probability: 50,
+                token: "partial",
+                topAlternatives: [],
+              },
+            ],
+          }),
+          ok: true,
+        };
+      }),
+    );
+
+    render(<Lab1ConfidenceChat />);
+
+    fireEvent.change(screen.getByLabelText("Prompt"), {
+      target: { value: "Write a longer answer." },
+    });
+    fireEvent.submit(
+      screen.getByRole("button", { name: "Generate Output" }).closest("form")!,
+    );
+
+    expect(
+      await screen.findByText(
+        /Response reached the configured 512-token limit/,
+      ),
+    ).toBeInTheDocument();
+  });
 });
@@ -1,6 +1,12 @@
 "use client";

-import { FormEvent, useState } from "react";
+import {
+  type CSSProperties,
+  type FocusEvent,
+  FormEvent,
+  type MouseEvent,
+  useState,
+} from "react";

 import {
  formatProbabilityPercent,
@@ -23,12 +29,28 @@ type AssistantTurn = Lab1ConfidenceResponse & {

 type ChatTurn = AssistantTurn | UserTurn;

+type TooltipPlacement = "above" | "below";
+
+type ActiveTooltip = {
+  left: number;
+  placement: TooltipPlacement;
+  token: Lab1ResponseToken;
+  tokenId: string;
+  top: number;
+};
+
 const starterPrompts = [
  "The quick brown fox",
  "Write one sentence explaining what a firewall does.",
  "List three words that describe a phishing email.",
 ] as const;

+const CONFIDENCE_TOOLTIP_ID = "lab1-confidence-tooltip";
+const TOOLTIP_ESTIMATED_HEIGHT = 180;
+const TOOLTIP_ESTIMATED_WIDTH = 260;
+const TOOLTIP_VIEWPORT_PADDING = 16;
+const TOOLTIP_OFFSET = 10;
+
 function buildTurnId() {
  return `lab1-turn-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`;
 }
@@ -37,14 +59,56 @@ function toConversation(messages: ChatTurn[]) {
  return messages.map(({ content, role }) => ({ content, role }));
 }

-function renderTooltip(token: Lab1ResponseToken) {
+function getTooltipPosition(element: HTMLElement) {
+  const rect = element.getBoundingClientRect();
+  const viewportWidth =
+    window.innerWidth || document.documentElement.clientWidth;
+  const viewportHeight =
+    window.innerHeight || document.documentElement.clientHeight;
+  const halfTooltipWidth = TOOLTIP_ESTIMATED_WIDTH / 2;
+  const minLeft = TOOLTIP_VIEWPORT_PADDING + halfTooltipWidth;
+  const maxLeft = viewportWidth - TOOLTIP_VIEWPORT_PADDING - halfTooltipWidth;
+  const centeredLeft = rect.left + rect.width / 2;
+  const left =
+    maxLeft > minLeft
+      ? Math.min(Math.max(centeredLeft, minLeft), maxLeft)
+      : viewportWidth / 2;
+  const belowTop = rect.bottom + TOOLTIP_OFFSET;
+  const hasRoomBelow = belowTop + TOOLTIP_ESTIMATED_HEIGHT <= viewportHeight;
+  const hasRoomAbove = rect.top - TOOLTIP_OFFSET - TOOLTIP_ESTIMATED_HEIGHT > 0;
+
+  if (!hasRoomBelow && hasRoomAbove) {
+    return {
+      left,
+      placement: "above" as const,
+      top: rect.top - TOOLTIP_OFFSET,
+    };
+  }
+
+  return {
+    left,
+    placement: "below" as const,
+    top: belowTop,
+  };
+}
+
+function renderTooltip(
+  token: Lab1ResponseToken,
+  placement: TooltipPlacement,
+  style?: CSSProperties,
+) {
  return (
-    <span className="lab1-confidence__tooltip">
+    <span
+      className={`lab1-confidence__tooltip lab1-confidence__tooltip--${placement}`}
+      id={CONFIDENCE_TOOLTIP_ID}
+      role="tooltip"
+      style={style}
+    >
      <strong>{formatProbabilityPercent(token.probability)}</strong>
      {token.topAlternatives.length > 0 ? (
        <span className="lab1-confidence__tooltip-list">
-          {token.topAlternatives.map((candidate) => (
-            <span key={`${token.token}-${candidate.token}`}>
+          {token.topAlternatives.map((candidate, index) => (
+            <span key={`${token.token}-${candidate.token}-${index}`}>
              {formatProbabilityPercent(candidate.probability)}:{" "}
              <code>{candidate.token}</code>
            </span>
@@ -64,6 +128,27 @@ export function Lab1ConfidenceChat() {
  const [messages, setMessages] = useState<ChatTurn[]>([]);
  const [error, setError] = useState<string | null>(null);
  const [isSubmitting, setIsSubmitting] = useState(false);
+  const [activeTooltip, setActiveTooltip] = useState<ActiveTooltip | null>(
+    null,
+  );
+
+  function showTooltip(
+    tokenId: string,
+    token: Lab1ResponseToken,
+    element: HTMLElement,
+  ) {
+    setActiveTooltip({
+      token,
+      tokenId,
+      ...getTooltipPosition(element),
+    });
+  }
+
+  function hideTooltip(tokenId: string) {
+    setActiveTooltip((currentTooltip) =>
+      currentTooltip?.tokenId === tokenId ? null : currentTooltip,
+    );
+  }

  async function handleSubmit(event: FormEvent<HTMLFormElement>) {
    event.preventDefault();
@@ -183,23 +268,51 @@ export function Lab1ConfidenceChat() {
                </div>

                <div className="lab1-confidence__token-stream" role="list">
-                  {message.tokens.map((token, index) => (
-                    <span
-                      aria-label={`${token.token} ${formatProbabilityPercent(
-                        token.probability,
-                      )}`}
-                      className={`lab1-confidence__token lab1-confidence__token--${getConfidenceBand(
-                        token.probability,
-                      )}`}
-                      key={`${message.id}-${index}-${token.token}`}
-                      role="listitem"
-                    >
-                      {token.token}
-                      {renderTooltip(token)}
-                    </span>
-                  ))}
+                  {message.tokens.map((token, index) => {
+                    const tokenId = `${message.id}-${index}-${token.token}`;
+                    const isTooltipActive = activeTooltip?.tokenId === tokenId;
+
+                    return (
+                      <span
+                        aria-describedby={
+                          isTooltipActive ? CONFIDENCE_TOOLTIP_ID : undefined
+                        }
+                        aria-label={`${token.token} ${formatProbabilityPercent(
+                          token.probability,
+                        )}`}
+                        className={`lab1-confidence__token lab1-confidence__token--${getConfidenceBand(
+                          token.probability,
+                        )}`}
+                        key={tokenId}
+                        onBlur={() => hideTooltip(tokenId)}
+                        onFocus={(
+                          event: FocusEvent<HTMLSpanElement, Element>,
+                        ) => showTooltip(tokenId, token, event.currentTarget)}
+                        onMouseEnter={(
+                          event: MouseEvent<
+                            HTMLSpanElement,
+                            globalThis.MouseEvent
+                          >,
+                        ) => showTooltip(tokenId, token, event.currentTarget)}
+                        onMouseLeave={() => hideTooltip(tokenId)}
+                        role="listitem"
+                        tabIndex={0}
+                      >
+                        {token.token}
+                      </span>
+                    );
+                  })}
                </div>

+                {message.isTruncated ? (
+                  <p className="lab1-confidence__message-warning">
+                    Response reached the configured{" "}
+                    {message.maxTokens ? `${message.maxTokens}-token` : "token"}{" "}
+                    limit. Increase <code>COURSEWARE_LAB1_MAX_TOKENS</code> to
+                    allow longer Lab 1 generations.
+                  </p>
+                ) : null}
+
                {message.error ? (
                  <p className="lab1-confidence__message-warning">
                    {message.error}
@@ -239,6 +352,13 @@ export function Lab1ConfidenceChat() {

        {error ? <p className="lab1-confidence__error">{error}</p> : null}
      </form>
+
+      {activeTooltip
+        ? renderTooltip(activeTooltip.token, activeTooltip.placement, {
+            left: activeTooltip.left,
+            top: activeTooltip.top,
+          })
+        : null}
    </section>
  );
 }
@@ -56,6 +56,30 @@ describe("LabContent", () => {
    ).toBeInTheDocument();
  });

+  it("renders the Lab 4 inference visualization token into an interactive component", async () => {
+    mockRuntimeConfig();
+
+    render(
+      <LabContent
+        className="lab-content"
+        html="<div data-inference-settings-visualization></div>"
+      />,
+    );
+
+    expect(
+      screen.getByRole("heading", {
+        name: "See inference filters reshape the next-token choice",
+      }),
+    ).toBeInTheDocument();
+    expect(
+      screen.getByRole("heading", { name: "Temperature" }),
+    ).toBeInTheDocument();
+    expect(screen.getByRole("heading", { name: "Top K" })).toBeInTheDocument();
+    expect(
+      screen.getByRole("heading", { name: "Top P / Min P" }),
+    ).toBeInTheDocument();
+  });
+
  it("filters harness branches from a single Objective 2 selector", async () => {
    mockRuntimeConfig();

@@ -138,6 +162,73 @@ describe("LabContent", () => {
    expect(link).toHaveClass("lab-service-pill");
  });

+  it("renders Lab 3 browser targets as polished open buttons", async () => {
+    mockRuntimeConfig();
+
+    const lab = getLabDocument("lab-3-llama-cpp-and-ollama");
+    expect(lab).not.toBeNull();
+
+    render(
+      <LabContent
+        className="lab-content"
+        html={micromark(lab?.content ?? "", { allowDangerousHtml: true })}
+      />,
+    );
+
+    const llamaLink = await screen.findByRole("link", {
+      name: "LLaMA-3.2-1B on Hugging Face",
+    });
+    expect(llamaLink).toHaveAttribute(
+      "href",
+      "https://huggingface.co/meta-llama/Llama-3.2-1B",
+    );
+    expect(llamaLink).toHaveClass("lab-open-pill");
+
+    expect(
+      screen.getByRole("link", {
+        name: "WhiteRabbitNeo-V3-7B on Hugging Face",
+      }),
+    ).toHaveClass("lab-open-pill");
+    expect(screen.getByRole("link", { name: "Ollama registry" })).toHaveClass(
+      "lab-open-pill",
+    );
+  });
+
+  it("renders Lab 7 dataset and download targets as polished buttons", async () => {
+    mockRuntimeConfig();
+
+    const lab = getLabDocument("lab-7-dataset-generation-and-fine-tuning");
+    expect(lab).not.toBeNull();
+
+    render(
+      <LabContent
+        className="lab-content"
+        html={micromark(lab?.content ?? "", { allowDangerousHtml: true })}
+      />,
+    );
+
+    const gsm8kLink = await screen.findByRole("link", {
+      name: "GSM8K dataset",
+    });
+    expect(gsm8kLink).toHaveAttribute(
+      "href",
+      "https://huggingface.co/datasets/openai/gsm8k",
+    );
+    expect(gsm8kLink).toHaveClass("lab-open-pill");
+
+    const kilnLinks = screen.getAllByRole("link", {
+      name: "Download Kiln AI",
+    });
+    expect(kilnLinks).toHaveLength(2);
+    for (const kilnLink of kilnLinks) {
+      expect(kilnLink).toHaveAttribute(
+        "href",
+        "https://github.com/Kiln-AI/Kiln/releases/tag/v0.18.1",
+      );
+      expect(kilnLink).toHaveClass("lab-download-pill");
+    }
+  });
+
  it("keeps rendered service URL links after opening an image zoom modal", async () => {
    mockRuntimeConfig();

@@ -238,16 +329,15 @@ describe("LabContent", () => {
      />,
    );

-    expect(
-      await screen.findByRole("link", { name: "Open WebUI" }),
-    ).toHaveAttribute("href", "https://lab.example/openwebui");
-    expect(screen.getByRole("link", { name: "Open WebUI" })).toHaveClass(
-      "lab-service-pill",
-    );
-    expect(screen.getByRole("link", { name: "Open WebUI" })).toHaveAttribute(
-      "title",
-      "https://lab.example/openwebui",
-    );
+    const openWebUiLinks = await screen.findAllByRole("link", {
+      name: "Open WebUI",
+    });
+    expect(openWebUiLinks).toHaveLength(2);
+    for (const link of openWebUiLinks) {
+      expect(link).toHaveAttribute("href", "https://lab.example/openwebui");
+      expect(link).toHaveClass("lab-service-pill");
+      expect(link).toHaveAttribute("title", "https://lab.example/openwebui");
+    }
    const apiMatches = await screen.findAllByText(
      "https://lab.example/openwebui/api",
    );
@@ -12,6 +12,7 @@ import { Lab1ConfidenceChat } from "~/components/labs/Lab1ConfidenceChat";
 import { Lab1NetronPanel } from "~/components/labs/Lab1NetronPanel";
 import { Lab3TerminalFrame } from "~/components/labs/Lab3TerminalFrame";
 import { Lab8Chat } from "~/components/labs/Lab8Chat";
+import { InferenceSettingsVisualization } from "~/components/labs/InferenceSettingsVisualization";
 import { Objective5Chat } from "~/components/labs/Objective5Chat";
 import { QuantizationGridExplorer } from "~/components/labs/QuantizationGridExplorer";
 import { QuantizationExplorer } from "~/components/labs/QuantizationExplorer";
@@ -62,6 +63,8 @@ const lab3TerminalToken = "<div data-lab3-terminal></div>";
 const lab1ConfidenceToken = "<div data-lab1-confidence></div>";
 const lab1NetronToken = "<div data-lab1-netron-panel></div>";
 const tokenizerPlaygroundToken = "<div data-tokenizer-playground></div>";
+const inferenceSettingsVisualizationToken =
+  "<div data-inference-settings-visualization></div>";
 const serviceTokenPattern =
  /\{\{service-(url|address):([a-z0-9-]+)(?::([^}]+))?\}\}/g;
 const serviceLabels: Record<string, string> = {
@@ -461,7 +464,7 @@ const LabContentArticle = memo(function LabContentArticle({
  const renderedContent = html
    .split(
      new RegExp(
-        `(${escapeRegex(quantizationExplorerToken)}|${escapeRegex(quantizationGridExplorerToken)}|${escapeRegex(objective5ChatToken)}|${escapeRegex(lab8ChatToken)}|${escapeRegex(lab3TerminalToken)}|${escapeRegex(lab1ConfidenceToken)}|${escapeRegex(lab1NetronToken)}|${escapeRegex(tokenizerPlaygroundToken)})`,
+        `(${escapeRegex(quantizationExplorerToken)}|${escapeRegex(quantizationGridExplorerToken)}|${escapeRegex(objective5ChatToken)}|${escapeRegex(lab8ChatToken)}|${escapeRegex(lab3TerminalToken)}|${escapeRegex(lab1ConfidenceToken)}|${escapeRegex(lab1NetronToken)}|${escapeRegex(tokenizerPlaygroundToken)}|${escapeRegex(inferenceSettingsVisualizationToken)})`,
        "g",
      ),
    )
@@ -505,6 +508,14 @@ const LabContentArticle = memo(function LabContentArticle({
        );
      }

+      if (part === inferenceSettingsVisualizationToken) {
+        return (
+          <InferenceSettingsVisualization
+            key={`inference-settings-viz-${index}`}
+          />
+        );
+      }
+
      return (
        <Fragment key={`html-segment-${index}`}>
          <div dangerouslySetInnerHTML={{ __html: part }} />
@@ -2,10 +2,12 @@ import { describe, expect, it } from "vitest";

 import {
  extractLab1AssistantContent,
+  extractLab1FinishReason,
  extractLab1ResponseTokens,
  formatProbabilityPercent,
  getConfidenceBand,
  logprobToProbabilityPercent,
+  parseLab1MaxTokens,
 } from "~/lib/lab1-confidence";

 describe("logprobToProbabilityPercent", () => {
@@ -30,6 +32,28 @@ describe("extractLab1AssistantContent", () => {
  });
 });

+describe("extractLab1FinishReason", () => {
+  it("reads the upstream finish reason when it is present", () => {
+    expect(
+      extractLab1FinishReason({
+        choices: [
+          {
+            finish_reason: "length",
+          },
+        ],
+      }),
+    ).toBe("length");
+  });
+});
+
+describe("parseLab1MaxTokens", () => {
+  it("uses a bounded positive environment override", () => {
+    expect(parseLab1MaxTokens("768")).toBe(768);
+    expect(parseLab1MaxTokens("999999")).toBe(2048);
+    expect(parseLab1MaxTokens("nope")).toBe(512);
+  });
+});
+
 describe("extractLab1ResponseTokens", () => {
  it("maps token logprobs and alternate candidates into display data", () => {
    expect(
@@ -1,6 +1,7 @@
 export const LAB1_CONFIDENCE_MODEL_ALIAS = "batiai/gemma4-e2b:q4";
-export const LAB1_DEFAULT_MAX_TOKENS = 64;
+export const LAB1_DEFAULT_MAX_TOKENS = 512;
 export const LAB1_DEFAULT_TEMPERATURE = 0.7;
+export const LAB1_MAX_COMPLETION_TOKENS = 2048;
 export const LAB1_MAX_CONTEXT_MESSAGES = 10;
 export const LAB1_MAX_MESSAGE_LENGTH = 4000;

@@ -25,6 +26,9 @@ export type Lab1ResponseToken = {

 export type Lab1ConfidenceResponse = {
  content: string;
+  finishReason: string | null;
+  isTruncated: boolean;
+  maxTokens: number;
  model: string;
  role: "assistant";
  tokens: Lab1ResponseToken[];
@@ -43,6 +47,7 @@ type OpenAiLogprobToken = {

 type OpenAiCompatibilityPayload = {
  choices?: Array<{
+    finish_reason?: string;
    logprobs?: {
      content?: OpenAiLogprobToken[];
    };
@@ -61,6 +66,19 @@ export function getLab1SystemPrompt() {
  ].join(" ");
 }

+export function parseLab1MaxTokens(value: string | undefined) {
+  if (!value) {
+    return LAB1_DEFAULT_MAX_TOKENS;
+  }
+
+  const parsedValue = Number.parseInt(value, 10);
+  if (!Number.isFinite(parsedValue) || parsedValue <= 0) {
+    return LAB1_DEFAULT_MAX_TOKENS;
+  }
+
+  return Math.min(parsedValue, LAB1_MAX_COMPLETION_TOKENS);
+}
+
 export function clampLab1Messages(messages: Lab1ConfidenceMessage[]) {
  return messages
    .filter((message) => {
@@ -117,6 +135,13 @@ export function extractLab1AssistantContent(
  return content || null;
 }

+export function extractLab1FinishReason(payload: OpenAiCompatibilityPayload) {
+  const finishReason = payload.choices?.[0]?.finish_reason;
+  return typeof finishReason === "string" && finishReason.trim()
+    ? finishReason
+    : null;
+}
+
 export function extractLab1ResponseTokens(
  payload: OpenAiCompatibilityPayload,
 ): Lab1ResponseToken[] {
@@ -914,17 +914,22 @@ ol {
 }

 .lab-content ul.concept-pill-list > li {
-  display: flex;
-  flex-wrap: wrap;
-  align-items: center;
-  gap: 0.55rem;
+  display: grid;
+  grid-template-columns: max-content minmax(0, 1fr);
+  align-items: baseline;
+  column-gap: 0.7rem;
+  row-gap: 0.28rem;
  margin: 0;
-  padding: 0.48rem 0.78rem;
+  padding: 0.72rem 1rem;
  border: 1px solid #d5e2ee;
  border-radius: 999px;
  background: linear-gradient(180deg, #f9fcff, #f4f9fe);
 }

+.lab-content ul.concept-pill-list > li > span:not(.concept-pill-label) {
+  line-height: 1.45;
+}
+
 .lab-content .concept-pill-label {
  display: inline;
  color: #0f4f76;
@@ -951,6 +956,511 @@ ol {
  margin: 1.25rem 0 1.5rem;
 }

+.lab-content [data-inference-settings-visualization] {
+  margin: 1.25rem 0 1.5rem;
+}
+
+.inference-settings-viz {
+  margin: 1.25rem 0 1.5rem;
+  border: 1px solid #d7e4ef;
+  border-radius: 16px;
+  background: linear-gradient(180deg, #fbfdff, #f4f9fd);
+  padding: 1rem;
+}
+
+.inference-settings-viz code {
+  font-family:
+    ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
+    "Courier New", monospace;
+}
+
+.inference-settings-viz__header {
+  margin-bottom: 1rem;
+}
+
+.inference-settings-viz__eyebrow {
+  margin: 0;
+  color: #9a5f00;
+  font-size: 0.72rem;
+  font-weight: 800;
+  letter-spacing: 0.08em;
+  text-transform: uppercase;
+}
+
+.inference-settings-viz__header h3 {
+  margin: 0.1rem 0 0;
+  color: #0f3d58;
+  font-size: 1.2rem;
+}
+
+.inference-settings-viz__header p:not(.inference-settings-viz__eyebrow) {
+  margin: 0.55rem 0 0;
+  color: #334155;
+}
+
+.inference-settings-viz__grid {
+  display: grid;
+  grid-template-columns: repeat(2, minmax(0, 1fr));
+  gap: 0.9rem;
+  align-items: start;
+}
+
+.inference-settings-viz__card {
+  display: flex;
+  flex-direction: column;
+  min-width: 0;
+  min-height: 100%;
+  border: 1px solid #dce6ee;
+  border-radius: 14px;
+  background: rgba(255, 255, 255, 0.92);
+  padding: 0.9rem;
+}
+
+.inference-settings-viz__card--wide {
+  grid-column: 1 / -1;
+  width: 100%;
+}
+
+.inference-settings-viz__card--wide > * {
+  width: 100%;
+}
+
+.inference-settings-viz__card-header h4 {
+  margin: 0;
+  color: #0f3d58;
+  font-size: 1.05rem;
+  line-height: 1.35;
+}
+
+.inference-settings-viz__card-header p {
+  margin: 0.35rem 0 0;
+  color: #475569;
+  font-size: 0.92rem;
+  line-height: 1.42;
+}
+
+.inference-settings-viz__sequence {
+  margin: 0.8rem 0;
+  padding: 0.7rem 0.75rem;
+  border: 1px solid #d6e2ed;
+  border-radius: 10px;
+  background: #f7fbff;
+  color: #12364e;
+  font-family:
+    ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
+    "Courier New", monospace;
+  font-size: 0.92rem;
+  line-height: 1.35;
+  min-height: 2.85rem;
+}
+
+.inference-settings-viz__control {
+  --slider-thumb-size: 1rem;
+  --slider-thumb-offset: calc(var(--slider-thumb-size) / 2);
+  display: block;
+  margin-bottom: 0.85rem;
+}
+
+.inference-settings-viz__control > span {
+  display: flex;
+  justify-content: space-between;
+  gap: 0.75rem;
+  color: #334155;
+  font-size: 0.86rem;
+  font-weight: 700;
+}
+
+.inference-settings-viz__control strong {
+  color: #0b72ba;
+  font-family:
+    ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
+    "Courier New", monospace;
+}
+
+.inference-settings-viz__control input[type="range"] {
+  -webkit-appearance: none;
+  appearance: none;
+  display: block;
+  width: calc(100% - var(--slider-thumb-size));
+  margin-left: var(--slider-thumb-offset);
+  margin-right: var(--slider-thumb-offset);
+  margin-top: 0.55rem;
+  background: transparent;
+}
+
+.inference-settings-viz__control
+  input[type="range"]::-webkit-slider-runnable-track {
+  height: 0.68rem;
+  border-radius: 999px;
+  background: linear-gradient(180deg, #dbe7f2, #d4e1ec);
+}
+
+.inference-settings-viz__control input[type="range"]::-webkit-slider-thumb {
+  -webkit-appearance: none;
+  appearance: none;
+  width: var(--slider-thumb-size);
+  height: var(--slider-thumb-size);
+  margin-top: calc((0.68rem - var(--slider-thumb-size)) / 2);
+  border: 1px solid #c8d6e3;
+  border-radius: 999px;
+  background: linear-gradient(180deg, #ffffff, #eef3f8);
+  box-shadow: 0 1px 4px rgba(15, 23, 42, 0.18);
+}
+
+.inference-settings-viz__control input[type="range"]::-moz-range-track {
+  height: 0.68rem;
+  border: none;
+  border-radius: 999px;
+  background: linear-gradient(180deg, #dbe7f2, #d4e1ec);
+}
+
+.inference-settings-viz__control input[type="range"]::-moz-range-thumb {
+  width: var(--slider-thumb-size);
+  height: var(--slider-thumb-size);
+  border: 1px solid #c8d6e3;
+  border-radius: 999px;
+  background: linear-gradient(180deg, #ffffff, #eef3f8);
+  box-shadow: 0 1px 4px rgba(15, 23, 42, 0.18);
+}
+
+.inference-settings-viz__nucleus-controls {
+  margin-bottom: 0.85rem;
+}
+
+.inference-settings-viz__nucleus-controls .inference-settings-viz__control {
+  margin-bottom: 0;
+}
+
+.inference-settings-viz__segmented {
+  display: grid;
+  grid-template-columns: repeat(2, minmax(0, 1fr));
+  gap: 0.25rem;
+  margin-bottom: 0.75rem;
+  padding: 0.2rem;
+  border: 1px solid #d6e2ed;
+  border-radius: 10px;
+  background: #f7fbff;
+}
+
+.inference-settings-viz__segmented button {
+  border: 1px solid transparent;
+  border-radius: 8px;
+  background: transparent;
+  color: #426075;
+  cursor: pointer;
+  font: inherit;
+  font-size: 0.84rem;
+  font-weight: 800;
+  line-height: 1;
+  padding: 0.5rem 0.55rem;
+}
+
+.inference-settings-viz__segmented button[aria-pressed="true"] {
+  border-color: #9cc5e5;
+  background: #ffffff;
+  color: #0f4f76;
+  box-shadow: 0 1px 2px rgba(15, 23, 42, 0.08);
+}
+
+.inference-settings-viz__threshold-panel {
+  margin-bottom: 0.95rem;
+  padding: 0.85rem;
+  border: 1px solid #d6e2ed;
+  border-radius: 12px;
+  background: #f7fbff;
+}
+
+.inference-settings-viz__threshold-header {
+  display: grid;
+  gap: 0.25rem;
+  margin-bottom: 0.75rem;
+}
+
+.inference-settings-viz__threshold-header strong {
+  color: #0f3d58;
+  font-size: 0.92rem;
+}
+
+.inference-settings-viz__threshold-header span {
+  color: #475569;
+  font-size: 0.86rem;
+  line-height: 1.4;
+}
+
+.inference-settings-viz__formula-row {
+  display: grid;
+  grid-template-columns: max-content max-content max-content minmax(0, 1fr);
+  align-items: center;
+  gap: 0.45rem 0.6rem;
+  margin-bottom: 0.75rem;
+  color: #64748b;
+  font-size: 0.8rem;
+  font-weight: 700;
+}
+
+.inference-settings-viz__formula-row code {
+  color: #0f4f76;
+  font-size: 0.8rem;
+  font-weight: 800;
+}
+
+.inference-settings-viz__cumulative-strip {
+  position: relative;
+  display: flex;
+  height: 2.35rem;
+  overflow: visible;
+  border: 1px solid #cbdbe8;
+  border-radius: 10px;
+  background: #e8f1f8;
+}
+
+.inference-settings-viz__cumulative-segment {
+  display: flex;
+  align-items: center;
+  justify-content: center;
+  min-width: 0;
+  height: 100%;
+  overflow: hidden;
+  color: #ffffff;
+  font-size: 0.72rem;
+  font-weight: 800;
+  text-overflow: ellipsis;
+  white-space: nowrap;
+}
+
+.inference-settings-viz__cumulative-segment:first-child {
+  border-radius: 9px 0 0 9px;
+}
+
+.inference-settings-viz__cumulative-segment:nth-last-child(2) {
+  border-radius: 0 9px 9px 0;
+}
+
+.inference-settings-viz__cumulative-segment[data-included="false"] {
+  background: #cbd5e1 !important;
+  color: #475569;
+}
+
+.inference-settings-viz__threshold-marker {
+  position: absolute;
+  top: -0.42rem;
+  bottom: -0.42rem;
+  width: 2px;
+  transform: translateX(-1px);
+  background: #be123c;
+  color: #be123c;
+}
+
+.inference-settings-viz__threshold-marker {
+  font-size: 0;
+}
+
+.inference-settings-viz__threshold-marker::after {
+  content: "P threshold";
+  position: absolute;
+  left: 50%;
+  bottom: calc(100% + 0.18rem);
+  transform: translateX(-50%);
+  border: 1px solid #fecdd3;
+  border-radius: 999px;
+  background: #fff1f2;
+  color: #9f1239;
+  font-size: 0.66rem;
+  font-weight: 800;
+  line-height: 1;
+  padding: 0.2rem 0.34rem;
+  white-space: nowrap;
+}
+
+.inference-settings-viz__threshold-note {
+  margin: 0.7rem 0 0;
+  color: #475569;
+  font-size: 0.84rem;
+  line-height: 1.42;
+}
+
+.inference-settings-viz__threshold-note strong {
+  color: #0f3d58;
+}
+
+.inference-settings-viz__minp-bars {
+  display: grid;
+  gap: 0.45rem;
+}
+
+.inference-settings-viz__minp-row {
+  display: grid;
+  grid-template-columns: 4.35rem minmax(0, 1fr) 3.8rem;
+  align-items: center;
+  gap: 0.45rem;
+}
+
+.inference-settings-viz__minp-row > span {
+  overflow: hidden;
+  color: #334155;
+  font-family:
+    ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
+    "Courier New", monospace;
+  font-size: 0.78rem;
+  font-weight: 700;
+  text-align: right;
+  text-overflow: ellipsis;
+  white-space: nowrap;
+}
+
+.inference-settings-viz__minp-row > code {
+  color: #334155;
+  font-size: 0.72rem;
+  font-weight: 800;
+}
+
+.inference-settings-viz__minp-track {
+  position: relative;
+  height: 1.1rem;
+  border: 1px solid #d8e3ed;
+  border-radius: 999px;
+  background: #edf4fa;
+}
+
+.inference-settings-viz__minp-fill {
+  height: 100%;
+  border-radius: 999px;
+}
+
+.inference-settings-viz__minp-marker {
+  position: absolute;
+  top: -0.28rem;
+  bottom: -0.28rem;
+  width: 2px;
+  transform: translateX(-1px);
+  background: #be123c;
+}
+
+.inference-settings-viz__minp-row[data-included="false"]
+  .inference-settings-viz__minp-fill {
+  opacity: 0.24;
+}
+
+.inference-settings-viz__minp-row[data-included="false"] > span,
+.inference-settings-viz__minp-row[data-included="false"] > code {
+  color: #94a3b8;
+}
+
+.inference-settings-viz__bars {
+  display: grid;
+  gap: 0.42rem;
+}
+
+.inference-settings-viz__row {
+  display: grid;
+  grid-template-columns: 4.35rem minmax(0, 1fr) 4.45rem;
+  align-items: center;
+  gap: 0.45rem;
+}
+
+.inference-settings-viz__token {
+  overflow: hidden;
+  color: #334155;
+  font-family:
+    ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
+    "Courier New", monospace;
+  font-size: 0.78rem;
+  font-weight: 700;
+  text-align: right;
+  text-overflow: ellipsis;
+  white-space: nowrap;
+}
+
+.inference-settings-viz__bar-track {
+  height: 1.4rem;
+  overflow: hidden;
+  border: 1px solid #d8e3ed;
+  border-radius: 6px;
+  background: #edf4fa;
+}
+
+.inference-settings-viz__bar-fill {
+  display: flex;
+  align-items: center;
+  justify-content: flex-end;
+  min-width: 0;
+  height: 100%;
+  border-radius: 5px;
+  color: #ffffff;
+  transition:
+    opacity 0.18s ease,
+    width 0.24s ease;
+}
+
+.inference-settings-viz__bar-fill span {
+  padding: 0 0.36rem;
+  font-family:
+    ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono",
+    "Courier New", monospace;
+  font-size: 0.68rem;
+  font-weight: 800;
+}
+
+.inference-settings-viz__row[data-included="false"]
+  .inference-settings-viz__bar-fill {
+  opacity: 0.22;
+}
+
+.inference-settings-viz__row[data-included="false"]
+  .inference-settings-viz__token {
+  color: #94a3b8;
+}
+
+.inference-settings-viz__row-state {
+  color: #64748b;
+  font-size: 0.68rem;
+  font-weight: 800;
+  letter-spacing: 0.04em;
+  text-transform: uppercase;
+}
+
+.inference-settings-viz__row[data-included="true"]
+  .inference-settings-viz__row-state {
+  color: #0f766e;
+}
+
+.inference-settings-viz__actions {
+  display: flex;
+  flex-wrap: wrap;
+  align-items: center;
+  gap: 0.45rem;
+  margin-top: 0.9rem;
+}
+
+.inference-settings-viz__actions button {
+  border: 1px solid #bad5e8;
+  border-radius: 8px;
+  background: #ffffff;
+  color: #0f4f76;
+  cursor: pointer;
+  font: inherit;
+  font-size: 0.82rem;
+  font-weight: 800;
+  line-height: 1;
+  padding: 0.55rem 0.7rem;
+}
+
+.inference-settings-viz__actions button:first-child {
+  border-color: #0b72ba;
+  background: #0b72ba;
+  color: #ffffff;
+}
+
+.inference-settings-viz__actions button:hover {
+  border-color: #0f4f76;
+}
+
+.inference-settings-viz__actions span {
+  color: #64748b;
+  font-size: 0.78rem;
+  font-weight: 700;
+}
+
 .quantization-explorer {
  border: 1px solid #d7e4ef;
  border-radius: 16px;
@@ -1899,7 +2409,10 @@ ol {
  }

  .lab-content ul.concept-pill-list > li {
+    grid-template-columns: 1fr;
+    row-gap: 0.3rem;
    border-radius: 16px;
+    padding: 0.72rem 0.9rem;
  }

  .quantization-explorer__controls,
@@ -1912,6 +2425,23 @@ ol {
    grid-template-columns: repeat(2, minmax(0, 1fr));
  }

+  .inference-settings-viz {
+    padding: 0.9rem;
+  }
+
+  .inference-settings-viz__grid {
+    grid-template-columns: 1fr;
+  }
+
+  .inference-settings-viz__row {
+    grid-template-columns: 3.75rem minmax(0, 1fr);
+  }
+
+  .inference-settings-viz__row-state {
+    grid-column: 2;
+    margin-top: -0.22rem;
+  }
+
  .objective5-chat__settings {
    grid-template-columns: 1fr;
  }
@@ -2051,7 +2581,9 @@ ol {
  box-shadow: 0 12px 28px -22px rgba(15, 92, 139, 0.85);
 }

-.lab-content a.lab-service-pill {
+.lab-content a.lab-service-pill,
+.lab-content a.lab-open-pill,
+.lab-content a.lab-download-pill {
  display: inline-flex;
  align-items: center;
  gap: 0.45rem;
@@ -2072,7 +2604,9 @@ ol {
    background-color 120ms ease;
 }

-.lab-content a.lab-service-pill::before {
+.lab-content a.lab-service-pill::before,
+.lab-content a.lab-open-pill::before,
+.lab-content a.lab-download-pill::before {
  content: "Open";
  display: inline-flex;
  align-items: center;
@@ -2086,7 +2620,13 @@ ol {
  text-transform: uppercase;
 }

-.lab-content a.lab-service-pill:hover {
+.lab-content a.lab-download-pill::before {
+  content: "Download";
+}
+
+.lab-content a.lab-service-pill:hover,
+.lab-content a.lab-open-pill:hover,
+.lab-content a.lab-download-pill:hover {
  transform: translateY(-1px);
  box-shadow: 0 12px 28px -22px rgba(15, 92, 139, 0.85);
 }
@@ -2182,14 +2722,21 @@ ol {
 .lab1-confidence__token {
  position: relative;
  border-radius: 0.42rem;
+  cursor: help;
  padding: 0.12rem 0.08rem;
  transition: filter 120ms ease;
 }

-.lab1-confidence__token:hover {
+.lab1-confidence__token:hover,
+.lab1-confidence__token[aria-describedby="lab1-confidence-tooltip"] {
  filter: saturate(1.05);
 }

+.lab1-confidence__token:focus-visible {
+  outline: 2px solid rgba(15, 92, 139, 0.35);
+  outline-offset: 2px;
+}
+
 .lab1-confidence__token--very-high {
  background: rgba(88, 185, 102, 0.3);
 }
@@ -2211,25 +2758,24 @@ ol {
 }

 .lab1-confidence__tooltip {
-  position: absolute;
-  left: 0;
-  top: calc(100% + 0.45rem);
-  z-index: 5;
-  display: none;
+  position: fixed;
+  z-index: 50;
+  display: block;
  min-width: 180px;
-  max-width: 260px;
+  max-width: min(260px, calc(100vw - 2rem));
  border: 1px solid #d7e2ee;
  border-radius: 0.85rem;
  background: rgba(255, 255, 255, 0.98);
  box-shadow: 0 18px 38px -26px rgba(17, 44, 73, 0.7);
  color: #24384c;
  padding: 0.7rem 0.8rem;
+  pointer-events: none;
+  transform: translateX(-50%);
  white-space: normal;
 }

-.lab1-confidence__token:hover .lab1-confidence__tooltip,
-.lab1-confidence__token:focus-visible .lab1-confidence__tooltip {
-  display: block;
+.lab1-confidence__tooltip--above {
+  transform: translate(-50%, -100%);
 }

 .lab1-confidence__tooltip strong {
Author	SHA1	Message	Date
c4ch3c4d3	4927615ab6	Restore lab 3 quantization steps	2026-04-28 18:12:24 -06:00
c4ch3c4d3	08c21fa0e2	Add Lab 4 inference settings visualization	2026-04-27 14:50:55 -06:00
c4ch3c4d3	a7c1bda07c	Add configurable token limit and truncation warning to Lab 1 confidence chat	2026-04-27 10:58:13 -06:00
c4ch3c4d3	269a4e4985	Fix lab confidence tooltip styling	2026-04-27 10:42:21 -06:00
c4ch3c4d3	1e9f6fc0cf	Update lab content instructions	2026-04-27 10:37:43 -06:00
c4ch3c4d3	8626b3d1db	Add terminal link to site header	2026-04-27 09:15:50 -06:00
c4ch3c4d3	fd77d6ee1e	Polish lab link buttons	2026-04-27 09:11:45 -06:00