diff --git a/content/labs/lab-1-visualization-in-transformerlab.md b/content/labs/lab-1-visualization-in-transformerlab.md index 6aa73a0..1f31cbd 100644 --- a/content/labs/lab-1-visualization-in-transformerlab.md +++ b/content/labs/lab-1-visualization-in-transformerlab.md @@ -1,19 +1,20 @@ --- order: 1 -title: Lab 1 - Visualizing LLMs in TransformerLab -description: Explore model structure, tokenization, and next-token prediction inside TransformerLab. +title: Lab 1 - Model Structure, Tokenization, and Confidence Visualization +description: Explore GGUF model structure in Netron, inspect tokenization interactively, and visualize token confidence with a local Ollama model. --- -# Lab 1 - Visualizing LLMs in TransformerLab +# Lab 1 - Model Structure, Tokenization, and Confidence Visualization In this lab, we will: -- Download and Visualize LLama-3.2-1B-Instruct -- Visualize Tokenization & Prediction with LLama-3.2-1B-Instruct +- Visualize two small GGUF models in Netron +- Observe how text is split into tokens and token IDs +- Inspect the confidence of a local model one token at a time
Lab Flow Guide
@@ -21,258 +22,181 @@ In this lab, we will: Execute steps require performing actions in the lab environment.
-## Objective 1: Starting TransformerLab +## Objective 1: Visualize Tokenization and Token IDs -### Execute: Access the Lab Environment +### Execute: Use the Tokenizer Playground -To start Lab 1, ensure you've received a WireGuard configuration and system IP from your instructor. If you're unfamiliar with WireGuard, assistance will be provided to ensure you can access the lab environment for the duration of class. +The below embedded tool below allows you to enter raw text and observe how it's converted into model tokens. Tokenization is the critical first step that enables a Large Language Model to process and understand user input, accomplished by transforming words into numerical values. -All systems use the default username and password of `student`. All labs are located in the student home folder. To start Lab 1, run +
-```bash -~/lab1/lab1_start.sh -``` +### Explore: Try Multiple Inputs -using the `lab1_start.sh` script in the `lab1` folder. +Enter several different inputs and compare how the tokenization changes. Use at least these three examples: -Lastly, if necessary, you can `su -` to root at any time. No password will be required. +1. `The quick brown fox jumps over the lazy dog` +2. `cybersecurity analyst` +3. `printf("hello");` -Once started, you can reach TransformerLab on port 8338 of your Lab VM (http://:8338). +Then try a few of your own. Short English phrases, punctuation, code, and unusual spacing are all good choices. -## Objective 2: Visualizing a LLM +### Explore: Compare the Two Tokenization Views -### Explore: Understand the Model and Runtime +This tool is especially useful because it shows both: -The next steps will guide us through the process of deploying and interacting with a pre-trained LLM, `LLama-3.2-1B-Instruct`. To do this, we’ll be utilizing an inference engine – software designed to execute LLM models and generate token predictions. You'll encounter models packaged in the **GGUF** format, a file format designed for efficient storage and loading of quantized LLMs, enabling them to run on a wider range of hardware. Don't worry if these terms are new to you – the specifics of inference engines and the details of **GGUF** quantized LLMs will be thoroughly explained in the following section of this course. +- The **visual split** of the text into tokens +- The underlying **token ID values** -Normally to start, we'll need to install an **inference engine** capable of running **GGUF** files. +Those are two views of the same process. -### Execute: Verify the FastChat Plugin +The visual split helps us see where the model grouped characters or subwords together. The token ID view reminds us that the model never consumes English directly. It consumes numeric identifiers that point into the tokenizer vocabulary. -Navigate to **Plugins**, and in the search bar type `Fastchat`. Note that it has already been installed for you! +As you work through your examples, ask: -
- - - -
- Plugins -
-
-
+- Which full words remain intact? +- Which words get split into subwords or punctuation chunks? +- When spacing changes, do the token IDs change too? -### Execute: Find and Load `LLama-3.2-1B-Instruct` +Lastly, experiment with how different tokenizeres can change how inputs are split into different numerical values. How might this affect the next steps in the transformation process? -Next, navigate to **Model Registry**. You should see `LLama-3.2-1B-Instruct` right away on your screen, but if not, please start searching for this model using the search bar. - -
- - - -
- Model Registry Selection. -
-
-
- -Once downloaded, Select **Foundation** & our newly downloaded `LLama-3.2-1B-Instruct` model. - -
- - - -
- Model Selection -
-
-
- -Once selected, click **Run**. Give TransformerLab a moment to successfully load the model. - -
- - - -
- Starting a Model -
-
-
- -### Explore: Inspect the Architecture View - -To start, lets navigate to the **Interact** page, and then select **Model Architecture** from the Chat drop down. - -
- - - -
- Model Architecture Dropdown -
-
-
- -This page allows us to visualize the actively loaded model, in this case our downloaded `LLama-3.2-1B-Instruct-`. This interactive view is equivalent to the greatly simplified version shown on the slide “Transformation: Multylayer Perceptron” from our lecture. We can explore this view by: - -- Holding down both right and left mouse buttons and dragging will move the entire model. -- Holding down just the left mouse button will allow you to rotate the view. - -
- - - -
- Model Visualization -
-
-
- -### Explore: Interpret Layers, Blocks, and Parameters - -Each layer of the model performs a specific task, taking the input provided, and transforming it into the statistically most likely completion of text, token by token. This format of Llama 3.1 1B is made up of 372 **layers**. Each layer will transform the input of the layer above it, until eventually, we end up with the statically likely completion. -You have likely also noticed that the colors repeat. Each set of repeating **layers** is organized into **blocks**. Each **block** is a grouping of **layers** that perform the same functions, but with a slightly different focus. For example, one **block** may focus on nouns, and another may focus on adjectives, and so on. - -The **layers** within Llama 3.1 1B are as follows: - - - -Each of these **layers** also has a different type, corresponding to Q, K, V, and much more. 5. The **layers** between the small “Attention” **layers** are all considered to make up a single “block.” -To the side, we can see the actual number values of each weight within each layer. - -Fundamentally, the LLM itself is this stack of numbers. Those numbers allow us to transform tokenized input (such as English), and transform that into a useful output. The more **layers** & **blocks**, the bigger the model, the more accurate and “intelligent” the model will behave. This 1B parameter model is incredibly small however, so the “truthfulness” of generated predictions is likely to be suspect (aka Hallucinated). The model will at least sound very confident however! - -
+
+ + + +
Tokenization - GPT3
+
+
+
+ + + +
Tokenization - GPT4
+
--- -## Objective 3: Tokenization & Prediction with LLama-3.2-1B-Instruct -### Execute: Interactive Chat +## Objective 2: Open Netron and Download the Lab Models -Lets next move on to active conversation with the model. Navigate to the **Chat** tab from the dropdown menu. +### Execute: Launch Netron -
- - - -
- Select Chat -
-
-
+For this lab, model visualization now happens in **Netron**, a lightweight browser tool for inspecting model structure. -Once loaded, feel free to type any message and interact with the model in any way. To speed up the pace of our lab, I recommend setting your maximum output length to 64 tokens. +Use the launch panel below to open the local Netron service on port `8338`. -
- - - -
- Maximum Length - 64 -
-
-
+
-If text generation fails, or acts weird (such as merely repeating your input back to you), unload and reload the model using the previous Foundation screen from the last Objective. +### Execute: Download the Two GGUF Files -### Execute: View Tokenization +You will work with two small GGUF models in this objective: -If everything is in working order, review the **Tokenize** view. This allows us to visually see how Llama 3.2 will convert our input text into “tokens,” or numbers that represent the input English. Feel free to input any sentence into the box to review what the final tokenized version will be. +- [Qwen 3 0.6B](/api/lab1/models/qwen3-0.6b-q8_0.gguf) +- [Llama 3.2 1B](/api/lab1/models/llama-3.2-1b-q4_k_m.gguf) -
- - - -
- Tokenize View -
-
-
+These files are intentionally small enough to make architecture exploration practical in a classroom lab. Download both files to a convenient location such as your `Downloads` folder. Once you've downloaded your files, you can open them using the "Open Model" Button on the Netron Homepage. -### Execute: Visualize Next-Token Activations +
+ + + +
Netron Start Page
+
-Next, select Model Activations. By entering “The quick brown fox” and selecting visualize, we can see how the model selects the next word, and the models level of confidence. Also feel free to redo this process with alternative sentences. +Once Netron is open: -
- - - -
- Next Word Prediction -
-
-
+1. Select **Open Model** or drag a GGUF file directly into the browser window. +2. Start with `Qwen 3 0.6B`. -### Execute: Compare Confidence Views +Netron will display the model as a graph of tensors, operators, and named blocks. This is a more literal view than the simplified lecture diagrams, but it is still showing the same fundamental idea: the model is a large stack of numeric values, each serving a different purpose to model language. -Note how confident the model is about the word jumps in this famous phrase. For an alternative view of the same output, you can also select the **Visualize Logprobes** option from the menu, which will show the same information but by color. +### Explore: What to Look For -
- - - -
- Green is Confident. Red is less confident. -
-
-
+As you move around the graph, focus on these three recurring structures. Each of these grouping of these individual *layers* is what defines a *block*: -### Explore: Continue Exploring TransformerLab Features + -Please continue to explore Transformers Lab until you’re ready to move on. While we will utilize many different tools other than Transformers Lab throughout this course due to its beta nature, this software is improving all the time and is worth watching! Transformers lab supports many advanced features, in various stages of development, such as: +Notably, Qwen 3 0.6B is composed of 28 of these blocks! This is signifigantly more than GPT-2 (12 blocks), despite this model being 1/3rd the size! -- Batch Text Generation -- LLM Fine Tuning -- LLM Evaluation -- Retrieval Augmented Generation (RAG) - We will discuss these topics and more throughout the course. +Lastly, you may see labels such as **MatMul**, **Mul**, or **mulmat**, depending on how the graph was exported and named. In practice, these are often part of the feed-forward path that expands and reshapes the model's internal representation before passing it onward. -
+**Compare the Two Small Models** + +Both models are small compared to modern production systems, but they are still large enough to reveal repeating architectural patterns. + +As you compare them, ask: + +- Where do the repeating blocks begin to stand out? +- Which names remain stable between the two models? +- How many *Attention Heads* does each model have? How might this affect transformations predicted by the model? + +
+ + + +
Netron Qwen 3 0.6B Layers 1 & 2
+
+ +--- + + +## Objective 3: Visualize Prediction Confidence + +### Execute: Run the Local Confidence Widget + +The widget below talks to the preloaded local Lab 1 model through Ollama. Enter any prompt you like, generate a response, and then hover over the output tokens. + +
+ +### Explore: Interpret the Color Coding + +Each token in the output is colored by the model's confidence in that selected token. + +In general: + +- Greener tokens indicate the model was more confident in that choice +- Warmer yellow or orange tokens indicate a weaker preference +- Hovering over a token reveals the selected token's percentage and the strongest alternate predictions + +This is useful because it shows us that model output is not magic or certainty. Each generated token is chosen from a probability distribution over many possible next tokens. + +### Explore: Try Different Prompt Styles + +To make the confidence view more interesting, compare: + +1. A common phrase such as `The quick brown fox` +2. A factual question +3. A short cybersecurity prompt + +Notice where the model appears highly certain and where it becomes less stable. Small local models often produce text that sounds very confident even when the underlying prediction distribution is more fragile than it first appears. + +
+ Screenshot Placeholder + Confidence heatmap and hover tooltip view. +
--- ## Conclusion -In this lab, we observed the foundational concepts of all LLMs in action using TransformerLab. Through hands-on exploration, we observed the process of tokenization – how text is converted into numerical representations for the model – and visualized the model's prediction process, including its confidence levels for different token selections. By navigating the model’s layers and blocks, we gained an appreciation for the sheer scale and complexity inherent in modern LLMs. +In this lab, we explored three foundational views of an LLM. -This initial experience provides a crucial stepping stone for further exploration of LLMs, laying the groundwork for future labs focused on fine-tuning, evaluation, and advanced techniques like Retrieval Augmented Generation. +First, we opened two GGUF model files in Netron and inspected the architecture directly. Then we used a tokenizer playground to see how plain text becomes tokens and token IDs. Finally, we used a local confidence visualizer to watch a small model generate output token by token while exposing how certain it was about each choice. + +Together, these three perspectives give us a much more grounded picture of what an LLM actually is: a structured file of learned weights, a tokenizer that converts text into IDs, and a prediction engine that selects the next token from a probability distribution. diff --git a/content/labs/lab-3-llama-cpp-and-ollama.md b/content/labs/lab-3-llama-cpp-and-ollama.md index ea2573b..61e26c0 100644 --- a/content/labs/lab-3-llama-cpp-and-ollama.md +++ b/content/labs/lab-3-llama-cpp-and-ollama.md @@ -179,7 +179,7 @@ We should then see: A text listing of all of the model's tensors, and the precision of each. Because we have merely converted the model's format, and not performed quantization, the model is still in **FP16**. -- This is a text view of the previous graphical view we saw in **Lab 1, Objective 2: Visualizing a LLM**. While **TransformerLab** calls tensors **layers**, terms such as **tensors**, **layers**, and **blocks** can all be used semi-interchangeably, depending on the tool in question. We will further confuse these topics when we get to the Ollama objective below. +- This is a text view of the previous graphical view we saw in **Lab 1, Objective 2: Visualizing a LLM**. While tools such as **Netron** may expose tensors, operators, and repeating blocks with different labels, terms such as **tensors**, **layers**, and **blocks** can still be used semi-interchangeably at this level of discussion. We will further confuse these topics when we get to the Ollama objective below. - Pedantically, the proper definitions are: - Tensor - A multi-dimensional array of vectors to store data - Layer - A base computational unit in a neural network diff --git a/package-lock.json b/package-lock.json index 6141eec..7bbdd3d 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1307,9 +1307,6 @@ "arm64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -1327,9 +1324,6 @@ "arm64" ], "dev": true, - "libc": [ - "musl" - ], "license": "MIT", "optional": true, "os": [ @@ -1347,9 +1341,6 @@ "ppc64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -1367,9 +1358,6 @@ "s390x" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -1387,9 +1375,6 @@ "x64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MIT", "optional": true, "os": [ @@ -1407,9 +1392,6 @@ "x64" ], "dev": true, - "libc": [ - "musl" - ], "license": "MIT", "optional": true, "os": [ @@ -5520,9 +5502,6 @@ "arm64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MPL-2.0", "optional": true, "os": [ @@ -5544,9 +5523,6 @@ "arm64" ], "dev": true, - "libc": [ - "musl" - ], "license": "MPL-2.0", "optional": true, "os": [ @@ -5568,9 +5544,6 @@ "x64" ], "dev": true, - "libc": [ - "glibc" - ], "license": "MPL-2.0", "optional": true, "os": [ @@ -5592,9 +5565,6 @@ "x64" ], "dev": true, - "libc": [ - "musl" - ], "license": "MPL-2.0", "optional": true, "os": [ diff --git a/src/app/api/lab1/chat/route.ts b/src/app/api/lab1/chat/route.ts new file mode 100644 index 0000000..5bafdf1 --- /dev/null +++ b/src/app/api/lab1/chat/route.ts @@ -0,0 +1,163 @@ +import { NextResponse } from "next/server"; + +import { normalizeUpstreamChatEndpoint } from "~/lib/lab2-chat"; +import { + clampLab1Messages, + extractLab1AssistantContent, + extractLab1ResponseTokens, + getLab1SystemPrompt, + LAB1_CONFIDENCE_MODEL_ALIAS, + LAB1_DEFAULT_MAX_TOKENS, + LAB1_DEFAULT_TEMPERATURE, + type Lab1ConfidenceMessage, +} from "~/lib/lab1-confidence"; + +type ChatRouteRequestBody = { + messages?: Lab1ConfidenceMessage[]; +}; + +const LOCAL_OLLAMA_TIMEOUT_MS = 90000; + +function getLocalOllamaEndpoint() { + const configuredBaseUrl = + process.env.COURSEWARE_OLLAMA_BASE_URL?.trim() || "http://127.0.0.1:11434"; + + return normalizeUpstreamChatEndpoint(configuredBaseUrl); +} + +function getLab1ModelAlias() { + return ( + process.env.COURSEWARE_LAB1_OLLAMA_MODEL_ALIAS?.trim() || + LAB1_CONFIDENCE_MODEL_ALIAS + ); +} + +export async function POST(request: Request) { + let body: ChatRouteRequestBody; + + try { + body = (await request.json()) as ChatRouteRequestBody; + } catch { + return NextResponse.json( + { + error: "The request body must be valid JSON.", + }, + { status: 400 }, + ); + } + + if (!Array.isArray(body.messages) || body.messages.length === 0) { + return NextResponse.json( + { + error: "At least one chat message is required.", + }, + { status: 400 }, + ); + } + + const controller = new AbortController(); + const timeoutId = setTimeout( + () => controller.abort(), + LOCAL_OLLAMA_TIMEOUT_MS, + ); + + try { + const upstreamResponse = await fetch(getLocalOllamaEndpoint(), { + body: JSON.stringify({ + logprobs: true, + max_tokens: LAB1_DEFAULT_MAX_TOKENS, + messages: [ + { + content: getLab1SystemPrompt(), + role: "system", + }, + ...clampLab1Messages(body.messages), + ], + model: getLab1ModelAlias(), + stream: false, + temperature: LAB1_DEFAULT_TEMPERATURE, + top_logprobs: 5, + }), + headers: { + "Content-Type": "application/json", + }, + method: "POST", + signal: controller.signal, + }); + + const responseText = await upstreamResponse.text(); + const parsedBody = JSON.parse(responseText) as unknown; + + if (!upstreamResponse.ok) { + const message = + typeof parsedBody === "object" && + parsedBody !== null && + "error" in parsedBody && + typeof parsedBody.error === "object" && + parsedBody.error !== null && + "message" in parsedBody.error && + typeof parsedBody.error.message === "string" + ? parsedBody.error.message + : `The local Ollama endpoint returned ${upstreamResponse.status}.`; + + return NextResponse.json( + { + error: message, + }, + { status: upstreamResponse.status }, + ); + } + + if (!parsedBody || typeof parsedBody !== "object") { + return NextResponse.json( + { + error: "The local Ollama endpoint returned an unreadable response.", + }, + { status: 502 }, + ); + } + + const tokens = extractLab1ResponseTokens(parsedBody); + if (tokens.length === 0) { + return NextResponse.json( + { + error: + "The local Ollama response did not include token logprobs. Confirm the installed Ollama version supports logprobs.", + }, + { status: 502 }, + ); + } + + const content = + extractLab1AssistantContent(parsedBody) || + tokens.map((token) => token.token).join(""); + + return NextResponse.json({ + content, + model: + ("model" in parsedBody && typeof parsedBody.model === "string" + ? parsedBody.model + : getLab1ModelAlias()), + role: "assistant", + tokens, + }); + } catch (caughtError) { + if (caughtError instanceof Error && caughtError.name === "AbortError") { + return NextResponse.json( + { + error: `The local Ollama endpoint timed out after ${Math.floor(LOCAL_OLLAMA_TIMEOUT_MS / 1000)} seconds.`, + }, + { status: 504 }, + ); + } + + return NextResponse.json( + { + error: "The Lab 1 confidence route could not reach the local Ollama endpoint.", + }, + { status: 502 }, + ); + } finally { + clearTimeout(timeoutId); + } +} diff --git a/src/app/api/lab1/models/[filename]/route.ts b/src/app/api/lab1/models/[filename]/route.ts new file mode 100644 index 0000000..b7f96e5 --- /dev/null +++ b/src/app/api/lab1/models/[filename]/route.ts @@ -0,0 +1,75 @@ +import { createReadStream, statSync } from "fs"; +import path from "path"; +import { Readable } from "stream"; + +import { NextResponse } from "next/server"; + +const modelFileMap = { + "llama-3.2-1b-q4_k_m.gguf": { + envKey: "COURSEWARE_LAB1_LLAMA_MODEL_PATH", + fileName: "Llama-3.2-1B.Q4_K_M.gguf", + }, + "qwen3-0.6b-q8_0.gguf": { + envKey: "COURSEWARE_LAB1_QWEN_MODEL_PATH", + fileName: "Qwen3-0.6B-Q8_0.gguf", + }, +} as const; + +type ModelSlug = keyof typeof modelFileMap; + +function resolveModelPath(slug: string) { + const config = modelFileMap[slug as ModelSlug]; + if (!config) { + return null; + } + + const configuredPath = process.env[config.envKey]?.trim(); + if (!configuredPath) { + return null; + } + + return { + absolutePath: path.resolve(configuredPath), + fileName: config.fileName, + }; +} + +export async function GET( + _request: Request, + context: { params: Promise<{ filename: string }> }, +) { + const { filename } = await context.params; + const resolvedFile = resolveModelPath(filename.toLowerCase()); + + if (!resolvedFile) { + return NextResponse.json( + { + error: "The requested Lab 1 model file was not found.", + }, + { status: 404 }, + ); + } + + try { + const fileStats = statSync(resolvedFile.absolutePath); + const stream = Readable.toWeb( + createReadStream(resolvedFile.absolutePath), + ) as ReadableStream; + + return new NextResponse(stream, { + headers: { + "Cache-Control": "private, max-age=0, must-revalidate", + "Content-Disposition": `attachment; filename="${resolvedFile.fileName}"`, + "Content-Length": String(fileStats.size), + "Content-Type": "application/octet-stream", + }, + }); + } catch { + return NextResponse.json( + { + error: "The requested Lab 1 model file could not be opened.", + }, + { status: 404 }, + ); + } +} diff --git a/src/components/labs/Lab1ConfidenceChat.test.tsx b/src/components/labs/Lab1ConfidenceChat.test.tsx new file mode 100644 index 0000000..044b427 --- /dev/null +++ b/src/components/labs/Lab1ConfidenceChat.test.tsx @@ -0,0 +1,84 @@ +import { fireEvent, render, screen } from "@testing-library/react"; +import { beforeEach, describe, expect, it, vi } from "vitest"; + +import { Lab1ConfidenceChat } from "~/components/labs/Lab1ConfidenceChat"; + +describe("Lab1ConfidenceChat", () => { + beforeEach(() => { + vi.restoreAllMocks(); + }); + + it("renders colorized tokens and tooltip data from the Lab 1 chat route", async () => { + vi.stubGlobal( + "fetch", + vi.fn(async () => { + return { + json: async () => ({ + content: "often works", + model: "lab1-qwen3-0.6b-q8_0", + role: "assistant", + tokens: [ + { + logprob: Math.log(0.4), + probability: 40, + token: "often", + topAlternatives: [ + { probability: 14, token: "commonly" }, + { probability: 10, token: "also" }, + ], + }, + { + logprob: Math.log(0.8), + probability: 80, + token: " works", + topAlternatives: [], + }, + ], + }), + ok: true, + }; + }), + ); + + render(); + + fireEvent.change(screen.getByLabelText("Prompt"), { + target: { value: "Explain how often phishing succeeds." }, + }); + fireEvent.submit( + screen.getByRole("button", { name: "Generate Output" }).closest("form")!, + ); + + expect(await screen.findByLabelText("often 40.0%")).toBeInTheDocument(); + expect(screen.getByText("14.0%:")).toBeInTheDocument(); + expect(screen.getByText("commonly")).toBeInTheDocument(); + expect(screen.getByText("lab1-qwen3-0.6b-q8_0")).toBeInTheDocument(); + }); + + it("shows an inline error when the local route fails", async () => { + vi.stubGlobal( + "fetch", + vi.fn(async () => { + return { + json: async () => ({ + error: "The local Ollama request failed.", + }), + ok: false, + }; + }), + ); + + render(); + + fireEvent.change(screen.getByLabelText("Prompt"), { + target: { value: "Trigger an error." }, + }); + fireEvent.submit( + screen.getByRole("button", { name: "Generate Output" }).closest("form")!, + ); + + expect( + await screen.findByText("The local Ollama request failed."), + ).toBeInTheDocument(); + }); +}); diff --git a/src/components/labs/Lab1ConfidenceChat.tsx b/src/components/labs/Lab1ConfidenceChat.tsx new file mode 100644 index 0000000..ad35dfb --- /dev/null +++ b/src/components/labs/Lab1ConfidenceChat.tsx @@ -0,0 +1,244 @@ +"use client"; + +import { FormEvent, useState } from "react"; + +import { + formatProbabilityPercent, + getConfidenceBand, + type Lab1ConfidenceMessage, + type Lab1ConfidenceResponse, + type Lab1ResponseToken, +} from "~/lib/lab1-confidence"; + +type UserTurn = { + content: string; + id: string; + role: "user"; +}; + +type AssistantTurn = Lab1ConfidenceResponse & { + error?: string; + id: string; +}; + +type ChatTurn = AssistantTurn | UserTurn; + +const starterPrompts = [ + "The quick brown fox", + "Write one sentence explaining what a firewall does.", + "List three words that describe a phishing email.", +] as const; + +function buildTurnId() { + return `lab1-turn-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`; +} + +function toConversation(messages: ChatTurn[]) { + return messages.map(({ content, role }) => ({ content, role })); +} + +function renderTooltip(token: Lab1ResponseToken) { + return ( + + {formatProbabilityPercent(token.probability)} + {token.topAlternatives.length > 0 ? ( + + {token.topAlternatives.map((candidate) => ( + + {formatProbabilityPercent(candidate.probability)}:{" "} + {candidate.token} + + ))} + + ) : ( + + No alternate tokens returned for this position. + + )} + + ); +} + +export function Lab1ConfidenceChat() { + const [draft, setDraft] = useState(starterPrompts[0]); + const [messages, setMessages] = useState([]); + const [error, setError] = useState(null); + const [isSubmitting, setIsSubmitting] = useState(false); + + async function handleSubmit(event: FormEvent) { + event.preventDefault(); + + const prompt = draft.trim(); + if (!prompt) { + setError("Enter a prompt to inspect the model output."); + return; + } + + const nextUserTurn: UserTurn = { + content: prompt, + id: buildTurnId(), + role: "user", + }; + const nextConversation = [...messages, nextUserTurn]; + + setMessages(nextConversation); + setDraft(""); + setError(null); + setIsSubmitting(true); + + try { + const response = await fetch("/api/lab1/chat", { + body: JSON.stringify({ + messages: toConversation(nextConversation), + }), + headers: { + "Content-Type": "application/json", + }, + method: "POST", + }); + + const payload = (await response.json()) as Lab1ConfidenceResponse & { + error?: string; + }; + + if (!response.ok) { + throw new Error(payload.error || "The local Ollama request failed."); + } + + setMessages((currentMessages) => [ + ...currentMessages, + { + ...payload, + id: buildTurnId(), + }, + ]); + } catch (caughtError) { + setError( + caughtError instanceof Error + ? caughtError.message + : "The local Ollama request failed.", + ); + } finally { + setIsSubmitting(false); + } + } + + return ( +
+
+

Lab 1 Confidence View

+

Visualize token confidence locally

+

+ This widget uses the preloaded local Lab 1 Qwen model. Hover over any + output token to inspect its probability and the strongest alternate + predictions returned for that position. +

+
+ +
+ {starterPrompts.map((prompt) => ( + + ))} +
+ +
+ {messages.length === 0 ? ( +
+ Try a short prompt first. +

+ Start with one of the suggested prompts, then hover across the + model output to compare high-confidence and low-confidence tokens. +

+
+ ) : ( + messages.map((message) => { + if (message.role === "user") { + return ( +
+
+ You +
+
+                    {message.content}
+                  
+
+ ); + } + + return ( +
+
+ Assistant + {message.model} +
+ +
+ {message.tokens.map((token, index) => ( + + {token.token} + {renderTooltip(token)} + + ))} +
+ + {message.error ? ( +

+ {message.error} +

+ ) : null} +
+ ); + }) + )} +
+ +
+ +