202 lines
8.7 KiB
Markdown
202 lines
8.7 KiB
Markdown
---
|
|
order: 1
|
|
title: Lab 1 - Model Structure, Tokenization, and Confidence Visualization
|
|
description: Explore GGUF model structure in Netron, inspect tokenization interactively, and visualize token confidence with a local Ollama model.
|
|
---
|
|
|
|
<!-- breakout-style: instruction-rails -->
|
|
<!-- step-style: underline -->
|
|
<!-- objective-style: divider -->
|
|
|
|
# Lab 1 - Model Structure, Tokenization, and Confidence Visualization
|
|
|
|
In this lab, we will:
|
|
|
|
- Visualize two small GGUF models in Netron
|
|
- Observe how text is split into tokens and token IDs
|
|
- Inspect the confidence of a local model one token at a time
|
|
|
|
<div class="lab-callout lab-callout--info">
|
|
<strong>Lab Flow Guide</strong><br />
|
|
<strong>Explore</strong> sections focus on observation and interpretation.<br />
|
|
<strong>Execute</strong> steps require performing actions in the lab environment.
|
|
</div>
|
|
|
|
## Objective 1: Visualize Tokenization and Token IDs
|
|
|
|
### Execute: Use the Tokenizer Playground
|
|
|
|
The below embedded tool below allows you to enter raw text and observe how it's converted into model tokens. Tokenization is the critical first step that enables a Large Language Model to process and understand user input, accomplished by transforming words into numerical values.
|
|
|
|
<div data-tokenizer-playground></div>
|
|
|
|
### Explore: Try Multiple Inputs
|
|
|
|
Enter several different inputs and compare how the tokenization changes. Use at least these three examples:
|
|
|
|
1. `The quick brown fox jumps over the lazy dog`
|
|
2. `cybersecurity analyst`
|
|
3. `printf("hello");`
|
|
|
|
Then try a few of your own. Short English phrases, punctuation, code, and unusual spacing are all good choices.
|
|
|
|
### Explore: Compare the Two Tokenization Views
|
|
|
|
This tool is especially useful because it shows both:
|
|
|
|
- The **visual split** of the text into tokens
|
|
- The underlying **token ID values**
|
|
|
|
Those are two views of the same process.
|
|
|
|
The visual split helps us see where the model grouped characters or subwords together. The token ID view reminds us that the model never consumes English directly. It consumes numeric identifiers that point into the tokenizer vocabulary.
|
|
|
|
As you work through your examples, ask:
|
|
|
|
- Which full words remain intact?
|
|
- Which words get split into subwords or punctuation chunks?
|
|
- When spacing changes, do the token IDs change too?
|
|
|
|
Lastly, experiment with how different tokenizeres can change how inputs are split into different numerical values. How might this affect the next steps in the transformation process?
|
|
|
|
<figure style="text-align:center;">
|
|
<a href="https://i.imgur.com/kc8W4gU.png" target="_blank">
|
|
<img src="https://i.imgur.com/kc8W4gU.png" width="800" style="border:5px solid black;">
|
|
</a>
|
|
<figcaption>Tokenization - GPT3</figcaption>
|
|
</figure>
|
|
<br>
|
|
<figure style="text-align:center;">
|
|
<a href="https://i.imgur.com/xMKEBwB.png" target="_blank">
|
|
<img src="https://i.imgur.com/xMKEBwB.png" width="800" style="border:5px solid black;">
|
|
</a>
|
|
<figcaption>Tokenization - GPT4</figcaption>
|
|
</figure>
|
|
|
|
---
|
|
|
|
|
|
## Objective 2: Open Netron and Download the Lab Models
|
|
|
|
### Execute: Launch Netron
|
|
|
|
For this lab, model visualization now happens in **Netron**, a lightweight browser tool for inspecting model structure.
|
|
|
|
Use the launch panel below to open the local Netron service on port `8338`.
|
|
|
|
<div data-lab1-netron-panel></div>
|
|
|
|
### Execute: Download the GGUF File
|
|
|
|
You will work with a small GGUF model in this objective. Download the provided file:
|
|
|
|
- [Llama-3.2-1B.Q4_K_M.gguf](/api/lab1/models/llama-3.2-1b-q4_k_m.gguf) for Llama 3.2 1B
|
|
|
|
This file is intentionally small enough to make architecture exploration practical in a classroom lab. Save it to a convenient location such as your `Downloads` folder. Once you've downloaded the file, you can open it using the "Open Model" button on the Netron home page.
|
|
|
|
<figure style="text-align:center;">
|
|
<a href="https://i.imgur.com/Y7QpGpG.png" target="_blank">
|
|
<img src="https://i.imgur.com/Y7QpGpG.png" width="800" style="border:5px solid black;">
|
|
</a>
|
|
<figcaption>Netron Start Page</figcaption>
|
|
</figure>
|
|
|
|
Once Netron is open:
|
|
|
|
1. Select **Open Model** or drag a GGUF file directly into the browser window.
|
|
2. Open `Llama 3.2 1B`.
|
|
|
|
Netron will display the model as a graph of tensors, operators, and named blocks. This is a more literal view than the simplified lecture diagrams, but it is still showing the same fundamental idea: the model is a large stack of numeric values, each serving a different purpose to model language.
|
|
|
|
### Explore: What to Look For
|
|
|
|
As you move around the graph, focus on these three recurring structures. Each of these grouping of these individual *layers* is what defines a *block*:
|
|
|
|
<ul class="concept-pill-list">
|
|
<li>
|
|
<span class="concept-pill-label">Tokenization:</span>
|
|
<span>Converts textual input into numeric values, a requirement to allow the Machine to understand a user's input.</span>
|
|
</li>
|
|
<li>
|
|
<span class="concept-pill-label">Embedding:</span>
|
|
<span>Takes tokenized ID values, and converts them into positional vectors the model can perform transformation against.</span>
|
|
</li>
|
|
<li>
|
|
<span class="concept-pill-label">Multi-head attention:</span>
|
|
<span>"Attends" to the Query (What am I looking for?), Key (What do I contain?), and Value (What do I pass on?) of each token. .</span>
|
|
</li>
|
|
<li>
|
|
<span class="concept-pill-label">Feed-forward / mulmat:</span>
|
|
<span>Applies learned "transformations" after attention to further refine each token representation.</span>
|
|
</li>
|
|
</ul>
|
|
|
|
Notably, even small local models are composed of many repeated blocks. The exact count varies by model family, size, and export format, but the important pattern is the repeated attention and feed-forward structure.
|
|
|
|
Lastly, you may see labels such as **MatMul**, **Mul**, or **mulmat**, depending on how the graph was exported and named. In practice, these are often part of the feed-forward path that expands and reshapes the model's internal representation before passing it onward.
|
|
|
|
**Inspect the Small Model**
|
|
|
|
This model is small compared to modern production systems, but it is still large enough to reveal repeating architectural patterns.
|
|
|
|
As you inspect it, ask:
|
|
|
|
- Where do the repeating blocks begin to stand out?
|
|
- Which names remain stable across repeated blocks?
|
|
- How many *Attention Heads* does the model have? How might this affect transformations predicted by the model?
|
|
|
|
<figure style="display:flex; flex-direction:column; align-items:center; text-align:center;">
|
|
<a href="https://i.imgur.com/WhnFZss.png" target="_blank" style="display:block; max-width:100%;">
|
|
<img src="https://i.imgur.com/WhnFZss.png" width="600" style="display:block; max-width:100%; height:auto; border:5px solid black;">
|
|
</a>
|
|
<figcaption>Netron Qwen 3 0.6B Layers 1 & 2</figcaption>
|
|
</figure>
|
|
|
|
---
|
|
|
|
|
|
## Objective 3: Visualize Prediction Confidence
|
|
|
|
### Execute: Run the Local Confidence Widget
|
|
|
|
The widget below talks to the preloaded Gemma 4 E2B Q4 model through Ollama. Enter any prompt you like, generate a response, and then hover over the output tokens.
|
|
|
|
<div data-lab1-confidence></div>
|
|
|
|
### Explore: Interpret the Color Coding
|
|
|
|
Each token in the output is colored by the model's confidence in that selected token.
|
|
|
|
In general:
|
|
|
|
- Greener tokens indicate the model was more confident in that choice
|
|
- Warmer yellow or orange tokens indicate a weaker preference
|
|
- Hovering over a token reveals the selected token's percentage and the strongest alternate predictions
|
|
|
|
This is useful because it shows us that model output is not magic or certainty. Each generated token is chosen from a probability distribution over many possible next tokens.
|
|
|
|
### Explore: Try Different Prompt Styles
|
|
|
|
To make the confidence view more interesting, compare:
|
|
|
|
1. A common phrase such as `The quick brown fox`
|
|
2. A factual question
|
|
3. A short cybersecurity prompt
|
|
|
|
Notice where the model appears highly certain and where it becomes less stable. Small local models often produce text that sounds very confident even when the underlying prediction distribution is more fragile than it first appears.
|
|
|
|
<div class="lab-screenshot-placeholder">
|
|
<strong>Screenshot Placeholder</strong>
|
|
Confidence heatmap and hover tooltip view.
|
|
</div>
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
In this lab, we explored three foundational views of an LLM.
|
|
|
|
First, we opened two GGUF model files in Netron and inspected the architecture directly. Then we used a tokenizer playground to see how plain text becomes tokens and token IDs. Finally, we used a local confidence visualizer to watch a small model generate output token by token while exposing how certain it was about each choice.
|
|
|
|
Together, these three perspectives give us a much more grounded picture of what an LLM actually is: a structured file of learned weights, a tokenizer that converts text into IDs, and a prediction engine that selects the next token from a probability distribution.
|