Files

T

bzuccaro e4621ca65b Refactor lab 1 for Netron and local confidence views

2026-04-16 11:15:39 -06:00

8.6 KiB

Raw Blame History

order, title, description

order	title	description
1	Lab 1 - Model Structure, Tokenization, and Confidence Visualization	Explore GGUF model structure in Netron, inspect tokenization interactively, and visualize token confidence with a local Ollama model.

Lab 1 - Model Structure, Tokenization, and Confidence Visualization

In this lab, we will:

Visualize two small GGUF models in Netron
Observe how text is split into tokens and token IDs
Inspect the confidence of a local model one token at a time

Lab Flow Guide
Explore sections focus on observation and interpretation.
Execute steps require performing actions in the lab environment.

Objective 1: Visualize Tokenization and Token IDs

Execute: Use the Tokenizer Playground

The below embedded tool below allows you to enter raw text and observe how it's converted into model tokens. Tokenization is the critical first step that enables a Large Language Model to process and understand user input, accomplished by transforming words into numerical values.

Explore: Try Multiple Inputs

Enter several different inputs and compare how the tokenization changes. Use at least these three examples:

The quick brown fox jumps over the lazy dog
cybersecurity analyst
printf("hello");

Then try a few of your own. Short English phrases, punctuation, code, and unusual spacing are all good choices.

Explore: Compare the Two Tokenization Views

This tool is especially useful because it shows both:

The visual split of the text into tokens
The underlying token ID values

Those are two views of the same process.

The visual split helps us see where the model grouped characters or subwords together. The token ID view reminds us that the model never consumes English directly. It consumes numeric identifiers that point into the tokenizer vocabulary.

As you work through your examples, ask:

Which full words remain intact?
Which words get split into subwords or punctuation chunks?
When spacing changes, do the token IDs change too?

Lastly, experiment with how different tokenizeres can change how inputs are split into different numerical values. How might this affect the next steps in the transformation process?

Objective 2: Open Netron and Download the Lab Models

Execute: Launch Netron

For this lab, model visualization now happens in Netron, a lightweight browser tool for inspecting model structure.

Use the launch panel below to open the local Netron service on port 8338.

Execute: Download the Two GGUF Files

You will work with two small GGUF models in this objective:

These files are intentionally small enough to make architecture exploration practical in a classroom lab. Download both files to a convenient location such as your Downloads folder. Once you've downloaded your files, you can open them using the "Open Model" Button on the Netron Homepage.

Once Netron is open:

Select Open Model or drag a GGUF file directly into the browser window.
Start with Qwen 3 0.6B.

Netron will display the model as a graph of tensors, operators, and named blocks. This is a more literal view than the simplified lecture diagrams, but it is still showing the same fundamental idea: the model is a large stack of numeric values, each serving a different purpose to model language.

Explore: What to Look For

As you move around the graph, focus on these three recurring structures. Each of these grouping of these individual layers is what defines a block:

Tokenization: Converts textual input into numeric values, a requirement to allow the Machine to understand a user's input.
Embedding: Takes tokenized ID values, and converts them into positional vectors the model can perform transformation against.
Multi-head attention: "Attends" to the Query (What am I looking for?), Key (What do I contain?), and Value (What do I pass on?) of each token. .
Feed-forward / mulmat: Applies learned "transformations" after attention to further refine each token representation.

Notably, Qwen 3 0.6B is composed of 28 of these blocks! This is signifigantly more than GPT-2 (12 blocks), despite this model being 1/3rd the size!

Lastly, you may see labels such as MatMul, Mul, or mulmat, depending on how the graph was exported and named. In practice, these are often part of the feed-forward path that expands and reshapes the model's internal representation before passing it onward.

Compare the Two Small Models

Both models are small compared to modern production systems, but they are still large enough to reveal repeating architectural patterns.

As you compare them, ask:

Where do the repeating blocks begin to stand out?
Which names remain stable between the two models?
How many Attention Heads does each model have? How might this affect transformations predicted by the model?

Netron Qwen 3 0.6B Layers 1 & 2

Objective 3: Visualize Prediction Confidence

The widget below talks to the preloaded local Lab 1 model through Ollama. Enter any prompt you like, generate a response, and then hover over the output tokens.

Explore: Interpret the Color Coding

Each token in the output is colored by the model's confidence in that selected token.

In general:

Greener tokens indicate the model was more confident in that choice
Warmer yellow or orange tokens indicate a weaker preference
Hovering over a token reveals the selected token's percentage and the strongest alternate predictions

This is useful because it shows us that model output is not magic or certainty. Each generated token is chosen from a probability distribution over many possible next tokens.

Explore: Try Different Prompt Styles

To make the confidence view more interesting, compare:

A common phrase such as The quick brown fox
A factual question
A short cybersecurity prompt

Notice where the model appears highly certain and where it becomes less stable. Small local models often produce text that sounds very confident even when the underlying prediction distribution is more fragile than it first appears.

Screenshot Placeholder Confidence heatmap and hover tooltip view.

Conclusion

In this lab, we explored three foundational views of an LLM.

First, we opened two GGUF model files in Netron and inspected the architecture directly. Then we used a tokenizer playground to see how plain text becomes tokens and token IDs. Finally, we used a local confidence visualizer to watch a small model generate output token by token while exposing how certain it was about each choice.

Together, these three perspectives give us a much more grounded picture of what an LLM actually is: a structured file of learned weights, a tokenizer that converts text into IDs, and a prediction engine that selects the next token from a probability distribution.

8.6 KiB Raw Blame History