Files
LLM-Labs/content/labs/lab-1-visualization-in-transformerlab.md
T
2026-03-22 16:17:20 -06:00

12 KiB
Raw Blame History

Lab 1 - Visualizing LLMs in TransformerLab

In this lab, we will:

  • Download and Visualize LLama-3.2-1B-Instruct
  • Visualize Tokenization & Prediction with LLama-3.2-1B-Instruct
Lab Flow Guide
Explore sections focus on observation and interpretation.
Execute steps require performing actions in the lab environment.

Objective 1: Starting TransformerLab

Execute: Access the Lab Environment

To start Lab 1, ensure you've received a WireGuard configuration and system IP from your instructor. If you're unfamiliar with WireGuard, assistance will be provided to ensure you can access the lab environment for the duration of class.

All systems use the default username and password of student. All labs are located in the student home folder. To start Lab 1, run

~/lab1/lab1_start.sh

using the lab1_start.sh script in the lab1 folder.

Lastly, if necessary, you can su - to root at any time. No password will be required

Objective 2: Visualizing a LLM

Explore: Understand the Model and Runtime

The next steps will guide us through the process of deploying and interacting with a pre-trained LLM, LLama-3.2-1B-Instruct. To do this, well be utilizing an inference engine software designed to execute LLM models and generate token predictions. You'll encounter models packaged in the GGUF format, a file format designed for efficient storage and loading of quantized LLMs, enabling them to run on a wider range of hardware. Don't worry if these terms are new to you the specifics of inference engines and the details of GGUF quantized LLMs will be thoroughly explained in the following section of this course.

Normally to start, we'll need to install an inference engine capable of running GGUF files.

Execute: Verify the FastChat Plugin

Navigate to Plugins, and in the search bar type Fastchat. Note that it has already been installed for you!

Plugins

Execute: Find and Load LLama-3.2-1B-Instruct

Next, navigate to Model Registry. You should see LLama-3.2-1B-Instruct right away on your screen, but if not, please start searching for this model using the search bar.

Model Registry Selection.

Once downloaded, Select Foundation & our newly downloaded LLama-3.2-1B-Instruct model.

Model Selection

Once selected, click Run. Give TransformerLab a moment to successfully load the model.

Starting a Model

Explore: Inspect the Architecture View

To start, lets navigate to the Interact page, and then select Model Architecture from the Chat drop down.

Model Architecture Dropdown

This page allows us to visualize the actively loaded model, in this case our downloaded LLama-3.2-1B-Instruct-. This interactive view is equivalent to the greatly simplified version shown on the slide “Transformation: Multylayer Perceptron” from our lecture. We can explore this view by:

  • Holding down both right and left mouse buttons and dragging will move the entire model.
  • Holding down just the left mouse button will allow you to rotate the view.
Model Visualization

Explore: Interpret Layers, Blocks, and Parameters

Each layer of the model performs a specific task, taking the input provided, and transforming it into the statistically most likely completion of text, token by token. This format of Llama 3.1 1B is made up of 372 layers. Each layer will transform the input of the layer above it, until eventually, we end up with the statically likely completion. You have likely also noticed that the colors repeat. Each set of repeating layers is organized into blocks. Each block is a grouping of layers that perform the same functions, but with a slightly different focus. For example, one block may focus on nouns, and another may focus on adjectives, and so on.

The layers within Llama 3.1 1B are as follows:

  • Attention: Focuses the model on specific parts of an input sequence to more accurately predict the next token.
  • Weights: The core learnable parameters of the network.
  • Biases: Additional parameters added after the weighted sum to shift (transform) the output.
  • Scale: Normalizes the output of previous layers to prepare the next round of transformation.

Each of these layers also has a different type, corresponding to Q, K, V, and much more. 5. The layers between the small “Attention” layers are all considered to make up a single “block.” To the side, we can see the actual number values of each weight within each layer.

Fundamentally, the LLM itself is this stack of numbers. Those numbers allow us to transform tokenized input (such as English), and transform that into a useful output. The more layers & blocks, the bigger the model, the more accurate and “intelligent” the model will behave. This 1B parameter model is incredibly small however, so the “truthfulness” of generated predictions is likely to be suspect (aka Hallucinated). The model will at least sound very confident however!



Objective 3: Tokenization & Prediction with LLama-3.2-1B-Instruct

Execute: Interactive Chat

Lets next move on to active conversation with the model. Navigate to the Chat tab from the dropdown menu.

Select Chat

Once loaded, feel free to type any message and interact with the model in any way. To speed up the pace of our lab, I recommend setting your maximum output length to 64 tokens.

Maximum Length - 64

If text generation fails, or acts weird (such as merely repeating your input back to you), unload and reload the model using the previous Foundation screen from the last Objective.

Execute: View Tokenization

If everything is in working order, review the Tokenize view. This allows us to visually see how Llama 3.2 will convert our input text into “tokens,” or numbers that represent the input English. Feel free to input any sentence into the box to review what the final tokenized version will be.

Tokenize View

Execute: Visualize Next-Token Activations

Next, select Model Activations. By entering “The quick brown fox” and selecting visualize, we can see how the model selects the next word, and the models level of confidence. Also feel free to redo this process with alternative sentences.

Next Word Prediction

Execute: Compare Confidence Views

Note how confident the model is about the word jumps in this famous phrase. For an alternative view of the same output, you can also select the Visualize Logprobes option from the menu, which will show the same information but by color.

Green is Confident. Red is less confident.

Explore: Continue Exploring TransformerLab Features

Please continue to explore Transformers Lab until youre ready to move on. While we will utilize many different tools other than Transformers Lab throughout this course due to its beta nature, this software is improving all the time and is worth watching! Transformers lab supports many advanced features, in various stages of development, such as:

  • Batch Text Generation
  • LLM Fine Tuning
  • LLM Evaluation
  • Retrieval Augmented Generation (RAG) We will discuss these topics and more throughout the course.


Conclusion

In this lab, we observed the foundational concepts of all LLMs in action using TransformerLab. Through hands-on exploration, we observed the process of tokenization how text is converted into numerical representations for the model and visualized the model's prediction process, including its confidence levels for different token selections. By navigating the models layers and blocks, we gained an appreciation for the sheer scale and complexity inherent in modern LLMs.

This initial experience provides a crucial stepping stone for further exploration of LLMs, laying the groundwork for future labs focused on fine-tuning, evaluation, and advanced techniques like Retrieval Augmented Generation.