Update lab model defaults and assets

2026-04-24 20:08:56 -06:00
parent fcb2dcb36d
commit 562be3fd1f
18 changed files with 8971 additions and 916856 deletions
@@ -1,7 +1,7 @@
 ---
 order: 2
-title: "Lab 2 - Quantization Tradeoffs: Comparing 2-bit, 4-bit, and 8-bit"
-description: Compare Gemma 4 E2B in three Ollama quantizations and study how lower precision changes behavior.
+title: "Lab 2 - Quantization Tradeoffs: Comparing 4-bit and 6-bit"
+description: Compare Gemma 4 E2B in two Ollama quantizations and study how lower precision changes behavior.
 ---

 <!-- breakout-style: instruction-rails -->
@@ -10,7 +10,7 @@ description: Compare Gemma 4 E2B in three Ollama quantizations and study how low

 In this lab, we will:

- Pull the same Gemma model in Q2, Q4, and Q8 Ollama variants
+- Pull the same Gemma model in Q4 and Q6 Ollama variants
 - Compare the quantization labels and model behavior across those variants
 - Observe how lower precision changes the model's behavior
 - Build intuition for when a smaller quant may or may not be worth it
@@ -23,13 +23,12 @@ In this lab, we will:

 ## Objective 1: Understand the Model and the Quantizations

-For this lab, we will use three Ollama-published variants of **Gemma 4 E2B** that represent distinct precision bands:
+For this lab, we will use two Ollama-published variants of **Gemma 4 E2B** that represent distinct precision bands:

-| Precision band | Ollama model tag                    | Why we are using it                     |
-| -------------- | ----------------------------------- | --------------------------------------- |
-| Q2             | `cajina/gemma4_e2b-q2_k_xl:v01`     | Most aggressive compression in this lab |
-| Q4             | `batiai/gemma4-e2b:q4`              | Common middle-ground quant              |
-| Q8             | `bjoernb/gemma4-e2b-fast:latest`    | Highest-quality quant in this lab       |
+| Precision band | Ollama model tag       | Why we are using it               |
+| -------------- | ---------------------- | --------------------------------- |
+| Q4             | `batiai/gemma4-e2b:q4` | Faster, smaller quant             |
+| Q6             | `batiai/gemma4-e2b:q6` | Higher-quality quant in this lab  |

 Even though the Ollama tags differ, these are all variants of the same underlying Gemma 4 E2B model family. The main variable we are changing is how the weights are stored.

@@ -83,7 +82,7 @@ Those are not identical to the original weight, but they may still be close enou

 ### Explore: Interactive precision viewer

-The viewer below zooms out from one weight and instead shows a toy layer with 16 stored values. Real GGUF schemes such as `Q4_K_M` and `UD-IQ2_M` are more sophisticated than this toy example, but the core idea is the same:
+The viewer below zooms out from one weight and instead shows a toy layer with 16 stored values. Real GGUF schemes such as `Q4_K_M` and `Q6_K` are more sophisticated than this toy example, but the core idea is the same:

 - Fewer bits means fewer representable values
 - More weights get pushed into the same small set of stored buckets
@@ -93,11 +92,11 @@ The viewer below zooms out from one weight and instead shows a toy layer with 16

 ### Explore: Compare the same prompts through the hosted chat widget

-By default, the widget below points to the courseware-managed Ollama service and the three Lab 2 model tags above. You can still switch to another endpoint if your instructor provides one.
+By default, the widget below points to the courseware-managed Ollama service and the Lab 2 model tags above. You can still switch to another endpoint if your instructor provides one.

 - Use the preloaded managed endpoint or replace it with another compatible endpoint
 - Optionally add an API key if your chosen endpoint requires one
- Switch between the configured Q2, Q4, and Q8 Gemma variants
+- Switch between the configured Q4 and Q6 Gemma variants
 - Re-run the same prompt so you can compare coherence, stability, and SVG output
 - Try a visual prompt such as `Draw a pelican riding a bicycle.`

@@ -109,7 +108,7 @@ The widget keeps the transcript in your browser so you can switch models without

 By this point, you should have:

- Compared three quantized versions of the same model
+- Compared two quantized versions of the same model
 - Measured the storage savings directly
 - Verified that the core model metadata remains largely the same
 - Observed where output quality begins to degrade
@@ -118,4 +117,4 @@ The important takeaway is not that one quant is always "best." The important tak

 ## Conclusion

-This lab isolates quantization as the main variable. By comparing **Gemma 4 E2B** in Q2, Q4, and Q8 Ollama variants, you can directly observe one of the most important tradeoffs in local inference: balancing model quality against efficiency and resource constraints.
+This lab isolates quantization as the main variable. By comparing **Gemma 4 E2B** in Q4 and Q6 Ollama variants, you can directly observe one of the most important tradeoffs in local inference: balancing model quality against efficiency and resource constraints.