c4ch3c4d3 86a5df4681 Update Lab 2 Ollama Gemma model
Made-with: Cursor
2026-04-25 15:40:50 -06:00
2026-04-25 15:40:50 -06:00
2026-03-31 19:46:14 -06:00
2026-03-31 19:46:14 -06:00
2026-04-25 15:40:50 -06:00

Local Courseware Deployment

This project builds a student-friendly local lab environment for the courseware with a small control surface:

  • ./deploy-courseware.sh installs and configures the environment, then starts every managed service.
  • ./destroy-courseware.sh stops the managed services, uninstalls courseware-managed Ollama, and removes the project-owned lab state.
  • ./labctl provides day-two controls such as assets lab2, ollama_models, update_wiki, start, stop, status, urls, logs, and open kiln.

What It Installs

  • Ollama
  • llama.cpp
  • Netron, served locally on port 8338
  • Open WebUI
  • ChunkViz
  • Embedding Atlas
  • Promptfoo
  • Unsloth Studio
  • Kiln Desktop
  • Course-specific support assets for lab 1, lab 2, and lab 4

Lab 1 Defaults

Lab 1 is now provisioned directly by the installer:

  • The Llama-3.2-1B.Q4_K_M.gguf file is mirrored into state/models/lab1/.
  • The Lab 1 confidence widget uses the pre-pulled Gemma 4 E2B Q4 Ollama model, batiai/gemma4-e2b:q4.
  • The wiki serves a same-host download link for the Llama GGUF through /api/lab1/models/....
  • Lab 1 confidence visualization requires Ollama 0.12.11 or newer because it depends on logprobs.

Lab 2 Defaults

./labctl up now pre-pulls the Gemma 4 E2B Ollama variants used by the wiki widgets:

  • gemma4:e2b-it-q8_0
  • batiai/gemma4-e2b:q4
  • batiai/gemma4-e2b:q6

If you want to re-pull just those managed Ollama models later, run ./labctl ollama_models.

Supported Host Profiles

This build is the Linux/WSL variant of LLM Labs Local. If you are deploying on Apple Silicon macOS, use the sibling LLM-Labs-Local-Mac project instead.

  • Native Debian/Ubuntu: Debian-family Linux with an NVIDIA GPU visible to nvidia-smi and at least 8 GB VRAM.
  • WSL: Debian/Ubuntu-family Linux running under WSL, with the NVIDIA GPU exposed into the distro.

The launcher and Ansible preflight classify the host dynamically and apply different setup behavior for:

  • native-debian-ubuntu
  • wsl

WSL Check

If you run this inside WSL, the launcher checks GPU readiness before Ansible starts.

If that check fails, fix WSL first:

  • Install or update the NVIDIA Windows driver with WSL/CUDA support
  • Run wsl --update in Windows PowerShell
  • Run wsl --shutdown
  • Reopen WSL and confirm nvidia-smi works

Important: nvidia-smi is only the driver check. Building CUDA-enabled llama.cpp also requires the Linux-side CUDA toolkit inside the distro.

On Linux and WSL, the first ./labctl up or ./labctl preflight run may prompt once for your sudo password so Ansible can install system packages.

On Ubuntu WSL x86_64, preflight now installs the Linux-side CUDA toolkit automatically if it is missing.

It first tries the distro package:

  • sudo apt install -y nvidia-cuda-toolkit

If that package is unavailable or still does not expose nvcc, the installer falls back to NVIDIA's WSL-Ubuntu repository bootstrap for the toolkit only, not a Linux GPU driver.

If the automatic bootstrap still fails, verify:

  • nvcc --version
  • ls /usr/local/cuda/include/cuda_runtime.h

For non-Ubuntu WSL distros, install the CUDA toolkit manually before running the deploy script.

Native Debian/Ubuntu CUDA Behavior

On native Debian/Ubuntu hosts, the installer handles three CUDA-toolkit cases:

  • If the toolkit is already usable, it reuses the existing install instead of forcing a reinstall.
  • If the distro exposes nvidia-cuda-toolkit, it installs that package.
  • If the distro package is unavailable, it bootstraps NVIDIA's official CUDA network repository for supported native Debian/Ubuntu releases and installs the toolkit from there.

If apt starts in a broken dependency state, the installer attempts dpkg --configure -a and apt-get --fix-broken install before retrying package installation.

If CUDA is already mounted or preinstalled outside PATH, the installer detects standard locations such as /usr/local/cuda/bin/nvcc and /usr/local/cuda-*/bin/nvcc.

Standard Assumptions

  • The default deployment is centered on Ollama-backed local inference and browser-based tools such as Netron and the wiki.
  • Netron is installed into a managed Python virtual environment and served locally instead of being provisioned as a desktop package.
  • Lab 1's Llama GGUF download is mirrored locally during ./labctl up, so students do not have to fetch it manually from the original source.
  • WhiteRabbitNeo assets remain a separate Lab 2 flow and are still handled outside the default ./labctl up run.
  • Run ./labctl assets lab2 when you want to populate repo-local Lab 2 assets in assets/lab2/ from Hugging Face.
  • After base setup, run state/lab2/download_whiterabbitneo-gguf.sh to fetch only the Q4_K_M, Q8_0, and IQ2_M files from bartowski/WhiteRabbitNeo_WhiteRabbitNeo-V3-7B-GGUF and register local Ollama models WhiteRabbitNeo, WhiteRabbitNeo-Q4, WhiteRabbitNeo-Q8, and WhiteRabbitNeo-IQ2.
  • Unsloth homes are redirected into this project's state/ tree via symlinks.
  • Managed web services bind on all interfaces for headless LAN/VPN access. labctl urls reports the detected LAN IP by default; set COURSEWARE_URL_HOST=<host-or-ip> before ./labctl up to advertise a specific VPN DNS name or address.
  • The local Ansible bootstrap in .venv-ansible/ is machine-specific and will be recreated automatically if the folder is copied between hosts.
  • llama.cpp uses a conservative, memory-aware build parallelism setting instead of an unbounded -j build, which avoids OOM failures on smaller Linux and WSL hosts.

Lab URLs

After ./deploy-courseware.sh, run ./labctl urls.

Default endpoints use the detected host LAN IP:

  • Ollama API: http://<host-lan-ip>:11434
  • Open WebUI: http://<host-lan-ip>:8080
  • Netron: http://<host-lan-ip>:8338
  • ChunkViz: http://<host-lan-ip>:3001
  • Embedding Atlas: http://<host-lan-ip>:5055
  • Unsloth Studio: http://<host-lan-ip>:8888
  • Promptfoo UI: http://<host-lan-ip>:15500
  • Wiki: http://<host-lan-ip>:80
  • Lab 3 Terminal: http://<host-lan-ip>:7681/wetty

Lab 3 Browser Terminal

The deployment will:

  • bind sshd to 0.0.0.0:22 so regular SSH clients can connect over the LAN/VPN
  • install WeTTY and expose it at http://<host-lan-ip>:7681/wetty
  • leave login identity management to the host, so any existing local account with password-based SSH access can sign in through SSH or the browser terminal

Notes

  • ./labctl up installs the environment and then starts every managed service.
  • ./labctl versions shows the pinned Netron version, minimum Ollama version, and Ansible runtime version used by this workspace.
  • ./labctl assets lab2 is a separate manual step that clones the base WhiteRabbitNeo repo into assets/lab2/WhiteRabbitNeo-V3-7B and downloads the supported Q4_K_M, Q8_0, and IQ2_M GGUFs into assets/lab2/WhiteRabbitNeo_WhiteRabbitNeo-V3-7B-GGUF.
  • ./labctl ollama_models re-pulls the managed Lab 2 Gemma 4 E2B Ollama model set without rerunning the full installer.
  • ./labctl update_wiki hard-resets the managed wiki checkout to the remote latest, rebuilds it, and restarts only the managed wiki service on port 80.
  • ./labctl start core starts only ollama and open-webui.
  • ./labctl start all starts every managed web service.
  • ./labctl open kiln launches the Kiln desktop app installed into the project state.
  • The scripted Promptfoo install drops a starter config at state/lab6/promptfoo.yaml.
  • labctl start all includes Promptfoo via promptfoo view and the cloned wiki app.
  • Lab 2 includes state/lab2/download_whiterabbitneo-gguf.sh, which uses git + git lfs to pull only the supported WhiteRabbitNeo quants. Add --download-only if you want the files without Ollama registration.
  • The wiki is cloned from https://git.zuccaro.me/bzuccaro/LLM-Labs.git into state/repos/LLM-Labs and started with npm.
  • ./labctl down uninstalls Ollama entirely when this project installed it, instead of only stopping the service.
  • This variant is intended for NVIDIA-backed Linux/WSL training and lab workflows.
S
Description
No description provided
Readme 2.5 MiB
Languages
Shell 64.5%
Python 22.8%
Jinja 12.7%