# Local Courseware Deployment This project builds a student-friendly local lab environment for the courseware with a small control surface: - `./deploy-courseware.sh` installs and configures the environment, then starts every managed service. - `./destroy-courseware.sh` stops the managed services, uninstalls courseware-managed Ollama, and removes the project-owned lab state. - `./labctl` provides day-two controls such as `assets lab2`, `ollama_models`, `update_wiki`, `start`, `stop`, `status`, `urls`, and `logs`. ## What It Installs - Ollama - `llama.cpp` - Netron, served locally on port `8338` - Open WebUI - ChunkViz - Embedding Atlas - Promptfoo - Unsloth Studio - Course-specific support assets for lab 1, lab 2, and lab 4 ## Lab 1 Defaults Lab 1 is now provisioned directly by the installer: - The `Llama-3.2-1B.Q4_K_M.gguf` file is mirrored into `state/models/lab1/`. - The Lab 1 confidence widget uses the pre-pulled Gemma 4 E2B Q4 Ollama model, `batiai/gemma4-e2b:q4`. - The wiki serves a same-host download link for the Llama GGUF through `/api/lab1/models/...`. - Lab 1 confidence visualization requires Ollama `0.12.11` or newer because it depends on logprobs. ## Lab 2 Defaults `./labctl up` now pre-pulls the Gemma 4 E2B Ollama variants used by the wiki widgets: - `cajina/gemma4_e2b-q2_k_xl:v01` - `batiai/gemma4-e2b:q4` - `batiai/gemma4-e2b:q6` If you want to re-pull just those managed Ollama models later, run `./labctl ollama_models`. ## Supported Host Profiles This build is the Linux/WSL variant of LLM Labs Local. If you are deploying on Apple Silicon macOS, use the sibling `LLM-Labs-Local-Mac` project instead. - Native Debian/Ubuntu: Debian-family Linux with an NVIDIA GPU visible to `nvidia-smi` and at least 8 GB VRAM. - WSL: Debian/Ubuntu-family Linux running under WSL, with the NVIDIA GPU exposed into the distro. The launcher and Ansible preflight classify the host dynamically and apply different setup behavior for: - `native-debian-ubuntu` - `wsl` ## WSL Check If you run this inside WSL, the launcher checks GPU readiness before Ansible starts. If that check fails, fix WSL first: - Install or update the NVIDIA Windows driver with WSL/CUDA support - Run `wsl --update` in Windows PowerShell - Run `wsl --shutdown` - Reopen WSL and confirm `nvidia-smi` works Important: `nvidia-smi` is only the driver check. Building CUDA-enabled `llama.cpp` also requires the Linux-side CUDA toolkit inside the distro. On Linux and WSL, the first `./labctl up` or `./labctl preflight` run may prompt once for your sudo password so Ansible can install system packages. On Ubuntu WSL x86_64, preflight now installs the Linux-side CUDA toolkit automatically if it is missing. It first tries the distro package: - `sudo apt install -y nvidia-cuda-toolkit` If that package is unavailable or still does not expose `nvcc`, the installer falls back to NVIDIA's WSL-Ubuntu repository bootstrap for the toolkit only, not a Linux GPU driver. If the automatic bootstrap still fails, verify: - `nvcc --version` - `ls /usr/local/cuda/include/cuda_runtime.h` For non-Ubuntu WSL distros, install the CUDA toolkit manually before running the deploy script. ## Native Debian/Ubuntu CUDA Behavior On native Debian/Ubuntu hosts, the installer handles three CUDA-toolkit cases: - If the toolkit is already usable, it reuses the existing install instead of forcing a reinstall. - If the distro exposes `nvidia-cuda-toolkit`, it installs that package. - If the distro package is unavailable, it bootstraps NVIDIA's official CUDA network repository for supported native Debian/Ubuntu releases and installs the toolkit from there. If `apt` starts in a broken dependency state, the installer attempts `dpkg --configure -a` and `apt-get --fix-broken install` before retrying package installation. If CUDA is already mounted or preinstalled outside `PATH`, the installer detects standard locations such as `/usr/local/cuda/bin/nvcc` and `/usr/local/cuda-*/bin/nvcc`. ## Standard Assumptions - The default deployment is centered on Ollama-backed local inference and browser-based tools such as Netron and the wiki. - Netron is installed into a managed Python virtual environment and served locally instead of being provisioned as a desktop package. - Lab 1's Llama GGUF download is mirrored locally during `./labctl up`, so students do not have to fetch it manually from the original source. - WhiteRabbitNeo assets remain a separate Lab 2 flow and are still handled outside the default `./labctl up` run. - Run `./labctl assets lab2` when you want to populate repo-local Lab 2 assets in `assets/lab2/` from Hugging Face. - After base setup, run `state/lab2/download_whiterabbitneo-gguf.sh` to fetch only the `Q4_K_M`, `Q8_0`, and `IQ2_M` files from `bartowski/WhiteRabbitNeo_WhiteRabbitNeo-V3-7B-GGUF` and register local Ollama models `WhiteRabbitNeo`, `WhiteRabbitNeo-Q4`, `WhiteRabbitNeo-Q8`, and `WhiteRabbitNeo-IQ2`. - Unsloth homes are redirected into this project's `state/` tree via symlinks. - Managed web services bind for access from both Linux and the Windows side of WSL, while `labctl urls` still reports localhost-friendly URLs. - The local Ansible bootstrap in `.venv-ansible/` is machine-specific and will be recreated automatically if the folder is copied between hosts. - `llama.cpp` uses a conservative, memory-aware build parallelism setting instead of an unbounded `-j` build, which avoids OOM failures on smaller Linux and WSL hosts. ## Lab URLs After `./deploy-courseware.sh`, run `./labctl urls`. Default endpoints: - Ollama API: `http://127.0.0.1:11434` - Open WebUI: `http://127.0.0.1:8080` - Netron: `http://127.0.0.1:8338` - ChunkViz: `http://127.0.0.1:3001` - Embedding Atlas: `http://127.0.0.1:5055` - Unsloth Studio: `http://127.0.0.1:8888` - Promptfoo UI: `http://127.0.0.1:15500` - Wiki: `http://127.0.0.1:80` - Lab 3 Terminal: `http://127.0.0.1:7681/wetty` ## Lab 3 Browser Terminal The deployment will: - bind `sshd` to `127.0.0.1:22` only - install WeTTY and expose it at `http://127.0.0.1:7681/wetty` - leave login identity management to the host, so any existing local account with password-based SSH access can sign in through the browser terminal ## Notes - `./labctl up` installs the environment and then starts every managed service. - `./labctl versions` shows the pinned Netron version, minimum Ollama version, and Ansible runtime version used by this workspace. - `./labctl assets lab2` is a separate manual step that clones the base WhiteRabbitNeo repo into `assets/lab2/WhiteRabbitNeo-V3-7B` and downloads the supported `Q4_K_M`, `Q8_0`, and `IQ2_M` GGUFs into `assets/lab2/WhiteRabbitNeo_WhiteRabbitNeo-V3-7B-GGUF`. - `./labctl ollama_models` re-pulls the managed Lab 2 Gemma 4 E2B Ollama model set without rerunning the full installer. - `./labctl update_wiki` hard-resets the managed wiki checkout to the remote latest, rebuilds it, and restarts only the managed wiki service on port `80`. - `./labctl start core` starts only `ollama` and `open-webui`. - `./labctl start all` starts every managed web service. - The scripted Promptfoo install drops a starter config at `state/lab6/promptfoo.yaml`. - `labctl start all` includes Promptfoo via `promptfoo view` and the cloned wiki app. - Lab 2 includes `state/lab2/download_whiterabbitneo-gguf.sh`, which uses `git` + `git lfs` to pull only the supported WhiteRabbitNeo quants. Add `--download-only` if you want the files without Ollama registration. - The wiki is cloned from `https://git.zuccaro.me/bzuccaro/LLM-Labs.git` into `state/repos/LLM-Labs` and started with `npm`. - `./labctl down` uninstalls Ollama entirely when this project installed it, instead of only stopping the service. - This variant is intended for NVIDIA-backed Linux/WSL training and lab workflows.