LLM-Labs-Local/README.md

# Local Courseware Deployment

This project builds a student-friendly local lab environment for the courseware with a small control surface:

- `./deploy-courseware.sh` installs and configures the environment, then starts every managed service.
- `./destroy-courseware.sh` stops the managed services, uninstalls courseware-managed Ollama, and removes the project-owned lab state.
- `./labctl` provides day-two controls such as `assets lab2`, `start`, `stop`, `status`, `urls`, `logs`, and `open kiln`.

## What It Installs

- Ollama
- `llama.cpp`
- TransformerLab, pinned to the classic single-user `v0.28.2` release
- Open WebUI
- ChunkViz
- Embedding Atlas
- Promptfoo
- Unsloth Studio
- Kiln Desktop
- Course-specific support assets for lab 2 and lab 4

## Supported Host Profiles

This build intentionally avoids the reference VM's hardware workarounds.

- macOS: Apple Silicon only, with at least 16 GB unified memory.
- Native Debian/Ubuntu: Debian-family Linux with an NVIDIA GPU visible to `nvidia-smi` and at least 8 GB VRAM.
- WSL: Debian/Ubuntu-family Linux running under WSL, with the NVIDIA GPU exposed into the distro.

The launcher and Ansible preflight now classify the host dynamically and apply different setup behavior for:

- `macos`
- `native-debian-ubuntu`
- `wsl`

## WSL Check

If you run this inside WSL, the launcher checks GPU readiness before Ansible starts.

If that check fails, fix WSL first:

- Install or update the NVIDIA Windows driver with WSL/CUDA support
- Run `wsl --update` in Windows PowerShell
- Run `wsl --shutdown`
- Reopen WSL and confirm `nvidia-smi` works

Important: `nvidia-smi` is only the driver check. Building CUDA-enabled `llama.cpp` also requires the Linux-side CUDA toolkit inside the distro.

On Linux and WSL, the first `./labctl up` or `./labctl preflight` run may prompt once for your sudo password so Ansible can install system packages.

On Ubuntu WSL x86_64, preflight now installs the Linux-side CUDA toolkit automatically if it is missing.

It first tries the distro package:
- `sudo apt install -y nvidia-cuda-toolkit`

If that package is unavailable or still does not expose `nvcc`, the installer falls back to NVIDIA's WSL-Ubuntu repository bootstrap for the toolkit only, not a Linux GPU driver.

If the automatic bootstrap still fails, verify:

- `nvcc --version`
- `ls /usr/local/cuda/include/cuda_runtime.h`

For non-Ubuntu WSL distros, install the CUDA toolkit manually before running the deploy script.

## Native Debian/Ubuntu CUDA Behavior

On native Debian/Ubuntu hosts, the installer now handles three CUDA-toolkit cases:

- If the toolkit is already usable, it reuses the existing install instead of forcing a reinstall.
- If the distro exposes `nvidia-cuda-toolkit`, it installs that package.
- If the distro package is unavailable, it bootstraps NVIDIA's official CUDA network repository for supported native Debian/Ubuntu releases and installs the toolkit from there.

If `apt` starts in a broken dependency state, the installer now attempts `dpkg --configure -a` and `apt-get --fix-broken install` before retrying package installation.

If CUDA is already mounted or preinstalled outside `PATH`, the installer now detects standard locations such as `/usr/local/cuda/bin/nvcc` and `/usr/local/cuda-*/bin/nvcc`.

## Standard Assumptions

- The host-side install path assumes modern local tooling, but TransformerLab itself is provisioned from a pinned classic single-user layout.
- TransformerLab is intentionally pinned to the older single-user `v0.28.2` release because newer upstream releases changed the project structure and behavior in ways that break this courseware.
- This project does not rely on TransformerLab's upstream `install.sh`; the Ansible role provisions the pinned release directly so web assets, env layout, and runtime behavior stay reproducible.
- The courseware repairs installed TransformerLab Fastchat plugin manifests so Fastchat-gated features such as Model Architecture and Visualize Logprobs stay available on pinned installs.
- No Ollama models are pulled during `./labctl up`; students pull models manually as part of the courseware.
- WhiteRabbitNeo assets are handled separately from `./labctl up` and `./labctl preflight`.
- Run `./labctl assets lab2` when you want to populate repo-local lab 2 assets in `assets/lab2/` from Hugging Face.
- After base setup, run `state/lab2/download_whiterabbitneo-gguf.sh` to fetch only the `Q4_K_M`, `Q8_0`, and `IQ2_M` files from `bartowski/WhiteRabbitNeo_WhiteRabbitNeo-V3-7B-GGUF` and register local Ollama models `WhiteRabbitNeo`, `WhiteRabbitNeo-Q4`, `WhiteRabbitNeo-Q8`, and `WhiteRabbitNeo-IQ2`.
- TransformerLab and Unsloth homes are redirected into this project's `state/` tree via symlinks.
- Managed web services bind for access from both Linux and the Windows side of WSL, while `labctl urls` still reports localhost-friendly URLs.
- The local Ansible bootstrap in `.venv-ansible/` is machine-specific and will be recreated automatically if the folder is copied between hosts.
- `llama.cpp` now uses a conservative, memory-aware build parallelism setting instead of an unbounded `-j` build, which avoids OOM failures on smaller Linux and WSL hosts.

## Lab URLs

After `./deploy-courseware.sh`, run `./labctl urls`.

Default endpoints:

- Ollama API: `http://127.0.0.1:11434`
- Open WebUI: `http://127.0.0.1:8080`
- TransformerLab: `http://127.0.0.1:8338`
- ChunkViz: `http://127.0.0.1:3001`
- Embedding Atlas: `http://127.0.0.1:5055`
- Unsloth Studio: `http://127.0.0.1:8888`
- Promptfoo UI: `http://127.0.0.1:15500`
- Wiki: `http://127.0.0.1:80`

## Notes

- `./labctl up` installs the environment and then starts every managed service.
- `./labctl versions` shows the pinned TransformerLab and Ansible runtime versions used by this workspace.
- `./labctl assets lab2` is a separate manual step that clones the base WhiteRabbitNeo repo into `assets/lab2/WhiteRabbitNeo-V3-7B` and downloads the supported `Q4_K_M`, `Q8_0`, and `IQ2_M` GGUFs into `assets/lab2/WhiteRabbitNeo_WhiteRabbitNeo-V3-7B-GGUF`.
- TransformerLab is installed as a pinned single-user app and no default courseware-managed TransformerLab user is created automatically.
- `./labctl start core` starts only `ollama` and `open-webui`.
- `./labctl start all` starts every managed web service.
- `./labctl open kiln` launches the Kiln desktop app installed into the project state.
- The scripted Promptfoo install drops a starter config at `state/lab6/promptfoo.yaml`.
- `labctl start all` now includes Promptfoo via `promptfoo view` and the cloned wiki app.
- Lab 2 includes `state/lab2/download_whiterabbitneo-gguf.sh`, which uses `git` + `git lfs` to pull only the supported WhiteRabbitNeo quants. Add `--download-only` if you want the files without Ollama registration.
- The wiki is cloned from `https://git.zuccaro.me/bzuccaro/LLM-Labs.git` into `state/repos/LLM-Labs` and started with `npm`.
- `./labctl down` now uninstalls Ollama entirely when this project installed it, instead of only stopping the service.
- Unsloth Studio currently supports chat and data workflows on macOS; Linux/WSL remains the standard path for NVIDIA-backed training.