Fixed Lab 2

This commit is contained in:
c4ch3c4d3
2026-03-26 20:09:49 -06:00
parent 3bafa35460
commit a663cdbd42
8 changed files with 70 additions and 20 deletions
+4 -5
View File
@@ -135,7 +135,7 @@ git clone https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B
**LLaMa.cpp** makes it easy for us to package models downloaded in SafeTensors format to GGUF. We can convert the model with the following official project script command:
```bash
python3 convert_hf_to_gguf.py /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B/WhiteRabbitNeo-V3-7B --outfile /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
convert_hf_to_gguf.py /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B/WhiteRabbitNeo-V3-7B --outfile /home/student/lab2/WhiteRabbitNeo/WhiteRabbitNeo-V3-7B.gguf
```
### 4 Execute: Review Model Metadata
@@ -212,7 +212,7 @@ Quantization reduces memory footprints and speeds inference, but it typically ra
To generate an 8 bit, 4 bit, and 1 bit quantization, run the following commands:
<div class="lab-callout lab-callout--warning">
<strong>Warning:</strong> Although these quantization steps are provided for replication, pre-quantized support files are already available in <code>/home/student/lab2/WhiteRabbitNeo/</code> for faster lab progress.
<strong>Warning:</strong> Although these quantization steps are provided for replication, pre-quantized support files are already available in <code>/home/student/lab2/WhiteRabbitNeo/</code> for faster lab progress. <br><br>You can skip these commands when participating in a live teaching session.
</div>
```bash
@@ -459,7 +459,7 @@ When finished, you will be presented with a prompt, similar to the `llama-cli` c
Similarly, we can do the same by pulling a model directly from **HuggingFace**. As long as the source file is a .gguf of any quantization level that fits within our system memory, Ollama can fetch it directly.
1. **Select a Quantized Model from Objective 1** visit [CodeIsAbstract](https://huggingface.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF) in your browser.
1. **Select the Quantized Model from Objective 1** visit [CodeIsAbstract](https://huggingface.co/CodeIsAbstract/Llama-3.2-1B-Q8_0-GGUF) in your browser.
2. **Use this model** - Click Use this model → choose the Ollama tab. The page displays a readytorun command:
<figure style="text-align: center;">
@@ -528,10 +528,9 @@ ollama run WhiteRabbitNeo
| `ollama list` | Shows all models currently registered with Ollama. |
| `ollama rm <tag>` | Deletes the specified model (freeing disk space). |
| `ollama show <tag>` | Prints model metadata (architecture, context length, quantization). |
| `ollama show <tag> --modelfile` | Prints an existing model's modelfile. Often useful for templating our own. |
| `ollama serve` | Starts the OpenAI-compatible API server (runs automatically when you first use `ollama run`). |
<br>
---
## Conclusion