diff --git a/content/labs/lab-3-oi-prompting.md b/content/labs/lab-3-oi-prompting.md index c5f77d1..6891650 100644 --- a/content/labs/lab-3-oi-prompting.md +++ b/content/labs/lab-3-oi-prompting.md @@ -4,7 +4,7 @@ # Lab 3 - Open WebUI & Prompting In this lab, we will: -* Run Open WebUI in Google Colaboratory +* Run Open WebUI * Using an Ollama Model within Open WebUI * Experimenting with Inference Parameters * Experimenting with Prompting Techniques @@ -19,17 +19,16 @@ In this lab, we will: ## Objective 1 Execute: Accessing Open WebUI -Your lab machine has been pre-installed with Open Webui. It is accessible on your provided system IP at port 8080 (http://:8080). You can log in with the following default credentials: +Your lab machine has been pre-installed with Open Webui. It is accessible on your provided system IP at port 8080 (http://:8080). You can log in or register with the following default credentials: Username: student@openwebui.com Password: student -Once you've successfully connected to Open WebUI, follow the registration instructions. Feel free to register with any information, as Kaggle instance will tear itself down after four hours (barring manual intervention or inactivity). Once successful, move on to the next objective.
- +
@@ -40,7 +39,7 @@ Once you've successfully connected to Open WebUI, follow the registration instru ## Objective 2 Execute: Downloading Our First Model through Open WebUI (OUI) -Locate, pull, and run **Gemma 3 4B‑IT‑QAT** (a quant‑aware‑trained model) using the **Open WebUI** interface that talks to Ollama. By the end of this section you should be able to start a model with a single click and generate a response in the UI. +Locate, pull, and run **Qwen3.5 4B** using the **Open WebUI**. By defualt, Open WebUI comes pre-configured to talk to a local install of Ollama, a legacy configuration from this projects original intent (it originally released as Ollama-WebUI). By the end of this section you should be able to start a model with a single click and generate a response in the UI. ### Execute: Download Qwen 3.5 4B @@ -49,11 +48,11 @@ Locate, pull, and run **Gemma 3 4B‑IT‑QAT** (a quant‑aware‑trained m * Locate the search box at the top of the page.
- - + -
Ollama homepage – use the search bar to look for “Gemma 3”.
+
Ollama homepage – use the search bar to look for “Qwen 3.5”.
2. **Find the Qwen 3.5 family** @@ -64,8 +63,8 @@ Locate, pull, and run **Gemma 3 4B‑IT‑QAT** (a quant‑aware‑trained m * Click the **`Tags`** link beneath the model description.
- - +
Tag view – each entry shows the model size and a short description.
@@ -76,8 +75,8 @@ Locate, pull, and run **Gemma 3 4B‑IT‑QAT** (a quant‑aware‑trained m * The size column reads **`3.4 GB`**, indicating the VRAM required for inference.
- - +
Model size for `Qwen3.5:4b` (≈ 3.3 GB VRAM).
@@ -94,8 +93,8 @@ Locate, pull, and run **Gemma 3 4B‑IT‑QAT** (a quant‑aware‑trained m * Click **`Pull`**. The UI will display a progress bar while Ollama downloads the GGUF file.
- - +
Open WebUI – paste the tag and press “Pull”.
@@ -106,8 +105,8 @@ Locate, pull, and run **Gemma 3 4B‑IT‑QAT** (a quant‑aware‑trained m * Press **Enter** and watch the response appear.
- - +
Successful inference – the model returns a coherent answer.
@@ -130,9 +129,9 @@ Prior to this lab, we discussed inference settings such as Top K, Top P, and Tem Open WebUI allows us to easily modify these parameters on the fly through the chat controls, found on the right hand side next to your user's icon.
- +
@@ -143,19 +142,21 @@ Open WebUI allows us to easily modify these parameters on the fly through the ch By default, Open WebUI selects the following generically sound options, with the expectation that users have access to modest hardware: -* `Context Length` - 4096 +* `Context Length` - 2048 * `Temperature` - .8 * `Top K` - 40 * `Top P` - .9 While we won't play with `Context Length`, this parameter is critical for successfully accomplishing more complicated tasks using local models. With only the small default context length value, the model will quickly forget your instructions and interactions, rendering the results the model generates less useful. Unfortunately, just increasing this value is not always an option, as your selected model + `Context Length` must fit within your available memory. As with many challenges in AI, a key to solving issues with `Context Length` is often scaling your hardware to meet the demands of the task. This generally means utilizing hardware with larger amounts of VRAM or unified memory – either by purchasing it or renting access. -Additionally, these defaults can be overruled by the Ollama model file, which can specify its own "preferred" defaults. Below are the defaults that come with the model we've downloaded, or feel free to interactively explore the `params` page for the model at this link: [gemma3:12b-it-qat](https://ollama.com/library/gemma3:12b-it-qat/blobs/3116c5225075). +Additionally, these defaults can be overruled by the Ollama model file, which can specify its own "preferred" default Hyperparameters. Below are the defaults that come with the model we've downloaded, or feel free to interactively explore the `params` page for the model at this link: [qwen3.5:4b-q4_K_M](https://ollama.com/library/qwen3.5:4b-q4_K_M/blobs/9371364b27a5). + +
- +
@@ -164,11 +165,11 @@ Additionally, these defaults can be overruled by the Ollama model file, which ca

-The best model makers will often override the defaults with their own preferred ones, as we've just seen. These Google selected defaults were the values they found to produce the best outputs for most tasks. When possible, it is likely that you'll want to stick with these defaults unless you have a very good reason to change them. +The best model makers will often override the defaults with their own preferred ones, as we've just seen. These Qwen selected defaults were the values they found to produce the best outputs for most tasks. When possible, it is likely that you'll want to stick with these defaults unless you have a very good reason to change them. Thankfully, our lab gives us just such a reason! We can manually modify these options with the aforementioned chat controls options. Depending on our end goal, we can either help the model to write more "creatively" or "precisely" through setting `Temperature`, `Top K`, and `Top P`. -Lets test this with a series of interactions, themed around Magic the Gathering. Gemma is considered a multi-modal model, meaning we're not just limited to inputing text! Input the following image, and ask `What is this? What does it do?` +Lets test this with a series of interactions, themed around Magic the Gathering. Qwen is considered a multi-modal model, meaning we're not just limited to inputing text! Input the following image, and ask `What is this? What does it do?` Next, set our inference parameters to the following: @@ -199,31 +200,42 @@ Feel free to continue to explore with other topics or images. Note how each tim ## Objective 4: Prompting Techniques -### Explore: Prompt Engineering +### Explore: Prompt Engineering & System Prompting + +
+ Warning: As you explore chat via Open WebUI, ensure you turn think (Ollama) to OFF. Qwen3.5 4b is likely to enter an infinite thinking loop for these tasks otherwise, which will require a VM reboot. +
Next, lets review different ways we can coax a model to perform better, without having to perform fine tuning or parameter customization. We can do this by "priming" the model with our first prompt in a number of ways: +
+ * Few Shot Prompting - Providing examples of our desired outcome up front * Meta Prompting - Providing a guide to reach the desired outcome * Chain of Thought - Providing the model guidance to think through its response * Self Criticism - Asking the model to play "devil's' advocate" against itself +
+ Each of these tools can be combined to help achieve a greater effect. Below is a suggested list of Magic the Gathering game design challenges which we can task Qwen 3.5 with, but each will require either some luck, or great prompt engineering. If you have a different topic you're more familiar with, feel free to first use Qwen 3.5 to adapt these challenges to a more familiar theme: -* Design a black rare creature card that fits thematically and mechanically into a Graveyard Matters set. Provide a few existing cards to help give the model a template. +
+ +* Design a black rare creature card that fits thematically and mechanically into a Graveyard Matters Magic the Gathering set. Provide a few existing cards to help give the model a template. * Design the same card, but this time outline the type, mechanics, tone, and identity * Invent a new keyword. Have the model reason step by step how the keyword will work within the game * Review your new keyword for game balance. Have the model challenge its decisions. -### Explore: System Prompting -There is one final prompting tool that we have yet to deep dive, which is system prompting. While the `chat controls` menu provides the option to override the default system prompt, Open WebUI provides a powerful flow for "creating" new models with saved system prompts and inference parameters. This is especially useful once we have created a system prompt that we especially prefer, or would like to set inference parameters once, and reuse them many times. +
+ +There is one final prompting tool that we have yet to deep dive, which is system prompting. While the `chat controls` menu provides the option to override the default system prompt, Open WebUI provides a powerful flow for "creating" new models with saved system prompts and inference parameters. This is also a great convenience feature, as changing Hyperparameters via Chat Controls for every chat becomes tedious. This is especially useful once we have created a system prompt that we especially prefer, or would like to set inference parameters once, and reuse them many times. Let's create a new model by selecting the `Workspace` link, and then selecting the `+` button to create a new model:
- +
@@ -266,9 +278,9 @@ When provided a name, generate a new Sliver card following this structure." ```
- +
@@ -281,7 +293,8 @@ When provided a name, generate a new Sliver card following this structure." 4. To ensure only the best card generation, show the `Advanced Params` and set the following to add creativity: * `Temperature` - 1.1 * `Top K` - 100 - * `Top P` - .7 + * `Top P` - .95 + * `Ollama (Think)` - Off Note: While we haven't actively discussed them as a part of this lab, as you play with more advanced inference problems, you may also find the following parameters of interest: * `Max Tokens` - Limit the possible length of a response to the desired number of tokens @@ -289,9 +302,9 @@ When provided a name, generate a new Sliver card following this structure." * `use_mlock` - Manually force Ollama to ensure all model components are kept within active memory. Useful for smaller systems.
- +
@@ -310,22 +323,20 @@ When provided a name, generate a new Sliver card following this structure." Throughout this lab, we've explored the fascinating world of Open WebUI and prompt engineering. Let's summarize the key topics we've covered: -1. **Open WebUI Setup**: We learned how to set up and run Open WebUI in both Google Colaboratory and locally using Docker containers. This gave us hands-on experience with deploying LLM interfaces. +1. **Model Selection and Management**: We explored how to download and manage models like Qwen 3.5, understanding their resource requirements and capabilities. This taught us about the practical considerations of working with different model sizes. -2. **Model Selection and Management**: We explored how to download and manage models like Qwen 3.5, understanding their resource requirements and capabilities. This taught us about the practical considerations of working with different model sizes. - -3. **Inference Parameters**: We experimented with critical inference parameters including: +2. **Inference Parameters**: We experimented with critical inference parameters including: - Temperature: Controls randomness in output - Top K: Limits token selection to top K most likely options - Top P: Uses nucleus sampling based on cumulative probability -4. **Prompting Techniques**: We examined various prompting strategies: +3. **Prompting Techniques**: We examined various prompting strategies: - Few Shot Prompting: Providing examples of desired outputs - Meta Prompting: Giving guidance to reach outcomes - Chain of Thought: Encouraging step-by-step reasoning - Self Criticism: Having the model evaluate its own responses -5. **System Prompting**: We created custom models with specific system prompts and parameter settings, learning how to tailor LLM behavior for specialized tasks. +4. **System Prompting**: We created custom models with specific system prompts and parameter settings, learning how to tailor LLM behavior for specialized tasks. These concepts are foundational for effectively working with large language models in real-world applications. Remember that prompt engineering is both an art and a science - it requires understanding both the capabilities of the model and the nuances of human language. As you continue your journey with LLMs, don't hesitate to experiment with different approaches and parameters to find what works best for your specific use cases. diff --git a/content/labs/lab-3-oi-prompting_files/Registration.png b/content/labs/lab-3-oi-prompting_files/Registration.png new file mode 100644 index 0000000..fef088d Binary files /dev/null and b/content/labs/lab-3-oi-prompting_files/Registration.png differ