Lab 6 - Evaluation and Red Teaming

In this lab, we will:

Perform Prompt Injection against three layers of model protection
Use PromptFoo to programmatically evaluate a model's security protections

Objective 1 Explore: Red Teaming

Chunking is the first step in any RAG pipeline. Chunking is the process of dividing our document into snippets that can then be stored within a database, paired with an embedded representation of that data. Because chunking occurs so early within the RAG process, the strategy chosen to create chunks of a document proves critical to the eventual embeddings which will be stored.

Successful chunking is hyper specific to the kinds of documents we wish to chunk. In real RAG pipeline production level development, we'd likely execute across a number of strategies against documents that we've analyzed for quality, bucketing them into various processing processes. However, we can at least get a rough idea of what affects chunking will have with a basic visualization.

First, ensure we've started our lab:

~/lab1/lab4_start.sh

And then, in a web browser, navigate to http://:3000. Once loaded, you should see the ChunkViz homepage.

Already, ChunkViz is populated with some example text. Additionally, the text has already been "chunked" according to a default, character based splitting strategy. In this case, every 200 characters is considered one chunk. We can modify chunk sizes by playing with the "Chunk Size" and "Chunk Overlap" sliders. Try changing those to 256 & 20 respectively.

Note how the colors in the text below dynamically change. Each color is a single chunk, with the "green" between each unique color the overlap. This overlap helps to increase the liklyhood that any given chunk will be properly selected.

Next, lets explore different chunking strategies. The major ones that we will cover are:

Strategy	Description
Character Splitter	This default view splits chunks into characters of words.
Token Splitter	Split chunks based on their tokenization values (tokenization done by tiktoken).
Sentence Splitter	Split chunks into rough sizes based on the interpretation of what is a "sentence".
Recursive Character	Split chunks based on multiple possible separators, such as new lines (`\n`), periods (`.`), commas (`,`), or other relevant language section signifiers.

Select each option, and observe some peculiarities in how ChonkViz breaks text into chunks.

Each strategy comes with its own unique benefits and drawbacks. Character based splitting is often one of the easiest strategies to implement, as all input will utilize text characters for OCR. Token based splitting is useful when consistency in chunk size is imperative. Sentence & Recursive splitting are often better for preserving "complete thoughts", as humans often write in complete sentences, but not always.

Lets explore one more facet of chunking, this time through the process of how chunking might present itself against a novel. Open your provided copy of "Blindsight" by Peter Watts, in txt format. Paste the contents into ChonkViz. Once again, play with the sliders (anywhere from 64 up to 1024 chunk sizes) and strategies. Note how different chunk sizes split the novel in different ways.

Chapter 1 - 1024 Chunks, Recursive Character. This strategy nicely breaks paragraphs up.

Imagine how precise / difficult it may be to find specific sets of information depending on chunk size!

Objective 2 Explore: Embedding Space

Now that we’ve seen some of the different trade-offs when chunking, we can move to the next major step of a RAG pipeline, embedding. As discussed during lecture, embedding is the process of converting text into a numerical representation that captures the "meaning" of the content. Instead of treating text as raw strings, embedding models map each chunk into a N-dimensional space where semantically similar content is vectored closer together.

This allows the system to perform similarity search efficiently: when a user submits a query, the query is also embedded into the same vector space, and the system retrieves the chunks whose embeddings are most closest together. This is in contrast to how embedding vectors are utilized within an LLM itself, I.E. for Attention and transformation via the Feed Forward network. In conclusion, this step is what enables a RAG system to move beyond simple keyword matching and instead retrieve information based on meaning and context.

Lets explore a real embedding space. Navigate to http://:5055. Here, we've started a project called Embedding Atlas. Embedding Atlas is a tool that provides interactive visualizations for datasets in parquet format. Each "chunk" in this case is one row in the dataset. It allows for us to visualize, cross-filter, and search embeddings and metadata in an interactive, manual way.

The lab4_start.sh script will have automatically started Embedding Atlas, as well as have performed embedding against each "Scenario" in our dataset. Scenarios in this case are 1-3 sentence snippets describing an action taken by an attacker.

Embedding Atlas CLI (Backend, EXAMPLE ONLY)

Our Embedding Atlas has already been pre-loaded with the main dataset we'll be using throughout the rest of today. Specifically, this is a dataset that matches "hacker scenarios" with MITRE ATT&CK Tactics, Technique, and Procedural IDs. If you're unfamiliar with ATT&CK, it is primarily a project that attempts to categorize and organize the possible ways a hacker might attempt to execute malware, pivot throughout a network, and eventually, act on their objectives (often ransomware). ATT&CK also provides us with a rich example and corpus of data that we can use to visualize the embedding process.

To help us visualize groups more clearly, before we start, please be sure to select "TTP_Name" from the dropdown in the top left.

Each color is a semantically similar concept, as defined by the embeddings generated during test processing. We can dynamically explore this embedding space through a few options:

Select the text categories on the right side. This will visually show only entries that match that category's organization
Alternatively, select any of the categories in the column on the right. This will perform the same function, exclusively showing only entries for the relevant ID

Note: You can use your mouse wheel to zoom in and out. Additionally, click and drag the map around with your left click to center areas you deem of interest.

Single Visible Category - System Information Discovery

Explore how the various categories naturally cluster together within the embedding space. If we, as a user, were to use this embedding space as a part of a RAG pipeline, an LLM could embed the words in our query in a similar manner, and surface the semantically similar ideas within our dataset back to us.

Lets visualize similarity in one other way:

Select any single dot, and click "Nearest Neighbor". Embedding Atlas will show us the specific datapoints that embed the closest to our selected datapoint. Notice how some of the nearest datapoints appear very distant! Think about why this might be. We'll discuss in review of this lab why.

If you'd like to continue to explore alternative datasets and see how embeddings can flexibly cluster raw data, feel free to take a look at Embedding Atlas' Examples Page. In particular, take a look at the Wine dataset until class resumes.

11 KiB Raw Blame History Unescape Escape

Lab 6 - Evaluation and Red Teaming

Objective 1 Explore: Red Teaming

Objective 2 Explore: Embedding Space

Objective 3 Explore: Full RAG Exploration

11 KiB

Raw Blame History