Update lab model defaults and assets

2026-04-24 20:08:56 -06:00
parent fcb2dcb36d
commit 562be3fd1f
18 changed files with 8971 additions and 916856 deletions
@@ -91,7 +91,7 @@ Select each option and observe the different ways ChunkViz breaks text into chun

 Each strategy comes with its own benefits and drawbacks. Character-based splitting is often one of the easiest strategies to implement because OCR and text extraction ultimately produce characters. Token-based splitting is useful when keeping chunk sizes consistent for a specific model matters most. Sentence and recursive strategies are often better at preserving complete thoughts, although real-world documents do not always follow clean sentence boundaries.

-Explore one more chunking example using a larger document. Open your provided copy of _Blindsight_ by Peter Watts in `.txt` format, paste its contents into ChunkViz, and then continue experimenting with chunk sizes from `64` up to `1024` using different strategies. Notice how different chunk sizes and separators change the resulting structure.
+Explore one more chunking example using a larger document. Open the provided file: [Blindsight.md](/labs/lab-6-embedding-and-chunking/Blindsight.md). Copy the novel text, paste it into ChunkViz, and then continue experimenting with chunk sizes from `64` up to `1024` using different strategies. Notice how different chunk sizes and separators change the resulting structure, especially around paragraph breaks, scene breaks, and chapter headings.

 <figure style="text-align: center;">
  <a href="https://i.imgur.com/M51ASNK.png" target="_blank">