Update lab model defaults and assets

This commit is contained in:
2026-04-24 20:08:56 -06:00
parent fcb2dcb36d
commit 562be3fd1f
18 changed files with 8971 additions and 916856 deletions
+1 -1
View File
@@ -91,7 +91,7 @@ Select each option and observe the different ways ChunkViz breaks text into chun
Each strategy comes with its own benefits and drawbacks. Character-based splitting is often one of the easiest strategies to implement because OCR and text extraction ultimately produce characters. Token-based splitting is useful when keeping chunk sizes consistent for a specific model matters most. Sentence and recursive strategies are often better at preserving complete thoughts, although real-world documents do not always follow clean sentence boundaries.
Explore one more chunking example using a larger document. Open your provided copy of _Blindsight_ by Peter Watts in `.txt` format, paste its contents into ChunkViz, and then continue experimenting with chunk sizes from `64` up to `1024` using different strategies. Notice how different chunk sizes and separators change the resulting structure.
Explore one more chunking example using a larger document. Open the provided file: [Blindsight.md](/labs/lab-6-embedding-and-chunking/Blindsight.md). Copy the novel text, paste it into ChunkViz, and then continue experimenting with chunk sizes from `64` up to `1024` using different strategies. Notice how different chunk sizes and separators change the resulting structure, especially around paragraph breaks, scene breaks, and chapter headings.
<figure style="text-align: center;">
<a href="https://i.imgur.com/M51ASNK.png" target="_blank">