Chonkie is a blazing-fast, lightweight library designed to efficiently chunk text and data for downstream processing, search, and machine learning applications. Built for speed and simplicity, Chonkie helps you break down large datasets or documents into manageable pieces.
- 🚀 Fast and lightweight
- 🧩 Flexible chunking strategies
- � Easy integration
- 🛠️ Minimal dependencies
pip install chonkie[all]# First import the chunker you want from Chonkie
from chonkie import TokenChunker
# Initialize the chunker
chunker = TokenChunker() # defaults to using GPT2 tokenizer
# Here's some text to chunk
text = """Woah! Chonkie, the chunking library is so cool!"""
# Chunk some text
chunks = chunker(text)
# Access chunks
for chunk in chunks:
print(f"Chunk: {chunk.text}")
print(f"Tokens: {chunk.token_count}")Contributions are welcome! Please open issues or pull requests to help improve Chonkie.
This is the work account for not-lain and is intended for maintaining and contributing to Chonkie 🦛.

