Skip to content
View chonk-lain's full-sized avatar

Organizations

@chonkie-inc

Block or report chonk-lain

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
chonk-lain/README.md

Chonkie 🦛

A lightweight, fast chunking library for text and data processing

Website | Maintainer

Overview

Chonkie is a blazing-fast, lightweight library designed to efficiently chunk text and data for downstream processing, search, and machine learning applications. Built for speed and simplicity, Chonkie helps you break down large datasets or documents into manageable pieces.

Features

  • 🚀 Fast and lightweight
  • 🧩 Flexible chunking strategies
  • Easy integration
  • 🛠️ Minimal dependencies

Installation

pip install chonkie[all]

Usage

# First import the chunker you want from Chonkie 
from chonkie import TokenChunker

# Initialize the chunker
chunker = TokenChunker() # defaults to using GPT2 tokenizer

# Here's some text to chunk
text = """Woah! Chonkie, the chunking library is so cool!"""

# Chunk some text
chunks = chunker(text)

# Access chunks
for chunk in chunks:
    print(f"Chunk: {chunk.text}")
    print(f"Tokens: {chunk.token_count}")

Contributing

Contributions are welcome! Please open issues or pull requests to help improve Chonkie.

Maintainer

This is the work account for not-lain and is intended for maintaining and contributing to Chonkie 🦛.


Made with ❤️ for the open-source community

Pinned Loading

  1. chonkie-inc/chonkie chonkie-inc/chonkie Public

    🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

    Python 3.5k 222