🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
-
Updated
May 29, 2026 - Python
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
ChatGPT PROMPTs Splitter. Tool for safely process chunks of up to 15,000 characters per request
Fully neural approach for text chunking
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library
🍱 Semantically create chunks from large document for passing to LLM workflows
Open-source toolkit for RAG chunking: convert Markdown, validate documents, visualize and optimize chunking strategies, and enrich results for LLM applications.
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
An agent with human in the loop that can search the web for information while bypassing bot detection for private sites.
JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications
We compared LangChain, Fixie, and Marvin
In this we implements a Retrieval-Augmented Generation (RAG) based conversational AI agent designed for intelligent knowledge extraction from PDF documents. Leveraging LangChain and Google’s Gemini LLM
A lightweight TypeScript text splitter for RAG applications
An exploration of text splitting and chunking in JavaScript
Free AI Prompt Splitter - Split large documents into chunks for ChatGPT, Claude, GPT-4. Supports PDF, TXT, MD files. Smart token counting & overlap control.
A comprehensive repository to learn and implement Retrieval-Augmented Generation (RAG) from scratch using LangChain. It covers the full RAG pipeline including Document Loaders, Text Splitters, Embeddings, Vector Databases, and Retrievers with practical examples and step-by-step explanations.
n8n community node that splits Markdown into retrieval-ready chunks with heading-aware metadata for RAG and vector stores (convert → chunk → embed)
Kardenwort is an intelligent tools for language learners that transforms complex texts and words into simple, clear, and context-rich vocabulary lists
Sementic chunking algorithm in (mostly) Go
A text splitter/chunker to make pasting large text into ChatGPT easier.
Add a description, image, and links to the text-splitter topic page so that developers can more easily learn about it.
To associate your repository with the text-splitter topic, visit your repo's landing page and select "manage topics."