text-splitting

Star

Here are 29 public repositories matching this topic...

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Jun 13, 2026
Python

ekimetrics / adaptive-chunking

Star

Adaptive Chunking: automatically select the best chunking method per document for RAG. Accepted at LREC 2026.

nlp information-retrieval chunking rag llm text-splitting

Updated May 20, 2026
Python

jparkerweb / semantic-chunking

Star

🍱 Semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking equill-library

Updated May 29, 2026
JavaScript

messkan / rag-chunk

Star

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

python nlp ia chunking rag vector-search embedding-vectors llm langchain retrieval-augmented-generation text-splitting rag-pipeline document-chunking

Updated Jan 18, 2026
Python

dimicx / griffo

Sponsor

Star

Kerning-aware text splitting

react javascript typescript animation motion typography gsap morph morphing text-animation kerning framer-motion split-text text-splitting

Updated Jun 27, 2026
TypeScript

speedyk-005 / chunklet-py

Star

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

visualization python nlp natural-language-processing chunking code-structure code-chunking sentence-boundary-detection rag chunks-processing chunks-algorithm text-splitting document-chunking

Updated Jun 22, 2026
Python

sentencizer / sentencizer

Star

A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.

golang natural-language-processing ai nlp-library sentence-tokenizer sentence-segmentation sentence-boundary-detection sentence-splitting rag sentence-splitter sentence-segmenter text-splitter llm retrieval-augmented-generation text-splitting

Updated Aug 31, 2025
Go

jchunk-io / jchunk

Star

JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications

java chunk chunking etl-pipeline rag text-splitter text-splitting

Updated Apr 13, 2026
Java

philnash / chunkers

Sponsor

Star

An exploration of text splitting and chunking in JavaScript

text-splitter llamaindex langchain-js text-chunking text-splitting

Updated Nov 20, 2025
TypeScript

thoven87 / text-splitter

Star

Split and analyze text files

swift ai swift-on-server rag document-splitter text-splitter text-splitting rag-pipeline

Updated Jun 8, 2026
Swift

a powerful Markdown chunking tool that understands document structure. Unlike naive token splitters, it protects atomic elements (code, math, tables), merges by semantic affinity, and scores chunk quality — ready for RAG and fine-tuning workflows.

markdown ai dataset chunking knowledge-base document-processing fine-tuning rag llm text-splitting

Updated Jul 2, 2026
JavaScript

Jeevav62 / chunking-techniques

Star

A practical guide to 6 document chunking strategies for RAG and LLM applications — Document, Fixed-Size, Recursive, Sentence, Semantic, and Agentic chunking with working code and plain-English explanations.

python nlp openai chunking rag vector-search llm llama-index retrieval-augmented-generation text-splitting semantic-chunking document-chunking agentic-chunking

Updated Jun 22, 2026
Python

HemalDholakiya12 / PDFChat

Star

A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .

Updated May 5, 2025
JavaScript

yzp0111 / structchunk

Star

Markdown chunker for RAG. Structure-aware splitting preserves full semantic context; tables split at row boundaries.

python nlp markdown ai embeddings chunking document-processing rag vector-search text-splitting content-engine

Updated Jun 12, 2026
Python

HamedFathi / RecursiveTextSplitter

Sponsor

Star

A smart C# text splitting library that intelligently chunks text while preserving semantic boundaries. Uses a hierarchical approach with configurable overlap and detailed metadata.

csharp dotnet text dotnetcore dotnet-core recursive recursive-algorithm dotnet-library text-split text-splitter text-splitting recursive-text-splitter

Updated Jun 18, 2025
C#

VaidehiShyara14 / Ayurveda-PDF-Q-A-Chatbot

Star

An intelligent chatbot that allows users to upload text-based Ayurveda PDFs and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining semantic search and LLM-based responses.