View Project
ContextGem is a free, open-source LLM framework that makes it radically easier to extract structured data and insights from documents — with minimal code.
Most LLM frameworks require repetitive boilerplate to extract even basic fields. ContextGem eliminates that with powerful abstractions that handle dynamic prompt generation, data validation, and reference tracking — so you can focus on what matters.
Automated dynamic prompts
Automatic data modeling & validation
Granular reference mapping (paragraph/sentence level)
Justification for every extraction
Neural segmentation (SaT)
Supports most LLM providers
v0.2.0 now provides native DOCX converter to easily transform DOCX files into LLM-ready data:
Extracts information that other open-source tools often do not capture: misaligned tables, comments, footnotes, textboxes, headers/footers, and embedded images
Preserves document structure with rich metadata for improved LLM analysis
👉 Explore all features in the README
If you find ContextGem useful, please support the project by sharing it with fellow developers and give it a ⭐
Built with