Developed a collection of quantization techniques for open-s
Developed a collection of quantization techniques for open-source LLMs, achieving 75–85% model size reduction and 2–4× inference speedups while maintaining quality.
Covers GPTQ (4-bit), GGUF/GGML, ExLlamaV2, and 8-bit quantization.