ternative by Michelangelo Romero Chisco - Funding

Open-source C++/CUDA inference engine for ternary-weight LLMs with runtime LoRA. Solves the ternary merge problem: LoRA deltas (~1e-5 magnitude) are silently erased when merged into ternary weights (~1.2 scale) and re-quantized. ternative keeps the adapter separate, applies it at full F32 precision at load time, and serves via OpenAI-compatible HTTP. Enables the entire class of LoRA-aligned BitNet models to be served correctly — something llama.cpp and bitnet.cpp cannot do.

Fund this project

Links

github.com/michelangeloromerochisco/ternative github.com/michelangeloromerochisco/ternative

License

Apache-2.0

ternative

Links

License

Tags

Unverified URL