Open-source C++/CUDA inference engine for ternary-weight LLMs with runtime LoRA. Solves the ternary merge problem: LoRA deltas (~1e-5 magnitude) are silently erased when merged into ternary weights (~1.2 scale) and re-quantized. ternative keeps the adapter separate, applies it at full F32 precision at load time, and serves via OpenAI-compatible HTTP. Enables the entire class of LoRA-aligned BitNet models to be served correctly — something llama.cpp and bitnet.cpp cannot do.

Fund this project

Unverified URL

The funding manifest has not provided proof via wellKnown that this link is associated with it. Learn more.

Continue