Open-source C++/CUDA inference engine for ternary-weight LLMs with runtime LoRA. Solves the ternary merge problem: LoRA deltas (~1e-5 magnitude) are silently erased when merged into ternary weights...
The funding manifest has not provided proof via wellKnown that this link is associated with it. Learn more.
wellKnown