AirLLM enables 70B large language model inference on a single 4GB GPU, making large model inference accessible without expensive hardware.
Dual/tri-GPU multi-agent simulation framework. Orchestrates LLM-powered agents across GPU slots with FLAME population dynamics, Bayesian calibration (ABC-SMC), sensitivity analysis (Morris/Sobol), ...