Ternative – C++/CUDA inference engine for ternary LLMs with runtime LoRA

2 points | by michelangeloro 8 hours ago

1 comments