Hi HN, I built llmrequirements.com to answer "what GPU should I buy for local models?" for myself without leaving the site to google something.
It's a static site that maps every model in the open-weights ecosystem (Llama, Qwen, Mistral, DeepSeek, GLM, Kimi, Flux, Wan, ...) to the hardware that can actually run it, with three numbers per build/model pair sourced from llama.cpp / vLLM benchmarks rather than vendor marketing:
- tg/s (single-stream generation)
- pp (prefill / prompt-processing throughput)
- TTFT at 100k-token context, null when the KV cache won't fit
Hardware ranges from a Framework laptop up to an 8x H200 rack; software-stack maturity and extensibility get explicit 0-5 scores.
The data exported to a public repo, so anyone can PR a correction and the diff is reviewable.
Project started from the picker as a landing, but now it has state of the local AI page - SOLAI because all use cases now are somewhat unified under coding, agent, personal assistant. And models which can run such use cases are well defined as well.
Hi HN, I built llmrequirements.com to answer "what GPU should I buy for local models?" for myself without leaving the site to google something.
It's a static site that maps every model in the open-weights ecosystem (Llama, Qwen, Mistral, DeepSeek, GLM, Kimi, Flux, Wan, ...) to the hardware that can actually run it, with three numbers per build/model pair sourced from llama.cpp / vLLM benchmarks rather than vendor marketing:
Hardware ranges from a Framework laptop up to an 8x H200 rack; software-stack maturity and extensibility get explicit 0-5 scores.The data exported to a public repo, so anyone can PR a correction and the diff is reviewable.
Project started from the picker as a landing, but now it has state of the local AI page - SOLAI because all use cases now are somewhat unified under coding, agent, personal assistant. And models which can run such use cases are well defined as well.