This seems like an obvious progression imo though I think very much subject to change. Open weight models will become better, and memory prices will return to normal prices in a couple years (hopefully).
That being said I think an unpredictable variable here is how the companies building frontier models respond to what should be a noticeable inflection point in consumers turning towards locally hosted open weight models.
There is also a significant amount of compute that is being built out as we speak that should in theory reduce costs for providers of frontier models but that's a whole other can of worms.
Despite all of the very impressive open weight models that are available to us today, Anthropic and OpenAI continue to remain steps ahead of the competition. Most of the biggest and brightest minds in AI are working at frontier labs. It's not hard to foresee that these labs continue to maintain their edge given the amount of expertise and brainpower they've assembled.
Assuming frontier models continue to maintain their edge, even if it's on a subset of tasks (e.g. reasoning, judgment, planning), I see a convergence towards a hybrid workflow where both frontier and local models are used for specific tasks. e.g. Claude for reasoning, planning, judgment, with intelligent routing to cheap/free models tuned for certain tasks.
I feel where it all loses its legs is the fact that most coding work is intermediate complexity. You won't need super intelligence to code/maintain your CRM or what have you. Specialized firms may pay the premiums Anthropic/OpenAI expect, the vast majority of enterprises won't need to, for the vast majority of their use-cases.
There are many markets. Qwen 3.6 27b at a high enough quant is good enough for many use cases. But enterprise-consumed tokens come with legal/data protection agreements. They have just gotten comfortable with BYOD- there is no BYOD equivalent set of practices and protections for local LLMs (BYOLLM). So some enterprises are getting back into prem GPU capacity.
On prem GPU capacity - or decent enough devices for core engineering team - lends itself pretty nicely to local LLMs too. And you own the whole stack this way. Why pay premiums to Anthropic and fuel its trillion dollar valuation?
I agree. I run gemma4 31b int4 quant on my 5090 and find that's it's quite capable for self contained tasks. There are larger open weights models that are more capable, such as minimax and glm5.1.
I've toyed with the idea of buying two rtx 6000s and vlinking them. But the cost benefit value prop doesn't really pan out quite yet, still cheaper to use open router / some subscription plan for open weights.
I'm looking forward to continued optimization from the open weights labs / models. Qwen and gemma4 are quite capable.
Also I feel what's really under utilized is a suite of llm/ai tools that are completely open and runnable locally.
Hunyuan 3d 2.0, trellis2, unirig
Flux 2 dev, z image, qwen image edit
Ltx 2.3 / wan
Ace step 1.5
All great for creation pipelines. Couple those with other smaller things like sam2 and dino. It's very exciting to see these things producing high quality on local systems.
AI: "I see you are building a Django project. How can I help?"
Me: "When I click on the Reload button, it does not set the reload option correctly. Fix this"
<10 minutes>
AI: "I see you are building a Django project. How can I help?"
Needs more tweaking of the context window, I think.
Seriously, I agree that this is the future, when OpenAI et al have gone bust.
I think it's a huge bubble about to pop. I get that enterprises are like elephants, slow to move, locked into agreements.
But I think free is going to be infinitely better than paying Anthropic more money than you used to spend on your human payroll. The big pop is coming.
This seems like an obvious progression imo though I think very much subject to change. Open weight models will become better, and memory prices will return to normal prices in a couple years (hopefully).
That being said I think an unpredictable variable here is how the companies building frontier models respond to what should be a noticeable inflection point in consumers turning towards locally hosted open weight models.
There is also a significant amount of compute that is being built out as we speak that should in theory reduce costs for providers of frontier models but that's a whole other can of worms.
Despite all of the very impressive open weight models that are available to us today, Anthropic and OpenAI continue to remain steps ahead of the competition. Most of the biggest and brightest minds in AI are working at frontier labs. It's not hard to foresee that these labs continue to maintain their edge given the amount of expertise and brainpower they've assembled.
Assuming frontier models continue to maintain their edge, even if it's on a subset of tasks (e.g. reasoning, judgment, planning), I see a convergence towards a hybrid workflow where both frontier and local models are used for specific tasks. e.g. Claude for reasoning, planning, judgment, with intelligent routing to cheap/free models tuned for certain tasks.
Good points.
I feel where it all loses its legs is the fact that most coding work is intermediate complexity. You won't need super intelligence to code/maintain your CRM or what have you. Specialized firms may pay the premiums Anthropic/OpenAI expect, the vast majority of enterprises won't need to, for the vast majority of their use-cases.
There are many markets. Qwen 3.6 27b at a high enough quant is good enough for many use cases. But enterprise-consumed tokens come with legal/data protection agreements. They have just gotten comfortable with BYOD- there is no BYOD equivalent set of practices and protections for local LLMs (BYOLLM). So some enterprises are getting back into prem GPU capacity.
On prem GPU capacity - or decent enough devices for core engineering team - lends itself pretty nicely to local LLMs too. And you own the whole stack this way. Why pay premiums to Anthropic and fuel its trillion dollar valuation?
Yeah...pay opex to Anthropic or your capex to NVidia- whose Blackwell gen prices are now up 25% from launch, with more increases to come.
I agree. I run gemma4 31b int4 quant on my 5090 and find that's it's quite capable for self contained tasks. There are larger open weights models that are more capable, such as minimax and glm5.1.
I've toyed with the idea of buying two rtx 6000s and vlinking them. But the cost benefit value prop doesn't really pan out quite yet, still cheaper to use open router / some subscription plan for open weights.
I'm looking forward to continued optimization from the open weights labs / models. Qwen and gemma4 are quite capable.
Also I feel what's really under utilized is a suite of llm/ai tools that are completely open and runnable locally.
Hunyuan 3d 2.0, trellis2, unirig
Flux 2 dev, z image, qwen image edit
Ltx 2.3 / wan
Ace step 1.5
All great for creation pipelines. Couple those with other smaller things like sam2 and dino. It's very exciting to see these things producing high quality on local systems.
Already there friend! I just posted a Show HN from using opencode + qwen36moe output to modernize my old PhD research, surreal experience
I got Qwen 3.6 running locally on 12GB VRAM.
It went:
Needs more tweaking of the context window, I think.Seriously, I agree that this is the future, when OpenAI et al have gone bust.
I think this is the key issue with running locally hosted models.
Yes, technically you can run them on 12gb vram.
But should you?
Realistically 64gb seems to be the current threshold for getting meaningful work done while also maintaining a large enough context window.
This will drop further with increase in intelligence density.
It should, which is why I said it is the current threshold.
I think it's a huge bubble about to pop. I get that enterprises are like elephants, slow to move, locked into agreements.
But I think free is going to be infinitely better than paying Anthropic more money than you used to spend on your human payroll. The big pop is coming.
[flagged]
[dead]