LibreChat was my favorite open source frontend/backend for interacting with LLMs. I know ClickHouse says it will stay that way but I find this acquisition odd to say the least. The overlap seems tenuous at best and I worry this will be abandoned along the way. I hope I'm wrong but the whole thing just is odd, a database acquiring an open source AI tool?
> Rather, people have to ask questions of it, and interact with the data. Increasingly, that is via AI tooling.
That is, given all my own experiences on that front, terrifying if "increasingly" people are interacting with their data via AI tooling. In all the testing I've done, it can seem like magic "Look, it just told us XXX piece of data and we just asked a simple question!" but LLMs, even with copious amounts of context, are not good at understanding your business rules for understanding your data. And that goes for just about any company with more than "Pet Store"-level complexity (especially after years or decades of the data growing/changing).
Perhaps this has improved/changed but I used LLMs daily and nothing indicates to me that it's improved enough to make this worthwhile. Any AI-only interface to data I would assume is either dealing with a laughably simple dataset/schema (or super new) or lying to you constantly.
(Ryadh from ClickHouse here)
Your comment is spot-on.
This the main challenge with Agentic Analytics and there are known limitations. It is also where we are orienting our investments atm.
Our own experience running internal agents taught us that the best remediation comes from providing the LLMs with the maximum and most accurate context possible. Robust evaluations are also critical to measure accuracy, detect regressions, and improve. But there is no silver bullet.
SOTA LLMs are increasingly better at generating SQL and notoriously bad with math and numbers in general. Combining them with powerful querying capabilities bridges that gap and makes the overall experience an useful one.
IMO, we'll always have to deal with the stochastic nature of these models and hallucinations, which calls for caution and requires raising awareness within the user base. What I found watching our users internally is that, while it's not magical, it allows users to request data more often, and compounds in data-driven decision-making, assuming the users are trained to interpret the interactions
I'll freely admit you have more data (experience) to work with on this than I did in the tests I ran almost a year ago. I spent a lot of time documenting my schemas, feeding the LLM sample rows, etc and the final results were not useful enough even as a starting point for a static query that a developer would improve on and "hard code" into a UI. I approached it as both:
- Wouldn't it be cool to let my users chat with their data? ("How many new users signed up today/this event/this month/etc?" or "How much did we make yesterday?")
- An internal tool to use as a starting point for analytics dashboards
I still use LLMs to help write queries if it's something I know can be done but can't remember the syntax but I scrapped the project to try and accomplish both the above goals due to too many mistakes. Maybe my data is just too "dirty" (but honestly, I've never _not_ seen dirty data) and/or I should have cleaned up deprecated columns in my tables that confused the models (even with strict instructions to ignore them, I should have filtered them completely) but I spent way too much time repeating myself, talking in all caps, and generally fighting with the SOTA models to try to get them to understand my data so that they could generate queries that actually worked (worked as in returned valid data, not just valid SQL). I wasn't doing any training/fine-tuning (which may be the magic needed) but I felt like it was a dead end (given current models). I'll also stress that I haven't re-tested those theories on newer models and my results are at least a year out of date (a lifetime in LLM/AIs) but the fundamental issues I ran into didn't seem to be "on the cusp" of being solved or anything like that.
I wish you all the best of luck in improving on this kind of thing.
Keep in mind that it's not fully "fair", since these public dataset are often documented in the internet so already present in pre-training of the models underneath (Claude Sonnet 4.5 in this case)
It's a fair concern, and I understand where you are coming from. What I can say is that it's not our first rodeo incorporating another OSS product in our family. I tried to summarize it in the post:
> "This proven playbook is the same one that we applied when joining forces with PeerDB to provide our ClickPipes CDC capabilities, and HyperDX, which became the UX of our observability product, ClickStack."
If you research both instances above, the result is that these projects got more traction and adoption overall.
I hope this helps! and thank you for using LibreChat
Of course they wouldn’t announce acquisition and a license change at the same time but this is obviously the beginning of the end.
See Hashicorp and Elasticsearch for the same old story.
Luckily these kinds of products are a dime a dozen, ie zero technical complexity and there are so many similar projects already out there. Hell you can even vibe code this kind of project.
Ryadh from ClickHouse here, happy to answers questions if folks have any.
So, why this move ?
Basically, we noticed that the existing "agentic" open-source ecosystem is primarily focused on developer tools and SDKs, as developers are the early adopters who build the foundation for emerging technologies. Current projects provide frameworks, orchestration, and integrations The idea behind the Agentic Data Stack is a higher-level integration to provide a composable software stack for agentic analytics that users can setup quicky, with room for customization.
Cool so your bet is that chat is essentially the new interface for BI... or ad hoc analytical inquiry... which opens up more dynamic BI... instead of asking an analyst in a slack conversation who then goes and runs a bunch of data pulls and munging, it's all handled agentically and a response is brought back to the user.. one thing is for sure: Tableau needs to be disrupted so happy to watch this one play out!
It actually comes from our own experience at ClickHouse.
We deployed this stack internally 8 months ago, and since very few people here have touched our legacy BI systems :)
I have never seen an adoption curve like this one tbh. It's obviously not perfect and can hallucinate sometimes, which can be tricky, but with the right approach and awareness in place, the value it delivers is massive. What really happens is that more users get access to data instantly, and as a result, we make better, data-driven, decisions overall.
My favourite use-case: our sales and support folks systematically ask DWAINE (our dwh agent) to produce a report before important meetings with customers, something along the lines of: "I'm meeting with <customer_name> for a QBR, what do I need to know?". This will pull usage data, support interactions, billing, and many other dimensions, and you can guess that the quality of the conversation is greatly improved.
This is really cool; does this mean Danny gets a salary to work on his open source project; would you consider this a "sponsorship" or would he have other jobs within ClickHouse / have a manager etc?
My biggest question and concern is whether or not LibreChat will end up introducing the SSO tax or other "enterprise tier" features. Is this something you can speak on?
Interestingly, LibreChat has a broad range of applications already and we'll continue to support them. The investment area we want to tackle in priority is around the analytics use-case specifically.In that space, I don't see an SSO-tax scheme unfolding tbh, it's really about better visualizations, semantic layers and anything that can improve the quality of the insights produced on top of analytics data
That's good to hear! I'll be honest, I was a bit concerned about how LibreChat was going to support long term development and definitely see that this could be a good thing.
> The idea behind the Agentic Data Stack is a higher-level integration to provide a composable software stack for agentic analytics that users can setup quicky, with room for customization.
I agree with this. For those who have been programming with LLM, the difference between something working and not working can be a simple "sentence" conveying the required context. I strongly believe data enrichment will be one of the main ways we can make agents more effective and efficient. Data enrichment is the foundation for my personal assistant feature https://github.com/gitsense/chat/blob/main/packages/chat/wid...
Basically instead of having agents blindly grep for things, you would provide them with analyzers that they can use to search with. By making it dead simple for domain experts to extract 'business logic' from their codebase/data, we can solve a lot of problems, much more efficiently. Since data is the key, I can see why ClickHouse will make this move since they probably want to become the storage for all business logic.
Note: I will be dropping a massive update to how my tool generates and analyzes metadata this week, so don't read too much into the demo or if you decide to play with it. I haven't really been promoting it because the flow hasn't been right, but it should be this week.
As a LibreChat user, I'm concerned. I've seen open source projects get acquired like that, and very soon they start to have some kind of paid features, telemetry, etc. Might have to start looking for alternatives soon.
At its simplest the team who was building that rad thing called LibreChat, now works at ClickHouse and build that rad thing called LibreChat.
Even simpler, the LibreChat team works at ClickHouse and are now my colleagues.
More complex, acquisitions can take a variety of "forms"...most importantly in these scenarios (and now I speak without knowledge of the deal structure) is making sure the team is paid, that copyright/trademark stuff is worked out, that OSS plans are discussed, and that everyone is excited to work together.
I find it more intuitive to talk about it in terms of project governance and control.
An OSS project has contributing members and they exert control over the project by working on the codebase and approving/rejecting contributions. If they act as an entity, you can buy their control over the project. The entity that owns the project can now change the license, “sponsor” the contributors by giving them salary, “fire” the contributors, etc.
“ LibreChat remains 100% open-source under its existing MIT license
Community-first development continues with the same transparency and openness
Expanded roadmap to bring an even more enterprise-ready analytics experience.
This proven playbook is the same one that we applied when joining forces with PeerDB to provide our ClickPipes CDC capabilities, and HyperDX, which became the UX of our observability product, ClickStack.”
Also, I work at ClickHouse, my email is super easy to figure out. Would love to alleviate concerns where I can.
LibreChat was my favorite open source frontend/backend for interacting with LLMs. I know ClickHouse says it will stay that way but I find this acquisition odd to say the least. The overlap seems tenuous at best and I worry this will be abandoned along the way. I hope I'm wrong but the whole thing just is odd, a database acquiring an open source AI tool?
A database in isolation is meaningless.
Rather, people have to ask questions of it, and interact with the data. Increasingly, that is via AI tooling.
We've had a long-standing demo at llm.clickhouse.com (librechat, bedrock, anthropic).
(disclaimer: work at ClickHouse)
> Rather, people have to ask questions of it, and interact with the data. Increasingly, that is via AI tooling.
That is, given all my own experiences on that front, terrifying if "increasingly" people are interacting with their data via AI tooling. In all the testing I've done, it can seem like magic "Look, it just told us XXX piece of data and we just asked a simple question!" but LLMs, even with copious amounts of context, are not good at understanding your business rules for understanding your data. And that goes for just about any company with more than "Pet Store"-level complexity (especially after years or decades of the data growing/changing).
Perhaps this has improved/changed but I used LLMs daily and nothing indicates to me that it's improved enough to make this worthwhile. Any AI-only interface to data I would assume is either dealing with a laughably simple dataset/schema (or super new) or lying to you constantly.
(Ryadh from ClickHouse here) Your comment is spot-on. This the main challenge with Agentic Analytics and there are known limitations. It is also where we are orienting our investments atm.
Our own experience running internal agents taught us that the best remediation comes from providing the LLMs with the maximum and most accurate context possible. Robust evaluations are also critical to measure accuracy, detect regressions, and improve. But there is no silver bullet.
SOTA LLMs are increasingly better at generating SQL and notoriously bad with math and numbers in general. Combining them with powerful querying capabilities bridges that gap and makes the overall experience an useful one.
IMO, we'll always have to deal with the stochastic nature of these models and hallucinations, which calls for caution and requires raising awareness within the user base. What I found watching our users internally is that, while it's not magical, it allows users to request data more often, and compounds in data-driven decision-making, assuming the users are trained to interpret the interactions
I'll freely admit you have more data (experience) to work with on this than I did in the tests I ran almost a year ago. I spent a lot of time documenting my schemas, feeding the LLM sample rows, etc and the final results were not useful enough even as a starting point for a static query that a developer would improve on and "hard code" into a UI. I approached it as both:
- Wouldn't it be cool to let my users chat with their data? ("How many new users signed up today/this event/this month/etc?" or "How much did we make yesterday?")
- An internal tool to use as a starting point for analytics dashboards
I still use LLMs to help write queries if it's something I know can be done but can't remember the syntax but I scrapped the project to try and accomplish both the above goals due to too many mistakes. Maybe my data is just too "dirty" (but honestly, I've never _not_ seen dirty data) and/or I should have cleaned up deprecated columns in my tables that confused the models (even with strict instructions to ignore them, I should have filtered them completely) but I spent way too much time repeating myself, talking in all caps, and generally fighting with the SOTA models to try to get them to understand my data so that they could generate queries that actually worked (worked as in returned valid data, not just valid SQL). I wasn't doing any training/fine-tuning (which may be the magic needed) but I felt like it was a dead end (given current models). I'll also stress that I haven't re-tested those theories on newer models and my results are at least a year out of date (a lifetime in LLM/AIs) but the fundamental issues I ran into didn't seem to be "on the cusp" of being solved or anything like that.
I wish you all the best of luck in improving on this kind of thing.
Thanks for your detailed reply. It is great to see that you have been experimenting with this approach.
We published a public demo of the Agentic Data Stack, I'd love to hear your feedback https://clickhouse.com/blog/agenthouse-demo-clickhouse-llm-m...
Keep in mind that it's not fully "fair", since these public dataset are often documented in the internet so already present in pre-training of the models underneath (Claude Sonnet 4.5 in this case)
Yeah that's really bad news.
I too have LibreChat deployed for my personal use and now the only question is how long until it will inevitably be enshittified/monetized.
Must feel even worse for volunteers who worked on the project but don't get any benefit from the aquisition.
(Ryadh from ClickHouse here)
It's a fair concern, and I understand where you are coming from. What I can say is that it's not our first rodeo incorporating another OSS product in our family. I tried to summarize it in the post:
> "This proven playbook is the same one that we applied when joining forces with PeerDB to provide our ClickPipes CDC capabilities, and HyperDX, which became the UX of our observability product, ClickStack."
If you research both instances above, the result is that these projects got more traction and adoption overall.
I hope this helps! and thank you for using LibreChat
The part I suspect most are here for:
- LibreChat remains 100% open-source under its existing MIT license
- Community-first development continues with the same transparency and openness
of course, no binding commitment to any of that in the long term.
Of course they wouldn’t announce acquisition and a license change at the same time but this is obviously the beginning of the end.
See Hashicorp and Elasticsearch for the same old story.
Luckily these kinds of products are a dime a dozen, ie zero technical complexity and there are so many similar projects already out there. Hell you can even vibe code this kind of project.
Yup, they will of course try to profit long-term, ClickHouse is not a charity project.
Within the next few years, it will introduce an additional enterprise edition, a SaaS offering, or a change in licensing terms.
We aren't a charity.
But, we are also a database provider.
Ryadh mentions some examples below where we have joined forces, incorporated code into ClickHouse Cloud (our commercial offering), and OSS has grown.
Time will tell (I can't predict future)...but I'm excited about the future of OSS LibreChat.
(disclaimer: I work at ClickHouse)
Ryadh from ClickHouse here, happy to answers questions if folks have any.
So, why this move ?
Basically, we noticed that the existing "agentic" open-source ecosystem is primarily focused on developer tools and SDKs, as developers are the early adopters who build the foundation for emerging technologies. Current projects provide frameworks, orchestration, and integrations The idea behind the Agentic Data Stack is a higher-level integration to provide a composable software stack for agentic analytics that users can setup quicky, with room for customization.
Cool so your bet is that chat is essentially the new interface for BI... or ad hoc analytical inquiry... which opens up more dynamic BI... instead of asking an analyst in a slack conversation who then goes and runs a bunch of data pulls and munging, it's all handled agentically and a response is brought back to the user.. one thing is for sure: Tableau needs to be disrupted so happy to watch this one play out!
It actually comes from our own experience at ClickHouse. We deployed this stack internally 8 months ago, and since very few people here have touched our legacy BI systems :) I have never seen an adoption curve like this one tbh. It's obviously not perfect and can hallucinate sometimes, which can be tricky, but with the right approach and awareness in place, the value it delivers is massive. What really happens is that more users get access to data instantly, and as a result, we make better, data-driven, decisions overall.
My favourite use-case: our sales and support folks systematically ask DWAINE (our dwh agent) to produce a report before important meetings with customers, something along the lines of: "I'm meeting with <customer_name> for a QBR, what do I need to know?". This will pull usage data, support interactions, billing, and many other dimensions, and you can guess that the quality of the conversation is greatly improved.
My colleague Dmitry wrote about it when we first deployed it: https://www.linkedin.com/pulse/bi-dead-change-my-mind-dmitry...
This is really cool; does this mean Danny gets a salary to work on his open source project; would you consider this a "sponsorship" or would he have other jobs within ClickHouse / have a manager etc?
The LibreChat folks are now my colleagues, and it's exciting
My biggest question and concern is whether or not LibreChat will end up introducing the SSO tax or other "enterprise tier" features. Is this something you can speak on?
Interestingly, LibreChat has a broad range of applications already and we'll continue to support them. The investment area we want to tackle in priority is around the analytics use-case specifically.In that space, I don't see an SSO-tax scheme unfolding tbh, it's really about better visualizations, semantic layers and anything that can improve the quality of the insights produced on top of analytics data
That's good to hear! I'll be honest, I was a bit concerned about how LibreChat was going to support long term development and definitely see that this could be a good thing.
Full Disclosure. I am the author of https://github.com/gitsense/chat
> The idea behind the Agentic Data Stack is a higher-level integration to provide a composable software stack for agentic analytics that users can setup quicky, with room for customization.
I agree with this. For those who have been programming with LLM, the difference between something working and not working can be a simple "sentence" conveying the required context. I strongly believe data enrichment will be one of the main ways we can make agents more effective and efficient. Data enrichment is the foundation for my personal assistant feature https://github.com/gitsense/chat/blob/main/packages/chat/wid...
Basically instead of having agents blindly grep for things, you would provide them with analyzers that they can use to search with. By making it dead simple for domain experts to extract 'business logic' from their codebase/data, we can solve a lot of problems, much more efficiently. Since data is the key, I can see why ClickHouse will make this move since they probably want to become the storage for all business logic.
Note: I will be dropping a massive update to how my tool generates and analyzes metadata this week, so don't read too much into the demo or if you decide to play with it. I haven't really been promoting it because the flow hasn't been right, but it should be this week.
Was this acquisitions strategic or because it feels good? How does this help ClickHouse make more money?
I hope the librechat dev got a nice payout, I've been selfhosting for about a year.
Check the comments below or the OP. I think Ryadh does a great job of explaining the “why”
As a LibreChat user, I'm concerned. I've seen open source projects get acquired like that, and very soon they start to have some kind of paid features, telemetry, etc. Might have to start looking for alternatives soon.
Yep embrace, extend, extinguish.
Ryadh from ClickHouse here, I commented below about the overall intent. Let me know if anything needs clarifying!
What does it mean to acquire an open source project?
(disclaimer: I work at ClickHouse)
At its simplest the team who was building that rad thing called LibreChat, now works at ClickHouse and build that rad thing called LibreChat.
Even simpler, the LibreChat team works at ClickHouse and are now my colleagues.
More complex, acquisitions can take a variety of "forms"...most importantly in these scenarios (and now I speak without knowledge of the deal structure) is making sure the team is paid, that copyright/trademark stuff is worked out, that OSS plans are discussed, and that everyone is excited to work together.
intellectual property has value and an owner. that ownership can change hands. for money.
open source means you have a license to freely use and commercialize it. not that it has no owners.
I find it more intuitive to talk about it in terms of project governance and control.
An OSS project has contributing members and they exert control over the project by working on the codebase and approving/rejecting contributions. If they act as an entity, you can buy their control over the project. The entity that owns the project can now change the license, “sponsor” the contributors by giving them salary, “fire” the contributors, etc.
That there will soon be a new fork of it.
Why is that?
From the post
“ LibreChat remains 100% open-source under its existing MIT license Community-first development continues with the same transparency and openness Expanded roadmap to bring an even more enterprise-ready analytics experience. This proven playbook is the same one that we applied when joining forces with PeerDB to provide our ClickPipes CDC capabilities, and HyperDX, which became the UX of our observability product, ClickStack.”
Also, I work at ClickHouse, my email is super easy to figure out. Would love to alleviate concerns where I can.
Congrats to Danny!
Nearly had a heart attack until I re-read it. I thought Libera Chat[0] was acquired
[0] https://libera.chat/
Nothing worse than having to update my IRC clients again ;)
I was wondering how it could acquire something that was formed as a reaction against freenode getting acquired...
History repeats itself. I wouldn't be suprised.