What is with the negativity in these comments? This is a huge, huge surface area that touches a large percentage of white collar work. Even just basic automation/scaffolding of spreadsheets would be a big productivity boost for many employees.
My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.
It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.
Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.
Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.
Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:
- Log in to the internal system that handles customer policies
- Find all policies that were bound in the last 30 days
- Log in to the internal system that manages customer payments
- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.
- Flag any divergences above X% for accounting/finance to follow up on.
Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.
Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.
Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...
The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"
I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.
My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.
I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers
> The one thing LLMs should consistently do is ensure that formatting is correct.
In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?
Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)
Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..
The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.
We’ve tried to gently use them to automate some of our report generation and PDF->Invoice workflows and it’s a nightmare of silent changes and absence of logic.. basic things like specifically telling it “debits need to match credits” and “balance sheets need to balance” that are ignored.
"I don't trust LLMs to do the kind of precise deterministic work" => I think LLM is not doing the precise arithmetic. It is the agent with lots of knowledge (skills) and tools. Precise deterministic work is done by tools (deterministic code). Skills brings domain knowledge and how to sequence a task. Agent executes it. LLM predicts the next token.
Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.
Exactly, and if it can be done in a way that helps users better understand their own spreadsheets (which are often extremely complex codebases in a single file!) then this could be a huge use case for Claude.
>I don't trust LLMs to do the kind of precise deterministic work
not just in a spreadsheet, any kind of deterministic work at all.
find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.
after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.
ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?
If only we had created some device that could perform deterministic calculations and then wrote software that made it easy for humans to use such calculations.
ok but humans are idiots, if only we could make some sort of Alternate Idiot, a non-human but every bit as generally stupid as humans are! This A.I would be able to do every stupid thing humans did with the device that performed deterministic calculations only many times faster!
I have to admit that my first thought was “April’s fool”. But you are right. It makes a lot of sense (if they can get it to work well). Not only is Excel the world’s biggest “programming language”. It’s probably also one of the most unintuitive ways to program.
What does scaffolding of spreadsheets mean? I see the term scaffolding frequently in the context of AI-related articles and not familiar with this method and I’m hesitant to ask an LLM.
Scaffolding typically just refers to a larger state machine style control flow governing an agent's behavior and the suite of external tools it has access to.
Yeah, this could be a pretty big deal. Not everyone is an excel expert, but nearly everyone finds themselves having to work with data in excel at some time or other.
They already doing that with AI, rejecting claims at higher numbers than before .
Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .
The issue isn’t in creating a new monstrosity in excel.
The issue is the poor SoB who has to spelunk through the damn thing to figure out what it does.
Excel is the sweet spot of just enough to be useful, capable enough to be extensible, yet gated enough to ensure everyone doesn’t auto run foreign macros (or whatever horror is more appropriate).
In the simplest terms - it’s not excel, it’s the business logic. If an excel file works, it’s because theres someone who “gets” it in the firm.
I used to live in Excel too. I've trudged through plenty of awful worksheets. The output I've seen from AI is actually more neatly organized than most of what I used to receive in outlook. Most of that wasn't hyper-sophisticated cap table analyses. It was analysis from a Jr Analyst or line employee trying to combine a few different data sources to get some signal on how XYZ function of the business was performing. AI automation is perfectly suitable for this.
Neat formatting didn't save any model from having the wrong formula pasted in.
Being neat was never a substitute for being well rested, or sufficiently caffeinated.
Have you seen how AI functions in the hands of someone who isn't a domain expert? I've used it for things I had no idea about, like Astro+ web dev. User ignorance was magnified spectacularly.
This is going to have Jr Analysts dumping well formatted junk in email boxes within a month.
It's actually really cool. I will say that "spreadsheets" remain a bandaid over dysfunctional UIs, processes, etc and engineering spends a lot of time enabling these bandaids vs someone just saying "I need to see number X" and not "a BI analytics data in a realtime spreadsheet!", etc.
Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.
It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.
Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.
It seems like to me the answer is moreso "People on HN are so far removed from the real use cases for this kind of automation they simply have no idea what they're talking about".
Seems everyone is speculating features instead of just reading TFA which does in fact list features:
- Get answers about any cell in seconds:
Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
- Test scenarios without breaking formulas:
Update assumptions across your entire model while preserving all dependencies. Test different scenarios quickly—Claude highlights every change with explanations for full transparency.
- Debug and fix errors:
Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.
- Build models or fill existing templates:
Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.
I'm not excited about having LLMs generate spreadsheets or formulas. But, I think LLMs could be particularly useful in helping me find inconsistent formulas or errors that are challenging to identify. Especially in larger, complex spreadsheets touched by multiple people over the course of months.
For once in my life, I actually had a delightful interaction with an LLM last week. I was changing some text in an Excel sheet in a very progromatic way that could have easily been done with the regex functions in Excel. But I'm not really great with regex, and it was only 15 or so cells, so I was content to just do it manually. After three or four cells, Copilot figured out what I was doing and suggested the rest of the changes for me.
This is what I want AI to do, not generate wrong answers and hallucinate girlfriends.
IMO, a real solution here has to be hybrid, not full LLM, because these sheets can be massive and have very complicated structures. You want to be able to use the LLM to identify / map column headers, while using non-LLM tool calling to run Excel operations like SUMIFs or VLOOKUPs. One of the most important traits in these systems is consistency with slight variation in file layout, as so much Excel work involves consolidating / reconciling between reports made on a quarterly basis or produced by a variety of sources, with different reporting structures.
Disclosure: My company builds ingestion pipelines for large multi-tab Excel files, PDFs, and CSVs.
I guess Claude maybe useful for finding errors in large Excel Workbooks.
May also help beginners to learn the more complex Excel functions (which are still pretty easy).
But if you are proficient at building Excel models I don't see any benefit.
Excel already has a superb very efficient UI for entering formulas, ranges, tables, data sources etc I'm sceptical that a different UI especially a text based one can improve on this.
I understand the sentiment about a skilled user not needing this, but I think having a little buddy that I can use to offload some menial tasks would be helpful for me to iterate through my models more efficiently; even if the AI is not perfect. As a highly skilled excel user, I admit the software has terrible ergonomics. It would be a productivity boon for me if an AI can help me stay focused on model design vs model implementation.
For some reason, I find that these tools are TERRIBLE at helping someone learn. I suspect because turning one on, results in turning the problem solving part of ones brain off.
Its obviously not the same experience for everyone. ( If you are one of those energized while working in a chat window, you might be in a minority - given what we see from the ongoing massacre of brains in education. )
Paraphrasing something I read here "people don't use ChatGPT to do learn more, they use it to study less".
This is going to be massive if it works as well as I suspect it might.
I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
It's much less about 'greenfield' new excel sheets and much more about fixing/improving existing ones. If it works as well as Claude Code works for code, then it will get pretty crazy adoption I suspect (unless Microsoft beats them to it).
> I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
So they can fire the two dudes that take care of it, lose 15 years of in house knowledge to save 200k a year and cry in a few months when their magic tool shits the bed ?
If the company is half baked, those "two dudes" will become indispensable beyond belief. They are the ones that understand how Excel works far deeper, and paired with Claude for Excel they become far far more valuable.
I have just launched a product (easyanalytica.com) to create dashboards from spreadsheets, and Excel is on my to-do list of formats to be supported. However, I'm having second thoughts. Although, from the description, it seems like it would be more helpful on the modeling side rather than the presentation side. I guess I'll have to wait until it's publicly available
That seems to be true for any startup that offers a wrapper to existing AIs rather than an AI on their own. The lucky ones might be bought but many if not most of them will perish trying to compete with companies that actually create AI models and companies large enough to integrate their own wrappers.
Gemini already has its hooks in Google Sheets, and to be honest, I've found it very helpful in constructing semi-complicated Excel formulas.
Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.
It is an entire agent loop. You can ask it to build a multi sheet analysis of your favorite stock and it will. We are seeing a lot of early adopters use it for financial modeling, research automation, and internal reporting tasks that used to take hours.
Last time I tried using Gemini in Google Sheets it hallucinated a bunch of fake data, then gave me a summary that included all that fake data. I'd given it a bunch of transaction data, and asked it to group the records into different categories for budgeting. When asking it to give the largest values in each category, all the values that came back were fake. I'm not sure I'd really trust it to touch a spreadsheet after that.
-stop using the free plan
-don't use gemini flash for these tasks
-learn how to do things over time and know that all ai models have improved significantly every few months
I have had the opposite experience. I've never had Gemini give me something useful in sheets, and I'm not asking for complicated things. Like "group this data by day" or "give me p50 and p90"
It’s interesting to me that this page talks a lot about “debugging models” etc. I would’ve expected (from the title) this to be going after the average excel user, similar to how chatgpt went after every day people.
I would’ve expected “make a vlookup or pivot table that tells me x” or “make this data look good for a slide deck” to be easier problems to solve.
The issue is that the average Excel user doesn’t quite have the skills to validate and double-check the Excel formulas that Claude would produce, and to correct them if needed. It would be similar to a non-programmer vibe-coding an app. And that’s really not what you want to happen for professionally used Excel sheets.
IMO that is exactly what people want. At my work everyone uses LLMs constantly and the trade off of not perfect information is known. People double check it, etc, but the information search is so much faster even if it finds the right confluence but misquotes it, it still sends me the link.
For easy spreadsheet stuff (which 80% of average white collars workers are doing when using excel) I’d imagine the same approach. Try to do what I want, and even if you’re half wrong the good 50% is still worth it and a better starting point.
Vibe coding an app is like vibe coding a “model in excel”. Sure you could try, but most people just need to vibe code a pivot table
I think actually Anthropic themselves are having trouble with imagining how this could be used. Coders think like coders - they are imagining the primary use case being managing large Excel sheets that are like big programs. In reality most Excel worksheets are more like tiny, one-off programs. More like scripts than applications. AI is very very good at scripts.
George Hotz said there's 5 tiers of AI systems, Tier 1 - Data centers, Tier 2 - fabs, Tier 3 - chip makers, Tier 4 - frontier labs, Tier 5 - Model wrappers. He said Tier 4 is going to eat all the value of Tier 5, and that Tier 5 is worthless. It's looking like that's going to be the case
Tier 5 requires domain expertise until we reach AGI or something very different from the latest LLMs.
I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.
George Hotz says a lot of things. I think he's directionally correct but you could apply this argument to tech as a whole. Even outside of AI, there are plenty of niches where domain-specific solutions matter quite a bit but are too small for the big players to focus on.
That is a common refrain by people who have no domain expertise in anything outside of tech.
Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)
This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.
yeah but if Anthropic/OpenAI dedicate resources to gaining domain expertise then any tier 5 is dead in the water. For example, they recently hired a bunch of finance professionals to make specialized models for financial modeling. Any startup in that space will be wiped out
Interesting. I found a reference to this in a tweet [1], and it looks to be a podcast. While I'm not extremely knowledgable. I'd put it like this: Tier 1 - fabs, Tier 2 - chip makers, Tier 3 - data centers, Tier 4 - frontier labs, Tier 5 - Model wrappers
However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.
I'm excited to see what national disasters will be caused by auto-generated Excel sheets that nobody on the planet understands. A few selections from past HN threads to prime your imagination:
Especially combined with the dynamic array formulas that have recently been added (LET, LAMBDA etc). You can have much more going on within each cell now. Think whole temporary data structures. The "evaluate formula" dialog doesn't quite cut it anymore for debugging.
from my experience in the corporate world, i'd trust an excel generated / checked by an LLM more than i would one that has been organically grown over years in a big corporation where nobody ever checks or even can check anything because its one big growing pile of technical debt people just accept as working
How well does change tracking work in Excel... how hard would it be to review LLM changes?
AFAIK there is no 'git for Excel to diff and undo', especially not built-in (aka 'for free' both cost-wise and add-ons/macros not allowed security-wise).
My limited experience has been that it is difficult to keep LLMs from changing random things besides what they're asked to change, which could cause big problems if unattackable in Excel.
When I think how easy I can misclick to stuff up a spreadsheet I can't begin to imagine all the subtle ways LLMs will screw them up.
Unlike code where it's all on display, with all these formulas are hidden in each cell, you won't see the problem unless click on the cell so you'll have a hard time finding the cause.
As an inveterate Excel lover, I can just sense the blinding pain wafting off the legions of accountants, associates, seniors, and tech people who keep the machine spirits placated.
lies, damn lies, statistics, and then Excel deciding cell data types.
Yet more evidence of the bubble burst being imminent. If any of these companies really had some almost-AGI system internally, they wouldn’t be spending any effort making f’ing Excel plugins. Or at the very least, they’d be writing their own Excel because AI is so amazing at coding, right?
Yes. I once interviewed a developer who’s previous job was maintaining the .NET application that used an Excel sheet as the brain for decisions about where to drill for oil on the sea floor. No one understood what was in the Excel sheet. It was built by a geologist who was long gone. The engineering team understood the inputs and outputs. That’s all they needed to know.
Years ago when I worked for an engineering consulting company we had to work with a similarly complex, opaque Excel spreadsheet from General Electric modeling the operation of a nuclear power plant in exacting detail.
Same deal there -- the original author was a genius and was the only person who knew how it was set up or how it worked.
I think you’re misunderstanding me. This might be something somewhat useful, I don’t know, and I’m not judging it based on that.
What I’m saying is that if you really believed we were 2, maybe 3 years tops from AGI or the singularity or whatever you would spend 0 effort serving what already seems to be a domain that is already served by 3rd parties that are already using your models! An excel wrapper for an LLM isn’t exactly cutting edge AI research.
They’re desperate to find something that someone will pay a meaningful amount of money for that even remotely justifies their valuation and continued investment.
Cool but now companies POs will be like "you must add the Excel export for all the user data!" and when asked why, will basically be "so I can do this roundabout query of data for some number in a spreadsheet using AI (instead of just putting the number or chart directly in the product with a simple db call)"
Okay. But then you could say the same for a human, isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
> isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
LLMs are not deterministic.
I'd argue over the short term humans are more deterministic. I ask a human the same question multiple times and I get the same answer. I ask an LLM and each answer could be very different depending on its "temperature".
I mean - try clicking the CoPilot button and see what it can actually do. Last I checked, it told me it couldn't change any of the actual data itself, but it could give you suggestions. Low bar for excellence here.
What is with the negativity in these comments? This is a huge, huge surface area that touches a large percentage of white collar work. Even just basic automation/scaffolding of spreadsheets would be a big productivity boost for many employees.
My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.
It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.
Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.
Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.
Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:
- Log in to the internal system that handles customer policies
- Find all policies that were bound in the last 30 days
- Log in to the internal system that manages customer payments
- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.
- Flag any divergences above X% for accounting/finance to follow up on.
Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.
Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.
Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...
The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"
I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.
My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.
I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers
> The one thing LLMs should consistently do is ensure that formatting is correct.
In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?
Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)
Indeed, in a small enough org, the sysadmin/technologist becomes support of last resort for all the things.
> these users LOVE smartsheet
I hate smartsheet…
Excel or R. (Or more often, regex followed by pen and paper followed by more regex.)
Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..
The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.
We’ve tried to gently use them to automate some of our report generation and PDF->Invoice workflows and it’s a nightmare of silent changes and absence of logic.. basic things like specifically telling it “debits need to match credits” and “balance sheets need to balance” that are ignored.
They're not great at arithmetic but at abstract mathematics and numerical coding they're pretty good actually.
"I don't trust LLMs to do the kind of precise deterministic work" => I think LLM is not doing the precise arithmetic. It is the agent with lots of knowledge (skills) and tools. Precise deterministic work is done by tools (deterministic code). Skills brings domain knowledge and how to sequence a task. Agent executes it. LLM predicts the next token.
Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.
Exactly, and if it can be done in a way that helps users better understand their own spreadsheets (which are often extremely complex codebases in a single file!) then this could be a huge use case for Claude.
I don't trust humans to do the kind of precise deterministic work you need in a spreadsheet!
Right, we shouldn’t use humans or LLMs. We should use regular deterministic computer programs.
For cases where that is not available, we should use a human and never an LLM.
"regular deterministic computer programs" - otherwise known as the SUM function in Microsoft Excel
>I don't trust LLMs to do the kind of precise deterministic work
not just in a spreadsheet, any kind of deterministic work at all.
find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.
after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.
ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?
I couldn’t agree more. I get all my perfectly deterministic work output from human beings!
If only we had created some device that could perform deterministic calculations and then wrote software that made it easy for humans to use such calculations.
ok but humans are idiots, if only we could make some sort of Alternate Idiot, a non-human but every bit as generally stupid as humans are! This A.I would be able to do every stupid thing humans did with the device that performed deterministic calculations only many times faster!
Whats with claiming negativity when most of the comments here are positive?
I have to admit that my first thought was “April’s fool”. But you are right. It makes a lot of sense (if they can get it to work well). Not only is Excel the world’s biggest “programming language”. It’s probably also one of the most unintuitive ways to program.
Why unintuitive?
What does scaffolding of spreadsheets mean? I see the term scaffolding frequently in the context of AI-related articles and not familiar with this method and I’m hesitant to ask an LLM.
Scaffolding typically just refers to a larger state machine style control flow governing an agent's behavior and the suite of external tools it has access to.
Yeah, this could be a pretty big deal. Not everyone is an excel expert, but nearly everyone finds themselves having to work with data in excel at some time or other.
My concern is that my insurance company will reject a claim, or worse, because of something an LLM did to a spreadsheet.
Now, granted, that can also happen because Alex fat-fingered something in a cell, but that's something that's much easier to track down and reverse.
They already doing that with AI, rejecting claims at higher numbers than before .
Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .
> They already doing that with AI, rejecting claims at higher numbers than before
Source?
> It is just nature of having the trifecta of profit motive , socialized risk and light regulation.
It's the nature of everything. They agree to pay you for something. It's nothing specific to "profit motive" in the sense you mean it.
> How teams use Claude for Excel
Who are these teams that can get value from Anthropic? One MCP and my context window is used up and Claude tells me to start a new chat.
I used to live in excel.
The issue isn’t in creating a new monstrosity in excel.
The issue is the poor SoB who has to spelunk through the damn thing to figure out what it does.
Excel is the sweet spot of just enough to be useful, capable enough to be extensible, yet gated enough to ensure everyone doesn’t auto run foreign macros (or whatever horror is more appropriate).
In the simplest terms - it’s not excel, it’s the business logic. If an excel file works, it’s because theres someone who “gets” it in the firm.
I used to live in Excel too. I've trudged through plenty of awful worksheets. The output I've seen from AI is actually more neatly organized than most of what I used to receive in outlook. Most of that wasn't hyper-sophisticated cap table analyses. It was analysis from a Jr Analyst or line employee trying to combine a few different data sources to get some signal on how XYZ function of the business was performing. AI automation is perfectly suitable for this.
How?
Neat formatting didn't save any model from having the wrong formula pasted in.
Being neat was never a substitute for being well rested, or sufficiently caffeinated.
Have you seen how AI functions in the hands of someone who isn't a domain expert? I've used it for things I had no idea about, like Astro+ web dev. User ignorance was magnified spectacularly.
This is going to have Jr Analysts dumping well formatted junk in email boxes within a month.
It's actually really cool. I will say that "spreadsheets" remain a bandaid over dysfunctional UIs, processes, etc and engineering spends a lot of time enabling these bandaids vs someone just saying "I need to see number X" and not "a BI analytics data in a realtime spreadsheet!", etc.
> What is with the negativity in these comments?
Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.
It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.
Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.
Claude Excel is leaning deeply into this garbage.
It seems like to me the answer is moreso "People on HN are so far removed from the real use cases for this kind of automation they simply have no idea what they're talking about".
This is so correct it hurts
Seems everyone is speculating features instead of just reading TFA which does in fact list features:
- Get answers about any cell in seconds: Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
- Test scenarios without breaking formulas: Update assumptions across your entire model while preserving all dependencies. Test different scenarios quickly—Claude highlights every change with explanations for full transparency.
- Debug and fix errors: Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.
- Build models or fill existing templates: Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.
I'm not excited about having LLMs generate spreadsheets or formulas. But, I think LLMs could be particularly useful in helping me find inconsistent formulas or errors that are challenging to identify. Especially in larger, complex spreadsheets touched by multiple people over the course of months.
For once in my life, I actually had a delightful interaction with an LLM last week. I was changing some text in an Excel sheet in a very progromatic way that could have easily been done with the regex functions in Excel. But I'm not really great with regex, and it was only 15 or so cells, so I was content to just do it manually. After three or four cells, Copilot figured out what I was doing and suggested the rest of the changes for me.
This is what I want AI to do, not generate wrong answers and hallucinate girlfriends.
IMO, a real solution here has to be hybrid, not full LLM, because these sheets can be massive and have very complicated structures. You want to be able to use the LLM to identify / map column headers, while using non-LLM tool calling to run Excel operations like SUMIFs or VLOOKUPs. One of the most important traits in these systems is consistency with slight variation in file layout, as so much Excel work involves consolidating / reconciling between reports made on a quarterly basis or produced by a variety of sources, with different reporting structures.
Disclosure: My company builds ingestion pipelines for large multi-tab Excel files, PDFs, and CSVs.
That's exactly what they're doing.
https://www.anthropic.com/news/advancing-claude-for-financia...
"This won't work because (something obvious that engineers at Anthropic clearly thought of already)"
So more or less like what AI has been doing for the last couple of years when it comes to writing code?
I guess Claude maybe useful for finding errors in large Excel Workbooks. May also help beginners to learn the more complex Excel functions (which are still pretty easy). But if you are proficient at building Excel models I don't see any benefit. Excel already has a superb very efficient UI for entering formulas, ranges, tables, data sources etc I'm sceptical that a different UI especially a text based one can improve on this.
I understand the sentiment about a skilled user not needing this, but I think having a little buddy that I can use to offload some menial tasks would be helpful for me to iterate through my models more efficiently; even if the AI is not perfect. As a highly skilled excel user, I admit the software has terrible ergonomics. It would be a productivity boon for me if an AI can help me stay focused on model design vs model implementation.
For some reason, I find that these tools are TERRIBLE at helping someone learn. I suspect because turning one on, results in turning the problem solving part of ones brain off.
Its obviously not the same experience for everyone. ( If you are one of those energized while working in a chat window, you might be in a minority - given what we see from the ongoing massacre of brains in education. )
Paraphrasing something I read here "people don't use ChatGPT to do learn more, they use it to study less".
Maybe some folk would be better off.
This is going to be massive if it works as well as I suspect it might.
I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
It's much less about 'greenfield' new excel sheets and much more about fixing/improving existing ones. If it works as well as Claude Code works for code, then it will get pretty crazy adoption I suspect (unless Microsoft beats them to it).
> I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
So they can fire the two dudes that take care of it, lose 15 years of in house knowledge to save 200k a year and cry in a few months when their magic tool shits the bed ?
Massive win indeed
If the company is half baked, those "two dudes" will become indispensable beyond belief. They are the ones that understand how Excel works far deeper, and paired with Claude for Excel they become far far more valuable.
> This is going to be massive if it works as well as I suspect it might.
Until Microsoft does its anti-competitive thing and find a way to break this in the file format, because this is exactly what copilot in excel does.
That said, Copilot in Excel is pretty much hot garbage still so anything will be better than that.
I have just launched a product (easyanalytica.com) to create dashboards from spreadsheets, and Excel is on my to-do list of formats to be supported. However, I'm having second thoughts. Although, from the description, it seems like it would be more helpful on the modeling side rather than the presentation side. I guess I'll have to wait until it's publicly available
Why second thoughts?
everyone will use claude if they support it why would they use my product. so i will have to find some other angle to differentiate.
Tough day to be an AI Excel add-in startup
its a great time for your ai excel add-in to start getting acquired by a claude competitor though
That seems to be true for any startup that offers a wrapper to existing AIs rather than an AI on their own. The lucky ones might be bought but many if not most of them will perish trying to compete with companies that actually create AI models and companies large enough to integrate their own wrappers.
Actually just wrote about this: https://aimode.substack.com/p/openai-is-below-above-and-arou...
not sure if it binary like that but as startups we will probably collect the scraps leftover indeed instead
Gemini already has its hooks in Google Sheets, and to be honest, I've found it very helpful in constructing semi-complicated Excel formulas.
Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.
I would recommend trying TabTabTab at https://tabtabtab.ai/
It is an entire agent loop. You can ask it to build a multi sheet analysis of your favorite stock and it will. We are seeing a lot of early adopters use it for financial modeling, research automation, and internal reporting tasks that used to take hours.
Last time I tried using Gemini in Google Sheets it hallucinated a bunch of fake data, then gave me a summary that included all that fake data. I'd given it a bunch of transaction data, and asked it to group the records into different categories for budgeting. When asking it to give the largest values in each category, all the values that came back were fake. I'm not sure I'd really trust it to touch a spreadsheet after that.
you should:
-stop using the free plan -don't use gemini flash for these tasks -learn how to do things over time and know that all ai models have improved significantly every few months
I have had the opposite experience. I've never had Gemini give me something useful in sheets, and I'm not asking for complicated things. Like "group this data by day" or "give me p50 and p90"
Gemini integratoins to Google workspace feels like it's using Gemini 1.5 flash, it's so comically bad at understanding and generating
It’s interesting to me that this page talks a lot about “debugging models” etc. I would’ve expected (from the title) this to be going after the average excel user, similar to how chatgpt went after every day people.
I would’ve expected “make a vlookup or pivot table that tells me x” or “make this data look good for a slide deck” to be easier problems to solve.
The issue is that the average Excel user doesn’t quite have the skills to validate and double-check the Excel formulas that Claude would produce, and to correct them if needed. It would be similar to a non-programmer vibe-coding an app. And that’s really not what you want to happen for professionally used Excel sheets.
IMO that is exactly what people want. At my work everyone uses LLMs constantly and the trade off of not perfect information is known. People double check it, etc, but the information search is so much faster even if it finds the right confluence but misquotes it, it still sends me the link.
For easy spreadsheet stuff (which 80% of average white collars workers are doing when using excel) I’d imagine the same approach. Try to do what I want, and even if you’re half wrong the good 50% is still worth it and a better starting point.
Vibe coding an app is like vibe coding a “model in excel”. Sure you could try, but most people just need to vibe code a pivot table
I think actually Anthropic themselves are having trouble with imagining how this could be used. Coders think like coders - they are imagining the primary use case being managing large Excel sheets that are like big programs. In reality most Excel worksheets are more like tiny, one-off programs. More like scripts than applications. AI is very very good at scripts.
I think this is aiming to be Claude Code for people who use Excel as a programming environment.
George Hotz said there's 5 tiers of AI systems, Tier 1 - Data centers, Tier 2 - fabs, Tier 3 - chip makers, Tier 4 - frontier labs, Tier 5 - Model wrappers. He said Tier 4 is going to eat all the value of Tier 5, and that Tier 5 is worthless. It's looking like that's going to be the case
Tier 5 requires domain expertise until we reach AGI or something very different from the latest LLMs.
I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.
George Hotz says a lot of things. I think he's directionally correct but you could apply this argument to tech as a whole. Even outside of AI, there are plenty of niches where domain-specific solutions matter quite a bit but are too small for the big players to focus on.
That is a common refrain by people who have no domain expertise in anything outside of tech.
Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)
This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.
yeah but if Anthropic/OpenAI dedicate resources to gaining domain expertise then any tier 5 is dead in the water. For example, they recently hired a bunch of finance professionals to make specialized models for financial modeling. Any startup in that space will be wiped out
People were saying the same thing about AWS vs SaaS ("AWS wrappers") a decade ago and none of that came to pass. Same will be true here.
Claude is a model wrapper, no?
Anthropic is a frontier lab, and Claude is a frontier model
Interesting. I found a reference to this in a tweet [1], and it looks to be a podcast. While I'm not extremely knowledgable. I'd put it like this: Tier 1 - fabs, Tier 2 - chip makers, Tier 3 - data centers, Tier 4 - frontier labs, Tier 5 - Model wrappers
However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.
1: https://x.com/tbpn/status/1935072881425400016
I'm excited to see what national disasters will be caused by auto-generated Excel sheets that nobody on the planet understands. A few selections from past HN threads to prime your imagination:
Thousands of unreported COVID cases: https://news.ycombinator.com/item?id=24689247
Thousands of errors in genetics research papers: https://news.ycombinator.com/item?id=41540950
Wrong winner announced in national election: https://news.ycombinator.com/item?id=36197280
Countries across the world implement counter-productive economic austerity programs: https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#Metho...
Especially combined with the dynamic array formulas that have recently been added (LET, LAMBDA etc). You can have much more going on within each cell now. Think whole temporary data structures. The "evaluate formula" dialog doesn't quite cut it anymore for debugging.
from my experience in the corporate world, i'd trust an excel generated / checked by an LLM more than i would one that has been organically grown over years in a big corporation where nobody ever checks or even can check anything because its one big growing pile of technical debt people just accept as working
Ok, they weren't confident enough to let the model actually edit the spreadsheet. Phew..
Only a matter of time before someone does it though.
How well does change tracking work in Excel... how hard would it be to review LLM changes?
AFAIK there is no 'git for Excel to diff and undo', especially not built-in (aka 'for free' both cost-wise and add-ons/macros not allowed security-wise).
My limited experience has been that it is difficult to keep LLMs from changing random things besides what they're asked to change, which could cause big problems if unattackable in Excel.
When I think how easy I can misclick to stuff up a spreadsheet I can't begin to imagine all the subtle ways LLMs will screw them up.
Unlike code where it's all on display, with all these formulas are hidden in each cell, you won't see the problem unless click on the cell so you'll have a hard time finding the cause.
I wish Gemini could edit more in Google sheets and docs.
Little stuff like splitting text more intelligently or following the formatting seen elsewhere would be very satisfying.
As an inveterate Excel lover, I can just sense the blinding pain wafting off the legions of accountants, associates, seniors, and tech people who keep the machine spirits placated.
lies, damn lies, statistics, and then Excel deciding cell data types.
This could be huge! Very exciting!
I just want Claude inside of Metabase.
https://www.metabase.com/features/metabot-ai
Yet more evidence of the bubble burst being imminent. If any of these companies really had some almost-AGI system internally, they wouldn’t be spending any effort making f’ing Excel plugins. Or at the very least, they’d be writing their own Excel because AI is so amazing at coding, right?
The fine tuning will continue until we reach AGI.
You wouldn't believe the amount of shit that runs on Excel.
Yes. I once interviewed a developer who’s previous job was maintaining the .NET application that used an Excel sheet as the brain for decisions about where to drill for oil on the sea floor. No one understood what was in the Excel sheet. It was built by a geologist who was long gone. The engineering team understood the inputs and outputs. That’s all they needed to know.
Years ago when I worked for an engineering consulting company we had to work with a similarly complex, opaque Excel spreadsheet from General Electric modeling the operation of a nuclear power plant in exacting detail.
Same deal there -- the original author was a genius and was the only person who knew how it was set up or how it worked.
I think you’re misunderstanding me. This might be something somewhat useful, I don’t know, and I’m not judging it based on that.
What I’m saying is that if you really believed we were 2, maybe 3 years tops from AGI or the singularity or whatever you would spend 0 effort serving what already seems to be a domain that is already served by 3rd parties that are already using your models! An excel wrapper for an LLM isn’t exactly cutting edge AI research.
They’re desperate to find something that someone will pay a meaningful amount of money for that even remotely justifies their valuation and continued investment.
I spotted a custom dialog in an Excel spreadsheet in a medical context the other day, I was horrified.
Sic
This. I work in Pharma. Excel and faxes.
A program that can do excel for you is almost AGI
Cool but now companies POs will be like "you must add the Excel export for all the user data!" and when asked why, will basically be "so I can do this roundabout query of data for some number in a spreadsheet using AI (instead of just putting the number or chart directly in the product with a simple db call)"
[flagged]
"Eschew flamebait. Avoid generic tangents."
https://news.ycombinator.com/newsguidelines.html
Okay. But then you could say the same for a human, isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
> isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
LLMs are not deterministic.
I'd argue over the short term humans are more deterministic. I ask a human the same question multiple times and I get the same answer. I ask an LLM and each answer could be very different depending on its "temperature".
If you ask human the same question repeatedly, you'll get different answers. I think that at third you'll get "I already answered that" etc.
We hardly react to things deterministically.
But I agree with the sentiment. It seems it is more important than ever to agree on what it means to understand something.
I'm having a bad day today. I'm 100% certain that today I'll react completely different to any tiny issue compared to how I did yesterday.
OK then. Groks?
I mean - try clicking the CoPilot button and see what it can actually do. Last I checked, it told me it couldn't change any of the actual data itself, but it could give you suggestions. Low bar for excellence here.