LLMs don't lack the virtue of laziness: it has it if you want it to, by just having a base prompt that matches intent. I've had good success convincing claude backed agents to aim for minimal code changes, make deduplication passes, and basically every other reasonable "instinct" of a very senior dev. It's not knowledge that the models haven't integrated, but one that many don't have on their forefront with default settings. I bet we've all seen the models that over-edit everything, and act like the crazy mid-level dev that fiddles with the entire codebase without caring one bit about anyone else's changes, or any risk of knowledge loss due to overfiddling.
And on Jess' comments on validating docs vs generating them... It's a traditional locking problem, with traditional solutions. And it's not as if the agent cannot read git, and realize when one thing is done first, in anticipation of the other by convention.
I'm quite senior: In fact, I have been a teammate of a couple of people mention in this article. I suspect that they'd not question my engineering standards. And yet I've no seen any of that kind of debt in my LLM workflows: if anything, by most traditional forms of evaluating software quality, the projects I work on are better than what they were 5, 10 years ago, using the same metrics as back then. And it's not magic or anything, but making sure there are agents running sharing those quality priorities. But I am getting work done, instead of spending time looking for attention in conferences.
> if anything, by most traditional forms of evaluating software quality, the projects I work on are better than what they were 5, 10 years ago, using the same metrics as back then.
In this side sentence you're introducing so much vagueness. Can you share insights to get some validation on your claim? What metrics are you using and how is your code from 10, 5, 0 years performing?
I feel throwing in a vague claim like that unnecessarily dilutes your message and distracts from the point. But, if you do have more to share I'd be curious to learn more.
Business logic is usually the most substantial part of legacy systems in my experience, so I imagine so.
Not to be too negative but a lot of modern software complexity is a prison of our own making, that we had time to build because our programs are actually pretty boring CRUD apps with little complex business logic.
I can only assume there's a ton of domain knowledge accrued over those years and beyond baked into the legacy code, that an LLM can just scoop up in a minute.
The anecdote the GP is providing there rings true for me too - although I'm not sure if I am going offer better detail.
I'm a proponent of architectural styles like MVC, SOLID, hexagonal architecture, etc, and in pre-LLM workflows, "human laziness" often led to technical debt: a developer might lazily leak domain logic into a controller or skip writing an interface just to save time.
The code I get the LLM to emit is a lot more compliant with those BUT there is a caveat that the LLMs do have a habit of "forgetting" the specific concerns of the given file/package/etc, and I frequently have to remind it.
The "metric" improvement isn't that the LLM is a better architect than a senior dev; it's that it reduces the cost of doing things the right way. The delta between "quick and dirty" and "cleanly architected" has shrunk to near zero, so the "clean" version becomes the path of least resistance.
I'm seeing less "temporary" kludges because the LLM almost blindly follows my requests
I don't think I'd like your code. But apparently there's enough implied YAGNI in my CLAUDE.md to prevent the unnecessary interfaces and layers of separation that you apparently like. So I guess there is a flavor for everyone.
I regularly prompt and re-prompt the clanker with esoteric terms like "subtractive changes", "create by removing" and more common phrases like "make the change easy, then make the easy change", "yagni", and "vertical slices", and "WET code is desirable".
It mostly works. CC's plan mode creates a plan by cleaning up first, then defining narrow, integrated steps. Mentioning "subtractive" and "yagni" appears to be a reliable enough way for an LLM to choose a minimal path.
To my mind these instructions remain incantations and I feel like an alchemist of old.
Was just listening to the Lenny’s Podcast interview with Simon Willison, who mentioned another such incantation: red/green TDD. The model knows what this means and it just does it, with a nice bump in code quality apparently.
I’m trying out another, what I call the principle of path independence. It’s the idea that the code should reflect only the current requirements, and not the order in which functionality was added — in other words, if you should decide to rebuild the system again from scratch tomorrow, the code should look broadly similar to its current state. It sort of works even though this isn’t a real thing that’s in its training data.
I often say to Claude "you're doing X when I want Y, how can I get you to follow the Y path without fail" and Claude will respond with "Edit my claude.md to include the following" which I then ask Claude to do.
Not sure this is a great idea. The model only internalized what it was trained on and writing prompts/context for itself isn't part of that. I try to keep my context as clean as possible, mostly today's models seem smart/aligned enough to be steered by a couple of keywords.
I see what Martin is saying here, but you could make that argument for moving up the abstraction layers at any point. Assembly to Python creates a lot of Intent & Cognitive debt by his definition, because you didn't think through how to manipulate the bits on the hardware, you just allowed the interpereter to do it.
My counter is that technical intent, in the way he is describing it, only exists because we needed to translate human intent into machine language. You can still think deeply about problems without needed to formulate them as domain driven abstractions in code. You could mind map it, or journal about it, or put post-it notes all over the wall. Creating object oriented abstractions isn't magic.
Translating your intent into a formal language is a tool of thought in itself. It’s by that process that you uncover the ambiguities, the aspects and details you didn’t consider, maybe even that the approach as a whole has to be reconsidered. While writing in natural language can also be a tool of thought, there is an essential element in aligning one’s thought process with a formal language that doesn’t allow for any vagueness or ambiguity.
It’s similar to how doing math in natural language without math notation is cumbersome and error-prone.
Agree: house architects have their language (architectural plans) to translate people needs in non ambiguous informations that will be useful for those who build the house. Musician uses musical notes, physician uses schemas to represent molecules, etc... And programmers use programming languages, when we write a line of code we don't hope that the compiler will understand what we write. Musical notes are a kind of abstraction: higher level than audio frequency but lower level than natural language. Same for programming language. Getting rid of all the formal languages take us back 2000 years ago.
Using a formal language also help to enter in a kind of flow. And then details you did not think about before using the formal language may appear. Everything cannot be prompted, just like Alex Honnold prepared his climbing of El Capitan very carefully but it's only when he was on the rock that he took the real decisions. Same for Lindbergh when he crossed the Atlantic. The map is not the territory.
So you need to find something better. In an article "How NASA writes 'perfect' software (1996) (fastcompany.com)" (comments on HN), the author explains that adding GPS support required 1500 pages of spec, and to avoid ambiguity the spec used pseudo code to describe expected features and behaviors.
If you invent a formal language that is easy to read and easy to write, it may look like Python... Then someone will probably write an interpreter.
We have many languages, senior people who know how to use them, who enjoy coding and who don't have a "lack of productivity" problem. I don't feel the need to throw away everything we have to embrace what is supposed to be "the future". And since we need good devs to read and LLM generated code how to remain a good dev if we don't write code anymore ? What's the point of being up to date in language x if we don't write code ? Remaining good at something without doing it is a mystery to me.
A formal language is executable. It might need some translation pass to be eventually executable on a particular system, but it is executable nevertheless.
The thing is that you paid that debt once. The mappings are well defined and deterministic.
The whole purpose of an abstractions is to not have to look underneath it to make sure what you did with the abstraction is still correct. You can make sure because you, or someone you trust, did the work of paying that debt once.
With LLMs you always need to verify the output, for every generation you need to pay that debt. So it is not an abstraction.
> you didn't think through how to manipulate the bits on the hardware, you just allowed the interpreter to do it
If you are thinking through deterministic code, you are thinking through the manipulation of bits in hardware. You are just doing it in a language which is easier for humans to understand.
> If you are thinking through deterministic code, you are thinking through the manipulation of bits in hardware.
No I'm not. If I want the machine to evaluate 2+2, I don't know or care what bits in hardware it uses to do that (as long as it doesn't run out of memory), I just want the result to come back as 4.
When you press the 2 button, the plus button, the 2 button and the equals button, you are translating your question into bits and operations which are logically guaranteed to yield bits that represent your answer.
When you think through what will happen as a result of deterministic code, you are also thinking through what the bits will do, albeit at a higher level of abstraction.
When you ask an LLM to do something, you have no guarantee that the intent you provide is accurately translated, and you have no guarantee you’ll get the result you want. If you want your answer to 2+2 to always be 4, you shouldn’t use a non deterministic LLM. To get that guarantee, the bit manipulation a machine does needs to be logically equivalent to the way you evaluate the question.
That doesn’t mean you can’t minimize intent distortion or cognitive debt while using LLMs, or that you can’t think through the logic of whatever problem you’re dealing with in the same structured way a formal language forces you to while using them. But one of my pet peeves is comparing LLMs to compilers. The nondeterminism of LLMs and lack of logical rigidity makes them fundamentally different.
I like the word intent, but Martin Fowler’s essay made me think more carefully about it. When Thomas Kuhn talked about paradigm shifts, “paradigm” ended up carrying more than twenty different meanings. In the same way, I think intent has recently become one of the most polluted and overused words in programming. My own toy language project uses the word intent, so I am not really in a position to criticize others too harshly.
Reading the Hacker News comments, I kept thinking that programming is fundamentally about building mental models, and that the market, in the end, buys my mental model.
If we start from human intent, the chain might look something like this:
human intent
-> problem model
-> abstraction
-> language expression
-> compilation
-> change in hadrware
But abstraction and language expression are themselves subdivided into many layers. How much of those layers a programmer can afford not to know has a direct effect on that programmer’s position in the market. People often think of abstraction as something clean, but in reality it is incomplete and contextual. In theory it is always clean; in practice it is always breaking down.
Depending on which layer you live in, even when using the same programming language, the form of expression can become radically different. From that point of view, people casually bundle everything together and call it “abstraction” or “intent,” but in reality there is a gap between intent and abstraction, and another gap between abstraction and language expression. Those subtle friction points are not fully reducible.
Seen from that perspective, even if you write a very clear specification, there will always be something that does not reduce neatly. And perhaps the real difference between LLMs and humans lies in how they deal with that residue.
Martin frames the issue in a way that suggests LLM abstractions are bad, but I do not fully agree. As someone from a third-world country in Asia, I have seen a great deal of bad abstraction written in my own language and environment. In that sense, I often feel that LLM-generated code is actually much better than the average abstractions produced by my Asian peers. At the same time, when I look at really good programming from strong Western engineers, I find myself asking again what a good abstraction actually is.
The essay talks about TDD and other methodologies, but personally I think TDD can become one of the worst methodologies when the abstraction itself is broken. If the abstraction is wrong, do the tests really mean anything? I have seen plenty of cases where people kept chasing green tests while gradually destroying the architecture. I have seen this especially in systems involving databases.
The biggest problem with methodology is that it always tends to become dogma, as if it were something that must be obeyed. SOLID principles, for example, do not always need to be followed, but in some organizations they become almost religious doctrine. In UI component design, enforcing LSP too rigidly can actually damage the diversity and flexibility of the UI. In the end, perhaps what we call intent is really the ability to remain flexible in context and search for the best possible solution within that context.
From that angle, intent begins to look a lot like the reward-function-based learning of LLMs.
You are right in that the code (or the formal model) alone isn’t sufficient, in that it doesn’t specify the context, requirements, design goals and design constraints. The formal and the informal level complement each other. But that’s also why it’s necessary to think at both levels when developing software. Withdrawing to just the informal level and letting LLMs handle the mapping to the formal level autonomously doesn’t work.
That being said, even model-based design (MBD) has largely been a failure, despite it being about mapping formal models to (formal-language) program code.
architecture is about the choices you will regret in this future if you get wrong today. You will regret not having testable code so tdd isn't bad - but that is not the whole storyand there are many things you will regret that tdd won't help with.
there is the famious bowling game tdd example where their result doesn't have a frame object and they argue they proved you don't need one. That is wrong though, the example took just a couple hours - there is nothing so bad in a a two hour program you will regret. If you were doing a real bowling system with pin setters, support for 50 lanes and a bunch of other things that I who don't work in that area don't even know about - you will find places to regret things.
in Tidy Code, Kent Beck explains that the main tradeoff os what we can get now vs what will be able to do later. A hacky decision can keep the company afloat, but can reduce the velocity to a snail pace in the future.
It’s easier to keep the balance by keeping everything simple and maintaining a good hygiene in your codebase.
> Assembly to Python creates a lot of Intent & Cognitive debt by his definition, because you didn't think through how to manipulate the bits on the hardware, you just allowed the interpereter to do it
I agree! You often see this realized when projects slowly migrate to using more and more ctypes code to try and back out of that pit.
In a previous job, a project was spun up using Python because it was easier and the performance requirements weren't understood at that time. A year or two later it had become a bottleneck for tapeout, and when it was rewritten most of the abstract architecture was thrown out with it, since it was all Pythonic in a way that required a different approach in C++
Unfortunately large parts of the paper that he linked to from the Wharton school is entirely AI generated, and yet to be peer reviewed.
I realize that most researchers use AI to assist with writing, but when the topic of your paper is "cognitive surrender", I struggle to take any content in there seriously.
I think Martin isn't wrong here, but I've first hand seen AI produce "lazy" code, where the answer was actually more code.
A concrete example, I had a set of python models that defined a database schema for a given set of logical concepts.
I added a new logical concept to the system, very analogous to the existing logical set. Claude decided that it should just re-use the existing model set, which worked in theory, but caused the consumers to have to do all sorts of gymnastics to do type inference at runtime. It "worked", but it was definitely the wrong layer of abstraction.
Is more code really bad? For humans, yes we want thing abstracted, but sometimes it may make more sense to actually repeat yourself. If a machine is writing and maintaining the code, do we need that extra layer now?
In the olden days we used Duff's devices and manually unrolled loops with duplicated code that we wrote ourselves.
Now, the compiler is "smart" enough to understand your intent and actually generates repeated assembly code that is duplicated. You don't care that it's duplicated because the compiler is doing it for you.
I've had some projects recently where I was using an LLM where I needed a few snippets of non-trivial computational geometry. In the old days, I'd have to go search for a library and get permission from compliance to import the library and then I'd have to convert my domain representations of stuff into the formats that library needed. All of that would have been cheaper than me writing the code myself, but it was non-trivial.
Now the LLM can write for me only the stuff I need (no extra big library to import) and it will use the data in the format I stored it in (no needing to translate data structures). The canon says the "right" way to do it would be to have a geometry library to prevent repeated code, but here I have a self contained function that "just works".
This kind of thinking only works as long as the machine can actually fix its own errors.
I've had several bugs that required manual intervention (yes, even with $YOUR_FAVORITE_MODEL -- I've tried them all at this point). After the first few sessions of deleting countless lines of pointless cruft, I quickly learned the benefits of preemptively trimming down the code by hand.
We have confidence in the extra code a compiler generates because it’s deterministic. We don’t have that in LLMs, neither those that wrote nor read the code.
I'm making experimental tooling that automates the boring parts around those transitions, while keeping humans focused on validating that intent survived each step.
I like the "between artifacts" framing. One layer I'd add is proxies/metrics. In a lot of analytics-heavy systems, the real loss isn't spec -> code, it's question -> proxy. Once the proxy gets baked into acceptance criteria, dashboards, or evals, people optimize that and gradually forget it was only a proxy.
I'm kidding. But yes, I explicitly didn't model it yet. The bigger vision is there's a reason for Spec to exist, right?
And that would be Outcome.
> "We observed that users share 100+ characters long links too often and they are frustrated when it doesn't work / crop / browser address bar limitations"
So the outcome is: "Users no longer have to worry about long URLs". And then you have idea, a spec: "what if we let them create and use short URLs for sharing?" -> URL shortener app.
And yes, this ERD is easily expandable. I'd rather not add more fields but keep the "core" schema short and nice.
Things like outcome, observations, analytics, they can be simply extra tables linking to Spec, ACs, etc. Jira tickets, Datadog dashboards, Tableau analytics, whatever makes sense to teams. And it doesn't require you to setup a postgres instance. MVP would run on sqlite3.
I also seen a lot of effort trying to link different systems together specifically for simpler context access for agents. "RAG enterprise intelligent search" it is.
What's concerning to me is that even Sourcegraph haven't thought about what I'm thinking since 2015: linking specs to code directly, via SCIP. I should be able to press a button "find specs", in addition to "find references" and "find implementations". And I strongly believe they are sitting on a gold mine right now.
From my experience, it all comes down to code, and so code was the first-class artifact for a long time. Up until I realized that code is only a lossy representation of the spec artifacts. And if nobody ever records spec as an artifact...
What I'm saying is that the pain is real, I've been here for a long enough time. And I should be able to at least use something like this even if the industry doesn't want to.
I completely agree. This is one of the most annoying things about LLMs. I always see them fixing linter errors by adding ignore comments, typing many things as "Any", duplicating test fixtures instead of extracting them, sometimes deleting tests they don't like, etc.
My most recent Claude Code fix consisted of one line: calling `third_party_lib._connect()`. It reaches into the internals of an external library. The fix worked, but it is improper to depend on the specific implementation. The correct fix was about 20 lines.
(Tangentially, this is why I think LLMs are more useful for senior developers because junior developers tend to not have a sense for what's good quality and accept whatever works.)
I've seen them adding "ignore" for linter errors I don't even catch in my CI. For example, adding Pylint stuff when I don't even use Pylint. This isn't laziness. It's just parrot-fashion regurgitation of code, which is by design for LLMs.
> ...to develop the powerful abstractions that then allow us to do much more, much more easily. Of course, the implicit wink here is that it takes a lot of work to be lazy
This lines up with YAGNI, but most people believe the opposite, often using YAGNI to justify NOT building the necessary abstractions.
The counter-argument is that people build abstractions they deem necessary but aren't, and then they're married to that premature architecture quite often. That's what YAGNI is there to advise against.
I don't think what Fowler says here is in favor of saddling the early versions of your system with abstractions before you actually seen its use in practice, and its needs over time as requirements and conditions change.
From this "Laziness drives us to make the system as simple as possible (but no simpler!) — to develop the powerful abstractions that then allow us to do much more, much more easily." it's clear that when he talks of abstractions he means of very basic, and as simple as possible, building blocks. Like having core, orthogonal, principles in the system.
Not the kind of piling of software and pattern design abstractions e.g. the Java land in the past used to build.
How does it --if it does-- relate to your idea of "affordability porn" :)
That line (between your other values?) was uproarious; I apologise for not u*voting it, partially because I couldn't vocalise my peculiar fetish att (+ "gnarliness-pornstar" doesn't sound nearly as enticing as "AI-affordability-pornstar" X)
This isn't one article, it's a "fragments" post with five separate small thoughts. They happen to all be about LLMs so I can see why it would read as a single article, but it's not.
the framing as "debt" is fair but in our case the bigger pain isn't lazy code, it's overzealous code. claude will happily refactor three unrelated files because it spotted a "pattern". we've ended up with a CLAUDE.md that's basically a list of "do not touch unless asked". probably says more about us than the model but yeah.
LLMs don't lack the virtue of laziness: it has it if you want it to, by just having a base prompt that matches intent. I've had good success convincing claude backed agents to aim for minimal code changes, make deduplication passes, and basically every other reasonable "instinct" of a very senior dev. It's not knowledge that the models haven't integrated, but one that many don't have on their forefront with default settings. I bet we've all seen the models that over-edit everything, and act like the crazy mid-level dev that fiddles with the entire codebase without caring one bit about anyone else's changes, or any risk of knowledge loss due to overfiddling.
And on Jess' comments on validating docs vs generating them... It's a traditional locking problem, with traditional solutions. And it's not as if the agent cannot read git, and realize when one thing is done first, in anticipation of the other by convention.
I'm quite senior: In fact, I have been a teammate of a couple of people mention in this article. I suspect that they'd not question my engineering standards. And yet I've no seen any of that kind of debt in my LLM workflows: if anything, by most traditional forms of evaluating software quality, the projects I work on are better than what they were 5, 10 years ago, using the same metrics as back then. And it's not magic or anything, but making sure there are agents running sharing those quality priorities. But I am getting work done, instead of spending time looking for attention in conferences.
I agree with your sentiment here. However:
> if anything, by most traditional forms of evaluating software quality, the projects I work on are better than what they were 5, 10 years ago, using the same metrics as back then.
In this side sentence you're introducing so much vagueness. Can you share insights to get some validation on your claim? What metrics are you using and how is your code from 10, 5, 0 years performing?
I feel throwing in a vague claim like that unnecessarily dilutes your message and distracts from the point. But, if you do have more to share I'd be curious to learn more.
My anecdotes for using LLMs to modernize legacy (20-year-old systems):
- 40x speed improvement
- Painless env setup
- 20 Second deploy
- 90+% test coverage
- Ability to quickly refactor
- Documentation
(The original system that I wrote with one other programmer 20 years ago took 1.5+ years to write. Modern rewrite: 2 days)
Presumably the 1.5 years for the first version involved work other than coding that the LLM rewrite didn’t entail?
Business logic is usually the most substantial part of legacy systems in my experience, so I imagine so.
Not to be too negative but a lot of modern software complexity is a prison of our own making, that we had time to build because our programs are actually pretty boring CRUD apps with little complex business logic.
I can only assume there's a ton of domain knowledge accrued over those years and beyond baked into the legacy code, that an LLM can just scoop up in a minute.
Not the poster you replied to but I’m sure it did. But still manual rewrite under the same constraints would be much less feasible.
Yep, coders do more than just code.
The anecdote the GP is providing there rings true for me too - although I'm not sure if I am going offer better detail.
I'm a proponent of architectural styles like MVC, SOLID, hexagonal architecture, etc, and in pre-LLM workflows, "human laziness" often led to technical debt: a developer might lazily leak domain logic into a controller or skip writing an interface just to save time.
The code I get the LLM to emit is a lot more compliant with those BUT there is a caveat that the LLMs do have a habit of "forgetting" the specific concerns of the given file/package/etc, and I frequently have to remind it.
The "metric" improvement isn't that the LLM is a better architect than a senior dev; it's that it reduces the cost of doing things the right way. The delta between "quick and dirty" and "cleanly architected" has shrunk to near zero, so the "clean" version becomes the path of least resistance.
I'm seeing less "temporary" kludges because the LLM almost blindly follows my requests
I don't think I'd like your code. But apparently there's enough implied YAGNI in my CLAUDE.md to prevent the unnecessary interfaces and layers of separation that you apparently like. So I guess there is a flavor for everyone.
Mind sharing the instructions you give Claude to go for minimal code changes etc?
I regularly prompt and re-prompt the clanker with esoteric terms like "subtractive changes", "create by removing" and more common phrases like "make the change easy, then make the easy change", "yagni", and "vertical slices", and "WET code is desirable".
It mostly works. CC's plan mode creates a plan by cleaning up first, then defining narrow, integrated steps. Mentioning "subtractive" and "yagni" appears to be a reliable enough way for an LLM to choose a minimal path.
To my mind these instructions remain incantations and I feel like an alchemist of old.
Was just listening to the Lenny’s Podcast interview with Simon Willison, who mentioned another such incantation: red/green TDD. The model knows what this means and it just does it, with a nice bump in code quality apparently.
I’m trying out another, what I call the principle of path independence. It’s the idea that the code should reflect only the current requirements, and not the order in which functionality was added — in other words, if you should decide to rebuild the system again from scratch tomorrow, the code should look broadly similar to its current state. It sort of works even though this isn’t a real thing that’s in its training data.
"Claude, write maintainable code that builds me generational wealth. Make no mistakes."
works for me.
I often say to Claude "you're doing X when I want Y, how can I get you to follow the Y path without fail" and Claude will respond with "Edit my claude.md to include the following" which I then ask Claude to do.
Not sure this is a great idea. The model only internalized what it was trained on and writing prompts/context for itself isn't part of that. I try to keep my context as clean as possible, mostly today's models seem smart/aligned enough to be steered by a couple of keywords.
Ah yea I do that too. I often have reflection sessions with Claude where I ask it "how can I make sure you do behavior X so we get outcome Y?"
It works relatively well but not always.
You should go get attention at conferences. You could write a book called
Practical LLM Coding
Do you think O'Reilly would still put an animal on the cover?
Sure, K9 - https://cms.doctorwho.tv/sites/default/files/2022-02/2et2o00...
A mockingbird?
I see what Martin is saying here, but you could make that argument for moving up the abstraction layers at any point. Assembly to Python creates a lot of Intent & Cognitive debt by his definition, because you didn't think through how to manipulate the bits on the hardware, you just allowed the interpereter to do it.
My counter is that technical intent, in the way he is describing it, only exists because we needed to translate human intent into machine language. You can still think deeply about problems without needed to formulate them as domain driven abstractions in code. You could mind map it, or journal about it, or put post-it notes all over the wall. Creating object oriented abstractions isn't magic.
Translating your intent into a formal language is a tool of thought in itself. It’s by that process that you uncover the ambiguities, the aspects and details you didn’t consider, maybe even that the approach as a whole has to be reconsidered. While writing in natural language can also be a tool of thought, there is an essential element in aligning one’s thought process with a formal language that doesn’t allow for any vagueness or ambiguity.
It’s similar to how doing math in natural language without math notation is cumbersome and error-prone.
Agree: house architects have their language (architectural plans) to translate people needs in non ambiguous informations that will be useful for those who build the house. Musician uses musical notes, physician uses schemas to represent molecules, etc... And programmers use programming languages, when we write a line of code we don't hope that the compiler will understand what we write. Musical notes are a kind of abstraction: higher level than audio frequency but lower level than natural language. Same for programming language. Getting rid of all the formal languages take us back 2000 years ago.
Using a formal language also help to enter in a kind of flow. And then details you did not think about before using the formal language may appear. Everything cannot be prompted, just like Alex Honnold prepared his climbing of El Capitan very carefully but it's only when he was on the rock that he took the real decisions. Same for Lindbergh when he crossed the Atlantic. The map is not the territory.
I agree, but that formal language doesn't need to be executable code.
So you need to find something better. In an article "How NASA writes 'perfect' software (1996) (fastcompany.com)" (comments on HN), the author explains that adding GPS support required 1500 pages of spec, and to avoid ambiguity the spec used pseudo code to describe expected features and behaviors.
If you invent a formal language that is easy to read and easy to write, it may look like Python... Then someone will probably write an interpreter.
We have many languages, senior people who know how to use them, who enjoy coding and who don't have a "lack of productivity" problem. I don't feel the need to throw away everything we have to embrace what is supposed to be "the future". And since we need good devs to read and LLM generated code how to remain a good dev if we don't write code anymore ? What's the point of being up to date in language x if we don't write code ? Remaining good at something without doing it is a mystery to me.
A formal language is executable. It might need some translation pass to be eventually executable on a particular system, but it is executable nevertheless.
"A sufficiently detailed specification is code"
The thing is that you paid that debt once. The mappings are well defined and deterministic.
The whole purpose of an abstractions is to not have to look underneath it to make sure what you did with the abstraction is still correct. You can make sure because you, or someone you trust, did the work of paying that debt once.
With LLMs you always need to verify the output, for every generation you need to pay that debt. So it is not an abstraction.
> you didn't think through how to manipulate the bits on the hardware, you just allowed the interpreter to do it
If you are thinking through deterministic code, you are thinking through the manipulation of bits in hardware. You are just doing it in a language which is easier for humans to understand.
There is a direct mapping of intent.
> If you are thinking through deterministic code, you are thinking through the manipulation of bits in hardware.
No I'm not. If I want the machine to evaluate 2+2, I don't know or care what bits in hardware it uses to do that (as long as it doesn't run out of memory), I just want the result to come back as 4.
When you press the 2 button, the plus button, the 2 button and the equals button, you are translating your question into bits and operations which are logically guaranteed to yield bits that represent your answer.
When you think through what will happen as a result of deterministic code, you are also thinking through what the bits will do, albeit at a higher level of abstraction.
When you ask an LLM to do something, you have no guarantee that the intent you provide is accurately translated, and you have no guarantee you’ll get the result you want. If you want your answer to 2+2 to always be 4, you shouldn’t use a non deterministic LLM. To get that guarantee, the bit manipulation a machine does needs to be logically equivalent to the way you evaluate the question.
That doesn’t mean you can’t minimize intent distortion or cognitive debt while using LLMs, or that you can’t think through the logic of whatever problem you’re dealing with in the same structured way a formal language forces you to while using them. But one of my pet peeves is comparing LLMs to compilers. The nondeterminism of LLMs and lack of logical rigidity makes them fundamentally different.
Most programmers who write reasonably deterministic code don't even know how many bits it allocates.
> you didn't think through how to manipulate the bits on the hardware, you just allowed the interpreter to do it
The interpreter is deterministic but LLMs aren't.
AI is not an abstraction layer.
I like the word intent, but Martin Fowler’s essay made me think more carefully about it. When Thomas Kuhn talked about paradigm shifts, “paradigm” ended up carrying more than twenty different meanings. In the same way, I think intent has recently become one of the most polluted and overused words in programming. My own toy language project uses the word intent, so I am not really in a position to criticize others too harshly.
Reading the Hacker News comments, I kept thinking that programming is fundamentally about building mental models, and that the market, in the end, buys my mental model.
If we start from human intent, the chain might look something like this:
human intent -> problem model -> abstraction -> language expression -> compilation -> change in hadrware
But abstraction and language expression are themselves subdivided into many layers. How much of those layers a programmer can afford not to know has a direct effect on that programmer’s position in the market. People often think of abstraction as something clean, but in reality it is incomplete and contextual. In theory it is always clean; in practice it is always breaking down.
Depending on which layer you live in, even when using the same programming language, the form of expression can become radically different. From that point of view, people casually bundle everything together and call it “abstraction” or “intent,” but in reality there is a gap between intent and abstraction, and another gap between abstraction and language expression. Those subtle friction points are not fully reducible.
Seen from that perspective, even if you write a very clear specification, there will always be something that does not reduce neatly. And perhaps the real difference between LLMs and humans lies in how they deal with that residue.
Martin frames the issue in a way that suggests LLM abstractions are bad, but I do not fully agree. As someone from a third-world country in Asia, I have seen a great deal of bad abstraction written in my own language and environment. In that sense, I often feel that LLM-generated code is actually much better than the average abstractions produced by my Asian peers. At the same time, when I look at really good programming from strong Western engineers, I find myself asking again what a good abstraction actually is.
The essay talks about TDD and other methodologies, but personally I think TDD can become one of the worst methodologies when the abstraction itself is broken. If the abstraction is wrong, do the tests really mean anything? I have seen plenty of cases where people kept chasing green tests while gradually destroying the architecture. I have seen this especially in systems involving databases.
The biggest problem with methodology is that it always tends to become dogma, as if it were something that must be obeyed. SOLID principles, for example, do not always need to be followed, but in some organizations they become almost religious doctrine. In UI component design, enforcing LSP too rigidly can actually damage the diversity and flexibility of the UI. In the end, perhaps what we call intent is really the ability to remain flexible in context and search for the best possible solution within that context.
From that angle, intent begins to look a lot like the reward-function-based learning of LLMs.
You are right in that the code (or the formal model) alone isn’t sufficient, in that it doesn’t specify the context, requirements, design goals and design constraints. The formal and the informal level complement each other. But that’s also why it’s necessary to think at both levels when developing software. Withdrawing to just the informal level and letting LLMs handle the mapping to the formal level autonomously doesn’t work.
That being said, even model-based design (MBD) has largely been a failure, despite it being about mapping formal models to (formal-language) program code.
architecture is about the choices you will regret in this future if you get wrong today. You will regret not having testable code so tdd isn't bad - but that is not the whole storyand there are many things you will regret that tdd won't help with.
there is the famious bowling game tdd example where their result doesn't have a frame object and they argue they proved you don't need one. That is wrong though, the example took just a couple hours - there is nothing so bad in a a two hour program you will regret. If you were doing a real bowling system with pin setters, support for 50 lanes and a bunch of other things that I who don't work in that area don't even know about - you will find places to regret things.
I don't deny that TDD is generally useful. I like it as well.
What I meant is that, like any powerful tool, there are situations where it shouldn't be used.
Thanks for the thoughtful comment.
in Tidy Code, Kent Beck explains that the main tradeoff os what we can get now vs what will be able to do later. A hacky decision can keep the company afloat, but can reduce the velocity to a snail pace in the future.
It’s easier to keep the balance by keeping everything simple and maintaining a good hygiene in your codebase.
> Assembly to Python creates a lot of Intent & Cognitive debt by his definition, because you didn't think through how to manipulate the bits on the hardware, you just allowed the interpereter to do it
I agree! You often see this realized when projects slowly migrate to using more and more ctypes code to try and back out of that pit.
In a previous job, a project was spun up using Python because it was easier and the performance requirements weren't understood at that time. A year or two later it had become a bottleneck for tapeout, and when it was rewritten most of the abstract architecture was thrown out with it, since it was all Pythonic in a way that required a different approach in C++
Unfortunately large parts of the paper that he linked to from the Wharton school is entirely AI generated, and yet to be peer reviewed.
I realize that most researchers use AI to assist with writing, but when the topic of your paper is "cognitive surrender", I struggle to take any content in there seriously.
It uses "not merely" 7 times! I wonder if an LLM would repeat a phrase that often. It could be the author starting to write like LLMs, instead.
Good thing I didn't read it but used an LLM to summarize it for me!
I didn't even read a summary, I just used an LLM to write inane comments on hacker news for me.
> I realize that most researchers use AI to assist with writing
This is disgusting
Unfortunately this is the new reality, it’s everywhere.
I think Martin isn't wrong here, but I've first hand seen AI produce "lazy" code, where the answer was actually more code.
A concrete example, I had a set of python models that defined a database schema for a given set of logical concepts.
I added a new logical concept to the system, very analogous to the existing logical set. Claude decided that it should just re-use the existing model set, which worked in theory, but caused the consumers to have to do all sorts of gymnastics to do type inference at runtime. It "worked", but it was definitely the wrong layer of abstraction.
Is more code really bad? For humans, yes we want thing abstracted, but sometimes it may make more sense to actually repeat yourself. If a machine is writing and maintaining the code, do we need that extra layer now?
In the olden days we used Duff's devices and manually unrolled loops with duplicated code that we wrote ourselves.
Now, the compiler is "smart" enough to understand your intent and actually generates repeated assembly code that is duplicated. You don't care that it's duplicated because the compiler is doing it for you.
I've had some projects recently where I was using an LLM where I needed a few snippets of non-trivial computational geometry. In the old days, I'd have to go search for a library and get permission from compliance to import the library and then I'd have to convert my domain representations of stuff into the formats that library needed. All of that would have been cheaper than me writing the code myself, but it was non-trivial.
Now the LLM can write for me only the stuff I need (no extra big library to import) and it will use the data in the format I stored it in (no needing to translate data structures). The canon says the "right" way to do it would be to have a geometry library to prevent repeated code, but here I have a self contained function that "just works".
This kind of thinking only works as long as the machine can actually fix its own errors.
I've had several bugs that required manual intervention (yes, even with $YOUR_FAVORITE_MODEL -- I've tried them all at this point). After the first few sessions of deleting countless lines of pointless cruft, I quickly learned the benefits of preemptively trimming down the code by hand.
But someone needs to review and maintain the computational geometry code; or edit it to use a novel algorithm/optimize it.
And even if they didn't every line of extra code without sufficient abstraction adds cognitive overload.
We have confidence in the extra code a compiler generates because it’s deterministic. We don’t have that in LLMs, neither those that wrote nor read the code.
This is my current visualization of the problem: https://excalidraw.com/#json=y1fSSx2z8-0nFs7CDnqhp,d9Di8JdGU...
I think the "cognitive bottlenecks" in software engineering live between artifacts, where code is simply one of them.
outcome → requirements → spec → acceptance criteria → executable proof → review
I'm making experimental tooling that automates the boring parts around those transitions, while keeping humans focused on validating that intent survived each step.
I like the "between artifacts" framing. One layer I'd add is proxies/metrics. In a lot of analytics-heavy systems, the real loss isn't spec -> code, it's question -> proxy. Once the proxy gets baked into acceptance criteria, dashboards, or evals, people optimize that and gradually forget it was only a proxy.
You're absolutely right! (c)
I'm kidding. But yes, I explicitly didn't model it yet. The bigger vision is there's a reason for Spec to exist, right?
And that would be Outcome.
> "We observed that users share 100+ characters long links too often and they are frustrated when it doesn't work / crop / browser address bar limitations"
So the outcome is: "Users no longer have to worry about long URLs". And then you have idea, a spec: "what if we let them create and use short URLs for sharing?" -> URL shortener app.
And yes, this ERD is easily expandable. I'd rather not add more fields but keep the "core" schema short and nice.
Things like outcome, observations, analytics, they can be simply extra tables linking to Spec, ACs, etc. Jira tickets, Datadog dashboards, Tableau analytics, whatever makes sense to teams. And it doesn't require you to setup a postgres instance. MVP would run on sqlite3.
I also seen a lot of effort trying to link different systems together specifically for simpler context access for agents. "RAG enterprise intelligent search" it is.
What's concerning to me is that even Sourcegraph haven't thought about what I'm thinking since 2015: linking specs to code directly, via SCIP. I should be able to press a button "find specs", in addition to "find references" and "find implementations". And I strongly believe they are sitting on a gold mine right now.
From my experience, it all comes down to code, and so code was the first-class artifact for a long time. Up until I realized that code is only a lossy representation of the spec artifacts. And if nobody ever records spec as an artifact...
What I'm saying is that the pain is real, I've been here for a long enough time. And I should be able to at least use something like this even if the industry doesn't want to.
cool dataviz but it's editable! trying to pan and zoom and scoll on my phone led to moving elements around on the canvas
My bad. Well, it's Excalidraw, and you can download png / svg from it.
Here's image you can open on the phone https://pbs.twimg.com/media/HGjHvSsWIAAkhHL?format=jpg&name=...
I also did a post explaining reasoning behind this diagram: https://x.com/br11k_dev/status/2047105958451507268
But I'll make a proper post on HN once I have all ingredients ready!
- Minimal CLI tooling
- Jupyter Lab you can go through step by step, on example greenfield project (URL shortener app)
- Blog post on what I've been doing for last 2 months
Activate the “Hand” tool to avoid that.
Intent debt is a useful framing. A few comments explaining "why" instead of "what" would have saved hours of guessing.
> The problem is that LLMs inherently lack the virtue of laziness.
I assure you, they do not.
I completely agree. This is one of the most annoying things about LLMs. I always see them fixing linter errors by adding ignore comments, typing many things as "Any", duplicating test fixtures instead of extracting them, sometimes deleting tests they don't like, etc.
My most recent Claude Code fix consisted of one line: calling `third_party_lib._connect()`. It reaches into the internals of an external library. The fix worked, but it is improper to depend on the specific implementation. The correct fix was about 20 lines.
(Tangentially, this is why I think LLMs are more useful for senior developers because junior developers tend to not have a sense for what's good quality and accept whatever works.)
I've seen them adding "ignore" for linter errors I don't even catch in my CI. For example, adding Pylint stuff when I don't even use Pylint. This isn't laziness. It's just parrot-fashion regurgitation of code, which is by design for LLMs.
> ...to develop the powerful abstractions that then allow us to do much more, much more easily. Of course, the implicit wink here is that it takes a lot of work to be lazy
This lines up with YAGNI, but most people believe the opposite, often using YAGNI to justify NOT building the necessary abstractions.
The counter-argument is that people build abstractions they deem necessary but aren't, and then they're married to that premature architecture quite often. That's what YAGNI is there to advise against.
I don't think what Fowler says here is in favor of saddling the early versions of your system with abstractions before you actually seen its use in practice, and its needs over time as requirements and conditions change.
From this "Laziness drives us to make the system as simple as possible (but no simpler!) — to develop the powerful abstractions that then allow us to do much more, much more easily." it's clear that when he talks of abstractions he means of very basic, and as simple as possible, building blocks. Like having core, orthogonal, principles in the system.
Not the kind of piling of software and pattern design abstractions e.g. the Java land in the past used to build.
Hits the spot for me. I am always pushing back on AI to simplify and improve concision.
How does it --if it does-- relate to your idea of "affordability porn" :)
That line (between your other values?) was uproarious; I apologise for not u*voting it, partially because I couldn't vocalise my peculiar fetish att (+ "gnarliness-pornstar" doesn't sound nearly as enticing as "AI-affordability-pornstar" X)
Where's the other half of the article? What an abrupt ending...
This isn't one article, it's a "fragments" post with five separate small thoughts. They happen to all be about LLMs so I can see why it would read as a single article, but it's not.
There is a paper linked which is describing the whole thing!
the framing as "debt" is fair but in our case the bigger pain isn't lazy code, it's overzealous code. claude will happily refactor three unrelated files because it spotted a "pattern". we've ended up with a CLAUDE.md that's basically a list of "do not touch unless asked". probably says more about us than the model but yeah.
I heard that LLMs imitate humans. Let's add laziness, impatience, and arrogance—the virtues of programmers—to AGENTS.md and improve it.
Wrong link. Technical, Cognitive and Intent Debt was discussed here: https://martinfowler.com/fragments/2026-04-02.html
Thanks! we updated the link.