Author here: To be honest, I know there are like a bajillion Claude code posts out there these days.
But, there are a few nuggets we figured are worth sharing, like Anchor Comments [1], which have really made a difference:
——
# CLAUDE.md
### Anchor comments
Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for.
- Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` as prefix as appropriate.
- *Important:* Before scanning files, always first try to grep for existing `AIDEV-…`.
- Update relevant anchors, after finishing any task.
- Make sure to add relevant anchor comments, whenever a file or piece of code is:
* too complex, or
* very important, or
* could have a bug
Great post. I'm fairly new to the AI pair programming thing (I've been using Aider), but with 20 years of coding behind me I can see where things are going. You're dead right in the conclusion about now being the time to adopt this stuff as part of your flow -- if you haven't already.
And regarding the HN post getting buried for a while there...[1] Somewhat ironic that an article about using AI to help write code would get canned for using an AI to help write it :D
Just to provide a contrast to some of the negative comments…
As a very experienced engineer who uses LLMs sporadically* and not in any systematic way, I really appreciated seeing how you use them in production in a real project. I don’t know why people are being negative, you just mentioned your project in details where it was appropriate to talk about the structure of it. Doesn’t strike me as gratuitous self promotion at all.
Your post is giving me motivation to empower the LLMs a little bit more in my workflows.
*: They absolutely don’t get the keys to my projects but I have had great success with having them complete specific tasks.
I meant though in the wider context of the team - everyone uses it but not everyone will work the same, use the same underlying prompts as they work. So how do you ensure everyone keeps to that agreement?
I’d say around ~40% me, the ideating, editing, citations, and images are all mine; rest Opus 4 :)
I typically try to also include the original Claude chat’s link in the post but it seems like Claude doesn’t allow sharing chats with deep research used in them.
Thanks for being transparent about this, but we’re not wanting substantially LLM-generated content on HN.
We’ve been asking the community to refrain from publicly accusing authors of posting LLM-generated articles and comments. But the other side of that is that we expect authors to post content thay they’ve created themselves.
It’s one thing to use an LLM for proof-reading and editing suggestions, but quite another for “40%” of an article to be LLM-generated. For that reason I’m having to bury the post.
Shouldn’t the quality of the content be what matters? Avoiding articles with low grade effort or genuine content either made with or without LLMs would seem to be a better goal.
I completely understand. Just to clarify, when I said it was ~40%, I didn’t mean the content was written by Claude/ChatGPT but that I took its help in deep research and writing the first drafts. The ideas, all of the code examples, the original CLAUDE.md files, the images, citations, etc are all mine.
Ok, sure, these things are hard to quantify. The main issue is that we can't ask the community to refrain from accusing authors of publishing AI-generated content if people really are publishing content that is obviously AI-generated. What matters to us is not how much AI was used to write an article, but rather how much the audience finds that the article satisfies intellectual curiosity. If the audience can sense that the article is generated, they lose trust in the content and the author, and also lose trust in HN as a place they can visit to find high-quality content.
Edit: On reflection, given your explanation of your use of AI and given another comment [1] I replied to below, I don't think this post is disqualified after all.
Surely you're missing the wood for the trees here - isn't the point of asking for no 'AI' to avoid low effort slop? This is a relatively high value post about adopting new practices and the human-LLM integration.
Tag it, let users decide how they want to vote.
Aside: meta: If you're speaking on behalf of HN you should indicate that in the post (really with a marker outside of the comment).
Indeed, and since the author has clarified what they meant by "40%", I've put the post back on the front page. Another relevant factor is they seem not speak English as a first language, and I think we can make allowances for such people to use LLMs to polish their writing.
Regarding your other suggestion: it's been the case ever since HN started 18 years ago that moderators/modcomments don't have any special designation. This is due to our preference for simple design and an aversion to seeming separate from the community. We trust people to work it out and that has always worked well.
thanks. to be clear, I'm not asking the q to be particularly negative about it. Its more just curiosity, mixed with trade in effort. If you wrote it 100%, I'm more inclined to read the whole thing. vs say now just feeding it back to the GPM to extract the condensed nuggets.
Very interesting, I'm going to use some of these ideas in my CLAUDE.md file.
> One of the most counterintuitive lessons in AI-assisted development is that being stingy with context to save tokens actually costs you more
Something similar I've been thinking about recently: For bigger projects & more complicated code, I really do notice a big difference between Claude Opus and Claude Sonnet. And Sonnet sometimes just wastes so much time on ideas that never pan out, or make things worse.
So I wonder: wouldn't it make more sense for Anthropic to not differentiate between Opus and Sonnet for people with a Max subscription? It seems like Sonnet takes 10-20 turns what Opus can do in 2 or 3, so in the end forcing people over to Sonnet would ultimately cost them more.
One of the exciting things to me about the ai agents is how they push and allow you to build processes that we’ve always known were important but were frequently not prioritized in the face of shipping the system.
You can use how uncomfortable you are with the ai doing something as a signal that you need to invest in systematic verification of that something. As a for instance in the link, the team could build a system for verifying and validating their data migrations. That would move a whole class of changes into the ai relm.
This is usually much easier to quantify and explain externally than nebulous talk about tech debt in that system.
For sure. Another interesting trick I found to be surprisingly effective is to ask Claude Code to “Look around the codebase, and if something is confusing, or weird/counterintuitive — drop a AIDEV-QUESTION: … comment so I can document that bit of code and/or improve it”. We found some really gnarly things that had been forgotten in the codebase.
Agreed, my hunch is that you might use higher abstraction-level validation tools like acceptance and property tests, or even formal verification, as the relative cost of boilerplate decreases.
- Is there a more elegant way to organize the prompts/specifications for LLMs in a codebase? I feel like CLAUDE.md, SPEC.mds, and AIDEV comments would get messy quickly.
- What is the definition of "vibe-coding" these days? I thought it refers to the original Karpathy quote, like cowboy mode, where you accept all diffs and hardly look at code. But now it seems that "vibe-coding" is catch-all clickbait for any LLM workflow. (Tbf, this title "shipping real code with Claude" is fine)
- Do you obfuscate any code before sending it to someone's LLM?
> - Is there a more elegant way to organize the prompts/specifications for LLMs in a codebase? I feel like CLAUDE.md, SPEC.mds, and AIDEV comments would get messy quickly.
Yeah, the comments do start to pile up. I’m working on a vscode extension that automatically turns them into tiny visual indicators in the gutter instead.
> - What is the definition of "vibe-coding" these days? I thought it refers to the original Karpathy quote, like cowboy mode, where you accept all diffs and hardly look at code. But now it seems that "vibe-coding" is catch-all clickbait for any LLM workflow. (Tbf, this title "shipping real code with Claude" is fine)
Depends on who you ask ig. For me, hasn’t been a panacea, and I’ve often run into issues (3.7 sonnet and codex have had ~60% success for me but Opus 4 is actually v good)
> - Do you obfuscate any code before sending it to someone's LLM?
In this case, all of it was open source to begin with but good point to think about.
I finally decided few days ago to try this Claude Code thing in my personal project. It's depressingly efficient. And damn expensive - I used over 10 dollars in one day.
But I'm afraid it is inevitable - I will have to pay tax to AI overlords just to be able to keep my job.
I was looking at $2,000 a year and climbing, before Anthropic announce $100 and $200 Max subscriptions that bundled Claude Console and Claude Code. There are limits per five hour windows, but one can toggle back to metered API with the login/ command, or just walk the dog. $100 a month has done me fine.
I had been musing over this. Will devs in very cheap countries still stay an attractive option, just because they'd be still cheaper monthly than Claude.
Lot of visual noise because of model specific comments. Or maybe that's just the examples here.
But as a human, I do like the CLAUDE.md file. It's like documentation for dev reasoning and choices. I like that.
Is this faster than old style codebases but with developers having the LLM chat open as they work? Seems like this ups the learning curve. The code here doesn't look very approachable.
I think most of this is good stuff but I disagree with not letting Claude touch tests or migrations at all. Handing writing tests from scratch is the part I hate the most. Having an LLM do a first pass on tests which I add to and adjust as I see fit has been a big boon on the testing front. It seems the difference between me and the author is I believe whether code was generated by an LLM or not the human still takes ownership and responsibility. Not letting Claude touch tests and migrations is saying you rightfully dont trust Claude but are giving ownership to Claude for Claude generated code. That or he doesn't trust his employees to not blindly accept AI slop, the strict rules around tests and migrations is to prevent the AI slop from breaking everything or causing data loss.
True but, in my experience, a few major pitfalls that happened:
1. We ran into really bad minefields when we tried to come back to manually edit the generated tests later on. Claude tended to mock everything because it didn’t have context about how we run services, build environments, etc.
2. And this was the worst, all of the devs on the team including me got realllyy lazy with testing. Bugs in production significantly increased.
Did you try to put all this (complex and external) context to the context (claude.md or whatever), with intructions how to do proper TDD, before asking for the tests? I know that may be more work than actual coding it as you know all it by heart and external world is always bigger than internal one. But in long term and with teams/codebases with no good TDD practises that might end up with useful test iterations.
Of course developer commiting the code is anyway responsible for it, so what I would ban is putting “AI did it” to the commits - it may mentally work as “get out of jail card” attempt for some.
I’d say around ~40% me, the ideating, editing, citations, and images are all mine; rest Opus 4 :)
I typically try to also include the original Claude chat’s link in the post but it seems like Claude doesn’t allow sharing chats with deep research used in them.
Author here: To be honest, I know there are like a bajillion Claude code posts out there these days.
But, there are a few nuggets we figured are worth sharing, like Anchor Comments [1], which have really made a difference:
——
——[1]: https://diwank.space/field-notes-from-shipping-real-code-wit...
Great post. I'm fairly new to the AI pair programming thing (I've been using Aider), but with 20 years of coding behind me I can see where things are going. You're dead right in the conclusion about now being the time to adopt this stuff as part of your flow -- if you haven't already.
And regarding the HN post getting buried for a while there...[1] Somewhat ironic that an article about using AI to help write code would get canned for using an AI to help write it :D
[1]: https://news.ycombinator.com/item?id=44214437
Just to provide a contrast to some of the negative comments…
As a very experienced engineer who uses LLMs sporadically* and not in any systematic way, I really appreciated seeing how you use them in production in a real project. I don’t know why people are being negative, you just mentioned your project in details where it was appropriate to talk about the structure of it. Doesn’t strike me as gratuitous self promotion at all.
Your post is giving me motivation to empower the LLMs a little bit more in my workflows.
*: They absolutely don’t get the keys to my projects but I have had great success with having them complete specific tasks.
Really appreciate the kind words! I did not intend the post to be too much about our company, just that it is the codebase I mostly hack on. :)
Q: How do you ensure tests are only written by humans? Basically just the honor system?
You can:
1. Add instructions in CLAUDE.md to not touch tests.
2. Disallow the Edit tool for test directories in the project’s .claude/settings.json file
Disallow edit in test dirs is a good tip. thanks.
I meant though in the wider context of the team - everyone uses it but not everyone will work the same, use the same underlying prompts as they work. So how do you ensure everyone keeps to that agreement?
Honest question: approx what percent of the post was human vs machine written?
I’d say around ~40% me, the ideating, editing, citations, and images are all mine; rest Opus 4 :)
I typically try to also include the original Claude chat’s link in the post but it seems like Claude doesn’t allow sharing chats with deep research used in them.
Update: here’s an older chatgpt conversation while preparing this: https://chatgpt.com/share/6844eaae-07d0-8001-a7f7-e532d63bf8...
Thanks for being transparent about this, but we’re not wanting substantially LLM-generated content on HN.
We’ve been asking the community to refrain from publicly accusing authors of posting LLM-generated articles and comments. But the other side of that is that we expect authors to post content thay they’ve created themselves.
It’s one thing to use an LLM for proof-reading and editing suggestions, but quite another for “40%” of an article to be LLM-generated. For that reason I’m having to bury the post.
Shouldn’t the quality of the content be what matters? Avoiding articles with low grade effort or genuine content either made with or without LLMs would seem to be a better goal.
I completely understand. Just to clarify, when I said it was ~40%, I didn’t mean the content was written by Claude/ChatGPT but that I took its help in deep research and writing the first drafts. The ideas, all of the code examples, the original CLAUDE.md files, the images, citations, etc are all mine.
Ok, sure, these things are hard to quantify. The main issue is that we can't ask the community to refrain from accusing authors of publishing AI-generated content if people really are publishing content that is obviously AI-generated. What matters to us is not how much AI was used to write an article, but rather how much the audience finds that the article satisfies intellectual curiosity. If the audience can sense that the article is generated, they lose trust in the content and the author, and also lose trust in HN as a place they can visit to find high-quality content.
Edit: On reflection, given your explanation of your use of AI and given another comment [1] I replied to below, I don't think this post is disqualified after all.
[1] https://news.ycombinator.com/item?id=44211417#44215719
Surely you're missing the wood for the trees here - isn't the point of asking for no 'AI' to avoid low effort slop? This is a relatively high value post about adopting new practices and the human-LLM integration.
Tag it, let users decide how they want to vote.
Aside: meta: If you're speaking on behalf of HN you should indicate that in the post (really with a marker outside of the comment).
Indeed, and since the author has clarified what they meant by "40%", I've put the post back on the front page. Another relevant factor is they seem not speak English as a first language, and I think we can make allowances for such people to use LLMs to polish their writing.
Regarding your other suggestion: it's been the case ever since HN started 18 years ago that moderators/modcomments don't have any special designation. This is due to our preference for simple design and an aversion to seeming separate from the community. We trust people to work it out and that has always worked well.
thanks. to be clear, I'm not asking the q to be particularly negative about it. Its more just curiosity, mixed with trade in effort. If you wrote it 100%, I'm more inclined to read the whole thing. vs say now just feeding it back to the GPM to extract the condensed nuggets.
Very interesting, I'm going to use some of these ideas in my CLAUDE.md file.
> One of the most counterintuitive lessons in AI-assisted development is that being stingy with context to save tokens actually costs you more
Something similar I've been thinking about recently: For bigger projects & more complicated code, I really do notice a big difference between Claude Opus and Claude Sonnet. And Sonnet sometimes just wastes so much time on ideas that never pan out, or make things worse. So I wonder: wouldn't it make more sense for Anthropic to not differentiate between Opus and Sonnet for people with a Max subscription? It seems like Sonnet takes 10-20 turns what Opus can do in 2 or 3, so in the end forcing people over to Sonnet would ultimately cost them more.
One of the exciting things to me about the ai agents is how they push and allow you to build processes that we’ve always known were important but were frequently not prioritized in the face of shipping the system.
You can use how uncomfortable you are with the ai doing something as a signal that you need to invest in systematic verification of that something. As a for instance in the link, the team could build a system for verifying and validating their data migrations. That would move a whole class of changes into the ai relm.
This is usually much easier to quantify and explain externally than nebulous talk about tech debt in that system.
For sure. Another interesting trick I found to be surprisingly effective is to ask Claude Code to “Look around the codebase, and if something is confusing, or weird/counterintuitive — drop a AIDEV-QUESTION: … comment so I can document that bit of code and/or improve it”. We found some really gnarly things that had been forgotten in the codebase.
Agreed, my hunch is that you might use higher abstraction-level validation tools like acceptance and property tests, or even formal verification, as the relative cost of boilerplate decreases.
Some thoughts:
- Is there a more elegant way to organize the prompts/specifications for LLMs in a codebase? I feel like CLAUDE.md, SPEC.mds, and AIDEV comments would get messy quickly.
- What is the definition of "vibe-coding" these days? I thought it refers to the original Karpathy quote, like cowboy mode, where you accept all diffs and hardly look at code. But now it seems that "vibe-coding" is catch-all clickbait for any LLM workflow. (Tbf, this title "shipping real code with Claude" is fine)
- Do you obfuscate any code before sending it to someone's LLM?
> - Is there a more elegant way to organize the prompts/specifications for LLMs in a codebase? I feel like CLAUDE.md, SPEC.mds, and AIDEV comments would get messy quickly.
Yeah, the comments do start to pile up. I’m working on a vscode extension that automatically turns them into tiny visual indicators in the gutter instead.
> - What is the definition of "vibe-coding" these days? I thought it refers to the original Karpathy quote, like cowboy mode, where you accept all diffs and hardly look at code. But now it seems that "vibe-coding" is catch-all clickbait for any LLM workflow. (Tbf, this title "shipping real code with Claude" is fine)
Depends on who you ask ig. For me, hasn’t been a panacea, and I’ve often run into issues (3.7 sonnet and codex have had ~60% success for me but Opus 4 is actually v good)
> - Do you obfuscate any code before sending it to someone's LLM?
In this case, all of it was open source to begin with but good point to think about.
I finally decided few days ago to try this Claude Code thing in my personal project. It's depressingly efficient. And damn expensive - I used over 10 dollars in one day. But I'm afraid it is inevitable - I will have to pay tax to AI overlords just to be able to keep my job.
I was looking at $2,000 a year and climbing, before Anthropic announce $100 and $200 Max subscriptions that bundled Claude Console and Claude Code. There are limits per five hour windows, but one can toggle back to metered API with the login/ command, or just walk the dog. $100 a month has done me fine.
Same. I ran out on the $200 one too yesterday. It’s skyrocketed after Opus 4. Nothing else comes close
I had been musing over this. Will devs in very cheap countries still stay an attractive option, just because they'd be still cheaper monthly than Claude.
Lot of visual noise because of model specific comments. Or maybe that's just the examples here.
But as a human, I do like the CLAUDE.md file. It's like documentation for dev reasoning and choices. I like that.
Is this faster than old style codebases but with developers having the LLM chat open as they work? Seems like this ups the learning curve. The code here doesn't look very approachable.
I think most of this is good stuff but I disagree with not letting Claude touch tests or migrations at all. Handing writing tests from scratch is the part I hate the most. Having an LLM do a first pass on tests which I add to and adjust as I see fit has been a big boon on the testing front. It seems the difference between me and the author is I believe whether code was generated by an LLM or not the human still takes ownership and responsibility. Not letting Claude touch tests and migrations is saying you rightfully dont trust Claude but are giving ownership to Claude for Claude generated code. That or he doesn't trust his employees to not blindly accept AI slop, the strict rules around tests and migrations is to prevent the AI slop from breaking everything or causing data loss.
True but, in my experience, a few major pitfalls that happened:
1. We ran into really bad minefields when we tried to come back to manually edit the generated tests later on. Claude tended to mock everything because it didn’t have context about how we run services, build environments, etc.
2. And this was the worst, all of the devs on the team including me got realllyy lazy with testing. Bugs in production significantly increased.
Did you try to put all this (complex and external) context to the context (claude.md or whatever), with intructions how to do proper TDD, before asking for the tests? I know that may be more work than actual coding it as you know all it by heart and external world is always bigger than internal one. But in long term and with teams/codebases with no good TDD practises that might end up with useful test iterations. Of course developer commiting the code is anyway responsible for it, so what I would ban is putting “AI did it” to the commits - it may mentally work as “get out of jail card” attempt for some.
[flagged]
I’d say around ~40% me, the ideating, editing, citations, and images are all mine; rest Opus 4 :)
I typically try to also include the original Claude chat’s link in the post but it seems like Claude doesn’t allow sharing chats with deep research used in them.
See this series of posts for example, I have included the link right at the beginning: https://diwank.space/juleps-vision-levels-of-intelligence-pt...
I completely get the critique and I already talked about it earlier: https://news.ycombinator.com/item?id=44213823
Update: here’s an older chatgpt conversation while preparing this: https://chatgpt.com/share/6844eaae-07d0-8001-a7f7-e532d63bf8...