I use AI to write specific types of unit tests, that would be extremely tedious to write by hand, but are easy to verify for correctness. That aside, it's pretty much useless. Context windows are never big enough to encompass anything that isn't a toy project, and/or the costs build up fast, and/or the project is legacy with many obscure concurrently moving parts which the AI isn't able to correctly understand, and/or overall it takes significantly more time to get the AI to generate something passable and double check it than just doing it myself from the get go.
Rarely, I'm able to get the AI to generate function implementations for somewhat complex but self-contained tasks that I then copy-paste into the code base.
Interesting. I treat VScode Copilot as a junior-ish pair programmer, and get really good results for function implementations. Walking it through the plan in smaller steps, noting that we’ll build up to the end state in advance ie. “first let’s implement attribute x, then we’ll add filtering for x later”, and explicitly using planning modes and prompts - these all allow me to go much faster, have good understanding of how the code works, and produce much higher quality (tests, documentation, commit messages) work.
I feel like, if a prompt for a function implementation doesn’t produce something reasonable, then it should be broken down further.
I don’t know how others define “vibe-coding”, but this feels like a lower-level approach. On the times I’ve tried automating more, letting the models run longer, I haven’t liked the results. I’m not interested in going more hands-free yet.
Can you setup automated integration/end-to-end tests and find a way to feed that back into your AI agents before a human looks at it? Either via an MCP server or just a comment on the pull request if the AI has access to PR comments. Not only is your lack of an integration testing pipeline slowing you down, it's also slowing your AI agents down.
"AFAICT, there’s no service that lets me"... Just make that service!
We do integration testing in a preview/staging env (and locally), and can do it via docker compose with some GitHub workflow magic (and used to do it that way, but setup really slowed us down).
What I want is a remote dev env that comes up when I create a new agent and is just like local. I can make the service but right now priorities aren’t that (as much as I would enjoy building that service, I personally love making dev tooling).
Man I was vim for life until cursor and the LLMs. For personal stuff I still do claude + vim because I love vim. I literally met my wife because I had a vim shirt on and she was an emacs user.
Your setup is interesting. I’ve had my mind on this space for a while now but haven’t done any deep work on a setup that optimizes the things I’m interested in.
I think at a fundamental level, I expect we can produce higher quality software under budget. And I really liked how you were clearly thinking about cost benefits especially in your setup. I’ve encountered far too many developers that just want to avoid as much cognitive work as possible. Too many junior and mid devs also are more interested in doing as they are told instead of thinking about the problem for themselves. For the most part, in my part of the world at least, junior and mid-level devs can indeed be replaced by a claude code max subscription of around $200 per month and you’d probably get more done in a week than four such devs that basically end up using an llm to do work that they might not even thoroughly explore.
So in my mind I’ve been thinking a lot about all aspects of the Software Development LifeCycle that could be improved using some llm or sorts.
## Requirements. How can we use llms to not only organize requirements but to strip them down into executable units of work that are sequenced in a way that makes sense. How do we go further to integrate an llm into our software development processes - be it a sprint or whatever. In a lot of green field projects, after designing the core components of the system, we now need to create tasks, group them, sequence them and work out how we go about assigning them and reviewing and updating various boards or issue trackers or whatever. There is a lot of gruntwork involved in this. I’ve seen people use mcps to automatically create tasks in some of these issue trackers based on some pdf of the requirements together with a design document.
## Code Review - I effectively spend 40% of my time reviewing code written by other developers and I mostly fix the issues I consider “minor” - which is about 60% of the time. I could really spend less time reviewing code with the help of an llm code reviewer that simply does a “first pass” to at least give me an idea of where to spend more of my time - like on things that are more nuanced.
## Software Design - This is tricky. Chatbots will probably lie to you if you are not a domain expert. You mostly use them to diagnose your designs and point out potential problems with your design that someone else would’ve seen if they were also domain experts in whatever you were building. We can explore a lot of alternate approaches generated by llms and improve them.
## Bugfixes - This is probably a big win for llms’ because there used to be a platform where I used to be able to get $50s and $30s to fix github bugs - that have now almost entirely been outsourced to llms. For me to have lost revenue in that space was the biggest sign of the usefulness of llms I got in practice. After a typical greenfield project has been worked on for about two months, bugs start creeping in. For apps that were properly architected, I expect these bugs to be fixable by existing patterns throughout the codebase. Be it removing a custom implementation to use a shared utility or other or simply using the design systems colors instead of a custom hardcoded one. In fact for most bugs - llms can probably get you about 50% of the way most of the time.
## Writing actual (PLUMBING) code . This is often not as much of a bottleneck as most would like to think but it helps when developers don’t have to do a lot of the grunt-work involved in creating source files, following conventions in a codebase, creating boilerplates and moving things around. This is an incredible use of llms that is hardly mentioned because it is not that “hot”.
## Testing - In most of the projects we worked on at a consulting firm, writing tests - whether ui or api was never part of the agreement because of the economics of most of our gigs. And the clients never really cared because all they wanted was working software. For a developing firm however, testing can be immense especially when using llms. It can provide guardrails to check when a model is doing something it wasn’t asked to do. And can also be used to create and enforce system boundaries especially in pseudo type systems like Typescript where JavaScript’s escape hatches may be used as a loophole.
## DEVOPS. I remember there was a time we used to manually invalidate cloudfront distributions after deploying our ui build to some e3 bucket. We’ve subsequently added a pipeline stage to invalidate the distribution. But I expect there are lots of grunt devops work that could really be delegated. Of course, this is a very scary use of llms but I daresay - we can find ways to use it safely
## OBSERVABILITY - a lot of observability platforms already have this feature where llms are able to review error logs that are ingested, diagnose the issue, create an issue on github or Jira (or wherever), create a draft PR, review, test it in some container, iterate on a solution X times, notify someone to review and so on and so forth. Some llms on this observability platform also attach a level of priority and dispatch messages to relevant developers or teams. LLms in this loop simply supercharge the whole observability/instrumentation of production applications
But yeah, that is just my two cents. I don’t have any answers yet I just ponder on this every now and then at a keyboard.
I use AI to write specific types of unit tests, that would be extremely tedious to write by hand, but are easy to verify for correctness. That aside, it's pretty much useless. Context windows are never big enough to encompass anything that isn't a toy project, and/or the costs build up fast, and/or the project is legacy with many obscure concurrently moving parts which the AI isn't able to correctly understand, and/or overall it takes significantly more time to get the AI to generate something passable and double check it than just doing it myself from the get go.
Rarely, I'm able to get the AI to generate function implementations for somewhat complex but self-contained tasks that I then copy-paste into the code base.
Interesting. I treat VScode Copilot as a junior-ish pair programmer, and get really good results for function implementations. Walking it through the plan in smaller steps, noting that we’ll build up to the end state in advance ie. “first let’s implement attribute x, then we’ll add filtering for x later”, and explicitly using planning modes and prompts - these all allow me to go much faster, have good understanding of how the code works, and produce much higher quality (tests, documentation, commit messages) work.
I feel like, if a prompt for a function implementation doesn’t produce something reasonable, then it should be broken down further.
I don’t know how others define “vibe-coding”, but this feels like a lower-level approach. On the times I’ve tried automating more, letting the models run longer, I haven’t liked the results. I’m not interested in going more hands-free yet.
Can you setup automated integration/end-to-end tests and find a way to feed that back into your AI agents before a human looks at it? Either via an MCP server or just a comment on the pull request if the AI has access to PR comments. Not only is your lack of an integration testing pipeline slowing you down, it's also slowing your AI agents down.
"AFAICT, there’s no service that lets me"... Just make that service!
We do integration testing in a preview/staging env (and locally), and can do it via docker compose with some GitHub workflow magic (and used to do it that way, but setup really slowed us down).
What I want is a remote dev env that comes up when I create a new agent and is just like local. I can make the service but right now priorities aren’t that (as much as I would enjoy building that service, I personally love making dev tooling).
I generally vibe code with vim and my playlist in Cmus.
Man I was vim for life until cursor and the LLMs. For personal stuff I still do claude + vim because I love vim. I literally met my wife because I had a vim shirt on and she was an emacs user.
Your setup is interesting. I’ve had my mind on this space for a while now but haven’t done any deep work on a setup that optimizes the things I’m interested in.
I think at a fundamental level, I expect we can produce higher quality software under budget. And I really liked how you were clearly thinking about cost benefits especially in your setup. I’ve encountered far too many developers that just want to avoid as much cognitive work as possible. Too many junior and mid devs also are more interested in doing as they are told instead of thinking about the problem for themselves. For the most part, in my part of the world at least, junior and mid-level devs can indeed be replaced by a claude code max subscription of around $200 per month and you’d probably get more done in a week than four such devs that basically end up using an llm to do work that they might not even thoroughly explore.
So in my mind I’ve been thinking a lot about all aspects of the Software Development LifeCycle that could be improved using some llm or sorts.
## Requirements. How can we use llms to not only organize requirements but to strip them down into executable units of work that are sequenced in a way that makes sense. How do we go further to integrate an llm into our software development processes - be it a sprint or whatever. In a lot of green field projects, after designing the core components of the system, we now need to create tasks, group them, sequence them and work out how we go about assigning them and reviewing and updating various boards or issue trackers or whatever. There is a lot of gruntwork involved in this. I’ve seen people use mcps to automatically create tasks in some of these issue trackers based on some pdf of the requirements together with a design document.
## Code Review - I effectively spend 40% of my time reviewing code written by other developers and I mostly fix the issues I consider “minor” - which is about 60% of the time. I could really spend less time reviewing code with the help of an llm code reviewer that simply does a “first pass” to at least give me an idea of where to spend more of my time - like on things that are more nuanced.
## Software Design - This is tricky. Chatbots will probably lie to you if you are not a domain expert. You mostly use them to diagnose your designs and point out potential problems with your design that someone else would’ve seen if they were also domain experts in whatever you were building. We can explore a lot of alternate approaches generated by llms and improve them.
## Bugfixes - This is probably a big win for llms’ because there used to be a platform where I used to be able to get $50s and $30s to fix github bugs - that have now almost entirely been outsourced to llms. For me to have lost revenue in that space was the biggest sign of the usefulness of llms I got in practice. After a typical greenfield project has been worked on for about two months, bugs start creeping in. For apps that were properly architected, I expect these bugs to be fixable by existing patterns throughout the codebase. Be it removing a custom implementation to use a shared utility or other or simply using the design systems colors instead of a custom hardcoded one. In fact for most bugs - llms can probably get you about 50% of the way most of the time.
## Writing actual (PLUMBING) code . This is often not as much of a bottleneck as most would like to think but it helps when developers don’t have to do a lot of the grunt-work involved in creating source files, following conventions in a codebase, creating boilerplates and moving things around. This is an incredible use of llms that is hardly mentioned because it is not that “hot”.
## Testing - In most of the projects we worked on at a consulting firm, writing tests - whether ui or api was never part of the agreement because of the economics of most of our gigs. And the clients never really cared because all they wanted was working software. For a developing firm however, testing can be immense especially when using llms. It can provide guardrails to check when a model is doing something it wasn’t asked to do. And can also be used to create and enforce system boundaries especially in pseudo type systems like Typescript where JavaScript’s escape hatches may be used as a loophole.
## DEVOPS. I remember there was a time we used to manually invalidate cloudfront distributions after deploying our ui build to some e3 bucket. We’ve subsequently added a pipeline stage to invalidate the distribution. But I expect there are lots of grunt devops work that could really be delegated. Of course, this is a very scary use of llms but I daresay - we can find ways to use it safely
## OBSERVABILITY - a lot of observability platforms already have this feature where llms are able to review error logs that are ingested, diagnose the issue, create an issue on github or Jira (or wherever), create a draft PR, review, test it in some container, iterate on a solution X times, notify someone to review and so on and so forth. Some llms on this observability platform also attach a level of priority and dispatch messages to relevant developers or teams. LLms in this loop simply supercharge the whole observability/instrumentation of production applications
But yeah, that is just my two cents. I don’t have any answers yet I just ponder on this every now and then at a keyboard.