I've noticed a huge gap between AI use on greenfield projects and brownfield projects. The first day of working on a greenfield project I can accomplish a week of work. But the second day I can accomplish a few days of work. By the end of the first week I'm getting a 20% productivity gain.
I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.
The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.
A friendly counterpoint: my test of new models and agentic frameworks is to copy one of my half a zillion old open source git repos for some usually very old project or experiment - I then see how effective the new infra is for refactoring and cleaning up my ancient code. After testing I usually update the old projects and I get the same warm fuzzy feeling as I do from spring cleaning my home.
I also like to generate greenfield codebases from scratch.
Enterprise IT dinosaur here, seconding this perspective and the author’s.
When I needed to bash out a quick Hashicorp Packer buildfile without prior experience beyond a bit of Vault and Terraform, local AI was a godsend at getting me 80% of the way there in seconds. I could read it, edit it, test it, and move much faster than Packer’s own thin “getting started” guide offered. The net result was zero prior knowledge to a hardened OS image and repeatable pipeline in under a week.
On the flip side, asking a chatbot about my GPOs? Or trusting it to change network firewalls and segmentation rules? Letting it run wild in the existing house of cards at the core of most enterprises? Absolutely hell no the fuck not. The longer something exists, the more likely a chatbot is to fuck it up by simple virtue of how they’re trained (pattern matching and prediction) versus how infrastructure ages (the older it is or the more often it changes, the less likely it is to be predictable), and I don’t see that changing with LLMs.
LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.
Yeah, I use it to get some basic info about topic I know little of (as google search is getting worse by the day..). That then I check.
Honestly the absolute revolution for me would be if someone managed to make LLM tell "sorry I don't know enough about the topic", one time I made a typo in a project name I wanted some info on and it outright invented commands and usages (that also were different than the project I was looking for so it didn't "correct the typo") out of thin air...
>LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.
This is essentially what I'm doing too but I expect in a different country. I'm finding it incredibly difficult to successfully speak to people. How are you making headway? I'm very curious how you're leveraging AI messaging to clients/prospective clients that doesn't just come across as "I farm out work to an AI and yolo".
Which local AI do you use? I am local-curious, but don’t know which models to try, as people mention them by model name much less than their cloud counterparts.
I let Claude configure en setup entire systems now. Requires some manual auditing and steering once in a while. But managing barebone servers without any management software has become pretty feasible and cheap. I managed to configure +50 Debian server cluster simultaneously with just ssh and Claude. Yes it's cowboy 3.0. But so are our products/sites.
Isn't this true of any greenfield project? with or without generative models. The first few days are amazingly productive. and then features and fixes get slower and slower. And you get to see how good an engineer you really are, as your initial architecture starts straining under the demands of changing real world requirements and you hope it holds together long enough to ship something.
"I could make that in a weekend"
"The first 80% of a project takes 80% of the time, the remaining 20% takes the other 80% of the time"
That is a good point and true to some extent. But IME with AI, both the initial speedup and the eventual slowdown are accelerated vs. a human.
I've been thinking that one reason is that while AI coding generates code far faster (on a greenfield project I estimate about 50x), it also generates tech-debt at a hyperastonishing rate.
It used to be that tech debt started to catch up with teams in a few years, but with AI coded software it's only a few months into it that tech debt is so massive that it is slowing progress down.
I also find that I can keep the tech debt in check by using the bot only as a junior engineer, where I specify precisely the architecture and the design down to object and function definitions and I only let the bot write individual functions at a time.
That is much slower, but also much more sustainable. I'd estimate my productivity gains are "only" 2x to 3x (instead of ~50x) but tech debt accumulates no faster than a purely human-coded project.
This is based on various projects only about one year into it, so time will tell how it evolves longer term.
In your experience, can you take the tech debt riddled code, and ask claude to come up with an entirely new version that fixes the tech debt/design issues you've identified? Presumably there's a set of tests that you'd keep the same, but you could leverage the power of ai in greenfield scenarios to just do a rewrite (while letting it see the old code). I dont know how well this would work, i havn't got to the heavy tech debt stage in any of my projects as I do mostly prototyping. I'd be interested in others thoughts.
Perhaps it depends on the nature of the tech-debt. A lot of the software we create has consequences beyond a paticular codebase.
Published APIs cannot be changed without causing friction on the client's end, which may not be under our control. Even when the API properly versioned, users will be unhappy if they are asked to adopt a completely changed version of the API on a regular basis.
Data that was created according to a previous version of the data model continues to exist in various places and may not be easy to migrate.
User interfaces cannot be radically changed too frequently without confusing the hell out of human users.
> ask claude to come up with an entirely new version that fixes the tech debt/design issues you've identified?
I haven't tried that yet, so not sure.
Once upon a time I was at a company where the PRD specified that the product needs to have a toggle to enable a certain feature temporarily. Engineering implemented it literally, it worked perfectly. But it was vital to be able to disable the feature, which should've been obvious to anyone. Since the PRD didn't mention that, it was not implemented.
In that case, it was done as a protest. But AI is kind of like that, although out of sheer dumbness.
The story is meant to say that with AI it is imperative to be extremely prescriptive about everything, or things will go haywire. So doing a full rewrite will probably work well, only if you manage to have very tight test case coverage for absolutely everything. Which is pretty hard.
Sometimes the start of a greenfield project has a lot of questions along the lines of "what graph plotting library are we going to use? we don't want two competing libraries in the same codebase so we should check it meets all our future needs"
LLMs can select a library and produce a basic implementation while a human is still reading reddit posts arguing about the distinction between 'graphs' and 'charts'.
All of this speedrun hits a wall at the context window. As long as the project fits into 200k tokens, you’re flying. The moment it outgrows that, productivity doesn’t drop by 20% - it drops to zero.
You start spending hours explaining to the agent what you changed in another file that it has already forgotten. Large organizations win in the long run precisely because they rely on processes that don’t depend on the memory of a single brain - even an electronic one
I call it the day50 problem, coined that about a year ago. I've been building tools to address it since then. Quit the dayjob 7 months ago and have been doing it full time since
Essentially there's a delta between what the human does and the computer produces. In a classic compiler setting this is a known, stable quantity throughout the life-cycle of development.
However, in the world of AI coding this distance increases.
There's various barriers that have labels like "code debt" where the line can cross. There's three mitigations now. Start the lines closer together (PRD is the current en vogue method), push out the frontier of how many shits someone gives (this is the TDD agent method), try to bend the curve so it doesn't fly out so much (this is the coworker/colleague method).
Unfortunately I'm just a one-man show so the fact that I was ahead and have working models to explain this has no rewards because you know, good software is hard...
I've explained this in person at SF events (probably about 40-50 times) so much though that someone reading this might have actually heard it from me...
It seems to be fantastic up to about 5k loc and then it starts to need a lot more guidance, careful supervision, skepticism, and aggressive context management. If you’re careful, it only goes completely off the rails once in a while and the damage is only a lost hour or two.
Overall, still a 4x production gain overall though, so I’m not complaining for $20 a month. It’s especially good at managing complicated aspects of c so I can focus on the bigger picture rather than the symbol contortions.
Yes, I see the same thing. My working thesis is that if I can keep the codebase modular and clear seperations, so I keep the entire context, while claude code only need to focus on one module at a time, I can keep up the speed and quality. But if I try and give it tasks that cover the entire codebase it will have issues, no matter how you manage context and give directions. And again, this is not suprising, humans do the same, they need to break the task apart into smaller piecers. Have you found the same?
Yeah, my observation is that for my usual work, I can maybe get a 20% productivity boot, probably closer to 10% tbh, and for the whole team overall productivity it feels like it has done nothing, as senior use their small productivity gains to fix the tons of issues in PR (or in prod when we miss something).
But last week I had two days where I had no real work to do, so I created cli tools to help with organisation, and cleaning up, I think AI boosted my productivity at least 200%, if not 500.
Yup. My biggest issue with designing software is usually designing the system architecture/infra. I am very opposed to just shove everything to AWS and call it a day, you dont learn anything from that, cloud performance stinks for many things and I dont want to get random 30k bills because I let some instance of something run accidentally.
AI sucks at determining what kinda infrastructure would be great for scenario x due to Cloud being to go to solution for the lazy dev. Tried to get it to recommend a way to self host stuff, but thats just a general security hazard.
Similar experience. I love using Gemini to set up my home server, it can debug issues and generate simple docker compose files faster than I could have done myself. But at work on the 10 year old Rails app, I find it so much easier to just write all the code myself than to work out what prompt would work and then review/modify the results.
This makes me think how AI turns SW development upside down. In traditonal development we write code which is the answer to our problems. With AI we write questions and get the answers. Neither is easy, finding the correct questions can be a lot fo work, whereas if you have some existing code you already have the answers, but you may not have the questions (= "specs") written down anywhere, at least not very well, typically.
I’ve done this in pure Python for a long time. Single file prototype that can mostly function from the command line. The process helps me understand all the sub problems and how they relate to each other. Best example is when you realize behaviors X, Y, and Z have so much in common that it makes sense to have a single component that takes a parameter to specify which behavior to perform.
It’s possible that already practicing this is why I feel slightly “meh” compared to others regarding GenAI.
I find that setting up proper structure while everything still fits in a single context window of Claude code, as well as splittjng as much as possible into libraries works pretty well for staving off that moment.
I have experienced much of the opposite. With an established code base to copy patterns from, AI can generate code that needs a lot less iteration to clean up than on green fields projects.
That's a fair observation, there's probably a sweet spot. The difference I've found is that I can reliably keep the model on track with patterns through prompting and documentation if the code doesn't have existing examples, whereas I can't document every single nuance of a big codebase and why it matters.
My observations match this. I can get fresh things done very quickly, but when I start getting into the weeds I eventually get too frustrated with babysitting the LLM to keep using it.
I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.
The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.