This is great and all but, who can actually afford to let these agents run on ta...

anupamchugh · 2026-02-06T04:01:47 1770350507

Real numbers from today. FastAPI codebase, ~50k LOC. 4 agents, 6 tasks, ~6 min wall clock vs ~18-20 min sequential. 24 tests, 0 file conflicts. Token cost: roughly 4x a single session.

To your cost question — agent teams are sprinters, not marathon runners. You use them for a 6-minute burst of parallel work, not all day. A 6-minute burst at 4x cost is still cheaper than 20 minutes at 1x if your time matters more than tokens.

The constraint nobody mentions: tasks must be file-disjoint. Two agents editing the same file means overwrites. Plan decomposition matters more than the agents themselves.

One thing to watch: Claude Code crashed mid-session with a React reconciler error (#23555). 4 agents + MCP servers pushes the UI past its limits.

simianwords · 2026-02-06T04:12:34 1770351154

Need it be actually disjoint? Interested in learning about the limitation here because apparently the agents can coordinate.

Otherwise what’s the difference between what they are providing vs me creating two independent pull requests using agents and having an agent resolve merge conflicts?

anupamchugh · 2026-02-06T05:39:39 1770356379

It does need to be disjoint. The https://code.claude.com/docs/en/agent-teams are explicit: "Two teammates editing the same file leads to overwrites. Break the work soeach teammate owns a different set of files."

locking is for task claiming — preventing two agents from grabbing the same task — not for file writes:

"Task claiming uses file locking to prevent race conditions when multiple teammates try to claim the same task simultaneously."

The coordination layer (TaskList, blockedBy, SendMessage) handles logical task sequencing, not concurrent file access. You can make agent B wait for agent A via dependencies, but that serializes the work and kills the parallelism benefit.

Aditya_Garg · 2026-02-06T07:06:21 1770361581

Anthropic themselves were able to write a c compiler using teams all at the same time

https://www.anthropic.com/engineering/building-c-compiler

Here is the relevant excerpt:

"To prevent two agents from trying to solve the same problem at the same time, the harness uses a simple synchronization algorithm:

Claude takes a "lock" on a task by writing a text file to current_tasks/ (e.g., one agent might lock current_tasks/parse_if_statement.txt, while another locks current_tasks/codegen_function_definition.txt). If two agents try to claim the same task, git's synchronization forces the second agent to pick a different one. Claude works on the task, then pulls from upstream, merges changes from other agents, pushes its changes, and removes the lock. Merge conflicts are frequent, but Claude is smart enough to figure that out."

MarkMarine · 2026-02-06T04:42:02 1770352922

A Claude max 20x plan and you’ll be fine. I’d been doing my normal process of running 4 Claude sessions in parallel because that was about the right amount of concurrent sessions for me to watch what’s going on and approve/deny plans and code… and this blows it out of the water. With an agent swarm it’s so fast at executing and testing I’m limited by my idea and review capabilities now. I tried running 2 and I can’t keep up, I’m defining specs and the other window is done, tested, validated and waiting for me.

rahimnathwani · 2026-02-05T22:43:59 1770331439

Many many companies can afford to hire a junior engineer for $150k/year (plus employer payroll taxes, employee benefits etc.).

Are you spending more than $150k per year on AI?

(Also, you're talking about the cost of your Cursor subscription, when the article is about Claude Code. Maybe try Claude Max instead?)

freeone3000 · 2026-02-05T22:56:33 1770332193

If it could do anything that a junior dev could, that’d be a valid point of comparison. But it continually, wildly performs slower and falls short every time I’ve tried.

rahimnathwani · 2026-02-05T23:10:47 1770333047

  But it continually, wildly performs slower and falls short every time I’ve tried.

If it falls short every time you've tried, it's likely that one or more of these is true:

A. You're working on some really deep thing that only world-class expects can do, like optimizing graphics engines for AAA games.

B. You're using a language that isn't in the top ~10 most popular in AI models' training sets.

C. You have an opportunity to improve your ability to use the tools effectively.

How many hours have you spent using Claude Code?

RollAHardSix · 2026-02-06T02:25:40 1770344740

Trying to make a media player, media server, all by using ffmpeg and a pre-built media streaming engine as it's core. Python and SQLite. About a week's worth of effort every time until it begins to go too far off the rails to be reliable to continue to develop with. It never did get the ffmpeg commands right, I had to go back to crafting those by hand, it never did get the streaming engine to play in the browser's video player in the supported hls and dash formats. Asked it to build a file and file metadata caching layer and then had to continue to re-prompt it to poll the caching layers before trying to get values from the database. Never even got to the library, metadata, or library image functionality. Had to ask it to create the rbac permissions model I wanted despite it being very junior-level common sense (super-admin, user-admin, metadata admin, image admin).

Not exactly world-class software.

frankc · 2026-02-06T06:14:18 1770358458

I recently built something in the same universe - using ffmpeg to receive streams from obs to capture audio and video - don't want to get into details beyond except to say it involved a fairly involved pipeline of ray actors and a significant admin interface with nicegui. I had no problem doing this with claude. You need to give it access to look up how do things, like context7. If you are doing something very specific, you need to have a session that does research to build a skill so it doesn't need to redo that research every time. And yes, you do need to tell it the architecture and be fairly detailed with something like how you want rbac.

Using these tools takes quite a bit of effort but even after doing all those steps to use the tool well, I still got this project done in a few days when it otherwise would have taken me 1-2 months and likely simply would never happened at all.

rahimnathwani · 2026-02-06T02:56:26 1770346586

I'm curious which harness and which model(s) you've been using.

And whether you have a decent PRD or spec. Are you trying to prompt the harness with one bit at a time, or did you give it a complete spec and ask it to analyze it and break it down into individual issues with dependencies (e.g. using beads and beads_viewer)?

I'm not looking for reasons to criticize your approach or question your experience, but your answers may point to opportunities for you to get more out of these tools.

If you're using Claude Code and you have a friend who has had more success with these tools, consider exporting your transcripts and letting them have a look: https://simonwillison.net/2025/Dec/25/claude-code-transcript...

astrange · 2026-02-06T01:28:00 1770341280

> A. You're working on some really deep thing that only world-class expects can do, like optimizing graphics engines for AAA games.

This is a relatively common skill. One thing I always notice about the video game industry is it's much more globally distributed than the rest of the software industry.

Being bad at writing software is Japan's whole thing but they still make optimized video games.

freeone3000 · 2026-02-06T00:57:03 1770339423

It’s a simple compiler optimization over bayesian statistics. It’s masters-level stuff at best, given that I’m on it instead of some expert. The codebase is mixed python and rust, neither of which are uncommon.

The issues I ran into are primarily “tail-chasing” ones - it gets into some attractor that doesn’t suit the test case and fails to find its way out. I re-benchmark every few months, but so far none of the frontier models have been able to make changes that have solved the issue without bloating the codebase and failing the perf tests.

It’s fine for some boilerplate dedup or spinning up some web api or whatever, but it’s still not suitable for serious work.

rahimnathwani · 2026-02-06T01:01:50 1770339710

Would you expect a junior engineer to perform better than this?

imiric · 2026-02-06T00:01:33 1770336093

The possibility that the performance of these tools still isn't at the level some people need it to be is not an option?

It's insulting that criticism is often met with superficial excuses and insinuation that the user lacks the required skills.

indemnity · 2026-02-06T06:17:42 1770358662

When really solid programmers who started skeptical (and even have a ban policy if PR submitters don’t disclose they used AI) now show how their workflows have been improved by AI agents, it may be worth trying to understand what they are doing and you are not.

https://mitchellh.com/writing/my-ai-adoption-journey

My experience mirrors that of Mitchell. It absolutely is at the level now where AI can free up time to do the really interesting stuff.

rahimnathwani · 2026-02-06T00:08:04 1770336484

That possibility is covered by A and B.

GP said 'falls short every time I’ve tried'. Note the word 'every'.

bryanlarsen · 2026-02-06T00:18:09 1770337089

> like optimizing graphics engines for AAA games.

Claude would be worse than an expert at this, but this is a benchmarkable task. Claude can do experiments a lot quicker than a human can. The hard part would be ensure that the results aren't just gaming your benchmark.

buzzerbetrayed · 2026-02-05T23:50:32 1770335432

I am way more productive with $200/month of AI than I would be with $5,000/month of junior developer. And it isn’t close.

poslathian · 2026-02-06T02:47:22 1770346042

What if you are going to spend 5400 either way, you go all agent or get an apprentice and an agent for them too.

andkenneth · 2026-02-05T23:40:00 1770334800

Companies are not comparing it straight to juniors. They're more making a comparison between a Senior with the assistance of one more more juniors, vs a Senior with the assistance of AI Agents.

I feel like comparison just to a junior developer is also becoming a fairly outdated comparison. Yes, it is worse in some ways, but also VASTLY superior in others.

taurath · 2026-02-05T23:57:54 1770335874

It’s funny so many companies making people RTO and spending all this money on offices to get “hallway” moments of innovation, while emptying those offices of the people most likely to have a new perspective.

logicx24 · 2026-02-05T22:49:43 1770331783

I can't even get through my Claude Max quota, and that's only 200/mo. And I code every day and use it for various other pretty-intensive tasks.

dangus · 2026-02-05T23:06:05 1770332765

only $200/mo…$200 a month is a used car payment.

I guarantee you that price will double by 2027. Then it’ll be a new car payment!

I’m really not saying this to be snarky, I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.

I pay less for Autocad products!

This whole product release is about maximizing your bill, not maximizing your productivity.

I don’t need agents to talk to each other. I need one agent to do the job right.

__turbobrew__ · 2026-02-06T01:11:57 1770340317

$200/month is peanuts when you are a business paying your employees $200k/year. I think LLMs make me at least 10% more effective and therefore the cost to my employer is very worth it. Lots of trades have much more expensive tools (including cars).

dangus · 2026-02-06T01:27:20 1770341240

> I think LLMs make me at least 10% more effective

I know this was last year but...

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

spiderfarmer · 2026-02-06T02:36:30 1770345390

I don’t need external research to validate or invalidate my own experience.

legulere · 2026-02-06T11:45:05 1770378305

One of the outcomes of that study is that your own productivity estimate might not match up with reality.

spiderfarmer · 2026-02-06T19:07:35 1770404855

Maybe for the developers who weren't very productive to begin with, and got even lazier now.

tehjoker · 2026-02-06T03:24:45 1770348285

I think it depends on the tasks you use it for. Bootstrapping or translating projects between languages is amazing. New feature development? Questionable.

__turbobrew__ · 2026-02-06T04:49:11 1770353351

I don’t write frontend stuff, but sometimes need to fix a frontend bug.

Yesterday I fed claude very surgical instructions on how the bug happens, and what I want to happen instead, and it oneshot the fix. I had a solution in about 5 minutes, whereas it would have taken me at least an hour, but most likely more time to get to that point.

Literally an hour or two of my day was saved yesterday. I am salaried at around $250/hour, so in that one interaction AI saved my employer $250-500 in wages.

AI allows me to be a T shaped developer, I have over a decade of deep experience in infrastructure, but know fuck all about front end stuff. But having access to AI allows me as an individual who generally knows how computers work to fix a simple problem which is not in my domain.

tehjoker · 2026-02-06T19:52:31 1770407551

Maybe this is a gray area, but that's kind of my experience with it too. I understand what I want to happen, but don't understand the language and it produces a language specific result that is close enough, maybe even one-shot, for me to use. I categorize this under translation.

rahoulb · 2026-02-06T12:06:36 1770379596

It also depends upon how you manage it

My process, which probably wouldn't work with concurrent agents because I'm keeping an eye on it, is basically:

- "Read these files and write some documentation on how they work - put the documentation in the docs folder" (putting relevant files into the context and giving it something to refer to later on)

- "We need to make change X, give me some options on how to do it" (making it plan based on that context)

- "I like option 2 - but we also need to take account of Y - look at these other files and give me some more options" (make sure it hasn't missed anything important)

- "Revised option 4 is great - write a detailed to-do list in the docs/tasks folder" (I choose the actual design, instead of blindly accepting what it proposes)

- I read the to-do list and get it rewritten if there's anything I'm not happy with

- I clear the context window

- "Read the document in the docs folder and then this to-do list in the docs/tasks folder - then start on phase 1"

- I watch what it's doing and stop if it goes off on one (rare, because the context window should be almost empty)

- Once done, I give the git diffs a quick review - mainly the tests to make sure it's checking the right things

- Then I give it feedback and ask it to fix the bits I'm not happy with

- Finally commit, clear context and repeat until all phases are done

Most of the time this works really well.

Yesterday I gave it a deep task, that touched many aspects of the app. This was a Rails app with a comprehensive test suite - so it had lots of example code to read, plus it could give itself definite end points (they often don't know when to stop). I estimated it would take me 3-4 days for me complete the feature by hand. It made a right mess of the UI but it completed the task in about 6 hours, and I spent another 2 hours tidying it up and making it consistent with the visuals elsewhere (the logic and back-end code was fine).

So either my original estimate is way off, or it has saved me a good amount of time there.

tehjoker · 2026-02-06T19:55:10 1770407710

When you say "it" completed the task in 6 hours, do you mean with you in the loop or running autonomously for hours after a certain point?

samtheprogram · 2026-02-06T03:59:55 1770350395

New feature development in web and mobile apps is absolutely 10% more productive with these tools, and anyone who says otherwise is coping. That's a large fraction of software development.

spiderfarmer · 2026-02-08T09:43:38 1770543818

They're hoping they still have a moat.

dangus · 2026-02-06T15:30:05 1770391805

The flat earther argument.

“The research is wrong.”

spiderfarmer · 2026-02-08T09:42:48 1770543768

Yes, the research is wrong. And in science, it's not taboo to call that out.

It's outdated, doesn't differentiate between people trying to incorporate it in their current workflow and the people who apply themselves to entirely new ones. It doesn't represent me in any way and I am releasing features to my platform daily now, instead of weekly. So I can wholeheartedly disagree with its conclusion.

The earth is either flat of it isn't. It's easy to proof it's not flat. It's not easy to conclude that the results of a study in a field that changes daily represents all people working in it, including the ones who did not participate.

dangus · 2026-02-08T15:14:53 1770563693

If it is so self-evident that the research is wrong, that means there should be some research that supports the opposite conclusion then? Maybe you can link it?

spiderfarmer · 2026-02-09T13:29:43 1770643783

No.

The reason we don’t see any other research is because it’s neigh impossible to study a moving field. Especially at this pace.

If you have any ideas on how to measure objectively while this landscape changes daily, please share them with us. Maybe a researcher will jump on this bandwagon and proof you right.

dangus · 2026-02-09T16:09:08 1770653348

Good excuse.

__turbobrew__ · 2026-02-06T16:42:08 1770396128

I proposed a logically consistent perspective where both my experience and the study are true at the same time? What is your response to that other than comparing me to a flat earther? Do you have something useful to contribute?

dangus · 2026-02-07T14:13:53 1770473633

I wasn't even responding to you.

__turbobrew__ · 2026-02-07T20:20:43 1770495643

Sorry

__turbobrew__ · 2026-02-06T04:41:37 1770352897

Honestly, that is a “skill issue” as the kids these days say. When used properly and with skill, agents can increase your productivity. Like any tool, use it wrong and your life will be worse off. The logically consistent view if you want to believe this study and my experience is that the average person is hindered by using AI because they do not have the skills, but there are people out there who gain a net benefit.

CuriouslyC · 2026-02-06T14:04:49 1770386689

It drives me nuts that people take the mean of AI code generation results and use that to make claims about what AI code generation is possible of. It's like using the mean basketball player to argue that people like LeBron and Jordan don't exist.

smithcoin · 2026-02-06T15:19:13 1770391153

No, we just want to point out not everybody utilizing agents ends up like LeBron or Jordan - most are Brian Scalabrine.

CuriouslyC · 2026-02-06T16:10:02 1770394202

For sure. I like having discussions with nuanced takes, these are tools with strengths and weaknesses and being a good tool user includes knowing when not to pick it up.

dangus · 2026-02-06T15:15:14 1770390914

It’s a skill issue, which means you can’t fire any of your highly skilled employees, which means it has the same value as any other business organization tool like Jira or Microsoft Excel, approximately $10-20 per user per month.

Autodesk Fusion for manufacturing costs less than Claude Max and you literally can’t do your job without it.

So Autodesk takes you from 0 to 100% productivity for under $200 a month and companies are expected to pay $200+ to gain an extra 10-20%?

That math isn’t how it works with any other business logic tools.

kesslern · 2026-02-05T23:47:48 1770335268

Not saying $200/mo isn't a lot, but I think you're underestimating used car payments these days. The average US used car payment is above $500 now.

yomismoaqui · 2026-02-05T23:58:09 1770335889

As company owner the math is simple:

If I pay $3k/month to a developer and a $200/month tool makes them 10% more productive I will pay it without thinking.

nlh · 2026-02-05T23:40:41 1770334841

I pay $200/month, don’t come near the limits (yet), and if they raised the price to $1000/month for the exact same product I’d gladly pay it this afternoon (Don’t quote me on this Anthropic!)

If you’re not able to get US$thousands out of these models right now either your expectations are too high or your usage is too low, but as a small business owner and part/most-time SWE, the pricing is a rounding error on value delivered.

rune-dev · 2026-02-05T23:51:43 1770335503

As a business expense to make profit, I can understand being ok with this price point.

But as an individual with no profit motive, no way.

I use these products at work, but not as much personally because of the bill. And even if I decided I wanted to pursue a for profit side project I’d have to validate it’s viability before even considering a 200$ monthly subscription

Wowfunhappy · 2026-02-06T01:36:14 1770341774

I'm paying $100 per month even though I don't write code professionally. It is purely personal use. I've used the subscription to have Claude create a bunch of custom apps that I use in my daily life.

This did require some amount of effort on my part, to test and iterate and so on, but much less than if I needed to write all the code myself. And, because these programs are for personal use, I don't need to review all the code, I don't have security concerns and so on.

$100 every month for a service that writes me custom applications... I don't know, maybe I'm being stupid with my money, but at the moment it feels well worth the price.

yomismoaqui · 2026-02-06T00:03:44 1770336224

You can do it for $40 month. What I'm doing:

- $20 for Claude Pro (Claude Code) - $20 for ChatGPT Plus (Codex) - Amp Free Plan (with ads and you get about $10 of daily value)

So you get to use 3 of the top coding agents for $40 month.

__turbobrew__ · 2026-02-06T01:14:27 1770340467

Some tools are not meant for individuals. That 100k software defined radio isn’t meant for you either.

geraneum · 2026-02-05T23:49:49 1770335389

We’re gonna see an economic boom any minute.

imiric · 2026-02-05T23:53:19 1770335599

I'm curious: what concrete value have you extracted using these tools that is worth US$thousands?

dangus · 2026-02-06T01:20:00 1770340800

"Rounding error" lol, you can hire an actual full time human in India for $1000/month.

nmfisher · 2026-02-06T02:06:29 1770343589

Will they be better than Opus though?

bdangubic · 2026-02-06T01:25:08 1770341108

wouldn’t hire one for $15/month…

with the US salaries for SWEs $1000/month is not a rounding error for all but definitely for some. say you make $100/hr and CC saves you say 30hrs / month? not rounding error but no brainer. if you make $200+/hr it starts to become a rounding error. I have multiple max accounts at my disposal and at this point would for sure pay $1000/month for max plan. it comes down to simple math

bryanlarsen · 2026-02-06T00:12:41 1770336761

That's one of 3 possible futures.

1. 1-3 LLM vendors are substantially higher quality than other vendors and none of those are open source. This is an oligarchy and the scenario you described will play out.

2. >3 LLM vendors are all high quality and suitable for the tasks. At least one of these is open source. This is the "commodity" scenario, and we'll end up paying roughly the cost of inference. This still might be hundreds per month, though.

3. Somewhere in between. We've got >3 vendors, but 1-3 of them are somewhat better than the others, so the leaders can charge more. But not as much more than they can in scenario #1.

CuriouslyC · 2026-02-06T14:08:35 1770386915

It's clear what's gonna play out. Chinese open source labs are slowly closing the gap, and as American frontier labs hit diminishing return on various tasks, the Chinese models are going to be good enough for the vast majority of use cases. This is going to strip American labs ability to do monopoly plays, and force them into open behavior.

The only place frontier labs will be able to profit take is niche models for specific purposes where they can control who has access to traces tightly. Any general pupose LLM with highly available traces is gonna get distilled down instantly.

Wowfunhappy · 2026-02-06T01:29:01 1770341341

> I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.

Traditional SaaS products don't write code for me. They also cost much less to run.

I'm having a lot of trouble seeing this as enshittification. I'm not saying it won't happen some day, but I don't think we're there. $200 per month is a lot, but it depends on what you're getting. In this case, I'm getting a service that writes code for me on demand.

dangus · 2026-02-06T15:22:16 1770391336

Traditional SaaS products literally “write code” for you (they implement business logic). See: Zapier, Excel.

The enshittification is that the costs are going up faster than inflation and companies like OpenAI are talking about adding advertisements.

https://www.fintechweekly.com/magazine/articles/cursor-prici...

https://hostbor.com/claude-ai-max-plan-explained/

We can see especially in the case of Claude AI Max that while it sounds like you’re getting better value than the cheaper plans, the company is now encouraging less efficient use of the tool (having multiple agents talking to each other, rather than improving models so that one agent is doing work correctly).

Wowfunhappy · 2026-02-06T16:06:44 1770394004

> Traditional SaaS products literally “write code” for you (they implement business logic). See: Zapier, Excel.

Eh, I'd call those a sort of programming language. The user is still writing code, albeit in a "friendlier" manner. You can't just ask for what you want in English.

> The enshittification is that the costs are going up faster than inflation and companies like OpenAI are talking about adding advertisements.

In 1980, IT would have cost $0 at most companies. It's okay for costs to go up if you're getting a service you were not getting before.

dangus · 2026-02-07T14:18:02 1770473882

In 1980, the costs associated with what we today call IT were not $0, they were just spread around in administrative clerical duties performed by a lot of humans.

Wowfunhappy · 2026-02-07T14:36:49 1770475009

Okay, but I think the analogy still works with that framing. These AI products can do tasks that would previously have been performed by a larger number of humans.

buzzerbetrayed · 2026-02-05T23:49:11 1770335351

If you can’t get $200 of value out of Claude Code Max, then you need to really step up your game. That’s user error.

meowface · 2026-02-05T23:32:13 1770334333

I could write an essay about how almost everything you wrote either is extremely incorrect or is extremely likely to be incorrect. I am too lazy to, though, so I will just have to wait for another commenter to do the equivalent.

dangus · 2026-02-06T01:25:46 1770341146

Why not make your AI tool do it for you?

meowface · 2026-02-06T02:12:26 1770343946

Because, while I have been a huge AI optimist for decades, I generally don't like their current writing output. And even if I did, it would feel like plagiarism unless I prepended it with "an AI responded with this:", which would make me seem lazy. (Though I did already just admit I am very lazy in my first post, so perhaps that is what I will do going forward once they become better writers.)

emp17344 · 2026-02-05T22:37:15 1770331035

Especially for what’s basically an experiment. Gas town didn’t really work, so there’s no guarantee this will even produce anything of value.

reactordev · 2026-02-05T23:42:28 1770334948

You know those VC funded startups with just two founders… them.

jwpapi · 2026-02-05T22:35:22 1770330922

I mean what you get for Claude Code Max is insane its 30x on the token price. If you don’t spend that all it’s your own fault. That must be below elecricity cost