Enterprise IT dinosaur here, seconding this perspective and the author’s. When I...

PunchyHamster · 2026-02-02T08:30:18 1770021018

Yeah, I use it to get some basic info about topic I know little of (as google search is getting worse by the day..). That then I check.

Honestly the absolute revolution for me would be if someone managed to make LLM tell "sorry I don't know enough about the topic", one time I made a typo in a project name I wanted some info on and it outright invented commands and usages (that also were different than the project I was looking for so it didn't "correct the typo") out of thin air...

xmcqdpt2 · 2026-02-02T11:58:51 1770033531

> Honestly the absolute revolution for me would be if someone managed to make LLM tell "sorry I don't know enough about the topic"

https://arxiv.org/abs/2509.04664

According to that OpenAI paper, models hallucinate in part because they are optimized on benchmarks that involve guessing. If you make a model that refuses to answer when unsure, you will not get SOTA performance on existing benchmarks and everyone will discount your work. If you create a new benchmark that penalizes guessing, everyone will think you are just creating benchmarks that advantage yourself.

snovv_crash · 2026-02-02T20:43:33 1770065013

That is such a cop-out, if there was a really good benchmark for getting rid of hallucinations then it would be included in every eval comparison graph.

The real reason is that every bench I've seen has Anthropic with lower hallucinations.

KellyCriterion · 2026-02-02T17:42:16 1770054136

...or they hallicunate because of floating point issues in parallel execution environments:

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

cbdevidal · 2026-02-02T12:15:18 1770034518

Holy perverse incentives, Batman

Incipient · 2026-02-02T05:02:55 1770008575

>LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.

This is essentially what I'm doing too but I expect in a different country. I'm finding it incredibly difficult to successfully speak to people. How are you making headway? I'm very curious how you're leveraging AI messaging to clients/prospective clients that doesn't just come across as "I farm out work to an AI and yolo".

Edit - if you don't mind sharing, of course.

stego-tech · 2026-02-02T21:18:05 1770067085

Oh, I should be more clear: I only use local models to help me accelerate non-production work. I never use it for communications, because A) I want to improve my skills there, and B) I don't want my first (or any) outreach to another human to be a bot.

Mostly it's been an excellent way to translate vocabulary between products or technologies for me. When I'm working on something new (e.g., Hashicorp Packer) and lack the specific vocabulary, I may query Qwen or Ministral with what I want to do ("Build a Windows 11 image that executes scripts after startup but before sysprep"), then use its output as a starting point for what I actually want to accomplish. I've also tinkered with it at home for writing API integrations or parsing JSON with RegEx for Home Assistant uses, and found it very useful in low-risk environments.

Thus far, they don't consistently spit out functional code. I still have to do a back-and-forth to troubleshoot the output and make it secure and functional within my environments, and that's fine - it's how I learn, after all. When it comes to, say, SQL (which I understand conceptually, but not necessarily specifically), it's a slightly bigger crutch until I can start running on my own two feet.

Still cheaper than a proper consultant or SME, though, and for most enterprise workloads that's good (and cheap) enough once I've sanity checked it with a colleague or in a local dev/sandbox environment.

judahmeek · 2026-02-02T06:25:08 1770013508

I interpreted his statement as LLMs being valuable for the actual marketing itself.

vages · 2026-02-02T07:25:07 1770017107

Which local AI do you use? I am local-curious, but don’t know which models to try, as people mention them by model name much less than their cloud counterparts.

stego-tech · 2026-02-02T21:22:13 1770067333

I'm frequently rotating and experimenting specifically because I don't want to be dependent upon a single model when everything changes week-to-week; focusing on foundations, not processes. Right now, I've got a Ministral 3 14B reasoning model and Qwen3 8B model on my Macbook Pro; I think my RTX 3090 rig uses a slightly larger parameter/less quantized Ministral model by default, and juggles old Gemini/OpenAI "open weights" models as they're released.

charcircuit · 2026-02-02T10:39:33 1770028773

I've had code look having claude code use ssh with root to deploy code, change configurations, and debug bad configs / selinux policy / etc. Debugging servers is not that different than debugging code. You just need to give it a way to test.

holoduke · 2026-02-02T06:21:46 1770013306

I let Claude configure en setup entire systems now. Requires some manual auditing and steering once in a while. But managing barebone servers without any management software has become pretty feasible and cheap. I managed to configure +50 Debian server cluster simultaneously with just ssh and Claude. Yes it's cowboy 3.0. But so are our products/sites.

GrinningFool · 2026-02-02T15:20:28 1770045628

When you use phrases like "managed to configure" to describe your production systems, it does not inspire confidence in long-term sustainability of those systems.