One of the perennial problems with on call situations I encountered was that at some point everyone knew that a production incident was going on and people were either trying to help or learn by following along running the same diagnostics the on point people were running, and exhausting the available resources that were needed to diagnose the problem.
Splunk was a particular problem that way, but I also started seeing it with Grafana, at least in extremis, once we migrated to self hosted on AWS from a vendor. Most times it was fine, but if we had a bug that none of the teams could quickly disavow as being theirs, we had a lot of chefs in the kitchen and things would start to hiccup.
There can be thundering herds in dev. And a bunch of people trying a repro case in a thirty second window can be one of them. The question is if anyone has the spare bandwidth to notice that it’s happening or if everyone trudges along making the same mistakes every time.
When Covid hit I wasn’t the only one working remotely at my company, but I was the only one working remotely in North America, and apparently the only one trying to Work Smarter. By then there were a handful of feature toggles I had implemented that I quickly set to always on in development, but chief among them was that gzip service calls were a net loss in AWS but very very handy while working from home.
I also had switched a head of line service call that was, for reasons I never sorted out, costing us 30ms TTFB per request for basically fifty bytes of data, to use a long poll in Consul because the data was only meant to be changed at most once every half hour and in practice twice a week. So that latency was hidden in dev sandbox except for startup time, where we had several consul keys being fetched in parallel and applied in order, so one more was hardly noticeable.
The nasty one though was that Artifactory didn’t compress its REST responses, and when you have a CI/CD pipeline that’s been running for six years with half a hundred devs that response is huge because npm is teh dumb. So our poor UI lead kept having npm install timeout and the UI team’s answer for “my environment isn’t working” started with clearing your downloaded deps and starting over.
They finally fixed it after we (and presumably half of the rest of their customers) complained but I was on the back 9 of migrating our entire deployment pipeline to docker and so I had nginx config fairly fresh in my brain and I set them up a forward proxy to do compression termination. It still blew up once a week but that was better than him spending half his day praying to the gods of chaos.
One of the most dangerous ideologies is "all good things come to those who wait" or that waiting is a virtue. Applied by people working at all the levels of a system for years and years it leads to steps that could be 30ms taking 30s.
There was a hacked driver you could get that would tighten up the tolerances of the stepper motor and get from 1.5 to 1.9 MB of data onto a single floppy, but sliding the tracks closer together.
There was I believe at some point a game that shipped 1.5MB disks as a copy protection mechanism. But if you had this tool you could copy them anyway.
Are you referring to 2M/2MGUI? That didn't change the track spacing (which is fixed) but used bigger sector sizes (similar to how HDDs went from 512B to 4K physical sectors):
The first time I installed Slackware I didn’t have enough spare floppies to get the whole thing, I had to delete some things to do so, and then copying it in the computer lab lead to several dead disks. The installer didn’t yet have a retry feature so every time a disk turned out to be bad I had to make a new copy and start at the beginning. And sometimes that disk would be bad too. So the first time I installed slack I really installed it ten times.
Up until a few years ago my Slackware install was broken up over 4 flash drives, as Slackware grew I never bothered to buy a new flash drive big enough for it. It was a lot like the old floppy install. Eventually I realized I could just put all the packages on an external drive and greatly simplify things and then I snapped out of the old habit and just bought a few new flash drives.
The first program I ever started on one day and finished on another was saved onto an audio cassette. And I thought that was pretty weird.
But like the vinyl it has really terrible random access behavior.
It would be sorta cool if someone used an auto repeat record and several copies in order to do a multi track streaming solution. With six players you can load the file in 1:02 instead of 6:10. Or perhaps 1:33 average if you don’t assume the record begins right when you’re ready to read and you have to wait ~31s average seek time.
reply