It may be one basket, but IBM high end kit is the nuclear fallout shelter of baskets.
E.g. we used to have an IBM Enterprise Storage System (aka the Shark) back in the day (around 2000), and it came in American fridge size, full of drawers of drives. You could just yank any drawer safe in the knowledge that all the raid volumes were distributed over multiple drawers. If a SCSI controller failed, you could yank a drawer of SCSI controllers and hotswap them safe in the knowledge they were fully redundant.
The "brains" of the thing consisted of a fully redundant pair of two AIX RS/6000 servers, and you could yank either one of them without losing data (all writes were committed to at least non-volatile memory on both servers before being acknowledged). Either server also had at least hot-swap RAM (raid-like memory controller) and may have had hot swap CPU's (tell the OS to move all threads off a CPU, swap, switch back).
On top of that, it had a phone connection and would dial out to report any early warnings of problems directly to IBM who'd send out a technician before anything even failed as long as you kept paying your support plan.
So yes, you can get that density with standard kit easily, and probably much cheaper too. Assuming you have enough skilled staff to manage it. The reason IBM still manages to sell this kind of kit, on the other hand, is because what they are really selling is peace of mind that most issues are Someone Elses Problem. For some people it makes sense to pay a lot for that.
Back in the late 1990s I was involved in provisioning a large Sun e15k. Not indestructable but nearly.
It broke. You know what happened? The factory roof leaked and poured water onto the DC sub-building which the roof then collapsed onto the e15k which promptly blew up and caused a spectacular fire, halon dump and about a month of work arguing with insurance companies and guys with shovels.
In that circumstance, it doesn't matter what promises the vendor make. That's still all your eggs in one basket.
Buy two and keep one somewhere else didn't help either as the network termination, switching and routing layers were down and all the people using it were about 300 miles away from the backup location anyway. So some poor fucker had to dismantle the backup e15k and disk arrays, bring them in a large truck[1] to the original location and erect a temp DC in a portakabin outside the building.
Edit: We would have been better served with two smaller DCs with off the shelf kit on the same site but different buildings running a mirrored arrangement. All for pocket change compared to a zSeries...
That's what the company I work for now do. We have off the shelf kit,SAN replication, ESX, redundant routing and multiple peers in different locations.
That's why you never deploy to just one location no matter how reliable the actual kit is.
You'd be in exactly the same situation if you had off the shelf "normal" servers in a rack. The point is one IBM mainframe is generally going to be more reliable than the vast majority of "homegrown" setups in a single location.
If you're comparing against a setup in multiple locations, then you should compare against two or more of these.
And there too these kind of solutions are far more reliable if you are willing to pay the money. E.g. IBM provides a range of options up to full synchronous mirroring of mainframe setups over up to about 200km where both systems can be administered as one identical unit (distance is down to latency). They also provide a range of other options for various performance vs. amount of data you can potentially lose vs. cost tradeoffs.
> Buy two and keep one somewhere else didn't help either as the network termination, switching and routing layers were down and all the people using it were about 300 miles away from the backup location anyway.
And this wouldn't have been any better if you had two racks of kit instead of two mainframes.
> All for pocket change compared to a zSeries...
There we agree. I'll likely never buy or recommend one of these, for the reason that I tend to work on cost sensitive projects.
Except that you're going to pay a lot more than you would for those off the shelf "normal" servers in a rack. Probably enough that you can afford doubly-redundant normal servers for the cost of a non-redundant IBM mainframe, with quite a bit of cash left over.
Yes but when that system falls over, your boss is yelling at you, and you're on the hook. With IBM, you can all yell at IBM. And that's why big enterprise companies buy IBM.
This story is obviously a bit ridiculous nowadays, since no one that can afford an ESS is buying a single site. In fact, most can't due to legal regulations about having data redundancy. These regulations typically lead to having a secondary site across town and a tertiary site across the country.
We have an AS/400 (excuse me, iSeries) and the damn thing is rock solid. It also alerts us and IBM when it needs maintenance. Its basically a tank with a logistics chain.
Designed for business apps, future-proofing, integrated database, largely self-managing, capability-security, continuing on solid (POWER) hardware... did about everything right. That's why we regularly fix crashed Windows and 'NIX machines but my company's AS/400 has been running for around 10 years.
I've always wanted a modern, clean-slated version of the System/38 w/out relics from that time and with any tricks we've learned since. Throw in hardware acceleration for garbage collection and some NonStop-style tricks for fault-tolerance to have a beast of a machine.
A similarly amazing machine that IBM's System/38 learned from a little bit. Somebody posted a link to an emulator but honestly I don't want to dredge through that. Like you said, a modern system that reimplemented its best attributes without the limitations or baggage would be nice.
Mainframes are complex enough that there's rarely projects to implement them but there's lots of work on safer CPU's. See crash-safe.org's early publications for a CPU that combined Burrough's-style checks, Alpha ISA, and functional programming at system level. Given stack preference, you might like these:
I started out my career in IT as an AS/400 operator / Netware 3.12 admin, and while AS/400 / iSeries aren't "en vogue" these days, I have a lot of respect for those machines. As you say, they are rock solid. One of the places I worked for had an even older machine, an IBM S/36 (predecessor to the AS/400) and while ancient, it just kept plugging away, day after day after day after day...
OTOH, you couldn't pay me to program in RPG/400 using SEU. Building menus, or playing around with a little CL on on the '400 is one thing. But RPG programming sucks. Well, it did anyway. Maybe things have gotten better. I understand the ILE stuff made RPG less column oriented and closer to a free-form language, but I had never had a chance to use that.
I remember original RPG as being the electronic descendant of the old IBM unit record machines, with their plug boards and mechanical processing cycles. That heritage likely predates even COBOL. IBM added many extensions over the years, and at one of my mainframe workplaces we even did online CICS programming with RPG (not fun at all!).
Who are you and how have you stolen my[1] early career history?!
Hahaha... well, it's a long story, regarding the early part of my career. Especially the whole bit about exactly how I got involved with AS/400's in the first place.
Let me guess you started in the early 90's, right?
Almost. I graduated H.S. in '91, started programming in '92 or so, but didn't start my first IT job until 1997.
Except that the OS is still provided by a different company to your actual hardware, so there's plenty of room for blame-passing, and most of the OS development is being done on x86 machines by people with no access to IBM Power hardware of any kind.
If I never hear "one throat to choke" again, I'll die a happy man.
This only works well if you can negotiate an acceptable SLA, your main vendor doesn't balk when integrating with subcontractors or other vendors and if you have a rock-solid vendor manager on your side enforcing the SLA.
Oh, that works great when you lose power to the rack. Or the datacentre. Or the SAN fails. Or the core routers. Or any of the many other SPOFs that can and do occur in a datacentre.
All of which are accounted for by having two or more of these, combined with the feature they call (I'm not kidding) Geographically Dispersed Parallel Sysplex (GDPS).
You can hook up multiple IBM mainframes remotely and set them up to automatically ensure consistent replication of machine state to various extents depending on your reliability vs. performance tradeoffs and replication distance (latency being the issue), all the way up to active-active operation across systems.
So in other words: It works far better than the failover options most people deploy on their off the shelf servers in their self-wired racks (and yes, I run my own setup across off the shelf servers; and no, they're not nearly as redundant as a pair of IBM mainframes).
Problem is we kitted out two 42U racks in two DCs with HP and EMC kit on VMware and got four humans for five years for less than the comparable quote from IBM. And we've tested replication and failover to the same extent and didnt have to rewrite the 2 million lines or so of code we have...
> All of which are accounted for by having two or more of these, combined with the feature they call (I'm not kidding) Geographically Dispersed Parallel Sysplex (GDPS).
And it is an awesome thing - although I didn't realise it supported zVM these days, rather than just zOS.
In any case, you've still got two baskets, which was my point.