We upgraded an old, 3PB large, Elasticsearch cluster without downtime

taf2 · on Nov 11, 2022

I did an upgrade with the team from 1.7 to 7.5.2 a few years ago we used terraform to build the 7.5.2 cluster with about 28 nodes. First we did a snapshot to upgrade the data from 1.7 to 2.4 and we synced by having our applications write to both. To get them to a synced state right before snapshotting we set a redis key that told our application servers to start writing every document changed or created to a redis set so we would have a set of all things changed since snapshot. This was to account for the time between snapshotting and getting the new cluster up. Once we have the set of changes synced we could test queries by switching a customer account to read from 2.4 via another redis set of upgrade accounts. Once we were confident and saw no new deprecations we did the process again for 5.6 and the. 7.5… as I recall we could skip 6.x It was an intense few weeks but definitely worth it for us. We also cleaned up our deployment to have a dedicated set of master, data and client nodes.

nullify88 · on Nov 12, 2022

I can recall the update from 5.x to 6.x gave us some headaches due to the removal of mapping types.

Luckily major updates from then have been relatively painless.

krallja · on Nov 11, 2022

While we're telling ES war stories:

FogBugz was still on twelve ElasticSearch 1.6 nodes when I left in 2018. We also had a custom plugin (essentially requesting facets that weren't stored in ElasticSearch back from FogBugz), which was the main reason we hadn't spent much time thinking about upgrading it. To keep performance adequate, we scheduled cache flush operations that, even at the time, we knew were pants-on-head crazy to be doing in production. I can't remember if we were running 32-bit or 64-bit with Compressed OOPs.

Kiln was on an even older version, v1.4 if I remember correctly. And one of the shards had a corruption warning, yet it didn't seem to affect stability or results. But that wasn't a fun cluster to operate, since it refused to do certain types of maintenance because of the supposed corruption.

Hopefully the newer versions are easier to migrate between. I don't remember what exactly was preventing us from upgrading, but I'm sure part of it was wanting to avoid a full reindex.

rjh29 · on Nov 11, 2022

It's good to hear stories of real-world systems. If you only look at blog posts you get the idea that everyone is doing everything perfectly, but of course it's not really like that at all...

metadat · on Nov 11, 2022

I've heard horror stories from friends about working at meltwater. Setting that aside for a moment, this is an amazing software engineering achievement.

Pulling off this level of scale with Elasticsearch is no easy feat and very impressive from a technical perspective. When you're running ES with petabytes of mission critical data as a core service powering the universe of a business, cluster rebuilds aren't an option (or maybe they are, as a last resort, but absolutely will not be acceptable on an ongoing basis).

Relying on Elasticsearch mega-clusters in this manner is akin to running an ultra-marathon with a really sharp pair of scissors glued in each hand. Or maybe even more extreme than my (admittedly lame) analogy.

Running nodes with such high shard counts is an appreciably precarious proposition, because there is a fair amount of overhead in the Elasticsearch management protocol. I wonder what the performance testing strategy entailed.

I have a lot of respect for the engineers working to make this project and service a success story. When it comes to Elasticsearch at scale, such outcomes are the exception.

andrelaszlo · on Nov 11, 2022

I usually don't explain my downvotes, but I thought that your comment was good overall, but the "horror stories from friends about working at meltwater" without explaining what they are just makes it a bit unfair.

As criticism, it's very vague, and as someone who doesn't work at Meltwater (for the last 5 years or so at least) it doesn't give me any information either. Well except that there are rumors about Meltwater, but that would be true about any large corporation.

Maybe I misunderstood and the horror stories were about ES, but I got it as being about the company itself. Could you expand? What type of stories? :)

karlney · on Nov 11, 2022

Thank you for those kind words.

And yes we have had our fair share of pain with the old cluster for sure.

the new version (7.17) is still behaving a lot better so far and feels a lot more predictable.

trendy0 · on Nov 11, 2022

What were the stories?

karlney · on Nov 11, 2022

One time, a few years ago a particularly nasty query was executed over and over again and it took a few hours to find it and then block it.

And during that time so many nodes had became slow and unresponsive that another (for us) previously unseen memory leak started to occur.

Nodes kept building up queues of unanswered ping requests on them. And the requests contained our 100Mb large cluster state, so the heaps filled up and evenmore nodes became unresponsive.

And from then on the whole thing turned into a death spiral of doom.

After trying, and failing to get it under control for 48 hours we gave up and rebuilt the whole cluster from scratch, using the snapshots we store on S3.

The recovery took another 90 hours or so. That was not a fun week.

thatwasunusual · on Nov 12, 2022

Non-technically, it was a horrow-show. I worked at the company from 2005/06 to 2012, when I quit after witnessing shameful behaviour towards women, a party-culture that literally lead to rape allegations, and a CEO who looted the company for money and shipped it to tax-havens.

One of the area managers - Kaveh, IIRC - also had a double morale on line with Trump. He was very "don't put your pen in the company ink", and proceeded to get one of his subordinates pregnant.

I remember there was an "anti Meltwater blog" at some point, but I can't find it now. I don't remember the URL either, so I can't look it up on archive.org. However, this site[0] seems to contain copy-pasted stuff from it.

As said, my experiences are 10+ years old, and hopefully things have improved.

[0] https://meltwater66.rssing.com/chan-11517474/latest.php

endisneigh · on Nov 11, 2022

Is there no other search database that can be persisted other than Elastic/Lucene/Solr?

I get that there’s little money to be made in these things but it’s surprising. Seems like most full text search are relatively simple plug-ins to existing databases or in memory only.

bratao · on Nov 11, 2022

Yes, there is. We moved from ES to Vespa (vespa.ai) and never looked back. WE got better results, speed and WAY lower maintenance costs. I really don't understand how underrated this project is.

atombender · on Nov 20, 2022

Vespa seems like a great match for Elastic's text and vector search, but not for classic "OLTP"-style queries.

For example, until very recently Vespa did not even have case-sensitive string field matching [1]; doing a strict equality query on a string field was not possible, and the authors did not seem to see why it would be useful. Vespa lacks a lot of this kind of basic search functionality, making it less general-purpose than Elasticsearch.

It's clear that it's a very powerful search engine, though it also feels antiquated in many ways. It's very obvious that it's an ancient project that has been worked on by many different people throughout the years, with no cohesive vision or design, though it does seem like they're slowly cleaning things up. (The documentation used to be much worse, for one.)

[1] https://github.com/vespa-engine/vespa/issues/10991

murkt · on Nov 11, 2022

How do you deal with Vespa’s query language, YQL?

bratao · on Nov 11, 2022

I was also suspicious "Great, another one that wants to reinvent SQL". But in practice it works very well, to the point of I enjoying it.

faizshah · on Nov 12, 2022

I agree, I think if they ever decide to get it in Apache the popularity will explode. Same thing happened with flink and ignite.

morelisp · on Nov 11, 2022

Nearly a decade ago (oh god) I converted some overdesigned five node ES mess to https://github.com/mchaput/whoosh. It's (obviously) not the fastest or anything, but it was more than good enough for low-dozens of GBs of mostly static data.

faizshah · on Nov 12, 2022

I tried whoosh but its way too slow, I think SQLite offers a great option (FTS5) for small data.

david38 · on Nov 12, 2022

Splunk is making billions on this.

permb · on Nov 11, 2022

Such an amazing engineering team that the world doesn’t know about (based in Gothenburg, Sweden).

Disclaimer: I was once part of it

andrelaszlo · on Nov 11, 2022

Congrats on finishing that monster migration!

> In order to control how queries are executed, we have built a plugin which exposes a set of custom query types. We use these query types to provide functionality and performance optimisations not available in stock Elasticsearch. For example, we have implemented wildcards within phrases, with support for executing within SpanNear queries. We optimise “*” to a match-all-query. And a whole lot of other things.

Did you port your the in-house plugins? Seems like a big blocker.

karlney · on Nov 11, 2022

Thank you. Yes it was a massive project.

I don't want to spoil the other blog posts but we managed to solve almost all of our custom use cases without modifying elasticsearch itself. We still have one custom plugin but only to enhance functionality, not for performance and stability reasons.

semi-extrinsic · on Nov 11, 2022

While I fully understand why you run this thing with 300+ nodes as you do, I have to wonder, just for fun - could you actually fit this whole thing on a single large server? Looks like something with 16 TiB RAM and 2 PiB SSD storage is actually a server you could theoretically buy today?

karlney · on Nov 11, 2022

We feel that ~300 nodes strikes a good balance in the cattle vs pets philosophy.

Going up to i3en.12xlarge (or equivalent) would probably have worked as well.

But after that the impact of loosing just one node would be too big.

andrelaszlo · on Nov 11, 2022

Cool! Will stay tuned for the next post :)

AtlasBarfed · on Nov 12, 2022

Wow and it WASN'T Elassandra?