I've been playing with wait-free algorithms some time ago but never got anywhere meaningful.
A wait-free queue for multiple consumers and multiple producers would be kind of the holy grail when it comes to wait free data structures. Is it MPMC though? I skipped through the PDF and could not find a definitive claim…
Yes, it's MPMC. Progress guarantees are only really meaningful for multiple readers/writers. (If you have only one reader/writer, and it dies, who cares if the writers/readers on the other side keep working? Your system's hosed anyway.)
He previously saved my bacon with `node-finish` - a Node.JS library that let's you fire off a bunch of tasks, and wait for _all_ to finish before executing a callback.
So I just recently learned how a rather simple ring buffer implementation is lock-free and doesn't even need primitives like CAS (although it does need memory barriers).
In spite of this, all the literature about lock-free queues that I see (including this one) is for some kind of linked structure. Any reason for that?
I guess one big limitation of the ring buffer is that it only works for a single reader and single writer. Although maybe you can work around that using CAS primitves...
I became aware of the distinction when I skimmed the paper. But I couldn't figure out what it meant in this case.
Specifically does the every thread requirement rule out algorithms where there only one reader and one writer is allowed? If not, then this ring buffer is wait-free, since there is no CAS-and-retry step. (Of course an operation can fail because the queue is empty/full, but that is still completion).
* memory reclamation being dependant on a GC or not
* which atomic operations they need (e.g. some require double-CAS)
What you mention is a spsc bounded queue. this is about a mpmc unbounded queue. And from skimming the paper is seems like it does not require a GC either.
So it's far more powerful.
Of course spsc queues still have their use even if we have a good mpmc queue, they generally incur less overhead.
The single-writer, single-reader limitation is a big deal. You can work around it, but it's no longer the same thing, as what makes such queues so fast is that they don't do CAS or other kinds of atomic operations at all. Such queues are extremely useful when you can ensure single-reader and single-writer, but it does not prevent us from need more general purpose solutions. More general solutions will require more synchronization, and if they do not want them to be fixed-sized structures, they will usually require some kind of linked structure.
A ring buffer can work for a single writer and multiple readers, in the model that each element is received by all the readers. Each reader just needs their own read index and the writer probably needs to check all the read indices before writing a new element, to ensure it doesn't overwrite data still being read.
And indeed it you want all readers to get all elements, I think a ring buffer is easier than a linked structure.
I think the fixed-size limitation is the least important thing in practice.
If your queue is filling up arbitrarily far, then you are almost certainly in trouble anyway, and want some limitation strategy. The ring buffer gives you this quite naturally.
I suppose there could be cases where you did not want to pre-commit to a buffer size. E.g. you have lots of queues which you expect to be short, but which might grow large in rare cases. But I have never actually seen such a case.
I would expect it to be a common use case in actor-based systems like Erlang, Akka, etc. It's perfectly supported to have 100k+ processes running in an Erlang VM, but the vast majority of them will have an empty message queue at any given time.
Indeed large numbers of small queues are an important case. But large numbers of small queues, every one of which might unexpectedly but legitimately need to grow large is not.
I don't know Erlang or Akka first hand, but I know that Go channels are inpsired by Erlang. Channels can be very numerous and are usually empty; but they are fixed size. In the rare occasion that you think a channel needs more buffering, you must say so explicilty.
For context, the traditional wait-free queues are only wait-free for one side, writing or reading. The other side has to spin in an atomic fetch loop, waiting for the wait-free multi-step operation to complete.
The double wait-free queue from the paper is using a technicality to achieve its status. The spinning operation is replaced with a loop over all the threads doing opposite (multi-step) operations, finishing them for threads that may be blocked in the middle of the steps. It's not that it doesn't spin, but the spinning is bounded by the number of threads. Edit: Actually, spinning is bounded by O(#threads^4) according to the paper.
Dmitry Vyukov queue is awesome, but as the author himself notices, a slow writer can prevent the reader to make progress, and as such is not even non-blocking.
Hmm.. why bother with a complicated approach now that transactional memory (Intel TSX) is available in HW now? What are the advantages and disadvantages of this approach vs using TSX?
Hardware transactions as implemented by TSX are not wait free. There is no guarantee that any TSX transaction will ever commit, so you need a non-transactional fallback path to guarantee forward progress.
Do you know the relative performance of Power7 and Haswell? The implementation is not wait-free on Power7 because of HW limitation, and has significantly lower throughput than Haswell in the benchmarks.
Not "recent". It's disabled in Haswell where it debuted, but it's OK from Broadwell on. From wikipedia, with citation: The bug was fixed in F-0 steppings of the vPro-enabled Core M-5Y70 Broadwell CPU in November 2014.[22] We're talking about end of 2014.
A wait-free queue for multiple consumers and multiple producers would be kind of the holy grail when it comes to wait free data structures. Is it MPMC though? I skipped through the PDF and could not find a definitive claim…
Also, big thumbs up to Chaoran for putting the implementation on GitHub: https://github.com/chaoran/fast-wait-free-queue The code looks like it is easy to follow and it's MIT licensed.