4
Wisecrack
20h

Non-Cooperative Threading With Lock-free, Wait-free Read AND Write
(without compare-And-Swap (CAS))

Premise is straight-forward:

You spawn multiple threads. Each is given a prime number assigned and a counter updated each full execution of the thread.

Any time a thread wants to read or write to a shared resource, it checks its counter modulo its prime number against all others. Wherever counter modulo prime == 0 and > 0 for all other prime keys in a given set, it is safe to read/write the shared resource.

Some limit is set on the counter so it periodically resets.

You may also assign these to say, numbered priority enums or somesuch.

So for example a "priority 0" thread would use prime=2, because this is going to match the most frequently against the counter.

And this is how you get non-cooperative threading with lock-free read and write.

Additionally, by carefully tuning the size of the set of primes, relative to the maximum of the counter before reset, you can minimize latency.

Finally with careful thread management, you could, say, have a framework such that if some threads counter mod its key == 0 while the mod of other keys == 1, those threads may be strictly reading a resource rather than writing it, decreasing read/write latency further, if I haven't misunderstood how reading a shared resource works.

Is this novel? Idk, but maybe it'll help someone eventually.

Comments
  • 4
    I wish I could remember how to do maths.
  • 2
    @donkulator its actually pretty straightforward.

    Every thread has a counter that gets increment each execution, on a timer.

    A thread will, when spawned have some argument passed to it thats a prime number, along with the current global counter value.

    You'd have to account of initialization and spawning latency so theres some epsilon of accuracy on the timer itself for the counter.

    Anyway thats not important. Important bit is, if you have three threads, A, B, and C, lets say A is the highest priority, you give it the key of 2 which is prime, and B you give say the key of 3, and C you give the key of 5.

    Each tick on each thread, they check their counters.

    Modulo is just the remainder of a division. So for example 10 mod 7 == 3, because 7 goes into 10 once, with 3 left over.

    continued...
  • 2
    @donkulator At ticks 2, 4, and 8, and thread A knows, without any communication, it can write/read to some global resource/thread without encountering race conditions because the only numbers that divide into 2, 4, and 8, is 2.

    If say you thread B tries to execute at tick counter 4, it will skip accessing the resource (and perhaps wait, or do some other work) because 4 mod 3 = 1 (not 0).

    Thread A has the most priority here because half of all numbers are divisible evenly (remainder of 0) by 2.
  • 1
    Interesting idea. Reminds me a bit of token passing networking (except the token is implicit)

    This does require you to pool up reads and writes until it's your thread's turn though, and you *do* still have to wait for other threads to increase the global counter

    So I'm not sure wait-free is accurate
  • 2
    Hmm actually are you sure this works?

    Lets say global counter is 3 which is my prime. I start writing to the resource. Meanwhile two threads want to read it. Do they now have to wait until it's their turn? If not, thread 3 (with prime 5) now thinks he can read the resource, but thread 1 is still writing!

    If on the other hand threads only increment the global counter after a successful access, then this scheme gets pretty bogged down since each thread has to read or write to move the counter along so that *other* threads can access the resource. If one thread stops accessing, nobody can move forward and everybody waits forever I think
  • 3
    "Wait-free" so that was a fucking lie. Doesn't the thread need to wait until its counter satisfies the condition?
  • 3
    There is no such thing as concurrent writes without waiting or CAS or some other sync method (such as eventual consistency with databases). It's not physically possible, and people just tend to end up inventing queues in more and more complicated manners.
  • 4
    Another thing to consider with concurrency is "is the juice worth the squeeze?"

    Concurrency generally implies high performance computing, so every op counts. Modulo is not free. Iterating a counter is essentially free, but still not. Comparisons for each read/write are not free. Traditional sync methods are probably still out-performing this prime queue scheme.
  • 2
    @AlgoRythm True. I've spend so much time myself to come up with some concurrent data structure but at the end of the day, you can either read or you can write. Ya can't do both at the same time, bucko!

    I think the only real solutions are either

    A.) Better scheduling like in bevy, where threads don't have locking overhead because some external scheduler makes sure they have proper access when the execute

    B.) Eventual consistency, where writes don't disturb concurrent reads until some synchronization point happens. left_right (https://docs.rs/left-right/latest/...) is an example of such a data structure
  • 2
    Also: isn't this a cooperative scheme because all of the threads need to cooperate upon the same counter?

    @12bitfloat I tend to solve concurrency by pre-allocating memory and allowing my threads to consume as much is needed, then syncing all the results back together at the end.

    For example, need to sum a list of numbers? Allocate an array that is n items long, where n is the number of logical units on the processor. Each LU will sum a fair portion of the input array and place it into its reserved spot in the memory (thread 1 gets spot 1, etc). Then, after all threads are joined back with the main thread, sum the mini array.

    This is in opposition to something like a mutex or semaphore where each thread might want to update the shared total sum, creating a bottleneck. In this example, it's obvious. In real-world examples it can be less so. In fact, I can't think of any time I've actually used a thread lock to solve a concurrency problem.
  • 1
    @AlgoRythm That works well for reduce-problems but not when you really need concurrent mutable access (multi threaded video games for example)
  • 2
    I can advise against a thread only concurrency model. Other then that I might misunderstand how the counter is updated/read. Why does the counter not already need locking?

    Also modulo is quite heavy. Did you benchmark this?

    Edit: I see others have similar concerns
  • 1
    @12bitfloat that's what I'm trying to say, it genuinely works well for just about any concurrency problem. My example reduces and joins data, but other applications might give each thread a bucket of data for queues, requests, messages, whatever - and a parent thread just glues it all together.

    Multiple threads rarely need to communicate back and forth with high bandwidth. Typically, they just do their work and join back with a main thread when the work is done. Video games are the same; you have a thread or two running physics, graphics, network, windowing, etc. They can all orchestrate just before a frame buffer swap (screen update). Typically the graphics and physics don't need to be in constant communication, for example.
  • 1
    not sure if I misunderstood, but I already do processing like this in some of apps on a higher level. Isn't this pretty much how SIMD architectures handle stuff anyway? You load a bunch of data into the memory, and then each thread operators on it's own bit/stride of the data.

    Not sure if this is different, or if you want to make this into a more low-level concept in compilers or in hardware or something, but as far as I can tell the technique isn't new as a concept... unless I'm not getting it :D
  • 1
    @12bitfloat yeah wait free was a bit of a reach.

    I was operating on 3 hours sleep over two days, as you can tell from all the fucked grammar though.
  • 1
    @12bitfloat "the counter along so that *other* threads can access the resource"

    Each thread keeps its own internal counter, initialized by some argument passed in when it was spawned. The counter is incremented based on a timer, not on a full pass over the thread function.
  • 2
    @AlgoRythm "It's not physically possible, and people just tend to end up inventing queues in more and more complicated manners."

    This is what you get when amateurs like me who are unfamiliar with a topic try to be inventive.

    Oh well. It was an idea.
  • 0
    @hjk101 depends on how many threads you have, and the maximum value of the counter.

    Part of the reason I wrote it might need tuning, but you bring up some solid concerns.
  • 0
    @Wisecrack But if its incremented on a timer, how do you make sure who ever is writing is finished by the time another thread can read it?
  • 0
    @12bitfloat I'd assume you'd have to have a chunking system, or a fixed amount of time a read or write can go on, determined empirically.

    Fuck, maybe everyone is right that its reinventing the wheel.
  • 0
    @Wisecrack I may have been overly-critical. I didn't intend for you to catch the flak of all other parallel processing n00bs. You're a smart guy and your ideas are good. Concurrency is a topic that you need to get a feel for, a lot of things in the field that seem logical are actually counter-intuitive. For example, parallel processing actually has a ton of very heavy overhead so even if you parallelize a problem properly you might see performance loss compared to single-threaded computation. It's a bit wild sometimes frankly.

    The good rule to remember (both in life and especially computer performance): there is no free lunch!

    Your idea is neat. I HIGHLY recommend a channel called "Core Dumped" on YouTube. He has some parallel processing videos and you'll see that your ideas are not actually that far off schemes from the past.
  • 0
    Relevant YouTube link (I have personally watched and recommend this video): https://youtube.com/watch/...

    Edit: the AI voiceover might be jarring. He does it because he has a strong accent, this channel is NOT AI slop at all.
Add Comment