Posts tagged with 'stack'

Nasty ABA problem in array-based lock-free stack

Stack of books and laptopIn the comments to my post on testing the lock-free stack, Dmitriy V'jukov wondered if my code could test a broken implementation of a stack built on a preallocated array. He provided a link to the code on a Russian forum (yes, I used Google Translate a lot). It seems it’s not his, but a friend’s or a colleague’s.

I downloaded it and cleaned it up a bit for here (the volatile keywords aren’t needed for example since there are memory fences all over the place, so that gets rid of the #pragmas, and I really wanted better names for the variables):

  class SafeStackItem<T> {
    public T Value;
    public Int32 Next;
  }

  class SafeStack<T> where T : class {
    private Int32 head;
    private Int32 count;
    private SafeStackItem<T>[] array;

    public SafeStack(SafeStackItem<T>[] array, int pushFrom, int pushCount) {
      this.head = -1;
      this.array = array;

      count = pushCount;

      head = pushFrom;
      for (int i = 0; i < pushCount - 1; i++)
        array[i + pushFrom].Next = pushFrom + i + 1;
      array[pushFrom + pushCount - 1].Next = -1;
    }

    public Int32 Pop() {
      while (count > 1) {
        Int32 oldHead = head;
        Int32 oldHeadNext = Interlocked.Exchange(ref array[oldHead].Next, -1);

        if (oldHeadNext >= 0) {
          if (Interlocked.CompareExchange(ref head, oldHeadNext, oldHead) == oldHead) {
            Interlocked.Decrement(ref count);
            return oldHead;
          }
          else
            Interlocked.Exchange(ref array[oldHead].Next, oldHeadNext);
        }
      }

      return -1;
    }

    public void Push(Int32 index) {
      Int32 oldHead;
      do {
        oldHead = head;
        array[index].Next = oldHead;
      } while (oldHead != Interlocked.CompareExchange(ref head, index, oldHead));

      Interlocked.Increment(ref count);
    }
  }

My first comments were essentially summed up by the word ‘horror’. The array is public, the indexes into the array are public (the push and pop operations work on array indexes, for heaven’s sake), and the elements are all public. I also pointed out that the stack was just completely broken: even in a single-threaded app you could never pop off the last item on the stack. He came back and asked whether I could find the bug in the algorithm. I replied with a negative, pointing out that

I'm not even sure how it's supposed to be used, so a test app would be more than welcome. In using a multithreaded stack, I'm more used to having 'producers' that push 'work' items on the stack, and 'consumers' that pop them off. Here I'm not even sure how a consumer would signal to a producer that it had completed the work on index N and hence that index can now be reused. Yuk, even writing that down gives me the shivers.

In translating and reading the thread on the Russian forum, I got the impression that he tested it with each thread popping off an item, “processing” it in some way, and then pushing it back. Why, I have absolutely no idea; I just think this stack is broken as a usable multi-threaded container for the reason given above, and that’s even before you get to the problem of it suffering from an ABA problem. (Or maybe you should have two such stacks, equal in size, pop off from one and push onto the other. Yuk.)

The ABA problem occurs because of the reuse of ‘nodes’ without making sure that all threads have relinquished control of a given node before reusing it. It’s a classic ABA scenario and after I’d thought about it for a little while, I played the game of “hunt the node reuse”. In the end, it wasn’t that hard to find.

Let us suppose that slow thread A ‘sees’ the stack 3 -> 4 -> 5 -> etc; that is, the head of the list is node 3. It calls Pop(). The first step is to read the head index into oldHead, setting it to 3. In between this step and the next, it falls asleep, and another thread, X, pops node 3. The oldHead node (that is, node 3) will have its Next index set to -1 by this pop operation.

Thread X then starts to push node 3 back onto the stack. However, it has some extraordinary difficulty because some rogue super-fast thread is pushing a bunch of nodes back onto the stack, bam, bam, bam. X tries, node 3’s Next gets set to 42, the current head, but the compare-exchange fails because the head has changed right under its nose. Damn. It tries again, node 3’s Next gets set to 43. No go, fails again. It tries again, setting node 3’s Next to 44 and finally succeeds.

The stack now should look like 3 -> 44 -> 43 -> 42 -> 4 -> 5 -> etc.

But in the middle of all that, thread A wakes up. It sees the momentary scenario of 3 -> 42 that is about to fail in the middle of thread X’s Push() loop. Its own exchange succeeds (the second step of its loop), and node 3 now points to -1 and oldHeadNext is 42. Thread A falls asleep again...

...and a little time passes. The stack now looks like this: 44 -> 43 -> 42 -> 4 -> etc. Thread X, in its attempt to push node 3 back onto the list has just set its Next pointer to point to 44, since that’s the current head of the stack (that -1 from the previous paragraph has long gone). It’s about to call the compare-exchange when...

...thread A awakes with a start. It fails its compare-exchange and so resets oldHead.Next to 42 (that is, node 3’s Next to 42) and goes round its loop again.

And thread X completes its compare-exchange successfully to push element 3 back onto the stack. Unfortunately, the stack now looks like this: 3 -> 42 -> 4  -> 5 -> etc, and elements 44 and 43 have been lost. ABA!

So, in summary, rule 1 of lock-free containers is “don’t reuse a node until every thread has relinquished that node’s reference”. With ‘my’ lock-free stack, the garbage collector enforces that rule in spades.

And before Dmitriy asks, this array-based stack is broken beyond repair. There is no fix. An array of nodes just implies “node reuse”.

Album cover for HeligolandNow playing:
Massive Attack - Girl I Love You (She Is Danger Remix)
(from Heligoland)


Testing the lock-free stack

Stack of books with laptop on topRecently a reader was having some issues with my lock-free queue implementation. In investigating the problem, I decided to throw out my previous “stack and queue” multithreaded test program as being way too hard to modify and started out afresh. In doing so, I also cleaned up the implementations of the lock-free stack and queue, making extensive use of CodeRush’s code issues functionality.

For details on the lock-free stack, read here. For details on the lock-free queue, see here.

In this post. I’ll talk about testing the lock-free stack in a multithreaded application.

First of all, here’s the latest version of the lock-free stack code. There’s not that much change from the original:

using System;
using JmBucknall.Threading;

namespace JmBucknall.Structures {

  public class LockFreeStack<T> {
    SingleLinkNode<T> head;

    public LockFreeStack() {
      head = new SingleLinkNode<T>();
    }

    public void Push(T item) {
      SingleLinkNode<T> newNode = new SingleLinkNode<T> { Item = item };
      do {
        newNode.Next = head.Next;
      } while (!SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, newNode.Next, newNode));
    }

    public bool Pop(out T item) {
      SingleLinkNode<T> node;
      do {
        node = head.Next;
        if (node == null) {
          item = default(T);
          return false;
        }
      } while (!SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, node, node.Next));
      item = node.Item;
      return true;
    }

    public T Pop() {
      T result;
      Pop(out result);
      return result;
    }
  }
}

First of all, I tested the stack to make sure it did the standard push and pop properly in a single-threaded environment. All that’s needed is a few unit tests and, to be frank, these tests are boring and tedious and I won’t show them here. Of more interest is the use of the stack in a multithreaded application.

Back in the old days, I tested on a dual core machine, but now I have a quad-core desktop, so I wanted to test the code using all four cores. In essence I wanted to have one or more “pusher” threads and one or more “popper” threads. With four cores, to avoid as much CPU contention as possible, I could have one pusher and three poppers, two pushers and poppers each, or three pushers and one popper. (Of course I could increase those numbers as far as I wanted to go, but I would run into thread scheduling on each CPU and I rather wanted to stress the concurrency code as much as possible. Having each thread on its own core would certainly do that.)

The flow of the test app would work like this. I created the shared lock-free stack; for ease of testing, making it a stack of ints. To make sure that the pop code didn’t (a) miss an item or (b) pop the same item twice due to any unforeseen multithreaded issues, I decided to check off each item as it was popped. For that I needed an array of results (essentially booleans) to show which values had been popped or not. Since I wanted to use the thread pool in .NET, I needed a way to know in my main thread when each subsidiary thread, be it a pusher or a popper, had finished. The simplest way to do that is to use a manual reset event (ManualResetEvent) for each thread and then signal it (that is, call Set())  at the end of the thread’s method.

For the pushers, I decided to push all the numbers from 1 to some large value (here, 10 million). To make sure that I didn’t push a value twice, I used the Interlocked.Increment() method ensuring that each pusher got its own values.

  static class Globals {
    public const int TopValue = 10000000;
    public const int PusherCount = 1;
    public const int PopperCount = 3;
    public static int PushValue;
    public static bool NoMoreData;
  }

  public class ThreadData {
    public LockFreeStack<int> stack;
    public ManualResetEvent completion;
    public int[] results;

    public ThreadData(LockFreeStack<int> stack, ManualResetEvent completion, int[] results) {
      this.stack = stack;
      this.completion = completion;
      this.results = results;
    }
  }

  static class PusherEngine {
    static public void Execute(Object stateInfo) {
      ThreadData data = stateInfo as ThreadData;
      int pushCount = 0;

      int nextValue = Interlocked.Increment(ref Globals.PushValue);
      while (nextValue <= Globals.TopValue) {
        data.stack.Push(nextValue);
        pushCount++;
        nextValue = Interlocked.Increment(ref Globals.PushValue);
      }

      Console.WriteLine(String.Format("Push count: {0}", pushCount));
      data.completion.Set();
    }
  }

The Globals class is just a holder for some constants and global values. The ThreadData class is merely a simple record-type class to hold the data for the threads. In essence, the only thread-specific member of that class is the completion reset event, so the other members could have been put in the Globals class. Hey, ho.

For the poppers, I decided to let them run, just popping values off the stack until they got a special “close-down” value. After that they just terminated.

  static class PopperEngine {
    static public void Execute(Object stateInfo) {
      ThreadData data = stateInfo as ThreadData;

      int popCount = 0;
      while (true) {
        int value = data.stack.Pop();
        if (value == 0) {
          if (Globals.NoMoreData)
            break;
          Thread.Sleep(1); //minor wait when stack is empty
        }
        else {
          popCount++;
          int oldValue = Interlocked.CompareExchange(ref data.results[value], 1, 0);
          if (oldValue != 0) {
            Console.WriteLine(String.Format("Error: already popped {0}", value));
          }
        }
      }

      Console.WriteLine(String.Format("Pop count: {0}", popCount));
      data.completion.Set();
    }
  }

Notice here the checking to see whether a value has already been popped or not. A value of 0 popped off the stack means that the stack is temporarily empty, and so the popper just waits a smidgeon before going round and trying again. If the stack was empty and the Globvals.NoMoreData item was set, it means that all the pushers have finished adding items to the stack and so it’s time for the popper to terminate.

Now the actual console application:

  class Program {
    private static ManualResetEvent[] QueueWorkItems(int count, LockFreeStack<int> stack, int[] results, WaitCallback callback) {
      ManualResetEvent[] events = new ManualResetEvent[count];
      for (int i = 0; i < count; i++) {
        events[i] = new ManualResetEvent(false);
        ThreadPool.QueueUserWorkItem(callback, new ThreadData(stack, events[i], results));
      }
      return events;
    }

    static void Main() {
      DateTime start = DateTime.Now;

      Console.WriteLine("create the shared stack");
      LockFreeStack<int> stack = new LockFreeStack<int>();

      Console.WriteLine("create the shared results array");
      int[] results = new int[Globals.TopValue + 1];

      Console.WriteLine("create the completion events");
      var pushersDone = QueueWorkItems(Globals.PusherCount, stack, results, PusherEngine.Execute);
      var poppersDone = QueueWorkItems(Globals.PopperCount, stack, results, PopperEngine.Execute);

      Console.WriteLine("wait for the pushers to be done");
      ManualResetEvent.WaitAll(pushersDone);

      Console.WriteLine("signal the poppers to stop, and wait");
      Globals.NoMoreData = true;
      ManualResetEvent.WaitAll(poppersDone);

      Console.WriteLine("check that all values were popped");
      for (int i = 1; i <= Globals.TopValue; i++) {
        if (results[i] != 1)
          Console.WriteLine(String.Format("Error: {0} was never popped.", i));
      }

      DateTime end = DateTime.Now;
      Console.WriteLine("Done");
      Console.WriteLine(end - start);

      Console.ReadLine();
    }
  }

The QueueWorkItems method creates the manual reset event for each thread and queues the thread up in the thread pool. It returns an array of the manual reset events created so that the main thread can wait on them all to be signaled with WaitAll(). Firstly the main thread waits for all the pushers to finish, sets the “no more data” boolean, and then waits for all the popper threads to complete. Finally it makes sure that all the integer values were popped.

Using this application over several runs with differing numbers of pushers and poppers, I checked that the stack functioned properly. And it did. All four cores were pegged at the max for the few seconds of the test run.

Next time, the queue.

Album cover for Far Away Trains Passing ByNow playing:
Schnauss, Ulrich - nobody's home
(from Far Away Trains Passing By)


More on the lock-free stack/queue: memory fences

Last time I talked about the lock-free stack and queue, I was more concerned about proving that my code was free from the ABA problem than anything else. In making my argument I naturally glossed over such niceties as the .NET memory model and so assumed (pretty much) a sequential memory model, because, to be honest, that's how we think as programmers.

Spring path with picket fenceConsider the following set of statements in C#:

      int foo;
      bool fooHasBeenSet = false;
      // ...
      foo = 42;
      fooHasBeenSet = true;

Now, you and I when we read this will naturally assume that the assignment statements occur in the order shown. In fact, that assumption is more along the lines of a tautology, a duh! statement. It can't be otherwise, surely? The foo variable will be set to 42 and then the boolean variable will be set to true. This assumption is known as the sequential memory model, and it is the strictest memory model we can use.

As it happens, on modern machines, this assumption could be false. .NET allows the compiler or the JITter or the hardware to rearrange statements as they see fit, so long as the rearrangement has the same overall semantics as the code as written. The reason for allowing this is to improve the overall efficiency of the code given the presence of pipelined CPUs, caches, and the like. So in fact the boolean variable could be set before the int variable. The boolean variable may not be set until we actually want to read it, ditto foo. We couldn't tell because as soon as either variable was read after the relevant assignment, we'd see the correct value.

Now in a single-threaded program, there's absolutely no way to find out if the order was changed. We can't throw in a Console.Write(value); immediately after the assignment to the int variable, since we would just see the correct value of the variable (but note, the assignment to the boolean could still occur prior to that). In essence, in a single-threaded application, all this talk of rearranging control flow to suit the compiler or JITter or hardware is just navel-gazing. We can't tell, and so it doesn't matter.

In a multi-threaded application though, it does make a difference. A big one.

In a multithreaded environment, one thread could set the boolean before the int variable due to this allowed rearrangement in control flow. Another thread could then see fooHasBeenSet as true, but still see the old value of foo, with presumably dire results. But, wait, there's more! In a modern PC, main memory is too slow and so there's one, generally two, levels of memory cache, known by the helpful names of L1 and L2. L1 is hyper fast, very small and is close to the CPU. L2 is much faster than RAM but slower than L1, and is a few megabytes in size. (For example, on my desktop, L1 cache is 32KB for data and 32KB for code, and the L2 cache is 4MB. I have 8GB of RAM.) The issue here is each core has separate caches. When you read data, you may be reading it from the cache for that CPU, not the shared memory for the machine. One thread may update a variable (that is, some memory slot), and another thread may not even see that it has been changed since the old value is in the cache of the core that's executing that thread.

How the heck do multithreaded applications work then? it sounds like they should all completely fail with invalid unsynchronized data.

In particular, going back to the lock-free stack and its Pop() method,

    public bool Pop(out T item) {
      SingleLinkNode<T> node;
      do {
        node = head.Next;
        if (node == null) {
          item = default(T);
          return false;
        }
      } while (!SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, node, node.Next));
      item = node.Item;
      return true;
    }

head.Next could be in the L1/L2 cache for a particular thread, but in fact be altered by another thread. The test to see whether head.Next to be equal to node in the CAS method could just be true for that particular thread because both variables happen to be in the same cache. We could pop the same node off the stack twice! Man, this is even worse than ABA, we are so hosed.

Luckily for us, all is not lost. Without going into too much detail (see this MSDN article if you want more), the hardware contains primitives to ensure data coordination and synchronization. These primitives go under the name of memory fences (or barriers) and since .NET only supports full memory fences, that's all I'll talk about here. To put it in layman's terms, a memory fence resets the caches and ensures that all pending memory writes and reads have completed. Deferred reads can't cross the memory fence (the fence forces them to happen), and memory writes must happen before leaving the fence. That is, a memory fence will ensure that, in our original code example, the second thread will "see" the assignments of the data in the correct order, and that in the Pop() code that head.Next is read from RAM.

A memory fence is therefore a slow operation. By slow, I mean in comparison with the normal running of code and fetching data that a core CPU does. It's certainly not slow like reading from a disk. A few hundred cycles, perhaps; nothing you'd notice. It forces everything, at least for that moment, to be synchronized across all threads.

How do we impose a memory fence to ensure synchronization? By far the most traditional way is to use a lock. This is the granddaddy of all memory fences in a way, since a lock is all about the explicit synchronization of threads. I'm sure, just like me, you never even thought about a lock including data synchronization — you just assumed it was so and never thought about any other possibility. But my lock-free stack and queue don't use locks.

Another memory fence is provided by the interlocked primitives, including the .NET methods that call them. And, since it uses the Interlocked.CompareExchange method, the CAS method I wrote includes a memory fence. The global value of head.Next is forced to be updated by the completion of the CAS method, and the CAS method is forced to read its value anew from shared RAM. Phew. The lock-free stack code still works.

Album cover for Blade Runner Now playing:
Vangelis - Blade Runner Blues
(from Blade Runner)


Someone is wrong on the internet — the lock-free stack edition

Getting on for 4 years ago, I wrote a series of posts on lock-free containers in .NET, like the lock-free stack and the lock-free queue. They were fun to write, mainly because of the difficulty of the topic and trying to rationalize why early versions didn't work, and so on so forth. Along the way I learned about the .NET memory model, volatile, memory fences, and other arcana, but in the end I had something that worked and worked well. Over the years since then I've had many emails from readers about lock-free containers and I've been happy to have been linked from many different places, so much so that currently I'm number 1 for "lock free stack" and queue on Google.

Dunce's Chair And then yesterday, I suffered from "Someone is wrong on the internet" syndrome. Big time.

Someone, let's call him GC after the garbage collector in .NET (since as you'll see he demonstrated that he doesn't understand it) sent me an email. It seems GC had been reading up on the ABA problem in wikipedia as part of learning about lock-free containers and then found my lock-free stack and queue implementations. Without taking any time to understand my implementation or even consider the differences between the environment for the code in the wikipedia article and that for my code, someone was wrong on the internet, and that person was me. An email was duly written to me saying my implementation was broken. You see, the wikipedia article had a broken lock-free stack, therefore all lock-free stacks were broken.

Huh? Let's see: we have an anonymous author on wikipedia demonstrating some code that was broken and a named author (that would be me) showing off some similar code that purported to be working, and therefore the named author was automatically wrong? Wow.

Sure the person who's willing to put his name to some technical article could be wrong, but isn't it likely that, you know, he could be right (by naming himself he's staking his reputation on what he writes after all — that stake isn't so visible on wikipedia) and that you were missing some information? So, wouldn't you be, like, "I think I'm misunderstanding something. I read this on wikipedia and your code doesn't seem to do anything about this problem. Could you help me understand why?" And I'd be all, sure, let me explain why the wikipedia problem, although a valid problem, doesn't apply to the code I wrote, and we'd have this illuminating conversation. It's happened in the past, and it'll happen in the future, and I'm happy to do it.

But no. Someone was wrong on the internet and it was obviously me.

GC sends me this email that said things like "[Your implementations of the lock-free stack and queue] are busted.  You have not solved the ABA problem." and then rehashed the wikipedia article to prove I was wrong, only to terminate with a snide "You might consider a disclaimer on your web site . People trying to use your algorithms may believe they are getting atomic lock-free algorithms, only to discover their applications crash (in the field) in ways that are almost impossible to reproduce deterministically." (No, I'm not sure what the word atomic means in there, either.)

To put it mildly I was shocked that someone who obviously hadn't understood any of the issues with lock-free programming (and I don't pretend to know them all) or the ABA problem was willing to spend five minutes reading up on it and then ten minutes writing an email to a stranger saying that said stranger was wrong despite the fact he could have downloaded and run the code that showed said stranger was right. It blew me away.

Take a look at the wikipedia article on the ABA problem, especially the example. It's a not very well written lock-free stack (there's hardly any encapsulation) written in C++ and exemplifies a standard ABA problem. To recap, ABA is all about the invalid reuse of memory before all threads have finished with that memory. The thing about the lock-free stack code on wikipedia is that, in order to show off ABA easily, it exposes the nodes that go to make up the stack to the callers. Obviously the callers can then do silly things to said nodes, including deleting them (that is, deallocating them) and reusing them inappropriately. And by doing so, it's then very easy to show the ABA problem. If you like, the code was a pedagogical device, not an example of how to write a lock-free stack or that all lock-free stacks are flawed. Unfortunately for my friend GC he wasn't experienced or equipped enough to understand the difference.

Here's the full code for the lock-free stack that I wrote.

  public class LockFreeStack<T> {

    private SingleLinkNode<T> head;

    public LockFreeStack() {
      head = new SingleLinkNode<T>();
    }

    public void Push(T item) {
      SingleLinkNode<T> newNode = new SingleLinkNode<T>();
      newNode.Item = item;
      do {
        newNode.Next = head.Next;
      } while (!SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, newNode.Next, newNode));
    }

    public bool Pop(out T item) {
      SingleLinkNode<T> node;
      do {
        node = head.Next;
        if (node == null) {
          item = default(T);
          return false;
        }
      } while (!SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, node, node.Next));
      item = node.Item;
      return true;
    }

    public T Pop() {
      T result;
      Pop(out result);
      return result;
    }
  }

It's slightly different from the code in my original post since it implements two different Pop() methods, one that returns true/false plus the item, the other that just returns the item or default<T> if the pop failed. Also the code shown in my article simplifies the CAS call syntax. Nevertheless all the main features are there.

Notice that it is the stack that is in charge of allocating nodes (this is in Push()). There is no "deallocation" of nodes because .NET doesn't do deallocation; instead it's the garbage collector (the real "GC") that comes along every now and then and compacts the heap, removing all objects that are no longer referenced. (A point that my friend GC was unable to appreciate given that he was fixated on the correctness of the wikipedia article.) The main cause of ABA in the wikipedia code is simply not there since the nodes are completely encapsulated inside the stack code. Nevertheless, there could be an ABA problem solely inside the stack. Let's destroy that notion.

Imagine the following scenario. The stack looks like this: head -> A -> B -> C -> etc, and thread 1 is in the middle of a pop operation. Unfortunately, it's a really slow thread. It takes a copy of the value of head.Next; in this case a reference to the node A. Slowly, but surely, it enters the CAS. The code evaluates the address of the head.Next field (it's a ref parameter), the local variable node (which is a reference to the node A), and whatever node.Next evaluates to, which is a reference to the node B, we hope. Leisurely, it executes the underlying operation to the CAS method (which happens to be cmpxchg, another point that old GC was sure to point out to me to show off his knowledge).

Meanwhile, other threads are whipping through their execution, pushing and popping items (and therefore nodes) off the stack. The node A gets removed, long before thread 1's execution actually does the cmpxchg. Or, even. horror, before thread 1 evaluates A.Next.

We're obviously well into the doom scenario from the wikipedia article, surely? Nope, not even close. This was bubba GC's argument too: the memory for node A could get reused! It could be added back onto the stack! Thread 1's view of what its reference to A points to could be completely different! It could point to another node on the stack! If thread 1 continued you could lose many other items on the stack! YOU HAVE THE ABA PROBLEM, DOOFUS!

OK, perhaps he didn't quite go so wild, but that was the thrust of his argument (and indeed the wikipedia argument too). Unfortunately, in the C# code I presented (and present above) the basis of the argument is specious and fallacious. Why? Because node A never gets deallocated and therefore cannot be allocated again. The reason for that is that thread 1 still contains a reference to A and the object is live. The GC will therefore never collect it, even if I put code like if (CurrentThread == thread1) Sleep(ABloodyLongTime); in the middle of the Pop() method. The reference to A only goes away if we go round the loop and read head.Next again, or if we succeed in the CAS and the reference is discarded at the final brace for the method (that is, goes out of scope). It's only when all references to A in all threads have gone that the GC will collect its memory.

However, to be fair, there could be other ABA problems. Even though A won't get freed/reused while thread 1 has it, the data within it (Next and Item) may change with similar ABA results. Let's consider Next. Apart from head.Next, the only place in the stack code where a node's Next field gets set is in Push(), and that only ever happens on a brand new, just-newed-up node. Since I've shown that the memory for A cannot get freed while thread 1 has a reference to it, it can't be reallocated, and so this scenario can't happen either. (In fact, there's an even stronger conclusion: A.Next will always reference B.)  A similar argument can be made for a node's Item field. The final case is that the outside world gets hold of nodes and does something silly with them. As I pointed out above, that doesn't happen here either.

All in all, there is no ABA problem with the code as written above. The main reason for this avoidance of such a large issue is .NET's garbage collection. It ensures that we completely avoid the problems of reusing memory that one or more threads could still be using.

But, sadly, my friend GC refused to see this reasoning. He ignored it completely, because it didn't fit into his wikipedia worldview. In fact he helpfully wrote back a final time to say "Wow, what a fountain of crap.", and "Perhaps it’s difficult for you to accept that all of that work [that is, my articles and code] is so fundamentally flawed. But, I assure you, it is.", and "Your simple stack class is the poster child for the ABA problem." (Yes, he discovered bold.) And then, for good measure, he went on to mention his "co-workers at Microsoft", especially those in the "Many Core" team who seem to be uproariously laughing away at me whenever they take a break. (I'm guessing he thought I'd be intimidated by this or something.)

Sad to say, GC does indeed work at Microsoft. Happy to say, though, he's not an actual employee just a contractor in the Project Management Excellence (PMX) group, although he's obviously falling short on the excellence part.

Now playing:
Vegas - Wise guy
(from Vegas)

PCPlus 262: Best Data Use

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there's an Xmas issue as well — so it's a bit more than monthly). The column is called Theory Workshop and appears in the back of every issue. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. After all, the PDFs do appear on each issue's DVD after a couple of months.

PCPlus logo (Before you ask: no article of mine appeared in PCPlus 261. There was some kind of delay on my side — probably our annual vacation to England — and I didn't complete this article in time for that issue, despite the simplicity of the topic. So, it got pushed over to issue 262.)

OK, wacky article title, not mine, for what is essentially an article on writing an efficient stack and queue. This discussion came about from writing a lock-free stack and lock-free queue: these particular structures are the best way of implementing a stack and queue if you want to make them lock-free.

But, really, when all's said and done, very much a beginner's article aimed at the hobby programmer.

This article first appeared in issue 262, December 2007.

You can download the PDF here.

Album cover for Greatest Hits Now playing:
Mott the Hoople - All the Young Dudes
(from Greatest Hits)

Search

About Me

I'm Julian M Bucknall, the M because it's my middle initial and because I and the other Julian Bucknall (the movie guy) would like to differentiate ourselves.

I'm a programmer by trade, an actor by ambition, and an algorithms guy by osmosis. I write articles for PCPlus in my spare time, not that there's much of that.

Julian M Bucknall Apart from that, an ex-pat Brit, atheist, microbrew enthusiast, Pet Shop Boys fanboy, slide rule and HP calculator collector, amateur photographer, Altoids muncher.

DevExpress

I'm Chief Technology Officer at Developer Express, a software company that writes some great controls and tools for .NET and Delphi. I'm responsible for the technology oversight and vision of the company.

Validation

Validate markup as HTML5 (beta)     Validate CSS

Bottom swirl

Archives

February 2012 (4)
SMTWTFS
« Jan  
1234
567891011
12131415161718
19202122232425
26272829

Like this Archive Calendar widget? Download it here.

Social networking

Google ads

The OUT Campaign

The OUT Campaign

My Tweets

Bottom swirl