Making a stack persistent or immutable

A while ago I wrote a book on algorithms and data structures, in my case for Delphi. It’s still on sale, but regrettably somewhat out of date, given the changes to the language in the last 15 years (I’m thinking of the new support for generics in particular). While on my vacation over the last couple of weeks I started thinking about writing a new series of blog posts on data structures, and what easier than updating the book into a new generics-capable one? There was one drawback: I haven’t done much Delphi at all over the last decade. Did I really want to go back?

And then something I read online brought up the concept of persistent data structures. In functional programming, there is a strong tenet to which we should adhere: data immutability, and persistent data structures are all about creating immutable collections and other structures. Given that I’ve talked a bit about functional programming in JavaScript both here and within my capacity of CTO at DevExpress (webinar), the answer seemed to be publishing an occasional series on writing persistent data structures. Using JavaScript as my functional language du choix as well. Score!

First, a definition. A persistent data structure is one that is immutable: as you add or remove items from the data structure, previous versions of that data structure are still accessible. In other words, an operation on the data structure will result in a new data structure being created. After the operation, there will be two instances of the data structure available: the one before the change and the one after the change. The fun part of designing/writing persistent data structures is minimizing the memory footprint; that is, avoiding the simple method of “Copy All The Things” when altering the data structure.

An example: suppose we have a binary search tree, a nice simple data structure for storing objects. We assume that each object has a key, which is used to identify that object. In general, the main operation for the tree would be to find objects in that tree via their keys in an efficient manner as possible. We’d create a tree of objects (through repeatedly inserting item-key pairs into the tree) and then use it repeatedly to find those objects via their keys. In that second phase, we could share that tree instance among various processes in our application in the full knowledge that it was a static collection of our objects. It could be thought of as immutable or persistent. If however, one function in our code decided to add an object that only it cared about to that tree (or worse, delete one), all other functions would then “see” the change too. The tree would have been mutated.

Let’s take the simplest example of a data structure: the stack. It has a couple of main operations: push (where a new object is added to the top of the stack) and pop (where the topmost item is removed). The stacks we all wrote with our non-functional languages (C#, Java, Delphi, to name a few) were mutable: pushing an item onto a stack resulted in the stack having one more item. The stack will have been changed. If we wanted the stack that existed prior to that state change, tough. It had gone. (The same applies, by the way to JavaScript arrays: they have push and pop methods to make the array act like a stack, but that alter the array when used.)

So how do we change the traditional stack to make it persistent?

I’ll start off by defining the basic operations for a stack: there’s the constructor (or constructors) which are all about creating a new stack and don’t affect an existing stack; there’s an isEmpty() method that returns true if the stack is empty (or false otherwise), with no attendant changes to the stack instance; and there’s the pop() and push() methods, which will (in theory) modify the stack instance. For a traditional stack that’s usually it, although some do define a peek() method as well, to return the item at the top of the stack without popping it (and again this would be an example of an operation that maintains the persistence of the stack). We’ll come back to this latter method in a moment.

Here’s an example implementation of this kind of traditional stack, written in JavaScript.

var createStack = function () {
  var top = null;

  var makeNode = function (item, next) {
    return {
      item: item,
      next: next
    };
  };

  var isEmpty = function () {
    return top === null;
  };

  var peek = function () {
    if (isEmpty()) {
      return undefined;
    }
    return top.item;
  };

  var push = function (item) {
    top = makeNode(item, top);
  }

  var pop = function () {
    if (isEmpty()) {
      return undefined;
    }
    var result = top.item;
    top = top.next;
    return result;
  }

  return {
    isEmpty: isEmpty,
    peek: peek,
    push: push,
    pop: pop
  }
};

Nothing too complicated to be sure. Notice though the peculiar thing about the pop() method: it implements two completely separate operations. The first is to return the item at the top of the stack, and the second is to remove that item from the stack (thereby altering it). Hold that thought.

Now let’s think about how we can make this stack persistent. The constructor is fine since it returns a new stack instance. The isEmpty() and peek() methods are also immutable in nature. The push() method, though, alters the stack. Since that is not allowed, what we shall do instead is to get push() to create a new stack that comprises the old stack with the new item at the top, and return that to the caller.

var newStack = oldStack.push(myItem);
// oldStack is still valid and doesn't contain myItem
// newStack has all the same items as oldStack, plus myItem on top

We certainly do not want to duplicate all of the items in the original stack when we do so, hence we’ll have to be a little clever in our implementation.

As for pop(), we’re just going to have to reduce the number of things it does. In essence, like push(), pop() will just return a new stack – this time, a copy of the original stack without the top item. Instead we will have to rely on peek() to get at the top item before we pop the stack.

So how is this all done efficiently? Think of a stack recursively this time: a stack comprises two fields, the first is the top item, and the second is a stack of the other items. Suddenly, using this recursive definition plus the restricted pop() method, implementing a persistent stack becomes much simpler.

var createStack = function () {
  var createInternalStack = function (top, rest) {
    var isEmpty = function () {
      return top === undefined;
    };

    var peek = function () {
      return top;
    };

    var push = function (item) {
      return createInternalStack(item, this);
    }

    var pop = function () {
      return rest;
    }

    return {
      isEmpty: isEmpty,
      peek: peek,
      push: push,
      pop: pop
    }
  };

  return createInternalStack(undefined, undefined);
};

Amazingly enough, as you can see the code is actually more straightforward because of the recursive definition.

There is one caveat to all this. Because the stack is now persistent and is recursively defined, we are relying a great deal on automatic garbage collection to reclaim stack instances that are no longer being used. Trying to do this in a language without automatic garbage collection is possible, but way more tedious and obfuscating. As an example, suppose we execute this code:

// create myStack with some items pushed onto it
// then...
myStack = myStack.pop();

Assuming no other code has a reference to it, what happens to the stack instance on which we call pop()? After all, the pop operation returns a new stack instance. With automatic garbage collection, the memory will be reclaimed at some point, but for a non-garbage collected environment, we’re in deep trouble. I’ll be showing further examples of this requirement as we explore other, more complex, data structures.

Until next time when we’ll discuss a persistent linked list, don’t change!

Stack of Plates

Album cover for She's the BossNow playing:
Jagger, Mick - Lonely at the Top
(from She's the Boss)


Loading similar posts...   Loading links to posts on similar topics...

No Responses

Feel free to add a comment...

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

  •  Emphasize with italics: surround word with underscores _emphasis_
  •  Emphasize strongly: surround word with double-asterisks **strong**
  •  Link: surround text with square brackets, url with parentheses [text](url)
  •  Inline code: surround text with backticks `IEnumerable`
  •  Unordered list: start each line with an asterisk, space * an item
  •  Ordered list: start each line with a digit, period, space 1. an item
  •  Insert code block: start each line with four spaces
  •  Insert blockquote: start each line with right-angle-bracket, space > Now is the time...
Preview of response