Posts tagged with 'inheritance'

JavaScript for C# programmers: refactoring the expression evaluator

Another in the series in learning JavaScript from the viewpoint of a C# programmer, using Firebug as our test engine.

In this episode, we take the functioning expression evaluator from the last post and clean it up JavaScript style.

Wrap in an object

The first step is to wrap this global-properties-all-over-the-place code tidily in a single global object. This happens to be quite easy, and I used an auto-executing function to create a single object called expressionEvaluator that has a single method called exec. At the top was this:

var expressionEvaluator = function() {

And then at the bottom of the original code, I replaced the old evaluateExpression method with this code:

    return {
        exec: function(expression) {
            var state = makeParserState(expression);
            var result = parseExpression(state);
            if (result.failed) { return NaN; }
            return result.expression.evaluate();
        }
    };
}();

As you can see the function is automatically executed (the execution () operator on the last line), and it returns an object (using the object literal syntax) that has one property, a method called exec. Everything else in this object is private, hidden from view from the closure created by the auto-executing function; it is a black box, and we are free to modify it completely so long as we retain the exec method.

I can't stress this enough: these patterns (closures and auto-executing functions to create other objects/functions) are very common in JavaScript. You might call these patterns idiomatic JavaScript. I'll agree that they can be hard to spot — heck, it's only until 200 lines later that you realize that expressionEvaluator is not a function after all but an object returned by a function that's executed immediately — but it's worth learning the pattern and how the natives speak.

So with this change we can check off that particular bug: we have one global object and no longer a dozen or so. The code to call it hasn't changed that much, and it's easy to see that everything still works properly:

console.log(expressionEvaluator.exec("1+2")); // outputs 3
console.log(expressionEvaluator.exec("1+2*3")); // outputs 7
console.log(expressionEvaluator.exec("((1+2)*(3-4))+((5*6)-(7/8))")); // outputs 26.125

State machine

Now we have a single object (since the auto-executing function is only run once, there can only ever be a single object, so it's essentially a singleton), we can get rid of the state object (there was only ever one of those too), and place all its data as fields of our evaluator object. This means we can get rid of the getCurrent method and generally clean things up.

    // State maintenance
    var expr, // the original expression string
        ch,   // the current character
        at,   // where we're at in the expression string
        advance = function() {
            ch = expr.charAt(++at);
        },
        skipWhite = function() {
            while (ch && (ch <= ' ')) { advance(); }
        },
        initialize = function(expression) {
            expr = expression;
            at = -1;
            advance();
        };

I've written this as a connected set of properties, connected in the sense of a series of variable declarations separated with commas; it's another JavaScript idiom that you may come across. The reason is that JavaScript is often minimized before use, that is, all comments and unnecessary white space is removed to make the interpreter work more efficiently. Since a comma is one character, it's often used like this instead of having a set of var declarations, which is more verbose. Notice I've added a skipWhite method to skip over any white space.

The first statement in exec now becomes this:

initialize(expression);

...but we're in a load of hurt because everything expects a state object. Nothing works. Let's forge on.

The tokens

Next up is the whole rpnToken thing, with its encapsulation-breaking isToken and isOperator. It's just nasty, people. It's a poor man's way of creating something like the is operator in C#. I hang my head in shame, but at least I could say I did it on purpose so I could show how to refactor it. Yeah, that's the reason. Anyway...

The thing to do here is to push down into the rpnToken object hierarchy the functionality that's being exposed at a higher level. For example, the isOperator function is only used in one place: when an RPN expression is being evaluated:

if (token.isOperator()) {
    var rhs = stack.pop();
    var lhs = stack.pop();
    stack.push(token.evaluate(lhs, rhs));
}
else {
    stack.push(token.value);
}

Better would be to get rid of this altogether, by declaring an evaluate method on the number token, and then making both evaluate functions accept a stack parameter. This way the operator token would pop, pop, calculate, push, and the number token would just push. But, at this higher level, all we'd do is call token.evaluate(stack); and be done with it. Nice one. Even better, we can now create a unaryOperator token as well that will pop, negate, push for unary minus, and nothing much at all for unary plus.

Having done that, let's look at the whole RPN expression thing. I wrote it originally to be an exact representation of how you'd write down an RPN expression: string of simple tokens, some of them numbers, some of them operators. But we don't have to be that literal, I could rephrase the definition to be recursive: an RPN expression is one or more operands, followed by an operator. The operator will determine how many operands there are. An operand could either be a number or another RPN expression.

Whoa. An RPN expression is then merely a token in another RPN expression. To evaluate an RPN expression, we evaluate its operands, and then its operator. This is great: we now have four types of tokens: number, unary operator, binary operator, RPN expression. They will all have an evaluate method.

    // Token hierarchy
    // ...base object
    var rpnToken = {
        evaluate: function(stack) {
            throw {
                type: "Abstract",
                message: "evaluate is an abstract method"
            };
        }
    };
    // ...number object
    var makeNumber = function(value) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) { stack.push(value); };
        return token;
    };
    // ...unary operator object
    var makeUnaryOp = function(eval) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            stack.push(eval(stack.pop()));
        };
        return token;
    };
    // ...binary operator object
    var makeBinaryOp = function(eval) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            var rhs = stack.pop();
            var lhs = stack.pop();
            stack.push(eval(lhs, rhs));
        };
        return token;
    };

    // ...the pre-built operators
    var operator = {
        "unary-": makeUnaryOp(function(value) { return -value; }),
        "unary+": makeUnaryOp(function(value) { return +value; }),
        "+": makeBinaryOp(function(lhs, rhs) { return lhs + rhs; }),
        "-": makeBinaryOp(function(lhs, rhs) { return lhs - rhs; }),
        "*": makeBinaryOp(function(lhs, rhs) { return lhs * rhs; }),
        "/": makeBinaryOp(function(lhs, rhs) { return lhs / rhs; })
    };

I've left out the one for the RPN expression for a moment, but look at how all of these functions create objects that, first, descend from rpnToken and, second, depend implicitly on closures. Taking the number token as an example, the evaluate method uses the value parameter. The only way it gets that is through the closure since the execution of makeNumber will have long been completed by the time evaluate will run. Notice I've also explicitly made rpnToken.evaluate an abstract method by throwing an exception if it's ever called.

The RPN expression creation function is a little special:

    // ...RPN expression object
    var makeExpression = function() {
        var expr = arguments;

        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            for (var i = 0; i < expr.length; i++) {
                expr[i].evaluate(stack);
            }
        };
        return token;
    };

It looks like it's been declared so that it accepts no parameters. Not quite. It's going to be called with either two parameters (operand, operator) for a unary operator, or three parameters (operand1, operand2, operator) for a binary operator. To get around this in C#, we'd have to either write overloaded methods or use parameter arrays, but in JavaScript we're merely going to use the arguments array and copy it to a local variable called expr. Later on, in the evaluate method, we're going to evaluate each argument in sequence as I discussed above and we'll use the copied arguments array provided by the closure.

Parsing functions

We're now ready for some parsing action.

    // parse a binary operator (either the adds or the multiplys)
    var parseOperator = function(ops) {
        skipWhite();
        if ((ch === ops[0]) || (ch === ops[1])) {
            var op = operator[ch];
            advance();
            return op;
        }
        return null;
    };

This method parses a binary operator. The operators come in pairs (plus/minus and multiply/divide) so I wrote a generic method that gets passes a two-element array containing the operator characters in question. Notice that now the parsing functions are returning an RPN expression and not one of the success/fail result objects (they're gone, toast). If there's a failure we just pass back null.

Next up is parsing a parenthesized expression:

    // parse a parenthesized expression
    var parseParens = function() {
        advance();

        var result = parseExpression();
        if (!result) { return null; }

        skipWhite();
        if (ch !== ')') { return null; } // missing right paren
        advance();

        return result;
    };

It is assumed here that the function will be called with the state machine already positioned on the opening parenthesis (which is how it's done in the only place it's called), so we can just move past it. This function also has an identifiable error: the missing right parenthesis. Later on, we can hook up an error function here to report that back to the caller of the expression evaluator, but for now we'll just note it as a comment.

Parsing a number:

    // parse a number
    var parseNumber = function() {
        var value = '';
        while (('0' <= ch) && (ch <= '9')) {
            value += ch;
            advance();
        }

        if (ch === '.') {
            value += '.';
            advance();
            while (('0' <= ch) && (ch <= '9')) {
                value += ch;
                advance();
            }
        }

        if (!value) { return null; } // number missing
        var number = +value; // force conversion to number type
        if (isNaN(number)) { return null; } // invalid number
        return makeNumber(number);
    };

Again we have a couple of identifiable errors (a number is expected but is missing, a number couldn't be converted from the string); again points at which we can add an error function.

Parsing a factor:

    // parse a factor (either expression in parentheses or number)
    var parseFactor = function() {
        skipWhite();
        if (ch === '(') { return parseParens(); }
        return parseNumber();
    };

As you can see, when parseParens is called, the state machine is pointing at the open parenthesis.

Now parsing a unary expression (yes, I added them):

    // parse a unary expression (factor or +/- factor)
    var parseUnaryExpr = function() {
        skipWhite();
        if ((ch === '-') || (ch === '+')) {
            var op = operator["unary" + ch];
            advance();
            var operand = parseFactor();
            if (!operand) { return null; }
            return makeExpression(operand, op);
        }
        return parseFactor();
    };

Notice how I index into the operator array to differentiate the unary minus and plus from their binary brethrens. Also I could "throw away" the unary plus here, if I wanted: it has no effect when evaluating an expression. Also note that this is where I call makeExpression with only two parameters for the unary operators.

And finally the remaining parser functions.

    // parse a binary expression (operand operator operand)
    var parseBinaryExpr = function(parseOperand, operators) {
        var operand = parseOperand();
        if (!operand) { return null; }
        var rpn = operand;

        var operator = parseOperator(operators);
        while (operator) {
            operand = parseOperand();
            if (!operand) { return null; }
            rpn = makeExpression(rpn, operand, operator);
            operator = parseOperator(operators);
        }

        return rpn;
    };
    // parse a term (unaryexpression multop unaryexpression)
    var parseTerm = function() {
        return parseBinaryExpr(parseUnaryExpr, ['*', '/']);
    };
    // parse an expression (term addop term)
    parseExpression = function() {
        return parseBinaryExpr(parseTerm, ['+', '-']);
    };

parseBinaryExpr does most of the work: it's called with a function that parses an operand, and with an array containing the binary operators to look out for. makeExpression is called with three parameters here.

After all these changes, we can now look at the refactored exec function:

        exec: function(expression) {
            initialize(expression);

            var result = parseExpression();
            if ((!result) || (ch !== '')) { return NaN; } // badly terminated expression

            var stack = [];
            result.evaluate(stack);
            return stack.pop();
        }

Here we see the final recognizable error where we've parsed the expression but there's still more left (an example would be "(1+2)3"). We also see where the stack gets created, the RPN expression evaluated, leaving the result on the stack, which can then be popped and returned.

Summary

Now that we've seen a non-trivial conversion of a C# project to JavaScript, what conclusions can we derive?

The first thing is closures are important. Really important. You have to understand closures and how they work because you'll see them all over the place. In this small example, we have the large closure that creates the expressionEvaluator object (where only one property is public and everything else, the vast majority of the code in fact, is private and hidden); we have the smaller closures that create the token objects.

Second, classical class hierarchies are not as important as in a classical class language like C#. I had to work hard to even get one: the tokens hierarchy. In fact, we could remove the base token object altogether and the evaluator would work just as well. The reason for this is of course JavaScript only tries to evaluate a property at run-time, there's no need to worry about it at coding time. If every token object has a evaluate method, it doesn't matter whether they're descended form a base object with that method, or whether they all independently define one, JavaScript will just call it. In fact, although this example didn't show it particularly, creating an object with some behavior (such as the expressionEvaluator object) is virtually a zero-energy exercise, compared with C#, where we have to write a full-blown class first.

Third, functions are objects. Create them when you need them, pass them around like candy. In this example, the operator tokens are created with function objects that know how to evaluate the operator.

Fourth, learn the idiom. In this example, I had only a few examples. The first and biggest, is the auto-executing function that creates the whole evaluator object in the first place (those two parentheses at the end are easy to miss when you're scanning code). Second, learn what values evaluate to false in an conditional expression. In this code, I was using the fact that null evaluates to false (an example: if (!operand) instead of writing the more long-winded if (operand === null), or the code in skipWhite which checks for an empty character).

Fifth, avoid global object pollution. My original code had umpteen methods created on the global object, the code in this post, just one. Like any language, global variables and methods are bad.

Album cover for (the rest of) New Order Now playing:
New Order - Confusion [Pump Panel Reconstruction Mix]
(from (the rest of) New Order)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: object inheritance (part 2)

Continuing to learn JavaScript from the viewpoint of a die-hard C# programmer, using Firebug as our test engine.

In this episode, we continue writing the expression evaluator. (Please review part 1 before continuing.) You might want to have an extra browser open at the C# code, so you can follow along.

The next thing on the agenda are the result objects. In reality, the way I wrote the original code, there is only one failed result, whereas the successful result contains the RPN expression (or token, come to that) of the bit of the algebraic expression we got to. So let's code that up:

var failedResult = {
    failed: true,
    expression: null
};

var makeSuccessfulResult = function(expression) {
    var result = Object.create(failedResult);
    result.failed = false;
    result.expression = expression;
    return result;
};

Having coded it up, I'm not that happy with it. It seems as if I'm trying way too hard to force a class inheritance model approach to what are, after all, very simple objects. It would work equally well without the call to Object.create in there. I'll ignore my doubts for the moment and forge on.

Next is the parserState object. There's only one of these, so no inheritance semantics to worry about.

var makeParserState = function(expression) {
    var expr = expression;
    var position = -1;
    var current;
    var advancePosition = function() {
        position++;
        if (position >= expression.Length) {
            current = '';
        }
        else {
            current = expression.charAt(position);
        }
        return current;
    };

    advancePosition();

    return {
        getCurrent: function() { return current; },
        advance: function() { return advancePosition(); }
    };
};

I've coded this as a standard closure type function to give the state some privacy. The object that's returned has two public methods, getCurrent and advance, but a whole set of private members from the closure. There's the original expression string that we're going to read through, the current position, the current character, and a method to advance the string pointer and read the next character. (Remember that JavaScript has no character type; a character is represented as a one-character string.)

After I wrote it like this I discovered that charAt will return the empty string if the index is out of bounds; I had my C# hat on, obviously: I was assuming that it would throw an exception. So the private advancePosition method could be rewritten as

    var advancePosition = function() {
        position++;
        current = expression.charAt(position);
        return current;
    };

Now we get to the fun stuff, the actual parsing. In the original C# code, I wrote it all as a static class with static methods, but for now I wrote it in JavaScript as a set of methods. This is certainly not the best way, but I will get to that later.

First, parsing the operators:

var parseAdd = function(state) {
    var current = state.getCurrent();
    if ((current === '+') || (current === '-')) {
        state.advance();
        return makeSuccessfulResult(operator[current]);
    }
    return failedResult;
};

var parseMultiply = function(state) {
    var current = state.getCurrent();
    if ((current === '*') || (current === '/')) {
        state.advance();
        return makeSuccessfulResult(operator[current]);
    }
    return failedResult;
};

Notice something subtle going on. The call to makeSuccessfulResult is being passed a token and not an expression, as the implementation of that method would indicate. Keep that thought that at the back of your mind for now.

Parsing a parenthesized expression:

var parseParenthesizedExpression = function(state) {
    if (state.getCurrent() !== '(') { return failedResult; }
    state.advance();

    var result = parseExpression(state);
    if (result.failed) { return result; }

    if (state.getCurrent() !== ')') { return failedResult; }
    state.advance();

    return result;
};

This makes a call to parseExpression that we haven't written yet, to parse the bits in between the parentheses.

Parsing a number:

var parseNumber = function(state) {
    var current = state.getCurrent();
    var value = '';
    
    while (('0' <= current) && (current <= '9')) {
        value += current;
        current = state.advance();
    }
    
    if (current === '.') {
        value += '.';
        current = state.advance();
        while (('0' <= current) && (current <= '9')) {
            value += current;
            current = state.advance();
        }
    }

    var number = +value; // force conversion to number type
    if (isNaN(number) || !value) { return failedResult; }
    return makeSuccessfulResult(makeNumber(number));
};

This involves a couple of tricky bits, so follow along as I describe them. The value variable is going to be a string that we'll grow with the number-like characters we find in the expression. (Number-like in this respect means the digits and the decimal point — sorry, no internationalization yet). So we first gather all the digits we can. If there's then a decimal point, we add that, and then gather all the digits after the decimal point. Fun bit next: we force the string to be converted into a number type. We do this by using JavaScript's interpreter: we start the expression off with a plus sign. JavaScript will decide that the expression is going to be a number since this plus sign could only be a unary plus operator. We then give it the value string. Since the interpreter is in "making a number" mode, it will convert the string to a number which is what we want. (The alternative is to use parseFloat, or to use something like "1 * value".)

The big problem with this is if we start this method off with the current character not being a digit (say, a letter), then the string-to-number conversion will produce 0 without error. So we fail the parse if either the conversion failed (the value of number will then be NaN) or if the string is empty (remember an empty string is equivalent to false, so !value would evaluate to true).

Parsing a factor is easy, it's either a parenthesized expression or a number:

var parseFactor = function(state) {
    if (state.getCurrent() === '(') {
        return parseParenthesizedExpression(state);
    }
    else {
        return parseNumber(state);
    }
};

Parsing a term and parsing the expression are roughly the same (and I haven't yet extracted out the commonality, like I did with the C# code).

var parseTerm = function(state) {
    var operand = parseFactor(state);
    if (operand.failed) { return operand; }
    var rpn = operand.expression;

    var operator = parseMultiply(state);
    while (!operator.failed) {
        operand = parseFactor(state);
        if (operand.failed) { return operand; }
        rpn = joinRpnParts(rpn, operand.expression, operator.expression); 
        operator = parseMultiply(state);
    }

    return makeSuccessfulResult(rpn);
};

parseExpression = function(state) {
    var operand = parseTerm(state);
    if (operand.failed) { return operand; }
    var rpn = operand.expression;

    var operator = parseAdd(state);
    while (!operator.failed) {
        operand = parseTerm(state);
        if (operand.failed) { return operand; }
        rpn = joinRpnParts(rpn, operand.expression, operator.expression);
        operator = parseAdd(state);
    }

    return makeSuccessfulResult(rpn);
};

Both make use of a routine called joinRpnParts to stitch together the operands and operator postfix style.

var joinRpnParts = function(first, second, operator) {
    var result = makeExpression();

    var addTokens = function(operand) {
        if (operand.isToken) {
            result.add(operand);
        }
        else {
            operand.forEach(function(token) { result.add(token); });
        }
    };

    addTokens(first);
    addTokens(second);
    addTokens(operator);

    return result;
};

makeExpression is a slightly changed version of rpnExpression from last time. I added a isToken field to both the token ancestor and the RPN expression objects so that I could tell them apart. This is where I tried to resolve the token versus expression problem I alluded to before: I thought I was being clever here in using the lack of type safety to help me (and feeling all JavaScripty about it) and then having to hack this "well, is it a token or not" boolean in two different places (which to me says I'm not being JavaScripty enough). We'll sort it out later.

addTokens is just a helper function to add the a part onto the RPN expression. Before you get excited by the forEach call there, it's a function I wrote for the RPN expression object:

var makeExpression = function() {
    var expr = [];
    return {
        isToken: false,

        add: function(node) {
            expr.push(node);
        },

        clear: function() {
            expr = [];
        },

        evaluate: function() {
            var stack = [];
            for (var i = 0; i < expr.length; i++) {
                var token = expr[i];
                if (token.isOperator()) {
                    var rhs = stack.pop();
                    var lhs = stack.pop();
                    stack.push(token.evaluate(lhs, rhs));
                }
                else {
                    stack.push(token.value);
                }
            }
            return stack.pop();
        },

        forEach: function(action) {
            for (var i = 0; i < expr.length; i++) {
                action(expr[i]);
            }
        }
    };
};

As you can see, the forEach method takes an action function to call for each token in the expression array, and that for joinRpnParts just adds it to the expression being stitched together.

Finally we need a function to tie it all together:

var evaluateExpression = function(expression) {
    var state = makeParserState(expression);
    var result = parseExpression(state);
    if (result.failed) { return NaN; }
    return result.expression.evaluate();
};

console.log(evaluateExpression("1+2")); // outputs 3
console.log(evaluateExpression("1+2*3")); // outputs 7
console.log(evaluateExpression("((1+2)*(3-4))+((5*6)-(7/8))")); // outputs 26.125

Now that I've shown you all the code, I can reveal that I'm just not happy about it. Yes, it's a pretty close translation of the C# code to JavaScript (and I did copy/paste code from the C# implementation to help move things along — just a case of removing all the type identifiers, mostly), but it's just not good JavaScript to me.

Let me enumerate the problems as I see them:

  • Global properties and methods everywhere. This should make you worried; it certainly does me.
  • It's too wordy. JavaScript is interpreted at run-time: shorter identifiers will help speed it up. However, shorter identifiers mean it's less legible to us humans, but it should still be possible to reduce it all a bit.
  • There's way too much code duplication. I seem to have some parse methods in pairs, it should be possible to extract out a common method for each pair. The evaluate method for the RPN expression should probably use the new forEach method, and so on.
  • I have a code smell between RPN tokens and expressions: I'm having to work out which is which in a higher method; surely that should be pushed down into the objects themselves.
  • The parser state object seems way overkill (it's a remnant from a back-tracking parser implementation I once did).
  • Ditto the result objects. Plus the failed result object doesn't tell us where the problem occurred.
  • Whitespace removal? Surely since it's easier for us to read "3 + 4" than "3+4", the evaluator should discard white space when it needs to.

So, next time, some major refactoring. We're going to be JavaScripty if it kills us.

Album cover for The Mirror Conspiracy Now playing:
Thievery Corporation - Le Monde
(from The Mirror Conspiracy)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: object inheritance (part 1)

More in the series for C# programmers learning JavaScript, with Firebug as our scratchpad.

In this episode, we let go of class inheritance. Bye!

I know, it's taken you years to understand class inheritance until you could do it in your sleep. It took me years as well. We've had it engrained in us for a while: design a class to model some object in our problem space, worry about encapsulation and behavior, think about creating descendants that increase the specificity of our first more-general class, and perhaps keep on going to produce higher, more-specific classes. We exult in those class models and in the type-safety they give us.

Some of us, as we learn more, start to find that the implementation inheritance model gets to be too restrictive and too wordy. We start experimenting with the interface inheritance model where, essentially, we think about inheritance of behavior rather than of behavior plus data. Our class models become shallower and not filled with classes than descend from classes which descend from others and so on all the way down. We use delegation of the interface as a coding model.

But JavaScript doesn't do all that. It has objects inheriting from objects. Period. Furthermore those objects are dynamic in nature: we can add new members or remove them at a moment's notice. No, this is not the description of anarchy, but a realization that perhaps classes are just too restrictive. Certainly, you can use libraries like Prototype that try to provide you with a class model type structure to your applications, but it is far better to just embrace the way that JavaScript works.

The problem is that JavaScript is conflicted in what you do for inheritance. Despite the speech from the high ground just then that JavaScript objects inherit from other JavaScript objects, there's so simple support for it in the language. Instead we have this unholy trinity of constructors, prototypes and the new keyword that hide object inheritance in order to smooth the way for us developers coming from a classical object-oriented language.

Duping an object

So let's add a new function to give us a direct way to create an object from another. Here's Douglas Crockford's Object.create method (so named because this is going to be part of ECMAScript 3.1 (ES3.1), the new version of JavaScript, due "soon").

if (typeof Object.create !== 'function') {
    Object.create = function(o) {
        function F() { }
        F.prototype = o;
        return new F();
    };
}

Here, Object is the constructor for objects (which we tend not to use as such, since the object literal syntax is so much more convenient), and we're going to try to add a method called create to it. So we test to see if it exists yet (so the if block won't execute once ES3.1 hits the streets), and, if not, we create it. The function takes an object o, and encapsulates declaring a do-nothing constructor, setting the prototype object of this new constructor to the passed in object, calling the constructor to create a new object, and returning it. What we get in the end of these shenanigans is a new object that inherits from the source object.

At a stroke we avoid several problems. We never have to call a constructor again and remember to use the new keyword (else we clobber the global object). We can just forget about new altogether. We can equally forget about constructors. We've boiled inheritance down to defining an object to do some interesting things, and then using Object.create to create dupes of that object that we can use however we want (including adding new members and then using with Object.create).

Example: Expression Parser

I wrote a simple expression parser for PCPlus a while back, and posted the article here together with the C# code I'd used. I recommend you go read it (it won't take long) and briefly browse the source code there so that you are familiar with what I'm going to be talking about.

We're going to rewrite it in JavaScript.

Writing RPN expression token objects

The first thing that I found restrictive with the C# code is that I wrote it to deal with single digit numbers. Wow, indeed, talk about restrictive. Whew! The reason for this was that it meant I could use a simple string to store the RPN expression and it made the whole article easier to understand without drowning the reader in unnecessary complications. So let's fix that to begin with.

An RPN expression is a sequential list of tokens, read from left to right, of which there are only two types: a number and an operator. There are no parentheses, or anything like that. We start off creating a token object that can report back what it is:

var rpnToken = {
    isOperator: function() { return false; }
};

Since we'll be dealing with numbers more often than operators, I made the isOperator method return false by default. (To recap on the syntax: rpnToken is an object containing a single member — a method — and I'm initializing it as an object literal.)

The next thing I wrote was a function that creates a number token:

var makeNumber = function(value) {
    var node = Object.create(rpnToken);
    node.value = value;
    return node;
};

It takes a value, creates an object that inherits from the rpnToken object, and defines a new member called value that holds the value passed in. Since the returned object inherits from rpnToken, it will inherit the isOperator method, and we already know that returns false by default.

Now let's create a similar function that creates an operator token:

var makeOperator = function(func) {
    var node = Object.create(rpnToken);
    node.isOperator = function() { return true; };
    node.evaluate = func;
    return node;
};

This works in the same way as makeNumber, but is slightly more complicated. Again we create a new object that descends from the rpnToken object. We override the isOperator method to return true. We pass in a function that will perform whatever operation this operator does and we save that as the evaluate method.

Since we're going to be dealing with the same four operators over and over again, I decided to create an object to store the canonical four as constants:

var operator = {
    "+": makeOperator(function(lhs, rhs) { return lhs + rhs; }),
    "-": makeOperator(function(lhs, rhs) { return lhs - rhs; }),
    "*": makeOperator(function(lhs, rhs) { return lhs * rhs; }),
    "/": makeOperator(function(lhs, rhs) { return lhs / rhs; })
};

This is slightly weird in that I'm using the actual operators as property names (so I can reference them later on when parsing the algebraic expression, a bit like this: op = operator[token];). Note how I create the functions to pass to the makeOperator function — you can guess that rhs and lhs in each case are going to be normal numbers.

Writing an RPN expression object

To make sure I was at least getting this all correct, I wrote my first cut at the rpnExpression object:

var rpnExpression = function() {
    var expr = [];
    return {
        add: function(token) {
            expr.push(token);
        },

        evaluate: function() {
            var stack = [];
            for (var i = 0; i < expr.length; i++) {
                var token = expr[i];
                if (token.isOperator()) {
                    var rhs = stack.pop();
                    var lhs = stack.pop();
                    stack.push(token.evaluate(lhs, rhs));
                }
                else {
                    stack.push(token.value);
                }
            }
            return stack.pop();
        }
    };
}();

Although it looks at first glance that rpnExpression is a function, it is not. It is an object. It is being set to the return value from a function: notice the execution operator (the parentheses) on the last line. The function is merely there as a one-off anonymous function that returns an object, and furthermore one that is executed immediately. I did it this way so that I could gain the private member called expr.

So, the function declares a private field and then returns an object with two methods. The first method merely pushes new tokens onto the end of the private expression field. I'm making use here of the standard methods on an array that make it look like a stack.

The second method defines the evaluate method. First of all it declares a stack (yes, it's an ordinary array: you weren't fooled), and then starts reading the tokens present in the expression. If the current token is an operator, two numbers are popped off the stack, the result evaluated according to the operator, and it's pushed back onto the stack. If, on the other hand, the token was a number, we push its value on the stack.

At the end of the expression, we pop the final number off the stack and return it.

Testing the code so far

We really should make sure that all works, just as we would if we'd been writing in C#, so I wrote a quick set of tests:

var numberEquals = function(x, y) {
    return Math.abs(x - y) < 0.00001;
};

rpnExpression.add(makeNumber(1.2));
rpnExpression.add(makeNumber(3.4));
rpnExpression.add(operator["+"]);
console.log(numberEquals(rpnExpression.evaluate(), 4.6));
rpnExpression.clear();

rpnExpression.add(makeNumber(1.2));
rpnExpression.add(makeNumber(3.4));
rpnExpression.add(operator["-"]);
console.log(numberEquals(rpnExpression.evaluate(), -2.2));
rpnExpression.clear();

rpnExpression.add(makeNumber(1.2));
rpnExpression.add(makeNumber(3.4));
rpnExpression.add(operator["*"]);
console.log(numberEquals(rpnExpression.evaluate(), 4.08));
rpnExpression.clear();

rpnExpression.add(makeNumber(1.2));
rpnExpression.add(makeNumber(3.4));
rpnExpression.add(operator["/"]);
console.log(numberEquals(rpnExpression.evaluate(), 0.3529412));
rpnExpression.clear();

(The clear method is an extra method that merely resets the private expr field.) Notice that I can't compare the results of the calculations to the expected values directly since the number type in JavaScript is a double — we have to compare equal to an approximation.

Next time we'll continue the work, but already you should be seeing how the code I've written doesn't use a class-based inheritance model despite the fact that we would have written it as such in C#. Notice how the JavaScript code is made much cleaner and legible as a result. It's also way less wordy and noisy (I'd have probably been using generics for the RPN expression class).

To recap: object inheritance involves making some object that has interesting behavior (the rpnToken above), and then creating other objects that inherit from it (the number and operator tokens above) and that can have other properties and methods defined on them.

Album cover for Greatest Hits Now playing:
Human League - Mirror Man
(from Greatest Hits)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: prototypes and privacy

Continuing my series about learning JavaScript when you're a C# programmer, using Firebug in Firebox as our testing ground.

In this episode, overriding, privacy, and class models.

Last time we saw how to create inheritance from JavaScript's constructors and prototypes, the so-called prototypal inheritance. In our example, we ended up with this:

var Point = function(x, y) {
    this.x = x;
    this.y = y;
    return this;
};
Point.prototype.move = function(x, y) {
    this.x += x;
    this.y += y;
};

This code snippet shows a couple of things I'd like to reinforce. First of all, because move is defined on the prototype for Point, it is visible to all objects we create from that constructor. If you like, since the prototype is the template for Point objects, all Point objects will have a move function. Second, the x and y fields are not shared between Point objects, but nevertheless all Point objects will have their own copies of these fields.

The concept of a class as we know it in C# is essentially split between the constructor and the prototype. If some variable (including a function) for an object is defined in the constructor, all objects created from it will have their own copy of that variable. If some variable for an object is declared on the prototype, all objects created using it will share the one copy from the prototype.

Overriding fields

But, note, however, there is a gotcha with that last sentence: it is only true when you read from the object. Let's investigate using my overworked Point example. First I'll declare a new field for the prototype called color, and I'll set it to "Red":

Point.prototype.color = "Red";

Now let's create a couple of points:

var first = new Point(1, 3);
var second = new Point(4, 2);
console.log(first.color);  // outputs Red
console.log(second.color); // outputs Red

When I read the value of first.color, JavaScript will first go to the object and see if it has a property called color. It does not. The interpreter then goes to the object's prototype object to see if it has a property called color (remember how it does this: the object first has a hidden field called constructor that points to Point, and this has the field called prototype, which is the prototype of the original object). It does, so the value of color, "Red", is returned. The same exact process happens when I call second.color. All is good; it makes sense.

What happens if I now set first.color to "Blue". What does second.color return now?

first.color = "Blue";
console.log(second.color);

There's two possible answers, "Red" or "Blue", and it hinges on what happens when first.color is set. Pat yourself on the back if you said "Red" and here's why. The chaining to the prototype only happens on a read operation. If you are writing a value, you will be modifying the object itself. So, since there is no property called color in the first object, JavaScript will create one and set it to "Blue". The common prototype.color property is not changed at all. Hence, when you read second.color, you get the chaining to the prototype operation, and "Red" gets returned.

You are, in effect, overriding a field from the object's prototype. The same thing happens when you set a function with the same name as a function in the prototype: you will override the prototype's function.

Constructing objects with private fields

In playing around with this Point example over the past few articles, I've gone from an object that had private fields but that didn't use a constructor/prototype, to an object that's lost its privacy but does have this notion of classical inheritance. Can we get the privacy back?

Remember that privacy comes from closure. Which is the only function we have that can supply privacy? The constructor. Here's a version that implements x and y as private variables:

var Point = function(x, y) {
    this.getX = function() { return x; };
    this.getY = function() { return y; };
    this.setX = function(value) { x = value; };
    this.setY = function(value) { y = value; };
    return this;
};
Point.prototype.move = function(x, y) {
    this.setx(this.getX() + x);
    this.sety(this.getY() + y);
};

Oh wow, it's suddenly a lot more complicated. The first thing to realize is that x and y are parameters to the constructor function, so they are automatically local variables and therefore private. Even more restrictive, since they are local to the constructor, they can't be seen outside the constructor, in particular by the common properties and methods of the prototype. So, the prototype's move method can't reference x and y (at least not without getting an undefined error). We have to therefore write some functions that are public and that can reference the private variables. Those functions by necessity must be defined in the constructor. And so I defined a set of getters and setters.

Note that these getter and setter functions are defined in each object, and not on the prototype. This means we're duplicating the code, but there's no way round it. It also means they can't participate in inheritance: they're defined on individual objects.

Notice also that the getters and setters are public. We are declaring them on the newly created object, and so the move method can make use of them. Douglas Crockford (the inventor of private properties in JavaScript) calls these kinds of functions privileged. A privileged function is a public method on an object that can get at the private data of the object.

Defining descendants

We've now gone into some depth about how to "do" classes by defining a constructor and by defining properties and methods on the constructor's prototype. We can define, in essence, a template for stamping out a whole set of similar objects and we know how to override properties and methods in our newly created objects. But what about further inheritance, building up a class model?

Suppose, now that we have a Point "class", we need a descendant, a ColoredPoint class, which knows its color. How is that done? We obviously need a new constructor, but that doesn't define the inheritance pattern, it's the prototype of that constructor that does. (We'll go back here to our non-privatized version of the Point "class" to avoid the noisy getters and setters.)

var ColoredPoint = function(x, y, color) {
    this.x = x;
    this.y = y;
    this.color = color;
    return this;
};
ColoredPoint.prototype = new Point(0, 0);

Notice that I've thrown away the automatic prototype object that's created with the constructor and replaced it with a fresh new Point object. I'm not particularly bothered about what values I pass when constructing this Point object, I won't be using them.

var coloredPoint = new ColoredPoint(1, 3, "Red");
coloredPoint.move(2, 2);

console.log("Point = (" +
        coloredPoint.x + "," +
        coloredPoint.y + "," +
        coloredPoint.color + ")");

Here I'm constructing a new ColoredPoint object and then I call move on it. What happens here? First, the coloredPoint object has no method called move. So JavaScript goes to the prototype object. The prototype object doesn't have a move method either. So JavaScript goes to its prototype, finally, which does have a move method. Notice how the process continues down the prototype chain, and that at the end, the function is called on the original object (which defines the value of this). The output therefore is Point = (3,5,Red) which is what we wanted and expected.

Next time, we'll throw all that away. We don't need this lookalike classical class model. Let's embrace objects!

Album cover for Zenyatta Mondatta Now playing:
Police - Voices Inside My Head
(from Zenyatta Mondatta)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: prototypes, the basics

Another post in the the series that discusses JavaScript for those more familiar with C#. In this episode, the first of a couple on the topic, we look at prototypes and prototypal inheritance.

Despite the fact that the keyword class is reserved, there are no classes in JavaScript as you'd understand them from C#. And, yet, it is an object-oriented programming language: there are objects, after all. It's all made even more confusing since JavaScript has a new keyword. How does it new up an object instance, if there are no classes?

In short, instead of using classes as a template from which you can create (instantiate) objects, in JavaScript you use other objects. These objects are known as prototypes, and JavaScript uses what's known as prototypal inheritance, rather than class inheritance. The biggest problem that people have with OOP in JavaScript is that they start off from a class-based model and try and fit that model to JavaScript. Due to the flexibility of JavaScript you can get part of the way there, but it all feels a little contrived.

So, in this series, I want you to forget all about class-based OOP and start over.

Back to basics

We've already seen how to create an object by using the object literal syntax:

var point = { x: 0, y: 0 };

However, the problem with this is that the object is a one-off. What if we needed a whole bunch of point objects? They would all look the same, that is have X and Y coordinates, and possibly have methods like move() that would translate the point in some direction on the plane, or rotate() to rotate the point a certain angle around another point, and so on. That is, we need some kind of template that defines the basic data and behavior and then use that template to stamp out point objects. In C# this is what a class does, but how's this done in JavaScript?

Let's take a look. The first thing we need to do is to write a special function known as a constructor. This will construct (that is initialize) a new object.

var Point = function(x, y) {
    this.x = x;
    this.y = y;
    return this;
};

But where is the this variable coming from? In fact, what does this mean anyway?

Digression on this

In C#, this is easy: it's a reference to the current object. Since the object is an instance of a class and the method is a member of that class, when the method is called we can say with certainty which object this refers to, and what members it has and so on so forth.

With JavaScript it's not so easy. Or rather it's just as easy: this is a reference to the object the function was called on.

Here's a point object with a move method.

var point = {
    x: 0,
    y: 0,
    move: function(x, y) {
        this.x += x;
        this.y += y;
    }
};

As you can see, it's declared using an object literal. There's two fields, x and y, and the move method translates the coordinates by the displacement values passed in. It works as you'd expect from a C# viewpoint: because move is declared within point, it must always refer to that object.

point.move(1, 2);
console.log("Point = (" + point.x + "," + point.y + ")");

And the output is Point = (1,2). So here the this variable is identical to the point variable.

Now check this out:

var point3d = { x: 0, y: 0, z: 0 };
point3d.move = point.move;

point3d.move(1, 2);
console.log("Point3D = (" + point3d.x + "," + point3d.y + "," + point3d.z + ")");

Here we're defining a brand new object called point3d, with three coordinates. We set its move method equal to point's move method. (Remember, functions are objects, they are not glued forevermore to a particular definition like they are with classes in C#.) We then call point3d.move(). I think you will have worked out by now that this call will not have changed point but instead will have acted on point3d. Since the function move will have been called on point3d, the this variable will be referring to point3d. The output is Point3d = (1,2,0).

Now you're comfortable with that, check out this next bit of code.

var spacetime = { x: 0, y: 0, z: 0, t: 0 };

point.move.call(spacetime, 1, 2);
console.log("spacetime = (" + spacetime.x + "," + spacetime.y + "," + spacetime.z + "," + spacetime.t + ")");

This is downright weird, so let us take it step by step. We define a new object called spacetime. We don't define a method on it called move. Instead we call the point.move method on it and pass in the usual parameters. The call method is a method that's defined on all function objects (yes, since functions are objects they can have properties and methods on them just like any other object — this is going to be important in a moment), that takes an extra initial parameter in addition to the function's usual parameters. This initial parameter is an object on which the function gets called. This object therefore become the this variable inside the function for that call.

So the output is spacetime = (1,2,0,0).

Now, you've reread that a couple of times, all you need to remember that this refers to the object the function was called on. In particular, if you have a function inside another function (and you're already aware of this is how scope works), then the nested function's this may not be the same as the outer function's this. If you like, the scope for this is the function itself and that's it. this is always a local variable. There's no following the scope chain to try and resolve it.

By the way, if you can't "see" the object at the function's call site, it's going to be the global object, window.

Back to constructors

Looking back at our Point constructor, you can see that it refers to this. What is the value of this here? Well, if the function is called in a normal fashion:

var point = Point(1, 2);

we are going to be clobbering the global object. Why? Firstly, the object that the Point function is called on is not specified, so it is taken to be the global object, window. So, inside this call to Point, this refers to window. We'll add two new fields, x and y, and set them, and then we'll return this (which is window, remember). We then set point to this return value. Although point has the required coordinates and so looks like a point object, it's actually window, and window will have been altered. Nasty.

Instead we want to call Point, not as a normal function, but as a constructor. Enter the new keyword.

var point = new Point(1, 2);

This does what you'd expect: a new empty object is created by JavaScript and then Point is called on it. The this variable now refers to the new empty object and the statements in the function will create the new coordinate fields.

Constructors are ordinary functions

The above argument should have drilled something into you: constructors are ordinary functions, but it's the way they're called with the new keyword that makes them special. This is insane, to put it mildly. If you miss off the new keyword, you're going to be clobbering the global object and you may not even notice straight away. That's why there's a convention in JavaScript to name constructors with an initial capital letter: it's a hint to the reader that this particular function must only be called with new. Unfortunately, there's no keyword to mark a function as a constructor, nor is there some compiler/interpreter option to flag misuses for us.

Even better, there's no need to have the return statement in a constructor as I have in the Point constructor above. If there is no such statement, JavaScript will return this for you. This is yet another difference between constructors and functions:  if you don't return anything from an ordinary function, undefined is returned for you.

Inheritance

If we wanted to add the move method to our constructor, we may be tempted to do this:

var Point = function(x, y) {
    this.x = x;
    this.y = y;
    this.move = function(x, y) {
        this.x += x;
        this.y += y;
    };
    return this;
};

This would work — we'd get point objects with a move method — but it contains an inefficiency. We are in effect declaring a move method for every single point object we create with the constructor, which is enormously wasteful of memory (remember the function is not compiled but interpreted every time it is called, so we are in effect copying the same source code over and over again).

We'd prefer to have the method shared, that is, declared on the template for the point objects. So far, we don't have a template, per se, just a special function that knows how to construct point objects. Enter the prototype object.

The prototype object is the template we need. It is shared amongst all objects created from it, so if there is a method declared on the prototype object it will be visible to all such objects. But where is this prototype object declared, and how can we access it?

Every function object has a property called prototype. Normally it's an empty object and is usually ignored, but it gains a special significance for constructors. A constructor's prototype is the template from which the new keyword creates the new object before calling the constructor. In my discussion above, I glossed over this and said that the new keyword would create a new empty object. Not quite; instead it creates a new object that looks like the prototype.

Let's investigate.

var Point = function(x, y) {
    this.x = x;
    this.y = y;
    return this;
};
Point.prototype.move = function(x, y) {
    this.x += x;
    this.y += y;
};

What we have here is our original Point constructor definition, and then a statement that creates a property called move on the Point.prototype object. This method does the usual translation thing. We didn't need to declare prototype, since JavaScript creates one automatically for every single function. Now let's call it:

var point = new Point(1, 2);
console.log("Point = (" + point.x + "," + point.y + ")");
point.move(3, 4);
console.log("Point = (" + point.x + "," + point.y + ")");

Here we're creating a new point object and logging its value. We then call the move method on point. Unfortunately, point does not have a move method. On discovering that, JavaScript will then check the prototype for the method.

How does it find the prototype? Luckily for us, the new keyword has set up a hidden property for us in the new object called constructor that references the constructor function that was used to create the object; in our case, Point. So, a new object gets an extra property called constructor that points to the constructor function, and as we've seen the constructor function has a property called prototype.

So, if the method is not found in the object, JavaScript will go take a look at the prototype object to see it it has the method, instead. If so, it calls it. If it hasn't, JavaScript will check to see if the prototype object was constructed and go find it's prototype object to see if it has such a method. And so on, up the prototype chain.

Yes, you've guessed it: this is JavaScript's inheritance. Constructed objects inherit their behavior from their prototypes. Even better, given the description of how a method is called on an object, we can see that if I declare a move method on a newly created point object, it will get called in preference to the prototype object's. In fact, it hides the prototype's version. That's usually known as polymorphism and we're overriding a method.

Welcome to prototypal inheritance and polymorphism, all without a class in sight.

Oh, by the way, remember the call method above that we used to set an explicit this variable on calling a function? It's a method that's defined on all functions; or, translating into what we now know, it's a method that's defined on a function's prototype. The constructor for a function object is Function.

We'll continue with this topic next time.

Album cover for Living in Fear Now playing:
Power Station - Taxman
(from Living in Fear)



Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

About Me

I'm Julian M Bucknall, the M because it's my middle initial and because I and the other Julian Bucknall (the movie guy) would like to differentiate ourselves.

I'm a programmer by trade, an actor by ambition, and an algorithms guy by osmosis. I write articles for PCPlus in my spare time, not that there's much of that.

Julian M Bucknall Apart from that, an ex-pat Brit, atheist, microbrew enthusiast, Pet Shop Boys fanboy, slide rule and HP calculator collector, amateur photographer, Altoids muncher.

DevExpress

I'm Chief Technology Officer at Developer Express, a software company that writes some great controls and tools for .NET and Delphi. I'm responsible for the technology oversight and vision of the company.

The OUT Campaign

The OUT Campaign

Validation

Valid XHTML 1.0 Transitional     Valid CSS!

Bottom swirl

Archives

March 2010 (15)
SMTWTFS
« Feb  
123456
78910111213
14151617181920
21222324252627
28293031

Like this Archive Calendar widget? Download it here.

Search

Google ads

My Tweets

Bottom swirl