Posts filed under the 'JavaScriptLessons' category

JavaScript for C# developers: date basics

A scenic diversion on the road to understanding JavaScript when you're a C# programmer.

In this episode, we'll look at dates in JavaScript.

Dates are implemented in JavaScript by the Date() constructor function, which acts like a class. You can new up a Date object in much the same way as you new up a DateTime object in C#. For instance, here's how to create a date that's equal to the date I wrote this post (Mon, Apr 27, 2009):

var today = new Date(2009, 3, 27);
console.log(today.toDateString); // outputs Mon Apr 27 2009

If you look at this code a little more carefully, you'll notice that I used 3 for the month and not 4. Yes, this is gotcha number 1: the date library in JavaScript uses zero-based month numbers. The days numbers are one-based, as you'd imagine: it's just the months that are zero-based. This is, to put it mildly, confusing.

Date objects you create are of type object. Their prototype is the Date() constructor. This prototype implements several nice methods you can use to manipulate dates, of which we've already seen one (toDateString()):

var today = new Date(2009, 3, 27);
console.log(today.getDay()); // outputs 1 (the day of the week of the date)
console.log(today.getDate()); // outputs 27 
console.log(today.getMonth()); // outputs 3
console.log(today.getYear()); // outputs 109 (the years since 1900)
console.log(today.getFullYear()); // outputs 2009
console.log(today.toDateString()); // outputs "Mon Apr 27 2009"
console.log(today.toLocaleDateString()); // outputs "Monday, April 27, 2009"
console.log(today.toUTCString()); // outputs "Mon, 27 Apr 2009 06:00:00 GMT"

There are a couple of points to note here. First, the getDay() method returns the day of the week as an integer, with Sunday as 0, Monday as 1, and so on. This can be a little confusing for C# developers, because the equivalent in .NET is the DayOfWeek property. The .NET Day property, on the other hand, is the equivalent of JavaScript's getDate() method. Note that the month is returned as a zero-based value again.

The getYear() method I include for completeness only, since different browsers implement it in different ways (despite the ECMAScript standard being very explicit about what it should do). As you can see, Firefox returns the number of years since 1900 (that's what the standard says too), however IE7 and 8 return the full year value (that is, 2009 in this example). So, I'd advise you to avoid it completely and use getFullYear() instead.

The various "toString" methods return the date in various string representations. The results are very similar between Firefox and IE here, although I'd note that toUTCString() in Firefox uses the confusing GMT suffix, whereas IE uses the more correct UTC. (There is a similar method, fully deprecated in 1999 when the ECMA standard was released, called toGMTString(). This is set equal to toUTCString(), so they do the same thing, but you might see the older version in old code.) All in all, I'd say don't depend on the output of these "toString" methods: as the standard states: "The contents of the string are implementation-dependent", but note that they would use the browser's locale (that is, the OS's locale) for the various day and month names.

As you may have guessed from the last method there, date objects also contain a time portion, that is they are DateTimes in the .NET vernacular. To create a date object with a time part you would use an overloaded constructor call:

var today = new Date(2009, 3, 27, 15, 24, 23, 300);
console.log(today.getHours()); // outputs 15
console.log(today.getMinutes()); // outputs 24
console.log(today.getSeconds()); // outputs 23
console.log(today.getMilliseconds()); // outputs 300

If you need the date/time value for right now, the equivalent of DateTime.Now, you'd use the Date constructor with no parameters:

var today = new Date();
console.log(today.toLocaleDateString()); // outputs "Monday, April 27, 2009"
console.log(today.toLocaleTimeString()); // outputs "8:14:38 PM"

There are also a set of methods that deal with setting the various parts of a date/time object: the year, the month, and so on (in essence, the same names as the getters but with "set" instead), but there are no methods that deal with date computations, such as adding a number of days to a date and so on. There is an open source library out there called Datejs which uses a "fluent" API (that is, you can write things like: Date.today().add(3).days();), but there's nothing particularly geared to C# developers used to DateTime who are coding in JavaScript. We'll take a look at that next time.

Another warning before I close this post: although Date() is a constructor function, you can actually use it as an ordinary function:

var nowAsString = Date();
console.log(nowAsString); // outputs the current date/time as a string

No matter what parameters you pass in the call, it'll ignore them and return a string representing the current date and time. Pretty useless, and it's dead confusing to boot (is it a constructor or isn't it?). My advice is don't use Date() in this way.

Album cover for Hinterland Now playing:
Aim - Fall Break
(from Hinterland)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# developers: functions are objects

Another stop on the long road to JavaScript understanding from a C# developers perspective.

This time we really look at what it means to say that functions are not only objects, but first-class objects in JavaScript.

Over this series I've constantly declared functions using function expressions instead of function statements. For example, I've used:

var isFiniteNumber = function(value) {
    return ((typeof value === "number") && isFinite(value));
};

instead of this:

function isFiniteNumber(value) {
    return ((typeof value === "number") && isFinite(value));
};

There is no real difference between the two (if you like, the second declaration is compiled as the first), but the reason for doing so is that I wanted to emphasize that a function is merely another kind of object. The second declaration, as a function declaration, looks like a C# method and in reading it you might fall into the trap and just treat it like a method, somehow fixed and immutable and locked to a class. Functions in JavaScript are not like that.

Let's investigate "functions as objects" using a recursive function. Here's a factorial function written recursively, rather than, say, iteratively.

var factorial = function(value) {
    if (value <= 1) {
        return 1;
    }
    return value * factorial(value - 1);
};
console.log(factorial(5));

Looks simple enough (and do note that I've removed a bit of parameter error checking for simplicity): if the value is 0 or 1, return 1, otherwise return the value times the factorial of the value minus 1.

But note one thing in particular: the function refers to itself. Well, duh, of course it does, that's what recursion means. But, remember, functions are objects, so I can do this:

var factorial = function(value) {
    if (value <= 1) {
        return 1;
    }
    return value * factorial(value - 1);
};
var anotherFactorial = factorial;
console.log(anotherFactorial(5));

Think about what this code is now doing. Executing anotherFactorial is going to call factorial; however it is not recursive in and of itself. It'll still work though, so let's really gum up the works:

var factorial = function(value) {
    if (value <= 1) {
        return 1;
    }
    return value * factorial(value - 1);
};
var anotherFactorial = factorial;
factorial = undefined;
console.log(anotherFactorial(5));

Crash: factorial is not a function. Note that anotherFactorial still is a function: it's the original code for factorial. It's just that factorial no longer exists.

This, to put it mildly, is annoying: since any named function object can be thought of as transient, we can't really reference the function name in the body of the function. If you like, the function code is an anonymous object that we can assign to whatever variable we like. How can we write a recursive anonymous function? It needs to refer to itself after all, and since it's anonymous, that's a little difficult. Let's see how we can.

The best idea is to pass in the recursive anonymous function as a parameter, for then we can name it. The idea we're going to explore is to create a function that can make a factorial function.

var makeFactorial = function(recurse) {
    return function(value) {
        if (value <= 1) {
            return 1;
        }
        return value * recurse(value - 1);
    };
};

OK, take this slowly. We're defining a function called makeFactorial. It takes a recursive function called recurse as a parameter and it returns another function. This other function takes a single parameter called value and that returns the factorial of value, providing recurse works properly. You'd call it like this

var factorial = makeFactorial(someFunction);
var result = factorial(5);

Or, in one line:

var result = makeFactorial(someFunction)(5);

In other words, execute makeFactorial on someFunction to return another function, which is immediately executed with parameter 5. But what is someFunction? Well, it's certainly the factorial function in some sense and the only way we know how to get that with this code is to use makeFactorial.

Somehow.

The problem is that makeFactorial is a function accepting another function, not a function accepting a single numeric value. So the only way we could write this would be:

var makeFactorial = function(recurse) {
    return function(value) {
        if (value <= 1) {
            return 1;
        }
        var factorial = recurse(recurse);
        return value * factorial(value - 1);
    };
};
var factorial = makeFactorial(makeFactorial);
var result = factorial(5); // result === 120

Wow, this is getting tautological, and definitely recursive. What does makeFactorial do now? It still takes a function and returns another, that much is clear. The returned function, when executed, will call the original passed in function, passing in itself. That returns a function which we then execute and bingo it produces the factorial result via a recursive call.

It's instructive to pause here in our discussion and work out what happens when this code is executed. First of all the outer factorial variable is set to the result of calling makeFactorial on itself. So, if you like, the outer factorial will be set to:

var outerFactorial = function(value) {
    if (value <= 1) {
        return 1;
    }
    var factorial = makeFactorial(makeFactorial);
    return value * factorial(value - 1);
};

We then call it passing in 5. The first thing that happens is that we check the value to be less than or equal to 1 (it isn't) and then we set the inner factorial variable to:

var factorial2 = function(value) {
    if (value <= 1) {
        return 1;
    }
    var factorial = makeFactorial(makeFactorial);
    return value * factorial(value - 1);
};

This is the second factorial function we've created, so for clarity I've labeled it with a 2. Still in the outer factorial function, we now call that inner factorial function, factorial2, passing in 4. The first thing factorial2 does is to check the value passed in to be 1 or less, and then to call makeFactorial on itself (the third time):

var factorial3 = function(value) {
    if (value <= 1) {
        return 1;
    }
    var factorial = makeFactorial(makeFactorial);
    return value * factorial(value - 1);
};

and then it calls this third factorial function.

This recursion continues until we get to the fifth level. This time the factorial function is not created, since we can return 1. We then unwind the stack, passing return values back and doing all the multiplications until we reach the outer function again, to give the result 120. (Notice in all of this, how much we owe to function closures.)

Seems pretty good, apart from one fact: we make a reference to makeFactorial as a parameter the first time it's called. So we're still self-referencing. All right, let's remove it.

var factorial = function(maker) {
    return function(value) {
        if (value <= 1) {
            return 1;
        }
        var factorial = maker(maker);
        return value * factorial(value - 1);
    };
}(function(maker) {
    return function(value) {
        if (value <= 1) {
            return 1;
        }
        var factorial = maker(maker);
        return value * factorial(value - 1);
    };
});
var result = factorial(5); // result === 120

OK, before you freak out, all I've done is to replace makeFactorial(makeFactorial) with the actual code for makeFactorial, and if you look closely, you can see the call parentheses surrounding the second function declaration. Although it's a ruddy mess, it still works. (I also took the opportunity to rename the passed-in recursive function maker, since it's a function that makes another.)

But, note there's no longer a makeFactorial anywhere. We've created an anonymous function that's recursive; albeit at the expense of some pretty horrible duplicated code. Let's clean it up.

First is to clean up the bit that does the factorial calculation (it'll be easier to understand what's going on if the body of the returned factorial function is one line):

var factorial = function(maker) {
    return function(value) {
        return (value <= 1) ? 1 : value * maker(maker)(value - 1);
    };
}(function(maker) {
    return function(value) {
        return (value <= 1) ? 1 : value * maker(maker)(value - 1);
    };
});
var result = factorial(5); // result === 120

Now we should extract this duplicate code somehow, without naming things again.

First a lemma, as we say in mathematics. Using an anonymous function, we can write

f(value);

(that is, calling f with a parameter value) like this:

(function(x) {
  return f(x);
})(value);

We're declaring an anonymous function that takes a single parameter. The function returns the value of f applied to that parameter. So, overall, it's equivalent, although not something I recommend you do in normal JavaScript programming.

Back to the problem. We'd like to extract this function somehow:

var F = function(value) {
    return (value <= 1) ? 1 : value * maker(maker)(value - 1);
};

But we're left with this awful maker(maker) thing. We can use our lemma to convert it:

var F = function(value) {
    return (value <= 1) ? 1 : value * (function(x){return maker(maker)(x);})(value - 1);
};

Applying the lemma yet again, this time making the internal anonymous function ((function(x){return maker(maker)(x);})) a parameter, we get:

var F = function(mm) {
    return function(value) {
        return (value <= 1) ? 1 : value * mm(value - 1);
    };
}(function(x){return maker(maker)(x);});

That is, F is set to the result of calling an anonymous function taking a single parameter, and the result of that function looks pretty much like a factorial function.

Nearly there! My next step is going to remove the parameter of this expression and create a function I'm going to call genFactorial, because it's going to generate the actual recursive factorial function in a moment:

var genFactorial = function(mm) {
    return function(value) {
        return (value <= 1) ? 1 : value * mm(value - 1);
    };
};

I can now do some substituting into my "cleaned-up" function declaration above:

var factorial = function(maker) {
    return genFactorial(function(x){return maker(maker)(x);});
}(function(maker) {
    return genFactorial(function(x){return maker(maker)(x);});
});

var result = factorial(5); // result === 120

Nice, it still works. The next step is to generalize this a little bit. I'm going to declare a new function that can take the genFactorial function as a parameter and produce the factorial function as we just wrote it:

var Y = function(generator) {
    return function(maker) {
        return generator(function(x){return maker(maker)(x);});
    }(function(maker) {
        return generator(function(x){return maker(maker)(x);});
    });
};

And then we can easily create the factorial function as follows:

var factorial = Y(genFactorial);
var result = factorial(5); // result === 120

That's it. We can delete genFactorial with impunity after creating factorial if we so wish, and factorial will continue to work because the current value of genFactorial will have been saved due to the closure inside the function Y. We have created a recursive function for calculating factorials without self-referencing the function name.

Along the way we created this rather bizarre function called Y that can convert any specially-written generator function to produce a recursive function. This Y function that we derived from first principles is better known as the Y combinator, and is well-known in functional programming circles (as well as being a Venture Capital investor and owner of Hacker News). I call it bizarre because just by looking at the finished result, it's hard to understand how it actually works; you have to go through the derivation as we did to get a better comprehension. The Y combinator was originally derived by American mathematician Haskell Curry (after whom we have the functional language called Haskell and the process known as currying) using lambda calculus.

As a further example of the usefulness of the Y combinator, here's the generator function for a Fibonacci number (we assume fib(0) === 1, and fib(1) === 1, and fib(n) = fib(n-2) + fib(n-1)):

var genFibonacci = function(mm) {
    return function(value) {
        if (value === 0) return 1;
        if (value === 1) return 1;
        return mm(value - 2) + mm(value - 1);    
    };
};

var fib = Y(genFibonacci);
var result = fib(7); // result === 21

Notice that none of this exploration into lambda calculus would have worked if JavaScript did not support functions as first-class objects (that is, functions can be passed as parameters to other functions, and functions can be returned from functions).

Next time in our journey, we'll look at something a little simpler.

Now playing:
Terry, Blair and Anouchka - Ultra Modern Nursery Rhymes
(from Ultra Modern Nursery Rhymes)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: a message from our sponsor

Mr Closure was kind enough to record this video for all my readers.

Album cover for Flight 602 Now playing:
Aim - Birchwood
(from Flight 602)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: convert string to integer

A hotdog stand on the road to JavaScript enlightenment for C# developers. Firebug is our companion.

In this episode we take a quick look at converting a string to a number and the parseInt function.

Every now and then we'd like to convert a number expressed as a string into a variable of type number so that we can do some calculations with it. There are several ways to go about this, and we saw one a while back when we were looking at the expression evaluator.

var stringValue = "42";
var value = +stringValue;
console.log(typeof value); // outputs "number"

Here we set stringValue to a string, and then convert it to a number by the simple expedient of using the unary plus operator. Since that's a little hard to spot sometimes, we can also use other numeric expressions that force the conversion:

var stringValue = "42";
var value2 = 0 + stringValue;
var value3 = 1 * stringValue;
console.log(typeof value2); // outputs "string"
console.log(typeof value3); // outputs "number"

Except… the first one here produces a string variable with value "042", not a number. Why? It certainly looks like it should produce a numeric expression. Unfortunately, when one of the terms in an additive expression is a string, JavaScript makes the result of the addition a string and not the other way around. So the 0 gets converted to '0' and is then concatenated with the other string. This is one of the gotchas of working with dynamically typed JavaScript: the plus operator can either mean numeric addition (as we're used to) or string concatenation (as we're also used to in C#), but JavaScript assumes the latter if there's a string term in the expression. Beware.

The second, multiplicative, expression works just fine as we'd expect, since there is no other meaning for the '*' operator but numeric multiplication. So, the string is coerced into a number and the multiplicative identity operation takes place.

However, it all seems a little hit-and-miss. So JavaScript provides two utility functions: parseFloat and parseInt. Both take a string and return a number, with the second forcing the number returned to be an integer value (but, remember, not an integer proper since JavaScript numbers are always floating point values). To us C# developers, the way they act is a little disconcerting: they stop parsing when they reach an invalid character, and return the number found up until that point. So:

var stringValue = "3.14 is the value of PI";
var value = parseFloat(stringValue);
console.log(value); // outputs "3.14"

parseFloat parsed the string up until the space character after 3.14 and returned that value back. parseInt works in the same way, but the arithmetic expressions I was showing above do not:

var stringValue = "3.14 is the value of PI";
var value = +stringValue;
console.log(value); // outputs "NaN"

Here, as you can see, the type coercion failed, and so value was set to NaN.

parseInt has another quirk as well. If the number in the string starts with a zero character, the value is assumed to be expressed in octal. So:

var stringValue = "078";
var value = parseInt(stringValue);
console.log(value); // outputs "7"

Since the character '8' is not a valid octal character, the parsing stops and the value 7 is returned. In fact, if the string begins with '0x', the number is assumed to be represented in hexadecimal.

But, wait. There's even more. parseInt accepts a second parameter, the radix. If this is missing (that is, undefined), the string is assumed to be base 10, except for the two exceptions noted. If you want to force it to be be parsed in base 10, you should explicitly state 10 as the radix.

var stringValue = "078";
var value = parseInt(stringValue, 10);
console.log(value); // outputs "78"

In fact, rather than assume anything about the string being parsed, it's best to specify the radix parameter at all times. Make your assumptions explicit.

Album cover for Power, Corruption & Lies Now playing:
New Order - The Beach
(from Power, Corruption & Lies)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: getting caught out with closures

Another stop on the road to becoming a JavaScript developer when you know C#. Fire up Firebug in Firefox and follow along.

In this episode we look at some problems we might encounter when using closures.

Recall that, just like anonymous methods in C#, a closure is a binding between a function and the 'environment' in which it's declared. I've been using them a lot in this series, but here's a simple 'counter' example:

var makeCounter = function(start) {
    return {
        next: function() { start++; },
        value: function() { return start; }
    };
};

var counter = makeCounter(42);
counter.next();
console.log(counter.value()); // outputs 43

Nothing too difficult, we've seen many examples just like this before. The makeCounter function takes a single parameter, start, and then returns an object with two methods, next and value. next advances the internal value of the counter and value merely returns its current value. The closure happens because the two functions are referencing the start parameter (that is, a local variable) of the outer function, even after the outer function has terminated. They have both captured start: this is the closure.

You can see this working in the test code: we create a new counter with start value 42, increment it, and then display the current value.

Let's change it so that we return an array of counters:

var makeCounters = function(start, count) {
    var counters = [];
    for (var i = 0; i < count; i++) {
        counters[i] = {
            next: function() { start++; },
            value: function() { return start; }
        };
    }
    return counters;
};

var counters = makeCounters(42, 2);
counters[0].next();
console.log(counters[0].value()); // outputs 43

Not too much has changed, apart from rearranging the code to create the array of counters. The counter objects still have the same form as before (the two methods); all we're doing is defining a new array and then creating as many counter objects in that array as were requested.

Underneath that function, you can see from the test code that it works as before.

Or does it? Add the following test code after the code to test counters[1]:

counters[1].next();
console.log(counters[1].value()); // outputs 44 ???

Something is wrong: the two counter objects in the array are supposed to be independent, and yet they don't seem to be. They seem to be sharing the same captured value.

That is exactly the problem: the closures are not capturing the "current" value of start, they are capturing the actual variable. If one of them makes a change to that captured variable, then the other closures will see the changed value. (In fact, if you look at the original makeCounter, you'll see that we're implicitly assuming this is how it works: both next and value are acting on the same captured variable.)

Now, with the start variable, it's pretty obvious. Let's make it slightly harder to spot the problem by adding an id method to the returned counter objects:

var makeCounters = function(start, count) {
    var counters = [];
    for (var i = 0; i < count; i++) {
        counters[i] = {
            id: function() { return i;},
            next: function() { start++; },
            value: function() { return start; }
        };
    }
    return counters;
};

The id of a counter object is just its position in the array. At least that's what we want it to be. Can you determine by inspection what the following lines will produce?

var counters = makeCounters(42, 2);
console.log(counters[0].id()); 
console.log(counters[1].id()); 

From the discussion we've just had, the answer is obviously not 0, 1. You're doing well if you recognize that they'll both output the same value, and very well if you work out that the value is 2. (Hint: the loop stops when i reaches 2.)

So what to do? We have to isolate the two local variables so that we can capture them separately for each counter object we create. The easiest way to do that is to use another anonymous function and pass the two values in as parameters.

var makeCounters = function(start, count) {
    var counters = [];
    for (var i = 0; i < count; i++) {
        counters[i] = function (start, id) {
            return {
                id: function() { return id;},
                next: function() { start++; },
                value: function() { return start; }
            };
        }(start, i);
    }
    return counters;
};

This is starting to get a little complicated, but bear with me. We're setting the element in the array, not to a function, but to the result of a function we're immediately going to execute. Here's the code with the noisy bits taken out, it'll be easier to see:

counters[i] = function (start, id) {
    // some code
}(start, i);

In other words, we have an anonymous function that takes two parameters called start and id. We immediately call it passing the current value of start for the start parameter, and the current value of i for the id parameter.

Inside this anonymous function, we merely return a new object. The id method returns the value of id passed in, and the other two methods do their stuff on the passed in value of start. Notice that the scoping rules for these methods say that they get their values from the immediate outer function, not from the outer outer function. Their closure is over the nested inner function. They don't "need" any local variables from the outer function and so don't form a closure over it.

The lesson to take away form this is that closures are over local variables, not the current value of those variables. Sometimes it's hard to see that in the thicket of braces and function keywords.

Having assimilated all that, you'll be in a great position (even knowing nothing about jQuery) to say well, duh! to this post.

Album cover for Diamond Life Now playing:
Sade - Why Can't We Live Together
(from Diamond Life)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: magic semicolons

Another fenced-off area on the road to being a JavaScript master dev when you know C#.

With this episode we look at JavaScript's charming ability to add semicolons when you forget them.

When you write C# code and forget a semicolon at the end of a statement, Visual Studio gives you a gentle hint with a red squiggly where it thinks a semicolon should be and the compiler throws a build-stopping (heart-stopping?) error if you miss the delicate hint. Basically, you have got used to adding those semicolons. Well, don't stop doing it.

You see, JavaScript has a pretty terrifying ability to add them for you, sometimes where you want them and, equally, sometimes where you don't. Let's explain.

There are certain statements in JavaScript that must be terminated with a semicolon, just as they are in C#. These statements include variable statements (essentially statements that declare and initialize variables), expression statements (for example, calls to functions as procedures), do-while statements, and the continue, break, return, and throw statements. These must be terminated, but JavaScript will automatically insert semicolons in the stream of tokens if you forget and when it believes it should.

The interpreter tokenizes the code in order to execute it. If it hits a token that's not allowed by the grammar, it'll insert a semicolon before the "bad" token and try again (there are a couple of conditions for this to happen — there must be a line break before the "bad" token or the "bad" token is the closing brace). If it reaches the end of the code and believes the code is incomplete, it'll add a semicolon at the end and try again. Those are what you might call the benign cases.

The non-benign cases involve things called "restricted productions". JavaScript won't allow line breaks immediately after the continue, break, return, and throw statements. If it does find one, it'll insert a semicolon. (There's a similar rule for the post-increment and post-decrement operators, but no one I know is crazy enough to write something like

counter
++;

so we'll ignore those — just don't do it, OK?)

Let's look at an example:

if (!wotsit.isValid) 
    throw 
    {
        message: "wotsit is not valid"
    };

Looks innocuous enough: if the wotsit object isn't valid, we throw an exception object that has the property and value shown.

Except that's not what it means.

The problem is the keyword throw cannot be separated from what it's going to throw by a line break. The rule is that the object being thrown must be defined (or you must start to define it) on the same line as the keyword. Since this code doesn't, JavaScript will add a semicolon like this:

if (!wotsit.isValid) 
    throw ;
    {
        message: "wotsit is not valid"
    };

Yikes, I think is the least exclamation you can make. The whole intent of the code has been changed. Luckily, as it happens, the automatic addition of the semicolon also causes a syntax error as well, so maybe all is not lost (although the syntax error could be puzzling, if you didn't know what was happening).

However, do the same with return inside a function:

var doSomething = function(wotsit) {
    if (!wotsit.isValid) 
        return
        {
            message: "wotsit is not valid"
        };
    // more code
};        

And you'll get no error, even though the automatic insertion of a semicolon actually produces this:

var doSomething = function(wotsit) {
    if (!wotsit.isValid) 
        return; // <-- automatic semicolon
        {
            message: "wotsit is not valid"
        };
    // more code
};        

Good luck with finding that error. Especially since JavaScript doesn't show you a nice little comment like I have.

You should have written it this way, essentially:

var doSomething = function(wotsit) {
    if (!wotsit.isValid) 
        return {
            message: "wotsit is not valid"
        };
    // more code
};        

In other words, providing you have been following along properly, the C# style choice of having the opening brace on the previous line and not on a new line may, one day, save your coding life and what remains of your hair in JavaScript.

So, my advice is continue to put in semicolons where you know they're expected. Do not rely on JavaScript's "convenience" feature of inserting them automatically and hoping the interpreter catches any syntax errors. (Besides which, if you don't, you slow down the interpreter since it has to stop and back up one token.)

And please put opening braces on the previous line.

(For those who are trying to work out what the interpreter understands by the code once it inserts the semicolon after return, here goes. The return returns undefined. The opening brace denotes the beginning of a block, not the start of an object literal. The identifier message is taken to be a label. The string is tokenized fine, but the interpreter determines that another automatic semicolon should be inserted after it, since the next token is a closing brace. That's an expression statement, and has the effect of the string being created and thrown away. The closing brace denotes the end of the block. The final semicolon, explicit this time, denotes an empty statement, which is allowed in JavaScript like in C#, and essentially does nothing. Voilà. It all disappears in a puff of smoke. No syntax error.)

 

Album cover for Protection Now playing:
Massive Attack - Weather Storm
(from Protection)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: the with statement considered harmful

A quickie pitstop on the road to learning JavaScript when you know C#.

In this episode, the with statement. See it, and then forget all about it.

If you've ever used Pascal or VB, you'll be familiar with with. It introduces a block where unqualified identifiers can come from the object referenced in the with clause. So, for example, you could have

var point = { x: 10, y: 20 };
console.log('(' + point.x + ',' + point.y + ')');

Using the with statement, you could write this instead:

with (point) {
    console.log('(' + x + ',' + y + ')');
}

Which some could view as being easier to read, especially if the object name from the with clause were long.

However all is not rosy. The way with works is to insert the object being referenced at the front of the scope chain (getting worried yet? what were the rules about scope again?) whilst in the block. So any time you have to resolve some variable, JavaScript would start off with the referenced object and then, if not found, continue with the rest of the scope chain.

Let's look at some other code — but this time I'm not going to show you the object definition:

with (someOldThing) {
    color = defaultColor;
}

Your natural instinct is to read that code like this:

someOldThing.color = someOldThing.defaultColor;

That's what it says, doesn't it? Right? Wrong, I'm afraid. Sure if someOldThing has properties named color and defaultColor, that would be an acceptable way to read that with statement. But what if it didn't?

Let's suppose it didn't have a property called color. JavaScript, in executing that with statement, would look at someOldThing. Does it have a color property? No. So JavaScript would continue down the chain, eventually reaching the global object, window. If that didn't have such a property, JavaScript would create one there. Ouch. Double ouch. A bazillion ouches. By our mistake, the with statement just created a global variable.

The same argument can be made with defaultColor. In fact, just looking at the example with statement you have absolutely no idea which of these following four statements JavaScript will execute:

someOldThing.color = someOldThing.defaultColor;
someOldThing.color = defaultColor;
color = someOldThing.defaultColor;
color = defaultColor;

Where the unqualified variables, in the worse case, are assumed to be on window.

So, all in all, given the fact that it can be so easily misread by programmers (especially programmers with varying levels of expertise), I recommend that you just forget about with completely.

Poof, it's gone.

Album cover for Zenyatta Mondatta Now playing:
Police - Shadows in the Rain
(from Zenyatta Mondatta)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: refactoring the expression evaluator

Another in the series in learning JavaScript from the viewpoint of a C# programmer, using Firebug as our test engine.

In this episode, we take the functioning expression evaluator from the last post and clean it up JavaScript style.

Wrap in an object

The first step is to wrap this global-properties-all-over-the-place code tidily in a single global object. This happens to be quite easy, and I used an auto-executing function to create a single object called expressionEvaluator that has a single method called exec. At the top was this:

var expressionEvaluator = function() {

And then at the bottom of the original code, I replaced the old evaluateExpression method with this code:

    return {
        exec: function(expression) {
            var state = makeParserState(expression);
            var result = parseExpression(state);
            if (result.failed) { return NaN; }
            return result.expression.evaluate();
        }
    };
}();

As you can see the function is automatically executed (the execution () operator on the last line), and it returns an object (using the object literal syntax) that has one property, a method called exec. Everything else in this object is private, hidden from view from the closure created by the auto-executing function; it is a black box, and we are free to modify it completely so long as we retain the exec method.

I can't stress this enough: these patterns (closures and auto-executing functions to create other objects/functions) are very common in JavaScript. You might call these patterns idiomatic JavaScript. I'll agree that they can be hard to spot — heck, it's only until 200 lines later that you realize that expressionEvaluator is not a function after all but an object returned by a function that's executed immediately — but it's worth learning the pattern and how the natives speak.

So with this change we can check off that particular bug: we have one global object and no longer a dozen or so. The code to call it hasn't changed that much, and it's easy to see that everything still works properly:

console.log(expressionEvaluator.exec("1+2")); // outputs 3
console.log(expressionEvaluator.exec("1+2*3")); // outputs 7
console.log(expressionEvaluator.exec("((1+2)*(3-4))+((5*6)-(7/8))")); // outputs 26.125

State machine

Now we have a single object (since the auto-executing function is only run once, there can only ever be a single object, so it's essentially a singleton), we can get rid of the state object (there was only ever one of those too), and place all its data as fields of our evaluator object. This means we can get rid of the getCurrent method and generally clean things up.

    // State maintenance
    var expr, // the original expression string
        ch,   // the current character
        at,   // where we're at in the expression string
        advance = function() {
            ch = expr.charAt(++at);
        },
        skipWhite = function() {
            while (ch && (ch <= ' ')) { advance(); }
        },
        initialize = function(expression) {
            expr = expression;
            at = -1;
            advance();
        };

I've written this as a connected set of properties, connected in the sense of a series of variable declarations separated with commas; it's another JavaScript idiom that you may come across. The reason is that JavaScript is often minimized before use, that is, all comments and unnecessary white space is removed to make the interpreter work more efficiently. Since a comma is one character, it's often used like this instead of having a set of var declarations, which is more verbose. Notice I've added a skipWhite method to skip over any white space.

The first statement in exec now becomes this:

initialize(expression);

...but we're in a load of hurt because everything expects a state object. Nothing works. Let's forge on.

The tokens

Next up is the whole rpnToken thing, with its encapsulation-breaking isToken and isOperator. It's just nasty, people. It's a poor man's way of creating something like the is operator in C#. I hang my head in shame, but at least I could say I did it on purpose so I could show how to refactor it. Yeah, that's the reason. Anyway...

The thing to do here is to push down into the rpnToken object hierarchy the functionality that's being exposed at a higher level. For example, the isOperator function is only used in one place: when an RPN expression is being evaluated:

if (token.isOperator()) {
    var rhs = stack.pop();
    var lhs = stack.pop();
    stack.push(token.evaluate(lhs, rhs));
}
else {
    stack.push(token.value);
}

Better would be to get rid of this altogether, by declaring an evaluate method on the number token, and then making both evaluate functions accept a stack parameter. This way the operator token would pop, pop, calculate, push, and the number token would just push. But, at this higher level, all we'd do is call token.evaluate(stack); and be done with it. Nice one. Even better, we can now create a unaryOperator token as well that will pop, negate, push for unary minus, and nothing much at all for unary plus.

Having done that, let's look at the whole RPN expression thing. I wrote it originally to be an exact representation of how you'd write down an RPN expression: string of simple tokens, some of them numbers, some of them operators. But we don't have to be that literal, I could rephrase the definition to be recursive: an RPN expression is one or more operands, followed by an operator. The operator will determine how many operands there are. An operand could either be a number or another RPN expression.

Whoa. An RPN expression is then merely a token in another RPN expression. To evaluate an RPN expression, we evaluate its operands, and then its operator. This is great: we now have four types of tokens: number, unary operator, binary operator, RPN expression. They will all have an evaluate method.

    // Token hierarchy
    // ...base object
    var rpnToken = {
        evaluate: function(stack) {
            throw {
                type: "Abstract",
                message: "evaluate is an abstract method"
            };
        }
    };
    // ...number object
    var makeNumber = function(value) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) { stack.push(value); };
        return token;
    };
    // ...unary operator object
    var makeUnaryOp = function(eval) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            stack.push(eval(stack.pop()));
        };
        return token;
    };
    // ...binary operator object
    var makeBinaryOp = function(eval) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            var rhs = stack.pop();
            var lhs = stack.pop();
            stack.push(eval(lhs, rhs));
        };
        return token;
    };

    // ...the pre-built operators
    var operator = {
        "unary-": makeUnaryOp(function(value) { return -value; }),
        "unary+": makeUnaryOp(function(value) { return +value; }),
        "+": makeBinaryOp(function(lhs, rhs) { return lhs + rhs; }),
        "-": makeBinaryOp(function(lhs, rhs) { return lhs - rhs; }),
        "*": makeBinaryOp(function(lhs, rhs) { return lhs * rhs; }),
        "/": makeBinaryOp(function(lhs, rhs) { return lhs / rhs; })
    };

I've left out the one for the RPN expression for a moment, but look at how all of these functions create objects that, first, descend from rpnToken and, second, depend implicitly on closures. Taking the number token as an example, the evaluate method uses the value parameter. The only way it gets that is through the closure since the execution of makeNumber will have long been completed by the time evaluate will run. Notice I've also explicitly made rpnToken.evaluate an abstract method by throwing an exception if it's ever called.

The RPN expression creation function is a little special:

    // ...RPN expression object
    var makeExpression = function() {
        var expr = arguments;

        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            for (var i = 0; i < expr.length; i++) {
                expr[i].evaluate(stack);
            }
        };
        return token;
    };

It looks like it's been declared so that it accepts no parameters. Not quite. It's going to be called with either two parameters (operand, operator) for a unary operator, or three parameters (operand1, operand2, operator) for a binary operator. To get around this in C#, we'd have to either write overloaded methods or use parameter arrays, but in JavaScript we're merely going to use the arguments array and copy it to a local variable called expr. Later on, in the evaluate method, we're going to evaluate each argument in sequence as I discussed above and we'll use the copied arguments array provided by the closure.

Parsing functions

We're now ready for some parsing action.

    // parse a binary operator (either the adds or the multiplys)
    var parseOperator = function(ops) {
        skipWhite();
        if ((ch === ops[0]) || (ch === ops[1])) {
            var op = operator[ch];
            advance();
            return op;
        }
        return null;
    };

This method parses a binary operator. The operators come in pairs (plus/minus and multiply/divide) so I wrote a generic method that gets passes a two-element array containing the operator characters in question. Notice that now the parsing functions are returning an RPN expression and not one of the success/fail result objects (they're gone, toast). If there's a failure we just pass back null.

Next up is parsing a parenthesized expression:

    // parse a parenthesized expression
    var parseParens = function() {
        advance();

        var result = parseExpression();
        if (!result) { return null; }

        skipWhite();
        if (ch !== ')') { return null; } // missing right paren
        advance();

        return result;
    };

It is assumed here that the function will be called with the state machine already positioned on the opening parenthesis (which is how it's done in the only place it's called), so we can just move past it. This function also has an identifiable error: the missing right parenthesis. Later on, we can hook up an error function here to report that back to the caller of the expression evaluator, but for now we'll just note it as a comment.

Parsing a number:

    // parse a number
    var parseNumber = function() {
        var value = '';
        while (('0' <= ch) && (ch <= '9')) {
            value += ch;
            advance();
        }

        if (ch === '.') {
            value += '.';
            advance();
            while (('0' <= ch) && (ch <= '9')) {
                value += ch;
                advance();
            }
        }

        if (!value) { return null; } // number missing
        var number = +value; // force conversion to number type
        if (isNaN(number)) { return null; } // invalid number
        return makeNumber(number);
    };

Again we have a couple of identifiable errors (a number is expected but is missing, a number couldn't be converted from the string); again points at which we can add an error function.

Parsing a factor:

    // parse a factor (either expression in parentheses or number)
    var parseFactor = function() {
        skipWhite();
        if (ch === '(') { return parseParens(); }
        return parseNumber();
    };

As you can see, when parseParens is called, the state machine is pointing at the open parenthesis.

Now parsing a unary expression (yes, I added them):

    // parse a unary expression (factor or +/- factor)
    var parseUnaryExpr = function() {
        skipWhite();
        if ((ch === '-') || (ch === '+')) {
            var op = operator["unary" + ch];
            advance();
            var operand = parseFactor();
            if (!operand) { return null; }
            return makeExpression(operand, op);
        }
        return parseFactor();
    };

Notice how I index into the operator array to differentiate the unary minus and plus from their binary brethrens. Also I could "throw away" the unary plus here, if I wanted: it has no effect when evaluating an expression. Also note that this is where I call makeExpression with only two parameters for the unary operators.

And finally the remaining parser functions.

    // parse a binary expression (operand operator operand)
    var parseBinaryExpr = function(parseOperand, operators) {
        var operand = parseOperand();
        if (!operand) { return null; }
        var rpn = operand;

        var operator = parseOperator(operators);
        while (operator) {
            operand = parseOperand();
            if (!operand) { return null; }
            rpn = makeExpression(rpn, operand, operator);
            operator = parseOperator(operators);
        }

        return rpn;
    };
    // parse a term (unaryexpression multop unaryexpression)
    var parseTerm = function() {
        return parseBinaryExpr(parseUnaryExpr, ['*', '/']);
    };
    // parse an expression (term addop term)
    parseExpression = function() {
        return parseBinaryExpr(parseTerm, ['+', '-']);
    };

parseBinaryExpr does most of the work: it's called with a function that parses an operand, and with an array containing the binary operators to look out for. makeExpression is called with three parameters here.

After all these changes, we can now look at the refactored exec function:

        exec: function(expression) {
            initialize(expression);

            var result = parseExpression();
            if ((!result) || (ch !== '')) { return NaN; } // badly terminated expression

            var stack = [];
            result.evaluate(stack);
            return stack.pop();
        }

Here we see the final recognizable error where we've parsed the expression but there's still more left (an example would be "(1+2)3"). We also see where the stack gets created, the RPN expression evaluated, leaving the result on the stack, which can then be popped and returned.

Summary

Now that we've seen a non-trivial conversion of a C# project to JavaScript, what conclusions can we derive?

The first thing is closures are important. Really important. You have to understand closures and how they work because you'll see them all over the place. In this small example, we have the large closure that creates the expressionEvaluator object (where only one property is public and everything else, the vast majority of the code in fact, is private and hidden); we have the smaller closures that create the token objects.

Second, classical class hierarchies are not as important as in a classical class language like C#. I had to work hard to even get one: the tokens hierarchy. In fact, we could remove the base token object altogether and the evaluator would work just as well. The reason for this is of course JavaScript only tries to evaluate a property at run-time, there's no need to worry about it at coding time. If every token object has a evaluate method, it doesn't matter whether they're descended form a base object with that method, or whether they all independently define one, JavaScript will just call it. In fact, although this example didn't show it particularly, creating an object with some behavior (such as the expressionEvaluator object) is virtually a zero-energy exercise, compared with C#, where we have to write a full-blown class first.

Third, functions are objects. Create them when you need them, pass them around like candy. In this example, the operator tokens are created with function objects that know how to evaluate the operator.

Fourth, learn the idiom. In this example, I had only a few examples. The first and biggest, is the auto-executing function that creates the whole evaluator object in the first place (those two parentheses at the end are easy to miss when you're scanning code). Second, learn what values evaluate to false in an conditional expression. In this code, I was using the fact that null evaluates to false (an example: if (!operand) instead of writing the more long-winded if (operand === null), or the code in skipWhite which checks for an empty character).

Fifth, avoid global object pollution. My original code had umpteen methods created on the global object, the code in this post, just one. Like any language, global variables and methods are bad.

Album cover for (the rest of) New Order Now playing:
New Order - Confusion [Pump Panel Reconstruction Mix]
(from (the rest of) New Order)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: the arguments array

A slight diversion in our quest to learn JavaScript from a C# programmer's perspective.

This episode is about the arguments array.

In C#, if we want to write a method that takes an undefined (at least at compile-time) number of parameters, we use the params keyword and declare a parameter array.

    static void ListArgs(params string[] args) {
      Console.Write(string.Format("There are {0} parameters: ", args.Length));
      for (int i = 0; i < args.Length; i++) {
        Console.Write(args[i] + " ");
      }
      Console.WriteLine();
    }

    static void Main(string[] args) {
      string[] words = {"one", "two", "three"};

      ListArgs(words);
      ListArgs("hello", "world");
      ListArgs();

      Console.ReadLine();
    }

Here we have a ListArgs method whose only parameter is a parameter array of strings, called unimaginatively args. The method merely outputs the number of parameters, and the values of each of them. The Main method calls this method in three different ways, first with an array of strings, then with two string parameters, then with nothing at all. In the first case, the compiler ensures the method is called with the given array. In the second case, the compiler inserts code to create an array containing the two parameters, and then calls ListArgs. In the third case the compiler creates an empty array and passes it in.

In all cases, the output is as we expect.

In JavaScript, no matter how we declare the parameters to a function, when we call it we always get a variable called arguments automatically created for us that contains all the parameters passed in.

One reason for this is that JavaScript is very loose in how it treats parameters. If you declare a function with two parameters, say, you can call it with exactly two parameters as you'd expect, but you can also call it with fewer than two parameters, or with more than two parameters. In the first case, any parameters that were not provided by the caller will be set to undefined inside the function; in the second case, the extra parameters are just ignored.

Nevertheless, in that second case, we can discover the extra parameters by using the arguments array. JavaScript will construct this pseudo-array to contain all the parameters passed to the function.

Here's the JavaScript equivalent of the ListArgs method:

var ListArgs = function() {
    var logstring = "There are " + arguments.length + " parameters: ";
    for (var i = 0; i < arguments.length; i++) {
        logstring += arguments[i] + " ";
    }
    console.log(logstring);
};

ListArgs("hello", "world");
ListArgs();

The call with a string array doesn't really translate: we'd have to check internally to the function to see if a particular parameter were an array and deal with it accordingly. Anyway, you can see that ListArgs is defined not to accept any formal parameters at all. Nevertheless, we can get at them by using the arguments pseudo-array.

I want to stress that arguments, although it looks like an array, isn't. It's an object, certainly, and it has a length property and its properties are numbers sequentially increasing from '0', but that's only as deep as the deception goes. A lot of times, it doesn't really matter that arguments is not an array (those properties I mentioned are usually enough to be getting on with), but sometimes you'll find yourself caught out and need some array methods. In that case you'll have to copy the elements to a real array, or you'll have to use the apply method to use an array method on the arguments object.

Because arguments (and this for that matter) are created when a function is called, they are locked in scope to that call of that function. They do not participate in the normal function scoping. Which is obvious, if you think about it: if you have a function B nested in function A, then any reference to arguments in B will use B's arguments object and not A's. If you want to share A's arguments with B, then you'll need to copy them to a local variable in A.

Album cover for (the best of) New Order Now playing:
New Order - World
(from (the best of) New Order)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

JavaScript for C# programmers: object inheritance (part 2)

Continuing to learn JavaScript from the viewpoint of a die-hard C# programmer, using Firebug as our test engine.

In this episode, we continue writing the expression evaluator. (Please review part 1 before continuing.) You might want to have an extra browser open at the C# code, so you can follow along.

The next thing on the agenda are the result objects. In reality, the way I wrote the original code, there is only one failed result, whereas the successful result contains the RPN expression (or token, come to that) of the bit of the algebraic expression we got to. So let's code that up:

var failedResult = {
    failed: true,
    expression: null
};

var makeSuccessfulResult = function(expression) {
    var result = Object.create(failedResult);
    result.failed = false;
    result.expression = expression;
    return result;
};

Having coded it up, I'm not that happy with it. It seems as if I'm trying way too hard to force a class inheritance model approach to what are, after all, very simple objects. It would work equally well without the call to Object.create in there. I'll ignore my doubts for the moment and forge on.

Next is the parserState object. There's only one of these, so no inheritance semantics to worry about.

var makeParserState = function(expression) {
    var expr = expression;
    var position = -1;
    var current;
    var advancePosition = function() {
        position++;
        if (position >= expression.Length) {
            current = '';
        }
        else {
            current = expression.charAt(position);
        }
        return current;
    };

    advancePosition();

    return {
        getCurrent: function() { return current; },
        advance: function() { return advancePosition(); }
    };
};

I've coded this as a standard closure type function to give the state some privacy. The object that's returned has two public methods, getCurrent and advance, but a whole set of private members from the closure. There's the original expression string that we're going to read through, the current position, the current character, and a method to advance the string pointer and read the next character. (Remember that JavaScript has no character type; a character is represented as a one-character string.)

After I wrote it like this I discovered that charAt will return the empty string if the index is out of bounds; I had my C# hat on, obviously: I was assuming that it would throw an exception. So the private advancePosition method could be rewritten as

    var advancePosition = function() {
        position++;
        current = expression.charAt(position);
        return current;
    };

Now we get to the fun stuff, the actual parsing. In the original C# code, I wrote it all as a static class with static methods, but for now I wrote it in JavaScript as a set of methods. This is certainly not the best way, but I will get to that later.

First, parsing the operators:

var parseAdd = function(state) {
    var current = state.getCurrent();
    if ((current === '+') || (current === '-')) {
        state.advance();
        return makeSuccessfulResult(operator[current]);
    }
    return failedResult;
};

var parseMultiply = function(state) {
    var current = state.getCurrent();
    if ((current === '*') || (current === '/')) {
        state.advance();
        return makeSuccessfulResult(operator[current]);
    }
    return failedResult;
};

Notice something subtle going on. The call to makeSuccessfulResult is being passed a token and not an expression, as the implementation of that method would indicate. Keep that thought that at the back of your mind for now.

Parsing a parenthesized expression:

var parseParenthesizedExpression = function(state) {
    if (state.getCurrent() !== '(') { return failedResult; }
    state.advance();

    var result = parseExpression(state);
    if (result.failed) { return result; }

    if (state.getCurrent() !== ')') { return failedResult; }
    state.advance();

    return result;
};

This makes a call to parseExpression that we haven't written yet, to parse the bits in between the parentheses.

Parsing a number:

var parseNumber = function(state) {
    var current = state.getCurrent();
    var value = '';
    
    while (('0' <= current) && (current <= '9')) {
        value += current;
        current = state.advance();
    }
    
    if (current === '.') {
        value += '.';
        current = state.advance();
        while (('0' <= current) && (current <= '9')) {
            value += current;
            current = state.advance();
        }
    }

    var number = +value; // force conversion to number type
    if (isNaN(number) || !value) { return failedResult; }
    return makeSuccessfulResult(makeNumber(number));
};

This involves a couple of tricky bits, so follow along as I describe them. The value variable is going to be a string that we'll grow with the number-like characters we find in the expression. (Number-like in this respect means the digits and the decimal point — sorry, no internationalization yet). So we first gather all the digits we can. If there's then a decimal point, we add that, and then gather all the digits after the decimal point. Fun bit next: we force the string to be converted into a number type. We do this by using JavaScript's interpreter: we start the expression off with a plus sign. JavaScript will decide that the expression is going to be a number since this plus sign could only be a unary plus operator. We then give it the value string. Since the interpreter is in "making a number" mode, it will convert the string to a number which is what we want. (The alternative is to use parseFloat, or to use something like "1 * value".)

The big problem with this is if we start this method off with the current character not being a digit (say, a letter), then the string-to-number conversion will produce 0 without error. So we fail the parse if either the conversion failed (the value of number will then be NaN) or if the string is empty (remember an empty string is equivalent to false, so !value would evaluate to true).

Parsing a factor is easy, it's either a parenthesized expression or a number:

var parseFactor = function(state) {
    if (state.getCurrent() === '(') {
        return parseParenthesizedExpression(state);
    }
    else {
        return parseNumber(state);
    }
};

Parsing a term and parsing the expression are roughly the same (and I haven't yet extracted out the commonality, like I did with the C# code).

var parseTerm = function(state) {
    var operand = parseFactor(state);
    if (operand.failed) { return operand; }
    var rpn = operand.expression;

    var operator = parseMultiply(state);
    while (!operator.failed) {
        operand = parseFactor(state);
        if (operand.failed) { return operand; }
        rpn = joinRpnParts(rpn, operand.expression, operator.expression); 
        operator = parseMultiply(state);
    }

    return makeSuccessfulResult(rpn);
};

parseExpression = function(state) {
    var operand = parseTerm(state);
    if (operand.failed) { return operand; }
    var rpn = operand.expression;

    var operator = parseAdd(state);
    while (!operator.failed) {
        operand = parseTerm(state);
        if (operand.failed) { return operand; }
        rpn = joinRpnParts(rpn, operand.expression, operator.expression);
        operator = parseAdd(state);
    }

    return makeSuccessfulResult(rpn);
};

Both make use of a routine called joinRpnParts to stitch together the operands and operator postfix style.

var joinRpnParts = function(first, second, operator) {
    var result = makeExpression();

    var addTokens = function(operand) {
        if (operand.isToken) {
            result.add(operand);
        }
        else {
            operand.forEach(function(token) { result.add(token); });
        }
    };

    addTokens(first);
    addTokens(second);
    addTokens(operator);

    return result;
};

makeExpression is a slightly changed version of rpnExpression from last time. I added a isToken field to both the token ancestor and the RPN expression objects so that I could tell them apart. This is where I tried to resolve the token versus expression problem I alluded to before: I thought I was being clever here in using the lack of type safety to help me (and feeling all JavaScripty about it) and then having to hack this "well, is it a token or not" boolean in two different places (which to me says I'm not being JavaScripty enough). We'll sort it out later.

addTokens is just a helper function to add the a part onto the RPN expression. Before you get excited by the forEach call there, it's a function I wrote for the RPN expression object:

var makeExpression = function() {
    var expr = [];
    return {
        isToken: false,

        add: function(node) {
            expr.push(node);
        },

        clear: function() {
            expr = [];
        },

        evaluate: function() {
            var stack = [];
            for (var i = 0; i < expr.length; i++) {
                var token = expr[i];
                if (token.isOperator()) {
                    var rhs = stack.pop();
                    var lhs = stack.pop();
                    stack.push(token.evaluate(lhs, rhs));
                }
                else {
                    stack.push(token.value);
                }
            }
            return stack.pop();
        },

        forEach: function(action) {
            for (var i = 0; i < expr.length; i++) {
                action(expr[i]);
            }
        }
    };
};

As you can see, the forEach method takes an action function to call for each token in the expression array, and that for joinRpnParts just adds it to the expression being stitched together.

Finally we need a function to tie it all together:

var evaluateExpression = function(expression) {
    var state = makeParserState(expression);
    var result = parseExpression(state);
    if (result.failed) { return NaN; }
    return result.expression.evaluate();
};

console.log(evaluateExpression("1+2")); // outputs 3
console.log(evaluateExpression("1+2*3")); // outputs 7
console.log(evaluateExpression("((1+2)*(3-4))+((5*6)-(7/8))")); // outputs 26.125

Now that I've shown you all the code, I can reveal that I'm just not happy about it. Yes, it's a pretty close translation of the C# code to JavaScript (and I did copy/paste code from the C# implementation to help move things along — just a case of removing all the type identifiers, mostly), but it's just not good JavaScript to me.

Let me enumerate the problems as I see them:

  • Global properties and methods everywhere. This should make you worried; it certainly does me.
  • It's too wordy. JavaScript is interpreted at run-time: shorter identifiers will help speed it up. However, shorter identifiers mean it's less legible to us humans, but it should still be possible to reduce it all a bit.
  • There's way too much code duplication. I seem to have some parse methods in pairs, it should be possible to extract out a common method for each pair. The evaluate method for the RPN expression should probably use the new forEach method, and so on.
  • I have a code smell between RPN tokens and expressions: I'm having to work out which is which in a higher method; surely that should be pushed down into the objects themselves.
  • The parser state object seems way overkill (it's a remnant from a back-tracking parser implementation I once did).
  • Ditto the result objects. Plus the failed result object doesn't tell us where the problem occurred.
  • Whitespace removal? Surely since it's easier for us to read "3 + 4" than "3+4", the evaluator should discard white space when it needs to.

So, next time, some major refactoring. We're going to be JavaScripty if it kills us.

Album cover for The Mirror Conspiracy Now playing:
Thievery Corporation - Le Monde
(from The Mirror Conspiracy)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

About Me

I'm Julian M Bucknall, the M because it's my middle initial and because I and the other Julian Bucknall (the movie guy) would like to differentiate ourselves.

I'm a programmer by trade, an actor by ambition, and an algorithms guy by osmosis. I write articles for PCPlus in my spare time, not that there's much of that.

Julian M Bucknall Apart from that, an ex-pat Brit, atheist, microbrew enthusiast, Pet Shop Boys fanboy, slide rule and HP calculator collector, amateur photographer, Altoids muncher.

DevExpress

I'm Chief Technology Officer at Developer Express, a software company that writes some great controls and tools for .NET and Delphi. I'm responsible for the technology oversight and vision of the company.

The OUT Campaign

The OUT Campaign

Validation

Valid XHTML 1.0 Transitional     Valid CSS!

Bottom swirl

Archives

September 2010 (1)
SMTWTFS
« Aug  
1234
567891011
12131415161718
19202122232425
2627282930

Like this Archive Calendar widget? Download it here.

Search

Google ads

My Tweets

Bottom swirl