Posts filed under the 'PCPlus' category

PCPlus 283: Learning the ropes

For July 2009, I managed to snuff out something about ropes, a kind of heavier programming type than string. Apart from the pun/joke, it was an interesting article to research. I have no recollection any more about how I came across the rope type; presumably it was through a late night surfing session, fueled with some microbrew.

Ropes are like strings in that they store text, but unlike strings in that the bytes making up the text are no longer contiguous in memory. A rope will split up the text into nodes and build up a tree with them. The original text can be recovered by reading the nodes in an in-order traversal.

Why would anyone do such a thing? Well it’s great for building up a string through lots of insertions/deletions — in effect it’s the same performance argument as inserting into/deleting from an array versus a linked list — and it’s good for really long strings. The final benefit is all about heap memory usage: long strings (or any large allocations) in run-times like .NET tend to stick around after disposal since the garbage collector is tuned more to small allocations rather than large ones.

Of course, these benefits come at a cost: for smaller pieces of text, the effort in manipulating them (reading the nth character, copying them, etc) tends to have high overhead, both in terms of memory and in terms of performance. So, not for every situation, by all means.

I couldn’t find a reference implementation of ropes in C# or .NET, but there is one for Java as part of this discussion here. Personally speaking, I’d have to say that the StringBuilder class in .NET would suffice in many scenarios and the effort of creating an implementation of a rope class probably not worth the effort.

This article first appeared in issue 283, July 2009.

You can download the PDF here.

(I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so.)

Now playing:
Ferry, Bryan - Tokyo Joe
(from In Your Mind)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 282: Understanding ternary trees

June 2009’s article was a reversion to what might be called straight computer science after a few months of layman’s topics (indexing the internet, spellchecking, etc) and covered ternary trees. Quick overview: ternary trees are a speedy, space-efficient data structure for storing large numbers of key-value pairs that in certain situations are better than hash tables.

PCPlus logoFor some reason, ternary trees really gelled with me way back when I first found out about them. They were invented by Jon Bentley and Robert Sedgewick and first described in Dr Dobbs in April 1998 (article). I actually wrote about them in June 1998 in The Delphi Magazine (a pretty quick turn-around, considering it had code as well — although buggy as I discovered later) and presented a session on them at DCON 1999 (with corrected code). So it was good to get back to them some 11 years later for another audience.

The article does a pretty good job discussing the precursors to the invention and shows why ternary trees are so important for key-value lists.

For some reason, although there were a couple of nice images of trees that enhanced the article, the editor or page designer put in a picture of a Sierpinski triangle (or gasket, as it’s sometimes known) which, unfortunately has nothing to do with ternary trees. Well, apart from the association with “three” I suppose.

This article first appeared in issue 282, June 2009.

You can download the PDF here.

(I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so.)

Album cover for PromiseNow playing:
Sade - War of the Hearts
(from Promise)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 281: Indexing the Internet

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.

PCPlus logoI was in England when the May issue came out, so I’m able to post this a little earlier than usual (my Barnes & Noble generally gets an issue 5-6 weeks after it appears in newsagents in England).

This particular piece was a pure layman’s article about how to index text and in particular how big search engines index web pages. I covered the usual suspects: inverted indexes and PageRank, with asides on stemming and SEO (search engine optimization).

As it happens, in doing the research for this article, I read Sergey Brin & Larry Page’s seminal paper The Anatomy of a Large-Scale Hypertextual Web Search Engine for the first time. This was the paper that essentially launched Google and that changed the landscape of search engines. The techniques discussed in this paper have obviously improved in the 12 years since then (I dare say that Google no longer just uses PageRank but instead use a panoply of different indexing mechanisms to improve results), but it is still an excellent exposition of what happens in a large-scale search engine.

And... 12 years ago? How the internet has changed since Brin and Page presented their paper at the Seventh International World-Wide Web Conference in 1998.

This article first appeared in issue 281, May 2009.

You can download the PDF here.

Album cover for HeligolandNow playing:
Massive Attack - Babel
(from Heligoland)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

Ray tracing image from June 2010’s PCPlus

I’ve just sent off June 2010’s article for PCPlus to my editor, just a smidgeon late. A couple of days is all. It’s on ray tracing, something I’ve wanted to discuss and play around with for a while. I downloaded POV-Ray, an open-source ray tracing renderer for Windows, OSX, and Linux to use as a test-bench, and spent some fun hours with it.

PCPlus logoFor the article I had to create an original image. Well, not ‘had to’ exactly, but I thought it only right that I show something that didn’t come from wikipedia or some other ray tracing enthusiast’s site. I certainly didn’t want to show the standard reflective ball hovering over a checkerboard image, although I admit snagging the sphere code from Christoph Hormann’s site. I decided to go for an image showing a 6×10 pentomino solution, since the previous article was about pentominoes and how to solve geometric puzzles with them.

Here’s the final image, after I’d spent entirely too much time this morning messing around with various options instead of completing the ruddy article.

Raytraced pentomino solution

(Click to make larger.) In essence I wanted to show off most of the topics I discussed in the article in one image. The pentominoes are translucent, so the shadow is colored. There are two light sources, a main white one and a slightly reddish-tinged dim one. The spheres reflect each other, the solution, and the shadows.

If you’ve downloaded POV-Ray and want to generate this image yourself, here’s the code. If you want to read the article, buy PCPlus’ June issue when it hits the newsstands, or wait until June 2011 when I’ll publish it here on this blog.

Album cover for Shepherd MoonsNow playing:
Enya - Marble Halls
(from Shepherd Moons)



Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 280: Writing a spellchecker

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.

PCPlus logoSince I picked up April 2010’s issue when I was in England just over a week ago, I’m publishing this article from April 2009 slightly earlier than I usually do (which means May 2009’s article will be posted in June or something).

The topic is pretty interesting: how does a spell-checker give you alternatives to a badly-spelled word? The first step — finding out if a word is misspelled — is simple enough as a basic algorithm: just look it up in a big ol’ list o’ words. The next step can seem daunting though: how can you generate a valid list of possible corrections? The article talks about a few possibilities: the Levenshtein distance, the Damerau algorithm, and the Soundex method.

The Levenshtein distance is a bit impractical for a large list of words, although there are certain techniques you can use to speed things up. It essentially calculates the edit distance (the number of character insertions and deletions to go from one word to another) of the misspelled word and every other word in the list. The words with the smallest edit distance are your correction candidates. As I said, possibly too slow for a large dictionary, although it’s a great algorithm for creating diff tools.

The Damerau algorithm essentially calculates all the words that can be formed form the misspelled word by a single mistyped letter and checks those in the dictionary. Those that are present are valid correction candidates. Although single mistakes are common, two or more mistyped letters in a single word is fairly rare, so the algorithm does pretty well (as the article shows).

The Soundex algorithm is a “phonetic” algorithm: it finds candidate words that “sound like” the misspelled word.

Of course the article doesn’t go into deeper, harder algorithms for space reasons. For example, Bloom filters are great for compressing the dictionary at the expense of a few false positives and can be used to replace the article’s hash table for the word list. Different dictionaries could be employed that just store root words and then describe possible suffixes, prefixes, and circumfixes that can go with each word (for example, Hunspell). Better phonetic algorithms now exist than Soundex (for example, Metaphone). Nevertheless, I think the article hangs together quite well and I’m pretty pleased with it.

This article first appeared in issue 280, April 2009.

You can download the PDF here.

Album cover for Plunkett & Macleane [Original Score]Now playing:
Armstrong, Craig - Revelations
(from Plunkett & Macleane [Original Score])


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 279: JPEG compression

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.

PCPlus logoOne of the topics I wanted to write up back when I had a two-page article was how JPEG compression worked, but I didn’t think I could cover it adequately in such a small space. So for March 2009 I tried with my new three-page allowance, but found that it was equally as difficult. Trouble is, there’s so much to talk about: colour spaces, DCTs, downsampling, Huffman encoding, and so on, so forth. So in the end, it turned into more of a layman’s discussion than any kind of deeper/broader article that laid down the foundations of why JPEG compression works, and why, sometimes, it doesn’t very well.

I also tried to show with an image, what the conversion of an RGB image to the YCbCr colour space would look like, and completely ignored the fact that, although our screens use the RGB colour space, printers use the CYMK colour space. I was expecting it all to get translated from RGB to CYMK properly and the inks to cooperate as they were laid onto paper, etc. Even looking at the PDF of the article, that “decomposition” image looks weird. This is what it should look like (click on it for the full size image):

RGB to YCbCr conversion

I actually created each supplementary image in code by using the RGB-YCbCr conversion equations on the original image and then stitched them together.

    private void button1_Click(object sender, EventArgs e) {
      Bitmap input = new Bitmap(@"D:\Users\Julian M Bucknall\Pictures\Group.jpg");

      Bitmap lumaImage = new Bitmap(input.Width, input.Height, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
      Bitmap crImage = new Bitmap(input.Width, input.Height, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
      Bitmap cbImage = new Bitmap(input.Width, input.Height, System.Drawing.Imaging.PixelFormat.Format24bppRgb);

      for (int c = 0; c < input.Width; c++) {
        for (int r = 0; r < input.Height; r++) {
          Color color = input.GetPixel(c, r);
          int Y  = (int) Math.Round(( 0.2990 * color.R) + (0.5870 * color.G) + (0.1140 * color.B));
          int Cb = (int) Math.Round((-0.1687 * color.R) - (0.3313 * color.G) + (0.5000 * color.B));
          int Cr = (int) Math.Round(( 0.5000 * color.R) - (0.4187 * color.G) - (0.0813 * color.B));
          color = Color.FromArgb(Y, Y, Y);
          lumaImage.SetPixel(c, r, color);
          Cr *= 2;
          if (Cr >= 0)
            color = Color.FromArgb(Cr, 0, 0);
          else
            color = Color.FromArgb(0, -Cr, -Cr);
          crImage.SetPixel(c, r, color);
          Cb *= 2;
          if (Cb >= 0)
            color = Color.FromArgb(0, 0, Cb);
          else
            color = Color.FromArgb(-Cb, -Cb, 0);
          cbImage.SetPixel(c, r, color);
        }
      }
      lumaImage.Save(@"D:\Users\Julian M Bucknall\Pictures\GroupLuma.jpg", System.Drawing.Imaging.ImageFormat.Jpeg);
      crImage.Save(@"D:\Users\Julian M Bucknall\Pictures\GroupCr.jpg", System.Drawing.Imaging.ImageFormat.Jpeg);
      cbImage.Save(@"D:\Users\Julian M Bucknall\Pictures\GroupCb.jpg", System.Drawing.Imaging.ImageFormat.Jpeg);
    }

Well, OK, there is a little bit of futzing around in there so that you can more easily visualize the Cr and Cb images. I can’t remember now why I chose the code I did; only that there was quite a bit of experimentation behind it so that the results would look good when printed, albeit inaccurate.

Although I’m not 100% happy with how the article turned out when printed, and possibly three pages is still too few to do the subject justice, I did enjoy the research and the playing around that went into it. It is a fascinating topic; one that has more than its fair share of voodoo (where do those constants in the RGB-YCbCr conversion equations come from again?).

This article first appeared in issue 279, March 2009.

You can download the PDF here.

(Quick aside: PCPlus used to put part of their archive as PDFs on the DVD in the back of the magazine. They’ve now moved to a CD instead of a DVD, presumably to save on costs, and the archive is no longer on there. I hear they’re going to publish it online instead, sometime in the near future.)

Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 278: Rainbow tables

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.

PCPlus logoFebruary 2009’s article was a "’commission’ in the sense that Martin Cooper, the Editor of PCPlus, wrote to me asking what I knew about rainbow tables and wouldn’t it be a good idea if I wrote an article on them. I don’t know about you but, but when the Head Honcho says wouldn’t it be a good idea, you take it as do it now. So I did.

Actually I didn’t know much about them to begin with and the research proved interesting and pleasurable. In essence, rainbow tables are a technique using large pre-computed tables that help you crack hashed passwords. The way it works is to use a class of functions called reduce functions that calculate a contender password from the hash. These reduce functions are used alongside the hash functions, applied in a chain: hash followed by reduce followed by hash. You end up with a candidate password and a final hash, but that chain covers all the intermediary passwords that were also hashed. Given enough reduce functions (thousands of them) and enough time, you’d create a large table of initial passwords and final hashes.

To crack a password, you get its hashed value and check that hash to be in your table. It it is, reproduce the chain until you get to the point where you can read off the password. If not, reduce the given hash with that final reduce function, hash the results, and check that new hash to be in the table. Continue this cycle until you find a match of the computed hash and an entry in the table (and therefore the password from regenerating the chain), or run out of reduce functions. For a more complete description, read the article.

This article first appeared in issue 278, February 2009.

You can download the PDF here.

(Quick aside: PCPlus used to put part of their archive as PDFs on the DVD in the back of the magazine. They’ve now moved to a CD instead of a DVD, presumably to save on costs, and the archive is no longer on there. I hear they’re going to publish it online instead, sometime in the near future.)

Album cover for HeathenNow playing:
Bowie, David - I Took a Trip on a Gemini Spaceship
(from Heathen)


Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 277: Dictionaries and hash tables

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.

PCPlus logoJanuary 2009 was an important change for me. It seems that the Editor was pleased enough with my pieces (and I presume so were the readers) that my commission each month expanded to three pages instead of the prior two (or, if you’re counting, 2000 words instead of 1300). Not quite a 50% increase in pay to go along with the 50% increase in surface area — ha! — but to be quite honest I didn’t particularly care, and still don’t: I really enjoy writing for them and the money they pay is pretty good anyway. The change in word count meant that I could start to do my topics in more depth. Before I would sometimes be struggling to contain the topic in the space, but now the extra room allowed me to cover more detail.

The first article in this new expanded section had to be a good one. I decided to cover one of my favourite data structures: the hash table or dictionary.

Even with the extra space, all I could cover was the basic hash table together with linear probing as a collision resolution mechanism (and example of open addressing) and the problems of clustering. Hash tables with open addressing is still one of my favourite ways of implementing a dictionary, and so writing the article was fairly quick.

I liked doing the figures too, although Figure 2 is a bit bizarre without some explanation (time is meant to be read from top to bottom, first we insert a record whose hash resolves to index 2, then one at 17, then one at 11, etc; the more we add records, the more collisions there are and they tend to cluster). The figure cries out for animation, as I discovered recently when I slipped in a couple of images of hash tables to my CTO video on seeing things in black and white. There I wanted to show how disruptive it can be when a hash table grows, and it too needed animation, especially in a video like that. I’ve since found out that Illustrator can produce Flash animations from a series of layers, etc, but I haven’t had a chance to play around with that as yet.

Double-take: a hash table “disruptive”? Indeed, yes, under certain situations. Much is made of the fact that insertion and search in a hash table is O(1), that is, it’s constant time whether there are 10 items in the table or 10,000. Not strictly true, as it happens, it’s more of an amortized O(1) over many insertions or searches. The reason is that, during an insertion, there may be a point when the load factor is too high (as explained in the article, for open addressing that’s assumed to be about 2/3 full), and the hash table has to be grown. This requires allocating a new array (traditionally it’s set to twice the size of the original), and then rehashing and inserting all of the current items into the new array. This is an O(n) operation and it happens every n items, so, amortized, it smears out to a constant addition factor over all n items. Whoopee for the big-Oh notation, but in practice, if you have a hash table containing a huge number of items, the time taken for this growth may be noticeable by the user or by a real-time process that’s not expecting it.

This article first appeared in issue 277, January 2009.

You can download the PDF here.

(Quick aside: PCPlus used to put part of their archive as PDFs on the DVD in the back of the magazine. They’ve now moved to a CD instead of a DVD, presumably to save on costs, and the archive is no longer on there. I hear they’re going to publish it online instead, sometime in the near future.)

Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 276: Space, the final frontier

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there's an Xmas issue as well — so it's a bit more than monthly). The column is called Theory Workshop and appears in the back of every issue. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. After all, the PDFs do appear on each issue's DVD after a few months. When I buy the current issue, I'll publish the article from the issue a year ago. I popped over to B&N this lunchtime and bought the Christmas issue, so here's Christmas 2008's article.

PCPlus logoThis article was about caching data to make algorithms faster. I used a couple of examples to prove my point: Fibonacci numbers using the recursive algorithm and primality testing.

However — I don't know why — it just didn't come out the way I wanted it to. Rereading it now, 15 months after writing it, it feels very under-researched, boring. Possibly I tossed it off in an afternoon under deadline. I also note that the images were added by my editor, they weren't mine (besides which, I only had one). A very disappointing article, I'm afraid, certainly not one of my best. Meh.

This article first appeared in issue 276, Christmas 2008.

You can download the PDF here (but, really, I wouldn't bother).

Album cover for Confessions On A Dance FloorNow playing:
Madonna - Get Together
(from Confessions On A Dance Floor)

Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

PCPlus 275: Ant Colony Optimization

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there's an Xmas issue as well — so it's a bit more than monthly). The column is called Theory Workshop and appears in the back of every issue. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. After all, the PDFs do appear on each issue's DVD after a few months. When I buy the current issue, I'll publish the article from the issue a year ago. I just now popped over to B&N and bought December's issue, so here's December 2008's article.

PCPlus logoThis particular article is about a fascinating optimization technique that, for some reason, I find easier to understand and implement than, say, genetic algorithms or simulated annealing.It also gave the PCPlus designer an opportunity to have a whopping big picture of an ant on the heading graphic.

Ant Colony Optimization (ACO) is a technique to solve NP-hard problems like the Travelling Salesman Problem (TSP). In essence, you use a model of an ant to wander randomly over the problem space. The ant will find a particular path to a solution and in doing so will leave a digital pheromone along his path. The longer the walk the smaller the pheromone density deposited, the shorter the distance the stronger the pheromone density. Launch a few hundred more ants over the space, and they will tend to follow higher concentrations of pheromone, but still investigate random walks. Eventually, you'll have a pheromone path to a solution that is likely to be fairly optimal. There are a few knobs to tweak in the algorithm, such as how quickly the digital pheromone evaporates.

Because of the "path" aspect, ACOs are great for solving TSP-type problems, and to illustrate it I showed a map of England with 5 major cities on it and invited people to solve the TSP by hand. To emphasize the paths between the cities as being distinct paths I had to smudge Birmingham's position a little because it's directly on the way between London and Manchester, and put it somewhere on the Welsh Borders. Sorry, Ludlow, for dumping Birmingham on you; I should have chosen another city like Bristol instead of Manchester.

This article first appeared in issue 275, December 2008.

You can download the PDF here.

Album cover for ZoolookNow playing:
Jarre, Jean Michel - Ethnicolor
(from Zoolook)

Share it: Digg It!  StumbleUpon  Reddit  Del.icio.us  NewsVine  Furl  BlinkList  Ma.gnolia  Technorati

About Me

I'm Julian M Bucknall, the M because it's my middle initial and because I and the other Julian Bucknall (the movie guy) would like to differentiate ourselves.

I'm a programmer by trade, an actor by ambition, and an algorithms guy by osmosis. I write articles for PCPlus in my spare time, not that there's much of that.

Julian M Bucknall Apart from that, an ex-pat Brit, atheist, microbrew enthusiast, Pet Shop Boys fanboy, slide rule and HP calculator collector, amateur photographer, Altoids muncher.

DevExpress

I'm Chief Technology Officer at Developer Express, a software company that writes some great controls and tools for .NET and Delphi. I'm responsible for the technology oversight and vision of the company.

The OUT Campaign

The OUT Campaign

Validation

Valid XHTML 1.0 Transitional     Valid CSS!

Bottom swirl

Archives

September 2010 (1)
SMTWTFS
« Aug  
1234
567891011
12131415161718
19202122232425
2627282930

Like this Archive Calendar widget? Download it here.

Search

Google ads

My Tweets

Bottom swirl