HTML5 and validation issues with this blog : Algorithms for the masses

HTML5 and validation issues with this blog

OK, I was nuts. If it ain’t broke, don’t fix it, right? But what the heck, it’s part of the technology stack I’m supposed to know and use and promote, and furthermore I have a text editor and know how to use it.

The issue is I’ve switched this blog to HTML5 markup rather than XHTML, which is what it was before. No sweat, you snort, the doctype is way simpler:

and you can get rid of that other XML namespace gubbins on the

tag:

Nice and easy. Morceau de gâteau. Yeah, that’s what I thought. Except when I then ran the W3 validator on it I got a rather bizarre error on the

tag: “application/xhtml+xml is not an appropriate Content-Type for a document whose root element is not in a namespace”. Whuuuuut?

Some searching later (and boy, was I getting ratty about it), I managed to piece together what was happening. So in the interests of getting this out there in some searchable fashion, here goes.

When you visit a website in a browser, you send along a request to the site and part of that request is the Accept header. Here’s what Firefox sends for the Accept header for the main page for my blog, as an example:

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

What this says (roughly) is that Firefox will accept an HTML document, an XHTML document, or a plain XML document. All of the other browsers send something similar. IIS7 (the web server my hosting service uses) notes this and replies with a response that has the following Content-Type header:

text/html; charset=utf-8

Magic. “Hey, Firefox, got your request, and here’s an HTML document for you.” That’s supposed to be what happens. All is good.

Unfortunately for me, the W3 validator was sending a similar Accept header as part of the request, but IIS7 was deciding – all on its lonesome – that it was going to send back this Content-Type header instead:

application/xhtml+xml; charset=utf-8

At which point, the W3 validator was going to start complaining that the document it got wasn’t valid XHTML, and in particular there was no XML namespace declared in the

tag. Without that, it gave up trying to decipher the document (that is, the web page) because the markup failed the first validation question of all.

So why was IIS7 deciding that it should declare the document as containing XHTML or XML? Who knows. There’s an interesting blog post from some chap at Opera that seems to suggest that, absent any other clues, IIS7 does some user agent sniffing and makes some kind of decision based on that. Maybe it was doing the same with the W3 validator. That left me with a problem though. How could I fix this, given that my access to IIS7 on my hosting company is pretty thin?

Luckily, I use GraffitiCMS to serve this blog. GraffitiCMS is open source, so I can change it if needed, which is what I did. I opened up TemplatedThemePage.cs (all public-facing pages are of this type), and added this line of code in the OnLoad method in order to force the response’s Content-Type to HTML:

protected override void OnLoad(EventArgs e)  {
    base.OnLoad(e);
    Response.ContentType = "text/html"; // force HTML type
    Initialize();
    //...

No more was I going to allow IIS7 to decide for me. A quick recompile, and then I uploaded the DLLs to my site, and the W3 validator was satisfied that the content was HTML.

And then proceeded to detail a bazillion errors.

There are two big ones that I want to talk about right now. The first one is the Grooveshark widgets I’ve been using recently. In essence, they’re copy-and-paste object elements from Grooveshark (now long dead and gone), and they break W3’s HTML5 validation big time. I really can’t be bothered to work out the changes I would need to make every time I use them, so as soon as I have time to update each post, they’re gone.

The second one is a small bug in Windows Live Writer (and I’ve just reported it). For images, WLW adds a border attribute to the img element – specifically border="0" – and HTML5 has deprecated that particular value for that particular attribute, suggesting that you use CSS instead. Hence every image on my web pages produces a warning. The problem is there’s no way to alter the default HTML markup that WLW produces for an image, and so I will have to go in to the Source View to edit the markup for every blog post to remove the warnings. Oh joy. Looking forward to that when I have time, although it can be done with a simple search/replace.

There are some other small bugs (can’t have the clear attribute on a
tag, etc), but they’re pretty easy to fix (an easy search/replace again).

After that I can start experimenting with the site’s templates to use header and footer elements, and so on; the whole panoply of HTML5. Stay tuned for more news on that.

Now playing:
10cc - Une Nuit a Paris: One Night in Paris, Pt. 1/The Same Night in Paris, Pt. 2
(from The Original Soundtrack)

Thu 19-Jan-2012 7:34 AM Blog / tags: html5 http accept content-type

Loading links to posts on similar topics...

previous post next post

No Responses

Feel free to add a comment...

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

Emphasize with italics: surround word with underscores _emphasis_
Emphasize strongly: surround word with double-asterisks **strong**
Link: surround text with square brackets, url with parentheses [text](url)
Inline code: surround text with backticks `IEnumerable`
Unordered list: start each line with an asterisk, space * an item
Ordered list: start each line with a digit, period, space 1. an item
Insert code block: start each line with four spaces
Insert blockquote: start each line with right-angle-bracket, space > Now is the time...

by Julian M Bucknall