The HTML5 syntax options problem
Usually blog posts with the words “problem, “considered harmful” and similar are just crying foul, but I would like to bring up something I actually believe is/will become a real problem: HTML5 syntax options.
Available syntax options
In strict XHTML, we needed lowercase tag names, quoting our attributes, and an attribute had to have a value (e.g. selected="selected"
). We also needed to close all empty elements, e.g. meta
, input
, img
etc, like this: <input type="text" />
.
But in HTML5, all this goes:
- Uppercase tag names
- Quotes are optional for attributes
- Attribute values are optional
- Closing empty elements are optional
All of these examples are valid HTML5:
<DIV>Hello Friday</DIV> <p id=abstract>Welcome to this blog post!</p> <input type=text> <input type="text"> <input type="text" />
Also, some people are under the impression that the closing of elements etc (basically, XML-formatting rules), only apply if you serve a web page as application/xhtml+xml
or application/xml
. This is not true!. XML-style syntax is just as allowed in regular text/html
, and some people prefer that syntax.
The choice of the syntax
I believe the WHATWG did the probably only viable choice, and decided to allow all the general syntax variations. When it comes to XML syntax or not, if they had chosen just one path, it would severely hinder the adoption amongst those who wouldn’t agree with it. When it comes to allowing uppercase tag names, no quoting of attributes etc, I believe it’s there to accommodate to code generated by CMSs and other tools.
And, at the end of the day, web browsers will generate the same DOM no matter what syntax one prefers.
The problem
As you can see with all these options, though, it is very hard to ensure that a consistent syntax is used for all HTML code, and we miss out on rigor and code quality. Basically, it is very hard to make something fail validation under these rules; we end up with an “Anything goes”-scenario. And after all these years of fighting for stricter syntax, better code practices etc, this is naturally not a good evolvement.
How do we get stricter syntax?
There are a few options to try and accommodate to this problem:
- Change the spec
- We could change the specification to say that XML-syntax is only allowed for
application/xhtml+xml
, and HTML-syntax, strict-style, otherwise. However, as mentioned above, I believe in people choosing the syntax type they prefer. - Evangelism
- We could all try to evangelize a better syntax, just as we have done all these years. However, I believe this will never be enough: people will evangelize for different syntaxes, and whatever web developers choose in the end will validate (even a mix of syntaxes will).
- Offering tools with different validation options
- There could be an offering of tools, and the validator itself, to have options for a stricter syntax. Therefore, rather helping developers by code auditing, than forbidding things through a specification.
I believe the last option here is the only realistic one we could choose. Also, let me point out that I do not put an equal sign between a strict validation option and XML-syntax. For some it is, but for others it’s not. Stricter for me personally means good HTML 4-style: lowercase tag names, quotes and values mandatory for attributes and closing of all non-empty elements (i.e. all elements that have en end tag). Maybe there should be more than one option in the validator.
But is this enough? Can we continue to fight for sensible and best practices syntax where tools/validator options are sufficient? Or, have just (re)opened Pandora’s Box for the web?
I think you hit the nail on the head Robert, tools to help check for best practice code is the only way to enforce best practice HTML.
It's a bit like coding standards, where best practice is to have a standard and stick to it for consistency. For those people who care about how they write code they'll use tools to help them write good HTML5. For those that don't there's not much change from now (I doubt such people use strict XHTML, for example).
This kind of task may be for coding standards tools (i.e JSLint, PEAR's CodeSniffer) as opposed to a validation tool. Someone's just got to write one π
I can totally understand the rants about HTML5 design choices. I liked and used the <code>X</code> in <code>XHTML</code> and made it extensible using namespaces. All the years of educating people to use strict XHTML, this feels like one step back.
Whats a validator good for when everything is possible, specs should enforce best practices. People have to write more code-guidelines and hopefully we'll get a validator that allows a "picky" mode π
But at the end of the day, its just the simple markup language to make websites and there are bigger problems to solve in today's projects.
I agree to some point. I prefer non self closing lowercase double qouted style but apparently there are a lot who likes self closing tags (to me its just extra bytes forced to clients to download). The only advantage of anything goes is that I can use html5 with django today. They use self closing for some stuff and i can use my own style.
For once I have to disagree.
Strict syntax rules are very important. Many of the problems with front end coding today are a direct result of HTML never having had strict rules previously.
XML style syntax should be the way forward and it should be enforced. Tools and validators can be ignored all too easily and then we end up where we are which is not a good thing.
Fewer syntax options are simpler to learn, easier to validate, easier to parse and render and will make for more robust code which is more easily understood and tested.
This has been discussed on the mailing lists.
A few solutions have been proposed. Here is mine. There are more relevant links in the comments on that page.
Like many others, I also have mixed feelings about HTML5 syntactic options. Working with valid XHTML makes for very easy re-use with in-browser WYSIWYG editing, RSS feeds, selecting partial content, etc. We have grown to appreciate the stricter syntax and have built a multitude of tools to make everyday developerment life easier.
We will have to invent a whole new toolset if we can expect users and tools to produce some very "interesting" markup.
Anyone remember Frontpage?
While some tools already exist (Hpricot, Nokogiri, etc) for extracting well-formed XML from HTML, most of these tools exist in non-browser environments, and would make for very in-elegant re-use. (Not to mention that surprising users by modifying their markup is just not sane.)
The overall success of the web has often been attributed to user-agents beeing so forgiving, and that is all fine and dandy. But, we can't expect every developer on the planet to have the same resources available as Microsoft, Mozilla Foundation, Apple, Google, Opera, etc.
I feel somewhat ambiguous about leaving it up to the audience to define what "quality html" is, and build tools to try and derive some semantic meaning from syntactic-soup. On one hand, it allows for great freedom, but on the other hand, it's going to create a lot of needless discussions with developers / content creators that don't care as much about markup as we do.
"It validates! What more do you want?"
So, +1 for strict syntax … I'll certainly keep preaching the merits of having strict, well-defined un-ambiguous syntax.
So, the question arises: Who should define what "stricter syntax" means? As long as there are no standards, a tool maker can't just pick a set of rules and go with it.
Perhaps there should be a "valid" HTML5, and a "recommended" HTML5 syntax? One that has a little stricter syntax, and that the HTML5 authors recommend that new documents be written in. I'm fine with someone else choosing that stricter syntax for me…
Thanks for your comments!
simon,
Glad that you agree!
Harald,
I agree to certain extent that specs should be about good code, but, however, not too narrow either; my hope is with some validation option.
Andreas,
That's just fine, but what you get then is a mix of self-closing and non-self-closing in the same document (I presume?) which breaks code consistency.
James,
<blockquote cite="">
Fewer syntax options are simpler to learn, easier to validate, easier to parse and render and will make for more robust code which is more easily understood and tested.
I think that is a great quote! However, I do not agree about enforcing XML syntax, since I believe proper, more strict, HTML 4 and alike is just as good. But don't you think this could be put as a validation option, maybe even default, instead of limiting the specification?
Lars,
Interesting post and good comments – thanks!
Morgan,
It's a very good point: what about aggreation, integration etc? Sure, if you own the code you can fix it, but for other web sites, "sloppy" code might make it that much harder.
Emil,
Off the top of my head, I guess stricter would mean lowercase tag names, quoted attributes and perhaps no attribute minimization. As I heard on the WHATWG IRC today, though, it seems that the HTML5 Super Friends are about to offer guidelines to the WHATWG for just that.
Everyone benefits from a solid set of high level rules! The browser developer wouldn't have to deal with tag soups, the front end developer, parsers, validators and markup generators (cms's) would all have a solid set of rules to conform to.
Who benefits from a sloppy syntax? I don't get it.
The 'stricter' rules you talk about are just the simple set of well-formedness of XML. It makes for easier-to-follow markup and happy campers!
regards/jens
I basically gave up on these kind of discussions and just go by what I feel is logical, consistent and not needlessly verbose. I don't care anymore if others agree or don't, but it seems you and I agree to some point:
– use lowercase tagnames
– use quotes for attributes
– use closing tags for all non-empty elements
In addition, or in contrast I:
– don't use values for 'boolean' attributes; selected="selected" makes me feel I'm doing double the work for the same result
– don't use X(HT)ML style self-closing tags; the concept of empty elements is special for (X)HTML so you must know anyway which elements are concerned (else you might be tempted to write ). There's in my opinion no need to emphasize that in HTML
– do (sometimes – so much for consistency :P) omit quotes for numeric attributevalues (width, height, etcetera) – but normally most of those go into CSS ofcourse
So in conclusion I disagree with those that think that XML-syntax should be enforced; I think it is just too verbose in some cases without really being more 'strict' in the sense that you cannot define a consistent rule to handle such cases.
I have been tempted sometimes to also omit (closing)tags that are optional in HTML but sofar never actually did that. I won't however disagree with such practice if it was done by an HTML minifier before serving up content.
Not so Robert.
But with HTML5's departure from SGML (no more null end tags) and the ignoring of "/" in xml style self closing tags (parsers have been doing this anyway). It's becoming a reality but you still can't use self closing tags on elements that actually require closing tags in text/html eg.
<code><script type="application/javascript" script="example.js" /></code>
I don't see a problem with a flexible yet clearly defined syntax. The rules you've pointed out are a million miles away from 'tag soup' which is a term that should only accurately be used to describe browsers attempting to make sense of a malformed DOM tree.
Still – it's only realistic to say that User Agents should still attempt to render every document as best they can. Draconian error handling would kill the web.
Yesterday I digged a little deeper in HTML 5. At first I was confused because I thought the authors of HTML 5 would have learned of HTML 4 and XHTML. But HTML is still inconsistent and I'm deeply disappointed. How can it possible that people working on HTML 5 make such a bad job. So the game goes on.
On the other side one will be able to distinguish good and bad work.
Jens,
From my perspective, I would also find it easier with a more strict syntax. However, I believe the spec writers have so many things to take into consideration that they need to allow more options – and that perhaps we can complement that with a more strict validation tool.
Tino,
I pretty much completely agree with what that syntax to use, and the rest of the things you say. Also like your openness about consistency! π
Aldrik,
Absolutely. To correct myself: the XML-syntax that is allowed, and that people have gotten used with, from XHTML: e.g. closing of empty elements.
Andy,
Yes, i agree about rendering as good as they can. draconian error handling is never a a viable option, at least not in my book.
Bernd,
Well, you have to take into consideration what legacy they have to think about. A lot of the work behind HTML5 is what we can use today, and not only what would be perfect in the future.
I agree. And you didnΒ΄t mention the worst thing about the HTML5 syntax.
(Implict closing elements by opening another element.)
for example the following code fragment is valid HTML.
<p>Here i open a p-tag
<p>Here we open another one, wich also closes the first
We need a Lint-Tool for [X]HTML[5], wich ensures writing clean, stricter and more browser-compilant code.
A Lint-Tool is not only needed to make HTML5 stricter, it can also be used to avoid not cross-browser-compilant XHML-Syntax like <span />, <script src=”behavior.js” /> etc.
alex,
Absolutely, good point!
Some people prefer verbose markup. Others prefer clean, lean markup, and making pages as small and as fast as possible. HTML5 allows both syntax styles – and that’s a feature, not a problem.
I'd really like to see a strict syntax enforced. Perhaps it's my ocd but I hate it when something validates that isn't exactly the same syntax as another site. I'm glad to not see loose/transitional/strict anymore but if it only validated with strict, it just makes everything mo bettah! π
But HTML syntax hasn’t changed in 10 years! (maybe longer, haven’t checked that far)
HTML5 hasn’t created any new syntax. It merely documented the syntax you’ve been using all along!
Yes, even documents you thought were XHTML working in IE were in fact using syntax that is now called HTML5.
It doesn’t matter what you think you’re writing. In the end what matters is how browser interprets it, and that hasn’t changed.
Your only option is evangelism. Even changes in the spec will amount to evangelism, because the spec is unable to change reality (at least not until last copy of today’s browser and pages die out).
There is no 'syntax problem'. All that HTML5 did was properly record the actual allowed syntax on the web that pages depend on. Now browsers can be confident that they're doing 'the right thing', and not have to spend inordinate amounts of money on QAing and testing with other browsers to see if there are any spots where they're doing something 'wrong' and other browsers handle it better.
It was literally impossible to do anything else and have a document that actually described how to parse documents on the public web. Parsers aren't just for future documents; they have to deal with pages that were written way back in the days of HTML 1 or 2 (and the HTML5 parsing algorithm does!).
This has no effect whatsoever on your own ability to use and enforce a particular subset of the syntax that you prefer using. Yes, <input type=text> is allowed. That doesn't mean you have to use it, nor does it mean that it's somehow wrong or impossible for a validator to flag it. It's valid "vanilla HTML5", but it might be invalid "your preferred syntax". Just get a validator to consider it worthwhile to implement something that checks for your preferred syntax, or write your own validator.
Let me state this again, so it's clear: HTML5's standardization of a particular syntax *has no effect* on your own ability to restrict yourself to a subset that you like better. If you think that all this craziness is bad for code quality, just don't use it. Use your preferred XML or near-XML syntax and be happy. You are not somehow required to avoid />'ing your void elements, or to keep your attributes unquoted.
This is not a problem.
Chuck,
Absolutely, just strict rendering helps a little at least. π
mattur, pL, Tab,
I'm all for people choosing themselves. But, when it comes to working in organizations, re-use of code, code consistency, aggregation etc, more strict guidelines would definitely be welcome, and the reaction from the people I have met have been worry about this.
pL,
<blockquote cite="http://robertnyman.com/2009/11/27/the-html5-syntax-options-problem/#comment-613311">
It doesn’t matter what you think you’re writing. In the end what matters is how browser interprets it, and that hasn’t changed.
While I get what you are after, and agree in essence, never underexpect code guidelines and consistency, especially in larger projects.
Tab,
You are absolutely right with most of what you write. But, there definitely is a problem, and that is how HTML5 is perceived by a large amount of web developers. If they find the syntax options to be sloppy, incorrect or whatever, they will shy away from using it and just stick around with regular HTML or XHTML.
What my point here is that, sure, anyone can use whichever syntax approach they want, it is very very important to, at the very least, offer validation options or similar for people to get the strictness that made them love strict HTML or XHTML.
yeah, super strict W3C.org / WHATWG validator for HTML5 is the only option to hope for good code in the wild..
Constantine,
Yes, at least I believe it's one of the things we need to be able to ensure code consistency.
As a Web designer who has been using xhtml forever (well as long as he can remember) HTML 5 has many advantages but this lack of clean sytax and strict syntax is just crazy. Yes it's my OCD talking, I like things all neat and tidy. But also with new developers, if they aren't consistent in their code all you have is one giant jumble of crappy code.
Syntax should stay strict. IMHO
HTML5 rox (canvas, audio, video, etc.). XHTML (served as text/html – in fact: as a tag-soup) with Flash and other s#its like that, are for 'webmasters' buying and using windows with internet explorer. If someone uses HTML5 as HTML (and not as an idiotical X-HTML-S#IT) there is no problem to use it as beautiful, clear HTML without stupid slashses etc in the 'end' of tags. If you use HTML5 as application/xhtml+xml you only need extra namespaces etc. and all those slashes in tags. Problem of windows webmasters is: they do not understand, if thier x-html page works in IE, this page is not xml, not html and not xhtml! At all. It is just a soup of tags. Nothing valid.
HTML 5 is just the freedom: clear code (you write it with rules but in style you like), smaller html code sizes, code is simpler to understand (for devices/applications and humen). And the best: great semantic and structure conception for code (tags). Also good for understanding how xhtml was/is not properly understood. HTML 5 is GREAT COMPATIBLE with html 3.2, 4, 4.01, and all those ie-working-X-html-blah-blah…
Seth Goldstein,
Personally, I would like that too. But, I do understand the reasoning behind allowing other options as well.
I hate windows webmasters,
Well, even if you use the regular HTML-syntax you are talking about, consistency and code formatting is equally as important.
Tomek,
HTML5 is absolutely great, in many aspects. However, I believe developers need some way to ensure code consistency.
I think HTML5 is the first realistic spec for HTML ever. Ever since the beginning, HTML specs were ideal and had to be ignored in practice. No browser is ever going to implement strict HTML syntax, because if they did they'd be less compatible. So why bother speccing it if no one will use it? HTML5 may just be the first version of the HTML spec ever actually followed by the browsers. And finally having compatible, standards compliant browsers will revolutionize the web.
[…] 27: The HTML5 syntax options problem […]
Moving from xhtml to html5 is easy now. I wrote simple html5 converter π
Aside of your coments there is another problem. While xhtml syntax is backward compatible html5 iss not. It is not campatible with xhtml. Take this simple example.
"If you want to support us please put this code on your page:
[A HREF="link.html"]Support my friend[/A] – this will not work with xhtml.
Gaurav,
Yes, it probably is the most realistic specification so far.
mynthon,
Thanks for the tip.
And about being backwards compatible: first, it can be completely compatible if you write it in a strict XML form. But even if you don't, it will work in any XHTML page, as long as it's not served as <code>application/xhtml+xml</code>.