The HTML5 syntax options problem

Usually blog posts with the words “problem, “considered harmful” and similar are just crying foul, but I would like to bring up something I actually believe is/will become a real problem: HTML5 syntax options.

Available syntax options

In strict XHTML, we needed lowercase tag names, quoting our attributes, and an attribute had to have a value (e.g. selected="selected"). We also needed to close all empty elements, e.g. meta, input, img etc, like this: <input type="text" />.

But in HTML5, all this goes:

  • Uppercase tag names
  • Quotes are optional for attributes
  • Attribute values are optional
  • Closing empty elements are optional

All of these examples are valid HTML5:

<DIV>Hello Friday</DIV>
<p id=abstract>Welcome to this blog post!</p>	

<input type=text>
<input type="text">
<input type="text" />

Also, some people are under the impression that the closing of elements etc (basically, XML-formatting rules), only apply if you serve a web page as application/xhtml+xml or application/xml. This is not true!. XML-style syntax is just as allowed in regular text/html, and some people prefer that syntax.

The choice of the syntax

I believe the WHATWG did the probably only viable choice, and decided to allow all the general syntax variations. When it comes to XML syntax or not, if they had chosen just one path, it would severely hinder the adoption amongst those who wouldn’t agree with it. When it comes to allowing uppercase tag names, no quoting of attributes etc, I believe it’s there to accommodate to code generated by CMSs and other tools.

And, at the end of the day, web browsers will generate the same DOM no matter what syntax one prefers.

The problem

As you can see with all these options, though, it is very hard to ensure that a consistent syntax is used for all HTML code, and we miss out on rigor and code quality. Basically, it is very hard to make something fail validation under these rules; we end up with an “Anything goes”-scenario. And after all these years of fighting for stricter syntax, better code practices etc, this is naturally not a good evolvement.

How do we get stricter syntax?

There are a few options to try and accommodate to this problem:

Change the spec
We could change the specification to say that XML-syntax is only allowed for application/xhtml+xml, and HTML-syntax, strict-style, otherwise. However, as mentioned above, I believe in people choosing the syntax type they prefer.
Evangelism
We could all try to evangelize a better syntax, just as we have done all these years. However, I believe this will never be enough: people will evangelize for different syntaxes, and whatever web developers choose in the end will validate (even a mix of syntaxes will).
Offering tools with different validation options
There could be an offering of tools, and the validator itself, to have options for a stricter syntax. Therefore, rather helping developers by code auditing, than forbidding things through a specification.

I believe the last option here is the only realistic one we could choose. Also, let me point out that I do not put an equal sign between a strict validation option and XML-syntax. For some it is, but for others it’s not. Stricter for me personally means good HTML 4-style: lowercase tag names, quotes and values mandatory for attributes and closing of all non-empty elements (i.e. all elements that have en end tag). Maybe there should be more than one option in the validator.

But is this enough? Can we continue to fight for sensible and best practices syntax where tools/validator options are sufficient? Or, have just (re)opened Pandora’s Box for the web?

Related reading

An Introduction to HTML5

Posted in Developing,HTML5/HTML/XHTML,Technology |

Leave a Reply

Your email address will not be published. Required fields are marked *