XHTML and error handling
What I want to touch with this post is how errors are handled when XHTML is served the way it should be. Let’s, for the sake of argument, say that we want to write and deliver XHTML (not wanting to turn this into a discussion whether we should write HTML or XHTML).
First, some general background information about how to send documents to the requesting web browser. It’s all about the media type type, described in XHTML Media Types:
- HTML
- Should be sent with the
text/html
MIME type. - XHTML 1.0
- All flavors of XHTML 1.0,
strict
,transitional
andframeset
, should be sent with theapplication/xhtml+xml
MIME type, but may be sent astext/html
when it conforms to Appendix C of the XHTML specification. - XHTML 1.1
- Should be sent with the
application/xhtml+xml
MIME type; Should not be sent with thetext/html
MIME type.
So, what’s the difference? It’s that web pages sent as text/html
is interpreted as HTML while those sent as application/xhtml+xml is received as a form of XML. However, this does not apply to IE, because it doesn’t even understand the application/xhtml+xml
MIME type to begin with, but instead tries do download it as a file. So, no application/xhtml+xml
for IE.
Aside from IE‘s lack of support for it, and for what you need to consider described by Mark Pilgrim in his The Road to XHTML 2.0: MIME Types article, it means that when a web page is sent as application/xhtml+xml
while containing an well-formedness error, the page won’t render at all.
The only thing displayed will be an error message when such an error occurs. This is usually referred to as draconian error handling, and its history is told in The history of draconian error handling in XML.
My thoughts about this started partly by seeing many web developers writing XHTML 1.1 web pages and then send them as text/html
, and they were only using it because it was the latest thing, not for any features that XHTML 1.1 offers (this also goes for some CMS companies that have invalid XHTML 1.1 sent as text/html
as default in their page templates for customers to take after). Sigh…
It is also partly inspired by an e-mail that I got a couple of months ago, when Anne was kind enough to bring an error on my web site to my attention, with the hilarious subject line:
dude, someone fucked up your XHTML
What had happened was that Faruk Ates had a entered a comment to one of my posts where his XHTML had been messed up (probably because of some misinterpretation by my WordPress system), hence ending up breaking the well-formedness of my web site so it didn’t render at all.
Because of that, and when using it for major public web sites, I really wonder if that’s the optimal way to handle an error. Such a small thing as an unencoded ampersand (example: &
instead of &
) in a link’s href
attribute will result in the page not being well-formed, thus not rendered. Given the low quality of the CMSs out there, terrible output from many WYSIWYG editors, the “risk” (read:chance) of the code being valid and well-formed is smaller than of the code being incorrect. Many, many web sites out there don’t deliver well-formed code.
Personally, I agree with what Ben de Groot writes in his Markup in the Real World post. I prefer the advantages of XHTML when it comes to its syntax and what will be correct within it. However, Tommy once said to me that if you can’t guarantee valid XHTML you shouldn’t use. Generally, I see his point and think he’s right, but to strike the note Ben does, I can guarantee my part of it but there will always be factors like third party content providers, such as ad providers, sub-par tools for the web site’s administrator and so on. And for the reasons Ben mention, I’d still go for XHTML.
So, conclusively, I have to ask: do you think XHTML sent as text/html
is ok, when it follows the Appendix C of the XHTML specification? Do you agree with me that having a web site break and show nothing but an error if something’s not well-formed isn’t good business practice?
I disagree. You want XHTML because it is strict, but you don't want it treated as XHTML because it is strict. Make up your mind: do you want the strictness or not? If you do, then use content negotiation. If you don't, use HTML instead.
That doesn't mean table layouts, font elements, invalid code, etc. HTML 4.01 is semantically identical to XHTML 1.0. You can still write high quality, valid code that separates content from presentation if you use HTML.
I *do* think that XHTML as text/html is okay. I just think that it's pointless for the reasons most people use it, and that goes for this situation as well.
> Do you agree with me that having a web site break and show nothing but an error if something’s not well-formed isn’t good business practice?
It isn't good business practice in the same way letting customers know that you haven't cleaned your kitchen in a month isn't good business practice for restaurants – the fact that the customers find out isn't the real problem.
If you've let data into your system that breaks things, then that's a major failing. That it is only immediately apparent to visitors if you use a particular media type is only a symptom of the larger problem.
All input going into your system needs to be validated (in the larger sense; not in the SGML/XML sense). Failure to do so is a big weakness in your processes. If you aren't ensuring correct input there, how do I know you are ensuring correct input elsewhere, e.g. how do I know you aren't vulnerable to SQL injection attacks that can compromise my credit card data or whatever other sensitive information I might be submitting to your websites?
XHTML as application/xhtml+xml keeps you honest. If you take away the strictness, it fails to do that. So why bother with XHTML at all? Just use HTML.
If you are considering XHTML as application/xhtml+xml, but you are worried because you think "what if there are bugs that screw things up" or "what if I upload something that breaks things", then you are already doing things wrong. Debug your code. Put a process in place that makes sure you can't upload broken things. Because *those* are the real problems, application/xhtml+xml is only the thing that will expose them. Don't let it distract you from the real issue.
Jim,
The main benefit of XHTML is that it promotes best practices a million times better than HTML does. It is almost the embodiment of "best practices on the web" — as for most people, the MIME type issue is a non-issue and all that matters to them is writing semantic, valid code. Sure, that's technically possible with HTML too, but the psychological factor is totally absent in 99% of the cases with HTML.
Ben's post on it describes that very well. Hopefully, my upcoming article on this issue will do the same, but even more thoroughly.
I think the idea with XHTML sent as application/xhtml+xml is great, but I dislike the fact that one single error can cause a web page not being readable at all. :/ It's ok when you can master all the input data on your site, which you not very often can.
I will as soon as possible try to make my step and start serve XHTML web pages as XML data, but I'm afraid many developers will encounter problems with sending web pages as XML data (specially those who aren't as familiar with XHTML and XML as others are) and maybe don't even use XHTML "the right way" because of its disabilities (as some see it).
However, Jim has a point here.
<blockquote cite="Jim">All input going into your system needs to be validated (in the larger sense; not in the SGML/XML sense). Failure to do so is a big weakness in your processes.
But not everyone has the knowledge to implement this functionality and what happens then when a user forgets an ending tag when writing a comment in for instance wordpress? 😉
> The main benefit of XHTML is that it promotes best practices a million times better than HTML does.
How? XHTML and HTML are practically the same language.
> all that matters to them is writing semantic, valid code. Sure, that’s technically possible with HTML too
"Technically possible"? It's just as easy to write semantic, valid code in HTML as it is in XHTML. They have the same element types. They have the same attributes. As far as semantics goes, they are *identical*.
> the psychological factor is totally absent in 99% of the cases with HTML.
Which psychological factor? The one that arises from people repeatedly telling them that XHTML is semantic and HTML is not?
> I will as soon as possible try to make my step and start serve XHTML web pages as XML data
A good first step is to have alternative URIs you can access your website from that are served as application/xhtml+xml. Then you can run the two in tandem, with the application/xhtml+xml being private, until you are confident that you've ironed out the bugs. There's no need to switch suddenly.
> But not everyone has the knowledge to implement this functionality and what happens then when a user forgets an ending tag when writing a comment in for instance wordpress?
In these cases, it's a toolset problem. If WordPress lets people enter malformed comments and dumps them into the page content unfixed, then it's absolutely a bug in WordPress, just the same as Frontpage producing rubbish HTML is a bug in Frontpage.
If you are hand-coding and you don't have the knowledge to use XHTML properly, then stick to HTML. Otherwise you are just running with scissors.
Robert,
I absolutely think that it is OK to serve XHTML as text/html. After all, the w3c guidelines are the "law" here, and according to them it is OK, right!?
Faruk pinned down another top reason: the psycological factor!
If we think that most webdevelopers are doing things the wrong way, why should we point finger an say "you are still wrong, lamer!" as soon as they tries to use valid XHTML code? That's not very pedagogical, is it!?
I think the "migration" to write proper (X)HTML code is like Maslow's Hierarchy of Needs:
When you have food and warmth, you want a social life and someone to love…if you get my point…
But, when it comes to Jim's comment about the need for all input to be validated I think that he's right! After all, that is an issue that will affect not only HTML presentation, but also RSS-output.
Since all feeds has to validate as XML it's likely that you already have the same problems in this publishing channel.
HTML is based on SGML (as non-strict as you can get)
XHTML is based on XML (very, very strict)
It's the strictness of the syntax that promotes best practices tons better, not the actual tags/elements.
But HTML doesn't enforce you in any way to apply best practices to your code, whereas XHTML does.
The one that Ben discussed in his post that Robert linked to.
Yet another totally elitist attitude.
Nobody on the planet ever learned how to do XHTML right in one go. Neither did you. We all need to take things one step at a time, in this case that means learning XHTML one step at a time. You can't go expecting people to switch from old school, nested table-ridden, font tag-polluted HTML to clean, semantic, well-formed XHTML in one go. But that's exactly what you're saying, and that's very elitist of you.
I agree completely that people should make their tools ensure valid code output, but as the W3C Specification doesn't even require <code>application/xhtml+xml</code> for XHTML documents, there's no need whatsoever for anyone to make the entire jump in a single attempt. People will move to semantic XHTML through trial and error, learning things along the way. Keeping them at HTML will keep them away from promoting best practices effectively, and it won't help them feel like they're moving forward much. THAT'S the psychological factor in a nutshell, btw.
Thanks for your comments.
Inevitably, this discussion tends to lean about using <acronym title="HyperText Markup Language">HTML</acronym> or <acronym title="eXtensible HyperText Markup Language">XHTML</acronym>, although my main focus was if only showing an error message and nothing else is good practice, when something isn't valid.
Jim,
I understand where you're coming from here. And in a perfect world, that would be the only factor. But to be clear, my scenario is often that I write and deliver valid code. Enter .NET-based <acronym title="Content Management System">CMS</acronym> systems, third party content providers and other web developers/administrators that don't have the knowledge (or even interest) to do that.
When I leave a project, I make my best to make sure the code is valid; using <code>text/html</code> as an excuse for delivering invalid <acronym title="eXtensible HyperText Markup Language">XHTML</acronym> is not acceptable. But after I've left a project, I cannot guarantee that all these other factors will live up to it. Basically, I set the bar and make my best to influence all these other parts that a solution consists of.
Simply speaking, like the title of Ben's post: Markup in the real world.
I definitely agree that the tools used need to be shaped up, but that's only something that I can try to affect by pointing this out to the providers of these, I can't rewrite them myself. If I get to choose all the tools and parts in a project, naturally I wouldn't choose those who causes problems, but it is rarely up to me. A common project is made up by many people, tools, descision makers and so on, and what I can do is to deliver my part as good as possible.
My main point is: if something invalid is added to a project that once was valid, be it through a tool, a less knowing administrator etc, I don't think the way errors are handled is best practice. I'd like the option to deliver the content to the visitor anyway.
You know my stand on the XHTML 1.0 Appendix C debate. I think there's nothing wrong with it.
I've said it a couple of times and it was (still is) one of my primary reasons for (always) seriously reconsidering my point of view on the aforementioned debate. It's draconian. The error handling is bad. Period. I cannot get it into my head why anyone would want to enforce such a strict error handling. Error handling that does not let the user accomplish his/her goals. Error handling that does not give access to the requested information. Error handling being terrible on the usability front.
Why did the internet grow so much? It's open, accessible and allows everyone and their dog to start a website. Whether those websites are using valid markup or whatever, honestly, do you think anyone except the professionals care? Do you think the budget projects actually have access to the resources to will scan and prevent any ill-formed markup from appearing on the web? We must have graceful error handling if we want to see the web continue to grow as much as it has done before.
If we want the current error handling to succeed, I recommend we should stop writing markup by hand.
Jeroen,
Thanks for your input.
I think you make valid points, and I agree that we need a more graceful error handling. Not to promote writing sloppy code, but if things go wrong the user shouldn't have to see a major error message across the screen, but instead be handled in a more user-friendly way.
XML syntax is strict for a reason, once parsed, it's very easy and fast to navigate the structure of the document and extract what you want from it. Be it a browser, a bot, a transformation, or whatever.
The problem, historically, is that browser makers have allowed sloppy code which encourages two things: laypeople to build websites and, unfortunately, lots more sloppy code. 99% of the Web is built from crap markup.
If you could wave a magic wand (an HTMLTidy bot perhaps?) and fix all that code, imagine how much better, faster and more powerful browsers would become. Why? Because, and this is only a wild guess, most of the programming resources thrown at building a browser are thrown at dealing with invalid markup. Huertistics, what to do when encountering this situation, that situation…
Display error messages when the code is invalid? You damn straight! Image a C compiler that said, oh well, I'm really not sure what the programmer meant when he fucked up that chunk of code so I'll just take a guess…
Douglas,
Thanks for throwing in your two cents.
Generally, we're on the same page. Of course <acronym title="eXtensible Markup Language">XML</acronym>-based languages should be well-formed, and web browsers and the web would've been such a better place if well-formed and valid code had been a must.
But unfortunately we aren't there today, and many, many tools etc don't deliver well-formed code.
Generalizing as hell now, but the internet is about spreading information. Do we want a visitor to miss out on information just because a flaw might have sneaked in somewhere (for all I know, it could've been the human factor and not the tools that caused it)?
I agree with delivering XHTML as text/html at this time under certain conditions. One being that I'm not sure IE is the only browser to choke on XML… there must be more. Also, as was mentioned, its a gradual migration and we're all at different points of the XHTML / XML learning curve.
I'm definately in favour of the idea we should have error handling in XHTML but isn't that already a run race – is it changable at this point in time? But yes I think it would be a good thing – being a programmer the idea of graceful error handling appeals a lot.
I'm just wondering if that is what XHTML was for though. I mean, and I may be entirely wrong, error handling in the programming sense means some programming structure saying if this happens then do this or else do that. Meaning XHTML would have to become JavaScriptish or something. Does that make sense? Like I said I may be entirely wrong here but as far as I can conceive of it's implementation there needs to be that mechanism.
Unless you mean simply that browsers should just show the page if it's broke even if delivered as XML… mmm that one's hard. I'd poke my foot in the water gently and suggest XML is the way it is because it is what it is… ummm and run. Cheers.
I think sending XHTML as text/html is just fine, and still have yet to hear a good argument against it. So many zealots say you should send your docs as HTML if they are not application/xhtml+xml, but there is no good reason why you shouldn't send them as XHTML (text/html).
As for XHTML error handling, I think it is hurting the adoption of XHTML and will continue to hurt it as RSS and different forms of data sharing become more and more popular. If your website includes a feed from del.icio.us or flickr or any other web service, you will have to worry about their feeds being valid, and if they aren't valid, you'll have to write a custom handler for each specific feed, which many users won't be able to do.
So then many users may have to make a choice: serve invalid content (as text/html) or don't have cool content provided by 3rd parties.
1. Yes, it's ok to send it as text/html when it conforms to appendix C. Why? Because that's what the standard says and if we don't go by it…then it's just not a standard anymore.
2. Yes and no. It's an XML syntax and therefore MUST go by the XML rules. It's not some separate language that people can just invent their own error concepts for. If a business needs or wants that strictness, then it's good for their business and therefore good business practice. If they don't or can't handle it, then it's good for their business to use HTML.
I think it's time people start asking a different question — How come nobody ever has HUGE issues with error handling in MySQL, PHP, Javascript, Perl, C++, etc, etc? They're just as draconian, and some of them are used SUCCESSFULLY by "non-programers".
<blockquote cite="http://www.robertnyman.com/2005/06/26/xhtml-and-error-handling/#comment-558">
My main point is: if something invalid is added to a project that once was valid, be it through a tool, a less knowing administrator etc, I don’t think the way errors are handled is best practice. I’d like the option to deliver the content to the visitor anyway.
There's your answer. You require HTML, it makes good business practice for your particular need.
I was recently pondering this issue as well.
I think this is fine as long as its a temporary fix. For instance, this is an idea I'm thinking of implementing using php & content negotiation…
If a well-formed error is encountered the page will be served as text/html so if any users come to the page, they will not get an ugly error. When this error is detected, an email is sent out to the webmaster notifying them of the bad markup so it can be fixed as soon as possible.
Any type of error is not good for business, whether its a 404 or a database error. The best thing we can do is display a user-friendly error message and get the problem fixed as soon as possible.
nortypig, Geoff, Devon, John,
Thank you for sharing your opinion.
nortypig,
I don’t know what would be best. Probably have some error handling, but I don’t know where nor how it should be implemented. If not that, I guess just serving it anyway with its errors.
Well-formedness errors doesn’t automatically mean that the web site doesn’t work.
Take, for instance, a visitor trying to find some vital information on a medical web site. For him/her an unencoded ampersand isn’t really the issue, they just want the information.
Geoff,
Basically, yes. There’s no way you will ever have a 100% control over third party content providers.
Devon,
I think it’s a huge difference. In those scenarios where the errors you mention occur, you (should) have proper error handling to show it gracefully to the end user. With application/xhtml+xml we don’t get that option.
I don’t want to repeat everything said before here, but for the reasons mentioned above and in Ben’s post, I generally think XHTML is the best way to go. But also, it isn’t always my discussion, it might be a business decision that I have to handle the best way I can.
John,
It’s an interesting concept you describe, and to me it sounds like the ideal way to handle it with the tools and options we have available right now!
If you cannot guarantee valid HTML, you shouldn't use it, either! […]
Jens,
That's a bit of simplifying, isn't it?
If you don't want to use <acronym title="eXtensible HyperText markup Language">XHTML</acronym>, I respect you for that.
The reasons for me wanting to use it (when it's my decision) has been stated above.
But my main point here is the error handling: I don't think it's professional not to be able to offer the user a friendly error message.
First of all I think that <acronym title="eXtensible Hypertext Markup Language">XHTML</acronym> can be delivered as <code>text/html</code> because the <acronym title="World Wide Web Consortium">W3C</acronym> allows it. But …
… this clearly indicates a widely ignored problem: <acronym title="Web Content Accessibility Guidelines">WCAG</acronym> are for Web Developers and <acronym title="Authoring Tools Accessibility Guidelines">ATAG</acronym> for developers of Content Management Systems and other authoring tools. They need to take care of valid output, error handling and stuff like that. Not the Web Developers. It's not that we have only one standard here, it is three of them and they are equally important.
Ansgar,
Absolutely.
But so far, as far as I know, <acronym title="Authoring Tools Accessibility Guidelines">ATAG</acronym> haven't really had any success.
The content delivered today from many (read:most)<acronym title="Content management System">CMS</acronym>s are really below all standards.
<blockquote cite="http://www.robertnyman.com/2005/06/26/xhtml-and-error-handling/#comment-568">
I think it’s a huge difference. In those scenarios where the errors you mention occur, you (should) have proper error handling to show it gracefully to the end user. With application/xhtml+xml we don’t get that option.
Why do you think there shouldn't be proper error handling in XHTML? If there's errors in it, then an XML parser cannot parse it. That's critical. It's no different than a PHP (or javascript) parser that konks out when you mistype some code or leave out a semicolon or quotation mark. Both PHP and XHTML wouldn't show anything but an error message to the end user. I just don't grasp why this is ok for a script but not ok for a markup language. I'd like to know more people's thought on this subject.
But that also beings me back to wondering why anyone would want to use XHTML just for the sake of using it. 'cause obviously, if it's not the right language for them, then why are they using it?
I don't know. I'm am XML guy, so it all makes good sense to me. I feel the draconian error thing, helps me out and makes my work more efficient.
Devon,
First, I like strict parsing that tells me when something goes wrong. And I'm not promoting that we should deliver invalid <acronym title="eXtensible HyperText Markup Language">XHTML</acronym>.
What I'm addressing is that, in the future (since it's impossible to have a 100% control over the content added by third party content providers, the customer's own web developers and so on), if something goes wrong the current error handling isn't suitable for its task.
For instance, when you get an error in JavaScript you can use try…catch to avoid throwing the error in the user's face.
If something goes wrong in your PHP code, you can catch that and send a styled text page to the user explaining the error.
With <acronym title="eXtensible HyperText Markup Language">XHTML</acronym> sent as application/xhtml+xml you can't do that, the only thing the user will see is a big <acronym title="eXtensible Markup Language">XML</acronym> error.
To me, that's not user friendly nor good for your business.
Robert, a very interesting discussion. Here are my thoughts.
You shouldn't use XHTML sent as application/xhtml+xml if you cannot guarantee that the content is valid. Instead use HTML or XHTML sent as text/html.
If the XHTML (sent as application/xhtml+xml) is dynamically generated you might have to validate the XHTML before you send it to the recieving browser. If the XHTML is invalid it has to be corrected before it is sent.
The big question is how to correct the XHTML that has been generated by another software than your own? Even though the external software guarantees to generate valid XHTML bugs will see to it that invalid XHTML is also generated. This is a big problem.
bodaniel,
Thank you!
Yes, this is a major question. Sure, you can add tools to find the errors, but correcting them automatically is a whole other question.
Let's add my voice to the discussion, as I was instrumental in setting it off.
Personally, I would say XML error handling actually would be better. If you write/generate faulty code, that should be detected as early in the process as possible. If all user agents would throw error messages with the slightest mistake in a document, everybody would try to prevent errors, and the world would be a better place.
Thing is, we started off on the wrong foot. Early HTML practice allowed for error handling in user agents in a way that would show as much of the document as possible. This promoted laziness in markup coders, resulting in a lot of tag-soupy business. In my opinion, that is bad practice.
But we have to deal with this in the real world. Most of our tools are written with HTML and its 'lax' error handling in mind. That is why I called HTML a lazy standard. But to convert fully to strict XML is often not practical, and from a business point of view is plain madness unless you use XML tools all the way (and if you use third-party content like ads, that is often out of your hand).
Ben,
Thanks for participating in the discussion!
I agree wholeheartedly that the web would've been a better place with strict error handling, if every piece in the chain had been forced to be that from day one.
But unfortunately, as you point out, that's not the case, so we have to deal with the current situation.
However, it makes you question: How do we get all tools, ad providers etc to produce well-formed code? What's their incentive?
Geez, isn’t that like saying since I can’t guarantee I won’t get rear-ended while driving, I’m not going to drive anymore.
Bugs are a fact. They happen. Thats why java, c++, php and virtually all languages have some built in functionality to catch errors.
Exactly. XML can’t do that. Very well put. Its like perhaps a new Error code is needed. Error 40# Well-formedness error
Ben makes some great points as well. The internet has a long way to go. There’s alot of bad code floating around. Hopefully what we’re seeing now is the start of a huge movement and adoption can only increase with time.
John,
Thanks for your comment.
And yes, <acronym title="eXtensible Markup Language">XML</acronym>, in my opinion, needs some kind of fallback error handling.
John, you can validate your XHTML using PHP, Java (try JAXB)… already. When using JAXB an exception is thrown if the XML document is invalid. Or do you mean that you shouldn't have to catch the "XHTML-errors" on the server-side? What du you expect that the browser should do when you send invalid XHTML? You could maybe send one extra XML document to the browser telling what the browser should do (show error message, redirect the user) if the next document (XHTML) includes errors. But what should happen if the first XML document is invalid?
I just don't get it. First people advocate switching to XHTML because it's "strict" and HTML is not. Then they complain about not being allowed to write sloppy markup. I don't get it.
And what's with the silly comparisons with programming languages and exception handling? Exception handling in, say, C++ handles runtime exceptions. It doesn't catch compile-time errors. But an XHTML document is static. It is not executed and modified in real-time, so there is no need for runtime exception handling.
One alleged advantage of using XHTML is that it can be parsed with a fast, lightweight XML parser rather than a slow and bloated SGML parser. Right. Now you want to add error recovery to XML … so we'll get slow and bloated XML parsers.
XHTML should be generated by tools that guarantee well-formedness. That's very easy: create a DOM tree and output the string representation of it. Skilled developers with a Type-A personality may enjoy hand-coding XHTML as an alternative to sky-diving, but it's not for our regular John or Jane Doe.
If you need a markup language that allows for sloppy coding, you'll be happy to hear that it already exists. It's even very well supported by all major and minor browsers. It's called HTML.
Tommy,
Thanks for your comment.
That's not what this post/discussion is about, at least not to me. I do promote valid code and I do not want to, in any way, promote sloppy markup.
But what this discussion is really about is that it's not just about you as a web developer or your team.
It's about external factors, such as tools, third party content providers and maybe some administrator at the customer's doing some maintenance work later on, that might (note: might, not should) end up in code that isn't well-formed.
And if that happens, I personally think that just showing an <acronym title="eXtensible Markup Language">XML</acronym> error message isn't good for your business nor the end user experience.
However, I definitely agree that there are ways to validate the document on the server-side before it's sent to the web browser. And in that case, it's probably better to load the document into an <acronym title="eXtensible Markup Language">XML</acronym> document and validate it, than to have built-in support for this in the <acronym title="eXtensible Markup Language">XML</acronym> parsers.
But how many out there actually do this kind of well-formedness check?
Tommy, i agree.
The problem with "create a DOM tree and output the string representation of it" is that you'd rather use a technique like JSP, dot-ASP (or whatever MS call it today) and separate the presentation layer from the business logic. The XHTML is otherwise mized up in a lot of Java or C++.
Does dot-ASP guarantee that the resulting XHTML is valid (exception thrown if something is invalid), or doy you have to create your DOM object separately (using maybe C++) and add content to it? I don't think JSP guarantees valid XHTML output.
bodaniel,
ASP.NET definitely doesn't guarantee (or deliver) valid <acronym title="eXtensible HyperText Markup Language">XHTML</acronym> out of the box. Please read more about that in
How to generate valid XHTML with .NET.
But you can't just take third-party content and include it in your page without validating or processing it?! I guess I still don't get it.
Third-party content should be received as XML, transformed (e.g. through XSLT) and imported into your DOM tree. If you are going to use content that is out of your control as-is, you need to use an OBJECT or IFRAME element and simply load the external content.
bodaniel, not at all, I'm suggesting there should be both. I think a server like apache should return a server error similar to a 404 which allows one to create a custom error page. Server side checking should also be available, which bodaniel says there is.
The only point I was trying to make is, I agree with Robert, an xml error message is bad for business. Lastly, errors can a may happen and developers should know how to and properly plan for them.
John, I don't think I'm convinced about "there should be both".
I think that developers should know how to generate valid XHTML. The developer should see to it that no invalid XHTML is sent to the receiving browser.
Static XHTML should be generated using an editor that guarantee valid XHTML. If you don't use that kind of editor you should validate the edited XHTML before it is made public. No validation should be performed by the web server during requests.
When you dynamically generate XHTML you should also use software that guarantee valid XHTML. The problem is that I don't know if web servers running common server side scripts like PHP, JSP, ASP etc are able to guarantee valid XHTML yet. It would be nice if an invalid-xhtml-exception was thrown if my JSP generates invalid XHTML. Then I would know that no invalid XHTML is sent to the browser. This is maybe the functionality that you are missing as well?
Tommy,
and
I agree with you here, that is the way to go. But the problem is that many (actually most of the ones I’ve met) System Developers don’t care about delivering valid and well-formed XHTML. And if you’re employed at a company, working inhouse, you have the possibiliy to affect the decisions and influence the other web developers to a much greater extent than I can, with me working as a consultant with web developers from the customer and from other consultant companies.
What this means is that my main goal is that the code being deliverered is well-formed when I leave the project, but I cannot guarantee that any of the other web developers or one of the customer’s employees won’t mess it up after I’m gone.
Yes, that’s a good idea. But sometimes you want this content to have a dynamic size depending on its content, and then you can try to either set its size with percentage values or use scripting to resize it after it has loaded. Neither sounds optimal to me (as a side note, I do use the
object
tag for the Google ads on this web site, so I promote it when it’s an option).Conclusively, I don’t think it’s good business sense to stubbornly hold on to application/xhtml+xml that will result in a non-functional web site if a, to me, minor well-formedness such as an unencoded ampersand sneaks its way in.
But I agree that if you’re doing your job correctly, you should validate content on the web server to make sure the code is well-formed before you send it to the visitor. Then you have the option to instead send them a friendly error message/page instead of the big XML error message.
bodaniel,
Couldn’t agree more. Regarding editors, when it comes to the WYSIWYG part, I discussed that in WYSIWYG Hell.
I don't like XHTML at all.
Every well-informed web-developer I know has realized that XHTML has become a childish markup language.
Seriously, how many of you aren't using XHTML – just because of the `cool` little X in front, or maybe because it's required to end elements without end-tags?
If you're not in the section above, you probably like the `strictness` which XHTML gives you.
Well then, why not just use a stricter <abbr>DTD</abbr> of HTML 4.01?
I for my sake at least don't understand why someone would go through the haze of content-neogation, just to get an error-handler which destroys the site if containing errors.
I used XHTML before. But at the same time I like to give my commentators full HTML possibilities.
And when someone with too little HTML experience came in and wrote a comment, the whole site crashed.
I would much more like to have a non-valid HTML document, than having a XHTML document that won't display because it contain errors.
So my five cent: If you don't have any need for XHTML in form of MathML etc..: Don't use it 😉
Henrik,
That was at least five cents (as opposed to the normal two). 🙂
I think (read:hope) my reasons for using <acronym title="eXtensible HyperText Markup Language">XHTML</acronym> has been declared above and in Ben's post.
But sure, of course you're right in one sense: there's always people who jump on the bandwagon without knowing (or even caring) about the consequences.
Personally, I think everyone should go with what feels best for them, be it <acronym title="eXtensible HyperText Markup Language">XHTML</acronym> or <acronym title="HyperText Markup Language">HTML</acronym>. But I do recommend using a strict doctype no matter what your choice is.
Markdown and Textile can help. They output nothing but valid XHTML. More CMS should use them.
Gabriel,
Thanks for the suggestions! 🙂
Faruk,
> It’s the strictness of the syntax that promotes best practices tons better, not the actual tags/elements.
> But HTML doesn’t enforce you in any way to apply best practices to your code, whereas XHTML does.
Which best practices are enforced/promoted by XHTML served as text/html?
> Nobody on the planet ever learned how to do XHTML right in one go. Neither did you. We all need to take things one step at a time, in this case that means learning XHTML one step at a time. You can’t go expecting people to switch from old school, nested table-ridden, font tag-polluted HTML to clean, semantic, well-formed XHTML in one go. But that’s exactly what you’re saying, and that’s very elitist of you.
Don't put words in my mouth.
If you have old-school, nested table-ridden, font tag-polluted HTML, then there are many intermediate steps you can take without switching to XHTML that will improve the quality of your code.
You don't have to switch to XHTML. For the people who can't do XHTML right, there are no benefits. If you switch to XHTML for the sake of "clean code" or "best practices", and you start writing invalid XHTML, then you've taken a step backwards, not forwards. Pointing this out isn't elitist, it's using common sense.
XHTML isn't a club all the smart people join, it's a tool. What I am saying is that you should use the right tool for the job. For people who cannot do XHTML correctly, XHTML is the wrong tool for the job and HTML 4.01 is the right tool for the job. How on earth is that elitist?
> Keeping them at HTML will keep them away from promoting best practices effectively
How? Does HTML prevent them from writing valid code? No. Does HTML prevent them from using CSS for layout? No. Does HTML prevent them from writing accessible pages? No. HTML is not incompatible with and doesn't discourage best practices.
Robert,
> But to be clear, my scenario is often that I write and deliver valid code. Enter .NET-based CMS systems, third party content providers and other web developers/administrators that don’t have the knowledge (or even interest) to do that.
If you have to deal with other people's bad code, then XHTML is simply the wrong tool for the job.
Jeroen,
> I cannot get it into my head why anyone would want to enforce such a strict error handling.
Because web authors broke Postel's Law. Because developers of authoring tools broke Postel's Law. Postel's Law only holds up when both parties respect it.
Robert,
> I think it’s a huge difference. In those scenarios where the errors you mention occur, you (should) have proper error handling to show it gracefully to the end user. With application/xhtml+xml we don’t get that option.
We don't get that option with JPEGs, GIFs, PNGs, MPEGs, or pretty much any other media type that is directly presented to the end-user without opportunity to catch errors. Nobody's asked for a JPEG parser that can automatically recover from corrupt images.
Tommy Olsson wrote:
Not quite. A document can be altered many times after loading by JavaScript. A single error there may render a valid document invalid. Also the content of a frame might be refreshed, bringing new and possibly invalid code into the page. XHTML documents are, like HTML ones, dynamic.
This post interests me because I have just made a round of XML tests (see XML Browser Differences). I’m delighted to say that an error in an XHTML page only gives a blank yellow screen in one browser – Firefox. In Opera the document is parsed and displayed! (The error message appears underneath.)
What’s more, IE6 also does the same, that is for XML documents. I was very surprised by this. Good news for the user, unless they use Firefox!
In my opinion, the browser should attempt to recover the document as far as possible, so at least something is displayed. After all, a single fault is not likely to affect the rest of the content enough to warrant not showing any of it at all. (Or perhaps I’m wrong in certain cases.) XHTML should therefore be no different than HTML. Browsers are now very good at fixing bad markup. Of course then we’re left with the argument about encouraging such markup by the browser fixing errors.
By the way, you might want to fix those typo and case errors in the main post… 🙂
Chris,
if you use javascript to update the XHTML you have to see to it that the javascript doesn't make the XHTML invalid.
I thought XHTML was the next step towards fast rendering browsers, that shouldn't have to guess what your messy HTML actually tries to say? If the browser has to fix every error in the XHTML document you would also need recommendations telling how errors in XHTML documents should be handled. Try to think of all possible errors that can occur and what the error handling code would look like. What should the browser do if it is within an ordered list and finds the end of a heading?
Jim,
Yes and no. For me, to write valid code and then present the flaws to the people responsible also helps me in pushing them toward demanding more from their tools and the content providers.
I'm not sure an error parser would be the best thing, I don't know which approach would be the best. Hence the discussion. 🙂
But, as Chris mentions about Opera (I haven't tested this myself), to show the error below while still displaying the web page sounds like a much better approach to me.
Chris,
Typos fixed, thank you. However, I'm not sure about what case errors you refer to?
bodaniel,
For a parser to go through valid <acronym title="eXtensible HyperText Markup Language">XHTML</acronym>, it should, theoratically, be faster than looking for optional tags.
But then we come back to those cases where the code isn't valid, and how the web browsers should handle that…
Robert,
I think the browser should display an error message if the XHTML isn't valid. The cryptical part of the error message would probably be "Unexpected element X found …" or some other common validation error message. It would be nice if the main part of the error message was easily understood by all users. The user should also be told what he/she should do to get back to where he/she came from.
I don't think the browser should have to guess what the content of the invalid XHTML actually should have been.
One thing I really would like is a better XHTML validation support in server side scripting languages (eg JSP). The XHTML produced by the scripting language should automatically be validated while it is beeing generated and if the validation process finds an error the user should be redirected to my JSP error handling page where I, as a programmer, get the chance to find out about the error and can choose what to display for the user. If I am unable to output valid XHTML even on the error handling page, I suck. It might sometimes be a good idea to output HTML instead of XHTML on the error handling page.
bodaniel,
Yes, built-in validation in the server-side languages would be a nice thing. Of course you can build it yourself, load the output into some <acronym title="eXtensible markup Language">XML</acronym> object etc, but I for one would like it as an easier feature to use.
I think a good approach the browser developers could use is to show an error message but give options on what the user wants to do next. For example, it could say whatever error, and provide buttons for "View Page Anyway", "Go back" and anything else which would be worth doing.
This way, it would allow the user to get to the information if it is important, but also make it clear something is not as it should be, thereby not promoting/allowing bad markup (ie. like HTML did).
Checking X(HT)ML for well-formedness on the server side is possible using PHP. I am currently writing it into my upcoming site. The xml check class at this site is useful, especially combined with the output control system in PHP, by using the callback function. It would allow you to check the markup before sending it out, and if there is a problem, you can send out a different friendly error page to the user (without well-formedness errors if this page is in XHTML too….!!).
Splash!,
That was an interesting approach! Me myself I'm not sure if the users should/are capable of making that decision or not. Many users just get scared away by messages about errors that require of them to make a call about what to do next.
Thanks for the tip about the <acronym title="eXtensible markup Language">XML</acronym> check class!
[…] en Activating the Right Layout Mode Using the Doctype Declaration by Henri Sivonen XHTML and error handling by Robert Nyman
[…]
Your post got me interested enough to look around some of the more popular *standardista* sites, and most I know of are using xhtml transitional doctypes, and none I visited (except Tommy's dead parrot) are serving as app xhtml/xml (according to Firefox Mac)?
A little disappointing?
Oops, sorry, meant to post in most recent HTML/XHTML article!
Steve,
Well, its' ok; I rather think this was the place to post such a comment.
Regarding serving XHTML 1.0 as <code>text/html</code> or <code>application/xhtml+xml</code>, I can live with that. But I have to say it's disappointing to see so many using the Transitional doctype.
[…] Validity, Guidelines and Law A principled argument Accessibility Only For Disabilities? XHTML and error handling Matt Haughey – Bloggers on Blogging TSN.ca: Reloaded […]
[…] n Kesteren Activating the Right Layout Mode Using the Doctype Declaration by Henri Sivonen XHTML and error handling by Robert Nyman
One Respo […]
[…] Kesteren Activating the Right Layout Mode Using the Doctype Declaration por Henri Sivonen XHTML and error handling por Robert Nyman
[…]
[…] Validity, Guidelines and Law A principled argument Accessibility Only For Disabilities? XHTML and error handling Matt Haughey – Bloggers on Blogging TS […]
Question:
What happens, say for example you create a document using HTML and you make a mistake in it, for example you forget to put the '/' beofre the closing > in so you would do:
like you would of done in HTML but the page is an XHTML page? what would happen? in relation to viewin the page?
Laurence,
If the HTML document were sent with the <code>application/xhtml+xml</code> MIME type, it would not render at all.
[…] XHTML and error handling por Robert Nyman […]