Monday, August 23, 2004

XHTML is not XML!!

WARNING: This blog entry was imported from my old blog on (which used different blogging software), so formatting and links may not be correct.

I was always under the impression that XHTML documents are XML documents - with HTML semantics for the tags. You know - must close all tags, tags must be nested, etc. But I recently discovered that that's not the case!

I had this weird bug in Creator that if you did "Preview In Browser" for the following document (not including html, body etc.), rather than have the browser show the text AFTER the text area, it showed up inside the text area!

<textarea/>Hello World

In particular, in XML you can use a minimized form, such that <foo></foo> can be written as <foo/> instead. But as it turns out, that's not always true in XHTML. In particular, some tags must always be minimized (such as br), and other tags can never be minimized - such as p, div, textarea, and friends. The definition of this is all written in one of the appendices to the xhtml spec.

In Creator we were using standard Xerces to parse and serialize the markup, but because of the above "feature" I can't do that anymore, since xerces will not correctly serialize tags as either minimized or not minimized based on the tag name. This was fixed in patch 1.

Of course, I'm still puzzled as to why Mozilla and other browsers choose to treat the fragment above such that the text shows up inside the textarea... perhaps this is some sort of quirks-mode handling which makes old html documents with errors show up correctly?

1 comment:

  1. The doctype by itself won't kick Mozilla into XML mode. For backwards compatibility, it uses the HTML parser if you serve pages with a text/html Content-Type; serve it as application/xhtml+xml and you'll get the strict, correct XML behavior.