Friday, August 29, 2008

Pretty-print serialized DOM

Another great Mozilla feature: pretty-format a serialized DOM tree. The following code will serialize an entire web page and pretty-format the markup:

var serializer = new XMLSerializer( );
var str = serializer.serializeToString( document.documentElement );
var pretty = XML( str ).toXMLString( );

As mentioned in my earlier post about XMLSerializer, the XML you get isn't perfect: element names come out ALL CAPS for some weird reason. And you get a bunch of automatic entity substitutions, most of which you probably want, others of which will simply break things if you try to deserialize the text back into a DOM later. (Forget about easy roundtripping.) But overall, it's a really useful trick.

I was hoping maybe this trick would also (as a free bonus) pretty-format any embedded scripts inside CDATA sections, but of course no such luck. In fact, due to automatic entity substitution, <![CDATA[ gets converted to &lt;![CDATA[, which is hilarious in a sad kind of way.

Serializing DOM nodes to XML in Firefox

I keep having to re-teach myself this, so I might as well post it where I can always find it!

Let 'd' be any arbitrary DOM node. To serialize the node (and its descendants) via JavaScript:

var serializer = new XMLSerializer();
var str = serializer.serializeToString( d );

Having a top-level XMLSerializer object in Mozilla is so nice. So, so nice.

But sadly, the output from the serializeToString( ) method is not the kind of XML I'd like to see. Element names come out ALL CAPS whether or not that's what you want (and it's never what I want). It also converts the greater-than and less-than symbols inside scripts to their entity equivalents, even if you enclose your scripts in CDATA sections. To me, entity substitution inside a CDATA section makes no sense whatsoever.

Still, it's handy to be able to serialize a DOM tree. Even if the tags come out ALL CAPS.