2006-03-05

E4X and the DOM

I haven't covered E4X (short for ECMAScript for XML; ECMA-357 specification) much here yet, but I have been experimenting with it for a while now in Greasemonkey scripts, and got to a point where I feel I have some findings to contribute.

I'm not going to go into details about the splendour of the E4X design nor explain basic concepts; Jon Udell gave a short introduction in September 2004, and I'll be referring to a few other good articles about it later in this post too. Put short, though, it makes XML nodes or trees first class objects (just like numbers, strings or RegExps), using XML as the literal syntax, and adds terse, readable and expressive syntax to perform various slicing and dicing operations on these objects sharing many common traits with XPath.

On a ranty side note, it's what the DOM APIs should have been in the first place, had they not been plagued by Javaisms such as naming the most basic and frequently used method document.getElementById, for a whopping 23 letter name. People who write I18N and L10N for Internationalization and Localization should not use the DOM. (Or set themselves up with d21d(), d27e(), and so on, aliases.) John Schneier makes a less emotional comparison between XSLT, DOM and E4X in Native XML Scripting with E4X, proceedings of the XML 2005 conference and sums up his conclusions in corporate speak too at the end. Again, put short: E4X is about productivity and readability.

As those of you who have been following the E4X field might know, there has been some support for it in Firefox for quite a while now (Kurt Cagle describes some basics in June, 2005), and returns in a later presentation, Advanced Javascript (subtitled "E4X in Firefox 1.5"), of which I'd like to quote the killer misfeature of the current state of affairs:

Object created is NOT a DOM Node, but an E4X node.

Which means that while E4X nodes are first class objects, that doesn't mean you can pass them to the DOM APIs; no node.appendChild( <img src={url}/> ) yet. (But had it worked, that code would have been a drop-in replacement for var img = document.createElement('img'); img.src = url; node.appendChild( img ); -- expressive indeed!) ...So while you can do lots of really nifty XML operations without resorting to messing with XPath through clunky DOM APIs, before you inject the results anywhere, it's falling back to the old and ugly node.innerHTML = e4x.toXMLString() injecting by string representation. Eww.

Or maybe not.

I sent out a plea for help to the Greasemonkey list, and some time later encountered a resourceful post by Mor Roses, where he tossed up an importNode method that translates E4X nodes to DOM nodes for a specific document object. Here is my take on it:
function importNode( e4x, doc )
{
var me = importNode, xhtml, domTree, importMe;
me.Const = me.Const || { mimeType: 'text/xml' };
me.Static = me.Static || {};
me.Static.parser = me.Static.parser || new DOMParser;
xhtml = <testing xmlns="http://www.w3.org/1999/xhtml" />;
xhtml.test = e4x;
domTree = me.Static.parser.parseFromString( xhtml.toXMLString(),
me.Const.mimeType );
importMe = domTree.documentElement.firstChild;
while( importMe && importMe.nodeType != 1 )
importMe = importMe.nextSibling;
if( !doc ) doc = document;
return importMe ? doc.importNode( importMe, true ) : null;
}
To make it more pragmatically useful, I tossed up two helper methods, appendTo and setContent, both of which take an E4X structure and a target node parameters, and injects your XML at the end of the node. The latter method, in addition, starts by removing any prior contents of the node:
function appendTo( e4x, node, doc )
{
return node.appendChild( importNode( e4x, doc || node.ownerDocument ) );
}

function setContent( e4x, node )
{
while( node.firstChild )
node.removeChild( node.firstChild );
appendTo( e4x, node );
}
So it's not node.appendChild( <img src={url}/> ), but appendTo( <img src={url}/>, node ). (Prototype fans may of course opt to add these methods to Node.prototype instead, laughing potential naming collisions with external libraries in the face, that aspect being an inherent feature or plague of the language design.)

For a real-world code example, I'm making extensive use of this in my recent Mark my links tool (version 1.7 source code).

Greasemonkey script writers out there might want to know that it is not a perfect translation, though useful for most purposes -- the tagName property of the resulting nodes are not upper case, the way they for some reason are in HTML documents, so if you want to play under the radar of target page code looking for i e IMG elements, you might need to perform some additional trickery. I just kludged a case of that using unsafeWindow.Image.prototype.__defineGetter__( 'tagName', function(){ return 'IMG' } ) -- I'm sure there are nicer ways too.

For some reason, I don't see any reports of parse errors in scripts where the E4X literals contain malformed XML though, but rather get plain non-functional scripts, which seriously hurts debugging. I have yet to find out whether it's due to some flaw of Mozilla core, Greasemonkey or my local firefox installation. Somehow I suspect the latter most; let's hope I'm right about that.
blog comments powered by Disqus