2007-02-04

(DOM)Node.textContent and Node.innerText

You can safely skip the next two paragraphs if you're in a hurry; it's just warm air.

I'm the first to admit I thrive in Mozilla/Firefox land, but would defect in an instant if I could bring Firebug, AdBlock, the Filterset.G updater and the Greasemonkey (well, its user interface and privileged API methods; the rest of the user script architecture in Opera beats any other web browser, hands down) with me, immigrating into Opera land. (Then I would spend the next few years peeking at the neighbour's green grass, wishing I could bring with me the best features from that environment to the next. But software unfortunately doesn't really work that way.)

Why? Social and emotional reasons mostly. Having closer and better connections with the Opera dev team (that's got to be a first), and seeing how fervently they profile, chip off overhead and beat their code into a Japanese razor sharp blade, with an staggering eye towards standards compliance and lightning speed. See? It's all a lot of emotional goo about an inflated feeling based on knowing the people who work there and more than occasionally hearing about how they spend their time.

Amusingly, I wrote up this post in a somewhat too tired state to take note of what the W3C Node.textContent DOM 3 specifications actually said, believing Firefox had it wrong and Opera current did it right. It's the other way around, fortunately.

Here is how Node.textContent works in Firefox: you have a node, read its textContent property, and out you get all its text contents, stripped of all tags and comments. A <tt>Some <!-- commented-out! --> text</tt> tag would thus give you the string Some  text. (Your browser, identifying itself as ".)

The Firefox behaviour is very useful. I love it. But it is unfortunately not how the W3C defined textContent to behave. Don't ask me why. The standards compliant correct result would be Some commented-out! text.

IE6 and IE7 do not implement Node.textContent at all, but have their own Node.innerText, which predates the DOM, and behaves the same way (barring whitespace differences -- those two space characters in the middle actually end up a single one).

Opera implements both, the way each was presumably defined neither actually presently quite on target. :-) As MSDN does not really define very well what innerText does, though, Opera actually implements innerText the way the Firefox textContent works.

Firefox only implements Node.textContent, gets it wrong right, and ended up implementing a useful behaviour insteadeed. If Firefox eventually decide on fixing this bug (I will not urge them to hurry; indeed I don't think an issue has even been filed yet -- and the present behaviour has been with us for as long as I have known it, anyway), I really hope they would consider delegating the present behaviour to innerText, instead, as does Opera.

Safari implements neither (innerText returns the empty string, though!) but amusingly has an innerHTML property which behaves the way Firefox's textContent does. (Yay Safari! ;-)

All the above concerns the behaviour of the getters of mentioned properties, which is the bit I have most interest in myself, for the most part. It's a great way to scrape data free of markup from pages, client side and at minimum effort, for instance in user scripts. I do this a lot.

Fortunately, for myself, I still mostly write user scripts for my personal needs, so it does not really matter that there is still a ravaging non-consensus war about the BOM (the browser object model) going on out there, even with the W3C trying to make them all agree about something. Some days, like when the W3C event bubble / trickle model was designed, for better, some days, like when they got textContent wrong, for worse.

Are there any ambitious people out there who have set up automated BOM test suites running around the ins and outs of the BOM of its visitors, collecting the results and presenting a continuously updated index of their findings? I would love to chip in some money to a good project like that. And if there aren't, here is an excellent opportunity for web fame and recognition for someone. I wouldn't mind mentoring it.
blog comments powered by Disqus