Sunday, September 27, 2009

The role-based favicon, and why Novell patented it


How many of these favicons can you identify? (From left to right: Gmail, Google Calendar, FilesAnywhere, Twitter, Y-Combinator, Reddit, Yahoo!, Picasa, Blogger)

Last week (excuse me a second while I tighten the straps on my tomato-proof jumpsuit) I was granted a patent (U.S.Patent No.7,594,193, "Visual indication of user role in an address bar") on something that I whimsically call the rolicon.

In plain English, a rolicon is a context-sensitive favicon (favorites icon) indicating your current security role in a web app, in the context of the URL you're currently visiting. It is meant to display in the address bar of the browser. Its appearance would be application-specific and would vary, as I say, according to your security-role status. In other words, if you logged into the site in question using an admin password, a certain type of icon would appear, whereas if you logged in with an OpenID URI, a different icon would appear in the address bar; and if you logged in anonymously, yet a different icon would be used, etc.

Now the inside story on how and why I decided to apply for this patent.

First understand that the intellectual property rights aren't mine. If you look at the patent you'll see that the Assignee is listed as Novell, Inc. That's because I did the work as a Novell employee.

Okay, but why do this patent? The answer is simpler than you think (and will brand me as a whore in some people's eyes). I did it for the money. Novell has a liberal bonus program for employees who contribute patent ideas. We're not talking a few hundreds bucks. We're talking contribute ten patents, put a child through one year of college.

I have two kids, by the way. One is in college, using my patent bonuses to buy pepperoni pizzas as we speak.

Now to the question of Why this particular patent.

Novell has two primary businesses: Operating systems, and identity management. On the OS side, Novell owns SUSE Linux, one of the top three Linux distributions in the world in terms of adoption at the enterprise level. This puts Novell in competition with Microsoft. That competition is taken very seriously at Novell (and at Microsoft, by the way). Perhaps it should be called coopetition at this point. You may recall that in 2006, Novell and Microsoft entered into an agreement (a highly lucrative one for Novell: $240 million) involving improvement of the interoperability of SUSE Linux with Microsoft Windows, cross-promotion of both products, and mutual indemnification of each company and their customers on the use of patented intellectual property.

Novell continues to take an aggressive stance on IP, however, and would just as soon keep ownership of desktop, browser, and OS innovations out of the hands of Redmond.

As it happens, I was on Novell's Inventions Committee, and I can tell you that a lot of attention was given, when I was there, to innovations involving desktop UIs as well as UI ideas that might pertain to security, access control, roles, trust, or other identity-management sorts of things.

One day, I was researching recent Microsoft patent applications and I noticed that Microsoft had applied for a patent on the little padlock icon that appears in IE's address bar when you visit a site using SSL. You've seen it:



I was outraged. How dare they patent such a simple thing?

I did more research and realized that favicons and browser adornments of various kinds figured into a number of patents. It wasn't just Microsoft.

Coming up with the idea of a role-based favicon (and a flyout icon menu so you can select a different role if you don't want to use your current one) was pretty easy, and I was surprised no one had yet patented it. (Most good ideas -- have you noticed? -- are already patented.) It seemed obvious to me that Microsoft would eventually patent the rolicon idea if we (Novell) didn't. So I applied for the patent. The paperwork went to the U.S. Patent and Trademark Office on February 6, 2007. The patent was granted September 22, 2009.

Would I ever have patented something like this on my own, had I not worked for Novell? No. Do I think it's a good patent for Novell to have? Yes. Am I sorry I got paid a nice bonus for coming up with what many people, I'm sure, would call a fairly lame piece of technology? Crap no.

Do I think patents of this kind (or any kind) are good or right, in general? Hey. Today may be Sunday, but I'm no theologian. I don't take sides in the patent jihad. The patent system is what it is. Let it be.

bookmark and share this

Saturday, September 26, 2009

A fix for the dreaded iTunes -9812 error

Just found a fix for a problem that's been driving me nuts for a month.

A few weeks ago, sometime in early September, I tried to go to the iTunes store and found myself locked out of my iTunes account. Anything I tried to do that involved a transaction of any kind resulted in an alert dialog that said: "iTunes could not connect to the iTunes Store. An unknown error occurred (-9812)." That's it. No diagnostic information, no tips, no links, no help whatsoever. A less useful dialog box, I cannot begin to imagine.

Apple's site was of no use whatsoever. The troubleshooting advice I found there was incredibly lame. And of course, the -9812 error code means exactly what the dialog says it means: Unknown Error.

I figured there must be someone, on one of the forums, who would have found the answer to this problem. I did a lot of searching (and wading through a lot of lame "did you try this? did you try that?" non-answers), to no avail.

Finally, on September 18, Mike P. Ryan posted what (for me) turned out to be the solution, on discussions.apple.com.

The problem? A corrupt or missing trusted root certificate for iTunes. How or why this got messed up on my Vista machine, I don't know, but the same thing has clearly happened to boatloads of people, judging from the uproar on the discussion boards. The cure is to download Microsoft's latest trusted root certificate update from here: http://www.microsoft.com/downloads/details.aspx?FamilyID=f814ec0e-ee7e-435e-99f8-20b44d4531b0&displaylang=en. Follow the wizard instructions carefully, because you need to download two executables, not just one.

Note that the fix works for Win XP as well as Vista.

Mike, if you're reading this, thanks!

Saturday, September 19, 2009

Jet-powered Beetle


Once more, it's Saturday morning and I find myself catching up on really important reading, stuff I've been caching all week in hopes of getting back to Real Soon Now. At the top of the list? This excellent post by Ron Patrick describing his jet-powered VW Beetle, which was featured on the David Letterman Show (above) on September 9, 2007. Patrick's web page has lots of photos and goes into detail about the design, motivations, and installation details behind the use of a General Electric T58 turboshaft engine (meant for Navy helicopters) in a street-legal Beetle. The engine in question develops 1350 horsepower in its original helicopter application, but note that that's in a turboshaft configuration. Patrick is using it as a free turbine (jet thrust only, no mechanical drive). If he were to couple it mechanically to the drive train of the Beetle, the torque would probably pretzel the car's chassis faster than you could say Holy Halon, Batman, where's the fire extinguisher?

Is this a great country, or what?

Wednesday, September 16, 2009

The newline legacy

In a recent post, I talked about a legacy technology from the 1800s that's an integral part of hundreds of millions of computers today: the QWERTY keyboard layout. QWERTY was designed as an usability antipattern, and its widespread use probably costs the U.S. economy a billion dollars a week in lost productivity. That's my SWAG estimate, anyway.

But that's a hardware problem. ;^)

As a programmer, I think the legacy annoyance I most love to hate is the newline.

The fact that the computing world never settled on an industry-standard definition of what a newline is strikes me as a bit disconcerting, given how ubiquitous newlines are. But it's way too late to change things. There's too much legacy code out there, on OSes that aren't going to change how they treat newlines. The only OS that ever changed its treatment of newlines, as far as I know, is MacOS, which up to System 9 considered a newline to be ASCII 13 (0x0D), also known as a carriage return (CR). It's now the linefeed (ASCII 10, 0x0A), of course, as it is in most UNIX-based systems.

It always bothered me that DOS and Windows adhered to the double-character newline idiom: 0x0D0A (CR+LF). To me it always seemed that one character or token (not a doublet) should be all that's needed to signify end-of-line, and since UNIX and Linux use LF, it makes sense (to me) to just go with that. But no. Gates and company went with CR+LF.

Turns out it's not Gates's fault, of course. The use of CR+LF as a newline stems from the early use of Teletype machines as terminals. With TTY devices, achieving a "new line" on a printout required two different operations: one signal to move the print head back to the start position, and another signal to cause the tractor-feed wheel to step to the next position in its rotation, bringing the paper up a line. Thus CR, then LF.

The fact that we're still emulating that set of signals in modern software is kind of funny. But that's how legacy stuff tends to be. Funny in a sad sort of way.

In any event, here's how the different operating systems expect to see newlines represented:

CR+LF (0x0D0A):
DOS, OS/2, Microsoft Windows, CP/M, MP/M, most early non-Unix, non-IBM OSes

LF (0x0A):
Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, others

CR (0x0D):
Commodore machines, Apple II family, Mac OS up to version 9 and OS-9

NEL (0x15):
EBCDIC systems—mainly IBM mainframe systems, including z/OS (OS/390) and i5/OS (OS/400)

The closest thing there is to a codification of newline standards is the Unicode interpretation of newlines. Of course, it's a very liberal interpretation, to enable reversible transcoding of legacy files across OSes. The Unicode standard defines the following characters that conforming applications should recognize as line terminators:

LF: Line Feed, U+000A
CR: Carriage Return, U+000D
CR+LF: CR followed by LF, U+000D followed by U+000A
NEL: Next Line, U+0085
FF: Form Feed, U+000C
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029

For more info on newlines and edge cases involving newlines, the best article I could find on the web is this one by Xavier Noria. (It's quite a good writeup.)

There's also an interesting discussion of newlines in the ECMA 262 [PDF] specification. See especially the discussion on page 22 of the difference in how Java and JavaScript treat Unicode escape sequences in comments. (For true geeks only.)

Many happy returns!

The single biggest usability quagmire in computing

It never ceases to amaze me that we're still, in some ways, hamstrung by the mechanical legacy of the Industrial Revolution in our day-to-day computing. The best example is probably the QWERTY keyboard, which impairs usability for millions of computer users daily.

As you probably know, the QWERTY layout, conceived by James Densmore and patented by Christopher Sholes in 1878, was specifically designed to make it difficult for people to type fast on early typewriters. In other words, it was purposely designed and implemented as a usability antipattern. Fast typing caused jamming of mechanical typewriter keys, which were (in Densmore's time) returned to their original "rest" position by weights, not springs. We continue to live with the QWERTY legacy-layout today even though it is well accepted that other keyboard layouts (for English, at any rate) are much more usable.

The best-known alternative layout for Latin-based alphabets is the Dvorak keyboard, which dates to the 1930s. The U.S. Navy did a study in World War Two that found that typing speed was 74 percent faster for Dvorak than for QWERTY, and accuracy better by 68 percent. Other studies (both by private industry and government) have tended to confirm this general result, although there's a considerable cult movement (given impetus by a 1990 article in the Journal of Law and Economics) claiming Dvorak usability to be nothing more than urban legend. (See further discussion here.)

The studies of Dvorak typing accuracy have produced some interesting results. It's instructive to compare the most-mistyped English words for QWERTY users versus Dvorak users.

The mere fact that your fingers travel dramatically less interkey distance when using Dvorak layout means less wrist, finger, and arm movement; thus Dvorak presents the potential for reduced risk of muscle fatigue and injury. This alone would seem to argue for more widespread adoption.

Interestingly, variants of Dvorak are available for Swedish (Sworak), Greek, and other languages. Also, there's a single-handed-typing version of Dvorak, to help with accessibility.

So, but. Let's assume for sake of argument that Dvorak is demonstrably better in some way (speed, accuracy, accessibility, risk of wrist injury) than QWERTY. Why are we still using QWERTY?

It seems an influential 1956 General Services Administration study by Earle Strong, involving ten experienced government typists, concluded that Dvorak retraining of QWERTY typists was cost-ineffective. This study apparently was instrumental in sinking Dvorak's prospects, not so much because people put stock in its results as because of the government's role as a market-mover. The practical fact of the matter is that the U.S. Government is one of the largest keyboard purchasers in the world, and if a large customer convinces manufacturers to settle on a particular device design, it becomes a de facto standard for the rest of the industry, whether that design is good or not. (Today that sort of reasoning is less compelling than in the 1960s, but it's still a factor in market dynamics.)

It turns out to be fairly easy to configure a Vista or Windows XP machine such that you can toggle between QWERTY and Dvorak with Alt-Shift, the way some people do with English and Russian layouts. Basically, to enable Dvorak, you just go to Control Panel, open the Regional and Language Options app, choose the Keyboards and Languages tab, then click the Change Keyboards button, and in the Text Services dialog, click the Add button. When you finally get to the Add Input Language dialog (see below), you can go to your language and locale, flip open the picker, and see if Dvorak is one of the listed options. In U.S. English, it is. (Click the screen shot to enlarge it.)



If you have tried the Dvorak layout yourself, I'd be interested in hearing about your experiences, so please leave a comment.

In the meantime, I hope to give the Dvorak layout a try myself this weekend, to see how it feels. In all honesty, I doubt I'll stay with it long enough to get back up to my QWERTY typing speed. But then again, if it improves my accuracy, I'll have to consider staying with it a while, becuase frankly my accuracy these dsya sucks.

Sunday, September 13, 2009

Garbage collection 2.0 vs. Web 3.0


I continue to think about garbage collection a lot, not only as a career move but in the context of browser performance, enterprise-app scaleup, realtime computing, and virtual-machine design. Certainly we're all affected by it in terms of browser behavior. Memory leakage has been an ongoing concern in Firefox, for example, and the Mozilla team has done a lot of great work to stem the leakage. Much of that work centers, of course, on improving garbage collection.

One thing that makes browser memory-leak troubleshooting such a thorny issue is that different browser subsystem modules have their own particular issues. So for example, the JavaScript engine will have its own issues, the windowing system will have its issues, and so on. What makes the situation even trickier is that third-party extensions interact in various ways with the browser and each other. And then there are the monster plug-ins for Acrobat Reader, Flash, Shockwave, Java, Quicktime, and so on, many of which simply leak memory and blow up on their own, without added help from Firefox. ;)

A lot's been written about GC in Java. And Java 6 is supposed to be much less leakage-prone that Java 5. But Flash is a bit of a mystery.

The memory manager was apparently rewritten for Flash Player 8, and enhanced again for 9. (I don't know what they did for 10.) At a high level, the Flash Player's GC is very Java-like: a nondeterministic mark-and-sweep system. What the exact algorithms are, though, I don't know. How they differ for different kinds of Flex, Flash, AIR, and/or Shockwave runtime environments, on different operating systems, I don't know.

I do know a couple of quirky things. One is that in an AIR application, the System.gc() method is only enabled in content running in the AIR Debug Launcher (ADL) or in content in the application security sandbox.

Also, as with Java, a lot of people wrongly believe that calling System.gc() is an infallible way to force garbage collection to happen.

In AIR, System.gc() only does a mark or a sweep on any given object, but not both in the same call. You might think that this means that if you simply call System.gc() twice in a row, it'll force a collection by causing both a mark and a sweep. Right? Not so fast. There are two different kinds of pointers in the VM at runtime: those in the bytecode and those in the bowels of the VM. Which kind did you create? You'll only sweep the bytecode ones.

How the details of memory management differ in AIR, Flash, and Flex is a bit of a mystery (to me). But they do differ. The Flex framework apparently makes different assumptions about garbage lifecycles vis-à-vis a pure Flash app. The use-cases for Flex versus Flash are, of course, quite different and have no doubt influenced the GC approach. Flash comes from a tradition of short-lived sprite-based apps that the user looks at briefly, then dismisses. Obviously you can use a very tactical approach to GC in that specific case. But if you've got an app (Flex based) that is long-running and not constantly slamming animation frames to the video buffer, you need a more strategic approach to GC. (When I say "you," I'm talking about the folks who are tasked with designing the Adobe VM's memory management logic, not the application developer.) A Flex-based DAM or CMS client made for enterprise customers won't necessarily benefit from a memory management system designed for sprite animations in a sidebar ad.

By now every developer who cares about memleaks in AS3 knows not to use anonymous functions inside event handlers. Callbacks should have a name (so they can be GC'd) and weak references should be used. However, weak references won't totally save the day here. In AS3, asynchronous objects register themselves with the Flash player when they run. If one of those objects (Timer, Loader, File, DB transaction) continues to be referenced by the player, it's essentially unreachable to you.

There's also the issue of Object Memory versus Rendering Memory. The bulk of all memory used by the Flash player goes toward rendering. And that's the part you have the least control over. A stage in Flash can grow to 100Mb fairly easily, but if you try to destroy it you might only reclaim 40Mb. I have no idea how much of this can be attributed to AS3-C++ object entanglement versus deep-VM mayhem (or some other gnarly issue).

Overall, I think GC is (regardless of technology) something that benefits from openness and community involvement. In other words, this is an area where "proprietary" serves no one. The code needs to be open-source and the community needs to be involved in figuring out solutions to deep memory management issues. Apps can't simply be allowed to detonate unpredictably, for no apparent reason (or no easily-troubleshot reason), in a Web 3.0 world, or in enterprise.

Bottom line? Solving memory-management problems (at the framework and VM level) is critical to the future success of something like AIR or Flex. It's much too important to be left to an Adobe.

Friday, September 11, 2009

Twitter's new terms of service: Give us all rights to your words

This is the strangest bit of lexical legerdemain I've seen in a while.

According to the Sept 10 post on the Twitter Blog (which tries to explain Twitter's new Terms of Service in plain English):
Twitter is allowed to "use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute" your tweets because that's what we do. However, they are your tweets and they belong to you.
Pray tell me, in what possible sense does something belong to me if I've given every worthwhile right of usage over to somebody else?

Thursday, September 10, 2009

My First Content Management Application

Anybody who knows me knows I'm a sucker for a good meme. I love the very word meme, because it contains "me" in it, twice. ;)

Now it seems a new meme is making the rounds: Pie started a snowball rolling with his blog post My First Content Management Application, which begat similar posts by Jon Marks, Johnny Gee, Lee Dallas, and CherylMcKinnon all telling how they got started in the content-management biz. I can't help but chime in at this point, meme-ho that I am.

Back in the paleolithic predawn of the Internet, before there was a Web, there was FidoNet. Instead of Web sites, there were electronic bulletin board systems, and in 1988, I was the sysop of a BBS powered by the freeware Opus system. Opus essentially put me in the content management business, although no one called it that at the time, of course.

Opus was extremely popular not only because it supported the bandwidth-efficient ZModem protocol but because it was a highly configurable system, thanks to a one-off scripting language that let you exercise godlike control over every imaginable system behavior. The Opus scripting language was my first introduction to any kind of programming.

In those days, bandwidth was dear (modems ran at 300 and 1200 baud) and you took great pains to compress files before sending them over the wire. The most popular compression software at the time was SEA's ARC suite, the code for which went open-source in 1986. ARC seemed adequately fast (it would process files at around 6Kbytes per sec on a reasonably fast PC, which is to say one with a 8MHz processor) until a guy named Phil Katz came along with an ARC-compatible program that ran six to eight times faster. In a matter of a year or so, almost every BBS switched from supporting ARC to supporting PKARC.

SEA sued Phil Katz for copyright violation (Katz had violated the terms of the open-source license) and a major legal fracas ensued. BBS operators, unsure of their legal exposure, didn't know whether to stay with PKARC or go back to the much slower ARC (and risk losing visitors). Being young and foolishly optimistic, I decided to write my own compression and archiving software for the use of my BBS customers. I decided it would be a good thing, too, if it was faster than PKARC. Of course, I would have to learn C first.

Thus began an adventure that led toward the path that finds me where I am today, free-ranging in the CMS jungle as an analyst. I'll save the details of that adventure for another time. Suffice it to say, I did learn C, I did write a compression program, and it was faster (though less efficient) than Phil Katz's routines, and in fact I won a bakeoff that led to my code being licensed by Traveling Software for use in their then-popular Laplink connectivity product. Katz lost the bakeoff. (He also lost the lawsuit with SEA.) But he eventually did all right for himself. Perhaps you've heard of Pkzip?

So to answer a question no one asked (except Pie), my first "content" application was in fact a compression and archiving program that I wrote in 1988 to support users of an Opus BBS. That's what started me down the path of learning C, then Java, JavaScript, HTML, XML, and all manner of W3Cruft leading to the purple haze I walk around in today.

Friday, September 04, 2009

Is Yak Shaving Driving You Nuts?

As a professional jargon-whore I'm always fascinated to come across neologisms that have either just recently entered the geek lexicon or have been there for a while, but I just didn't notice. Such is the case with yak shaving, a term I frankly hadn't come across until yesterday.

One of the better definitions I've seen so far for yak shaving is the following:
1 March 2008, Zed Shaw, " You Used Ruby to Write WHAT?!" [5], CIO. Yak shaving is a programmer's slang term for the distance between a task's start and completion and the tangential tasks between you and the solution. If you ever wanted to mail a letter, but couldn't find a stamp, and had to drive your car to get the stamp, but also needed to refill the tank with gas, which then let you get to the post office where you could buy a stamp to mail your letter—then you've done some yak shaving.
Shaw's explanation of finding a stamp to mail a letter is a little quaint (who mails letters any more?) and begs for more pertinent examples. I think most of us could easily come up with quite a few. Right away I'm thinking YSEE as a synonym for J2EE. Some others:
  • Creating a Mozilla extension
  • Hello World in OSGi
  • Building a Flex app
  • Doing a clean install of OpenCms on a virgin machine (i.e., a machine that doesn't already have JDK, Tomcat, MySQL) -- not difficult, just a lot of Yak shaving
  • Getting almost any kind of enterprise software configured and running
  • Installing a nontrivial application on Linux (and having to resolve dependencies)
Again, the essential idea here isn't that what you're doing is difficult, just that it involves a lot of onerous tedium along the way to some worthwhile goal.

What would you add to the above list?

Wednesday, September 02, 2009

Augmented Reality, Nokia Style



In case you thought the "virtual reality goggles" idea was strictly 1990s sci-fi, guess what? People are still working on it, and Nokia is leading the charge to the commercial finish line. In Nokia's case, the primary goal is not to develop goggles, although clearly they've put some thought into it. The primary goal is to bring augmented reality technology to the cell phone.

"Augmented reality" can be thought of as a highly annotated model of real-world milieux, containing rich-media annotations, text annotations, and other kinds of embedded goodness. This sort of thing is seen by Nokia and others as a major value-add for cell phone customers. But there are other commercial possibilities as well. To grok the fullness, take a look at the following slideshow.
If you're lucky enough to be in or near Palo Alto next Wednesday (September 9), SDForum will be hosting a talk,“Augmenting Both Reality and Revenue: Connecting Mobile, Sensors, Location and Layers”, by Clark Dodsworth of Osage Associates and Maribeth Back of FX Palo Alto Labs. (Registration begins at 6:30 and is $15 for non-SDForum members. Details here.) This talk will give a non-Nokia view of the subject that should be quite interesting. Unfortunately, the real me won't me able to attend. And the virtual me isn't ready.

Tuesday, September 01, 2009

Google as Skinner Box

Many people have commented on the effectiveness of Google's lean-and-simple approach to usability. I personally don't think a huge amount of thought went into designing their interface. I think it's pretty much the minimum you have to do: Give people a text box to type in, and a Go button. Back comes a page of links.

But what's interesting is, I do think the Google "fast and lean" design motif (whether it was consciously designed or not) has had a profound influence on people's usability expectations. It sets the bar in a number of ways (see below) and anyone who designs interfaces should take heed, because people are now literally conditioned to expect certain things from a UI.

When I say conditioned, I mean it in the true behaviorist sense. I think the argument can (should) be made that Google's landing page represents a kind of virtual Skinner box. And yes, we are the rats.

The similarities to a Skinner box experiment are striking. The mechanism is quick and easy to operate. The feedback is immediate. You are either rewarded or not. Iterate.

I make a trip to the box about 15 times a day, and hit the lever an average of three times per visit. I am well conditioned. Are you?

I submit that the many people in enterprise who use Google intensively are very thoroughly conditioned to expect certain things from a UI, as a result of operant conditioning.
  • The feedback cycle should be short. You should be able to do a task quickly and get immediate feedback on whether it succeeded. Actual success is less important than being told quickly whether you succeeded or not.
  • It should be quick and easy to repeat an operation.
  • Controls should be very few in number and located high on the screen (right in your face).
  • Hitting Enter should be sufficient to get a pleasure reward.
  • Everything is self-documenting.
  • The UI is flat: No drill points.
  • Everything is a link. (Except the main action controls: text field and button.)
Note that of the two required controls (text field, button), one is actually superfluous since you can just as easily hit Enter as click the button. In fact, it's easier to hit Enter.

The Google operant conditioning cycle is the new unit of interaction (not so new, now, of course). It's the behavioral pattern your users have the most familiarity with, and it's burned into their nervous systems by now. Ignore this fact at your own peril, if you're a UI designer.

Counting the number of DOM nodes in a Web page

Adriaan Bloem and I were chatting yesterday about hard-coded limits and why they're still so annoyingly prevalent in enterprise software. One thing led to another and before long we were talking about the number of DOM elements in a Web page, and I started wondering: How many DOM elements does a Web page typically contain, these days?

I decided to hack together a piece of JavaScript that would count DOM elements. But I decided doing a complete DOM tree traversal (recursive or not) would be too slow and klutzy a thing to do in JavaScript. I just wanted a quick estimate of the number of nodes. I wanted something that would execute in, say, 100 milliseconds. Give or take a blink.

So here's the bitch-ugly one-liner I came up with. It relies on E4X (XML extensions for ECMAScript, ECMA-357), hence requires an ECMA-357-compliant browser, which Firefox is. You can paste the following line of code into the Firefox address bar and hit Enter (or make it into a bookmarklet).

javascript:alert(XML((new XMLSerializer()).
serializeToString(document.documentElement)
)..*.length());


Okay, that's the ugly-contest winner. Let's parse it into something prettier.

var topNode = document.documentElement;
var serializer = new XMLSerializer();
var markup = serializer.serializeToString(topNode);
var theXML = new XML(markup);
var allElements = theXML..*;
var howMany = allElements.length();
alert( howMany );

The code serializes the DOM starting at the document element (usually the HTML node) of the current page, then feeds the resulting string into the XML() constructor of E4X. We can use dot-dot-asterisk syntax to fetch a list of all descendent elements. The length() method -- and yes, in E4X it is a method, not a property -- tells us how many elements are in the list.

I know, I know, the E4X node tree is not a DOM and the two don't map one-to-one. But still, this gives a pretty good first approximation of the number of elements in a web page, and it runs lightning-fast since the real work (the "hard stuff") happens in compiled C++.

The code shown here obtains only XML elements, not attributes. To get attributes, substitute "..@*" for "..*" in the 5th line.

Again, the Document Object Model has a lot more different node types than just elements and attributes. This is not a total node-count (although I do wish someone would post JavaScript code for that).

Last night when I ran the code against the Google home page (English/U.S.), I got an element count of 145 and an attribute count of 166. When I ran the code on the Google News page, I got 5777 elements and 5004 attributes. (Please post a comment below if you find web pages with huge numbers of nodes. Give stats and URLs, please!)

That's all the time I had last night for playing around with this stuff; just time to write a half dozen lousy lines of JavaScript. Maybe someone can post some variations on this theme using XPath? Maybe you know a slick way to do this with jQuery? Leave a comment or a link. I'd love to see what you come up with.