Wednesday, October 22, 2008

Flash-drive RAID

I stumbled upon the floppy-drive RAID story (see previous blog) as part of a Google search to see if any such thing as a memory stick (Flash-drive) RAID array is available for Vista. No such luck, of course. But there are quite a few blogs and articles on the Web by Linux users who have successfully created ad-hoc Flash RAIDs from commodity USB hubs and memory sticks. (I recommend this June 2008 article from the Linux Gazette and this even more entertaining, not to mention better-illustrated, piece by Daddy Kewl. Definitely do not fail to read the latter!) Linux supports this kind of madness natively.

MacOS is even better for this. Evidently you can plug two sticks into a PowerBook's USB ports and configure them as a RAID array with native MacOS dialogs. (Details here.) How I envy Mac users!

Tuesday, October 21, 2008

Floppy-disk RAID array




This has got to be the funniest thing I've seen all year. And trust me, this has been a funny year.

Daniel Blade Olson, a man after my own heart (even if that phrase doesn't translate well into foreign languages...), has rigged a bunch of floppy drives to form a RAID array. His disturbing writeup is here.

Saturday, October 18, 2008

Fast pixel-averaging

I don't know why it took me so long to realize that there's an easy, fast way to obtain the average of two RGB pixel values. (An RGB pixel is commonly represented as a 32-bit integer. Let's assume the top 4 bits aren't used.)

To ensure proper averaging of red, green, and blue components of two pixels requires parsing those 8-bit values out of each pixel and adding them together, then dividing by two, and crafting a new pixel out of the new red, green, and blue values. Or at least that's the naive way of doing things. In code (I'll show it in JavaScript, but it looks much the same in C or Java):


// The horribly inefficient naive way:

function average( a,b ) {

var REDMASK = 0x00ff0000;
var GREENMASK = 0x0000ff00;
var BLUEMASK = 0x000000ff;
var aRed = a & REDMASK;
var aGreen = a & GREENMASK;
var aBlue = a & BLUEMASK;
var bRed = b & REDMASK;
var bGreen = b & GREENMASK;
var bBlue = b & BLUEMASK;

var aveRed = (aRed + bRed) >> 1;
var aveGreen = (aGreen + bGreen) >> 1;
var aveBlue = (aBlue + bBlue) >> 1;

return aveRed | aveGreen | aveBlue;
}

That's a lot of code to average two 32-bit values, but remember that red, green, and blue values (8 bits each) have to live in their own swim lanes. You can't allow overflow.

Here's the much cleaner, less obvious, hugely faster way:


// the fast way:

MASK7BITS = 0x00fefeff;

function ave( a,b ) {

a &= MASK7BITS;
b &= MASK7BITS;
return (a+b)>>1;
}

The key intuition here is that you want to clear the bottom bit of the red and green channels in order to make room for overflow from the green and blue "adds."

Of course, in the real world, you would inline this code rather than use it as a function. (In a loop that's processing 800 x 600 pixels you surely don't want to call a function hundreds of thousands of times.)

Similar mask-based techniques can be used for adding and subtracting pixel values. Overflow is handled differently, though (left as an exercise for the reader).

Friday, October 17, 2008

Loading an iframe programmatically

This is a nasty hack. It's so useful, though. So useful.

Suppose you want to insert a new page (a new window object and DOM document) into your existing page. Not a new XML fragment or subtree on your current page; I'm talking about a whole new page within a page. An iframe, in other words.

The usual drill is to create an <iframe> node using document.createElement( ) and attach it to the current page somewhere. But suppose you want to populate the iframe programmatically. The usual technique is to start building DOM nodes off the iframe's contentDocument node using DOM methods. Okay, that's fine, but it's a lot of drudgery. (I'm sweating already.) At some point you're probably going to start assigning string values to body.innerHTML (or whatever). But then you're into markup-stringification hell. (Is there a JavaScript programmer among us who hasn't frittered away major portions of his or her waking life escaping quotation marks and dealing with line-continuation-after-line-continuation in order to stringify some hellish construction, whether it's a piece of markup or an argument to RegExp( ) or whatever?)

Well. All of that is best left to Internet Explorer programmers. If you're a Mozilla user, you can use E4X as your "get out of stringification-jail FREE" card, and you can use a data URL to load your iframe without passing through DOM hell.

Suppose you want your iframe to contain a small form. First, declare it as an XML literal (which you can do as follows, using E4X):

myPage = <html>
<body>
<form action="">
... a bunch of markup here
</form>
</body>
</html>;

Now create an iframe to hold it:

   iframe = top.document.createElement( "iframe" );

Now (the fun part...) you just need to populate the iframe, which you can do in one of two ways. You can attach the iframe node to the top.document, then assign myPage.toXMLString() to iframe.contentDocument.body, or (much more fun) you can convert myPage to a data URL and then set the iframe's src attribute to that URL:


// convert XML object to data URL
function xmlToDataURL( theXML ) {

var preamble = "data:text/html;charset=utf-8,";
var octetString = escape( theXML.toXMLString( ) );
return preamble + octetString;
}

dataURL = xmlToDataURL( myPage );

iframe.setAttribute( "src", dataURL ); // load frame

// attach the iframe to your current page
top.document.body.insertBefore( iframe ,
top.document.body.firstChild );

A shameless hack, as I say. It works fine in Firefox, though, even with very large data URLs. I don't recall the exact size limit on data URLs in Mozilla, but I seem to remember that it's megabytes. MSIE, of course, has some wimpy limit like 4096 characters (maybe it's changed in IE8?).

In my opinion, all browsers SHOULD support unlimited-length data URLs, just like they SHOULD support E4X and MUST support JavaScript. Notwithstanding any of this, Microsoft MAY go to hell.

Saturday, October 11, 2008

Russians use graphics card to break WiFi encryption

The same Russians who got in a lot of trouble a few years ago for selling a small program that removes password protection from locked PDF files (I'm talking about the guys at Elcomsoft) are at it again. It seems this time they've used an NVidia graphics card GPU to crack WiFi WPA2 encryption.

They used the graphics card, of course, for sheer number-crunching horsepower. The GeForce 8800 GTX delivers something like 300 gigaflops of crunch, which I find astonishing (yet believable). Until now, I had thought that the most powerful chipset in common household use was the Cell 8-core unit used in the Sony Playstation 3 (which weighs in at 50 to 100 gigaflops). Only 6 of the PS/3's processing units are available to programmers, though, and the Cell architecture is meant for floating-point operations, so for all I know the GeForce 8800 (or its relatives) might be the way to go if you need blazing-fast integer math.

Even so, it would be interesting to know what you could do with, say, an 8-box cluster of overclocked PS/3s. Simulate protein-ribosome interactions on an atom-by-atom basis, perhaps?

Decimal to Hex in JavaScript

There's an easy way to get from decimal to hexadecimal in JavaScript:

  function toHex( n ) { return n.toString( 16 ); }


The string you get back may not look the way you want, though. For example, toHex(256) gives "100", when you're probably wanting "0x0100" or "0x00000100". What you need is front-padding. Just the right amount of front-padding.

// add just the right number of 'ch' characters
// to the front of string to give a new string of
// the desired final length 'dfl'

function frontPad( string, ch, dfl ) {
var array = new Array( ++dfl - string.length );
return array.join( ch ) + string;
}


Of course, you should ensure that 'dfl' is not smaller than string.length, to prevent a RangeError when allocating the array.

If you're wondering why "++dfl" instead of plain "dfl", stop now to meditate. Or run the code until enlightenment occurs.

At this point you can do:

  function toHex( n ) {
return "0x" + frontPad( n.toString( 16 ), 0, 8);
}

toHex( 256 ) // gives "0x00000100"


If you later need to use this value as a number, no problem. You can apply any numeric operation except addition on it with perfect safety. Addition will be treated as string concatenation whenever any operand is a string (that's the standard JS intepreter behavior), so if you need to do "0x00000100" + 4, you have to cast the hex-string to a number.

  n = toHex( 256 );  // "0x00000100"
typeof n // "string"
isNaN( n ) // false
x = n * n; // 65536
x = n + 256 // "0x00000100256"
x = Number( n ) + 256 // 512

Wednesday, October 08, 2008

$20 touchscreen, anyone?

Touchless is one of those ideas that's so obvious, yet so cool, that after you hear it, you wonder why someone (such as yourself) didn't think of it ages ago. Aim a webcam at your screen; have software that follows your fingers around; move things around in screen space in response to your finger movements. Voila! Instant touch-screen on the cheap.

Mike Wasserman came up with Touchless as a college project while attending Columbia University. He's now with Microsoft. The source code is free.

Awesome.

Saturday, October 04, 2008

Accidental assignment

People sometimes look at my JavaScript and wonder why there is so much "backwards" notation:

   if ( null == arguments[ 0 ] )
return "Nothing to do";

if ( 0 == array.length )
break;

And so on, instead of putting the null or the zero on the right side of the '==' the way everyone else does.

The answer is, I'm a very fast typist and it's not uncommon for me to type "s" when I meant to type "ss," or "4" when I meant to type "44," or "=" when I meant to type "==".

In JavaScript, if I write the if-clause in the normal (not backwards) way, and I mistakenly type "=" for "==", like so...

   if ( array.length = 0 )
break;

... then of course I'm going to destroy the contents of the array (because in JavaScript, you can wipe out an array by setting its length to zero) and my application is going to behave strangely or throw an exception somewhere down the line.

This general type of programmer error is what I call "accidental assignment." Note that I refer to it as a programmer error. It is not a syntactical error. The interpreter will be only too happy to assign a value to a variable inside an if-clause, if you tell it to. And it may be quite some time before you are able to locate the "bug" in your program, because at runtime the interpreter will dutifully execute your code without putting messages in the console. If an exception is eventually thrown, it could be in an operation that's a thousand lines of code away from your syntactical blunder.

So the answer is quite simple. If you write the if-clause "backwards," with zero on the left, an accidental assignment will be caught right away by the interpreter, and the resulting console message will tell you the exact line number of the offending code, because you can't assign a value to zero (or to null, or to any other baked-in constant).

In an expression like "null == x" we say that null is not Lvaluable. The terms "l-value" and "r-value" originally meant left-hand value and right-hand value. But when Kernighan and Ritchie created C, the meaning changed, to become more precise. Today an Lvalue is understood to be a locatable value, something that has an address in memory. A compiler will allocate an address for each named variable at compile-time. The value stored in this address (its r-value) is generally not known until runtime. It's impossible, in any case, to refer to an r-value by its address if it hasn't been assigned to an l-value, hence the compiler won't even try to do so and you'll get an error if you try to compile "null = x".

On the other hand, "x = null" is perfectly legal, and in K&R days a C-compiler would obediently compile such a statement whether it was in an if-clause or not. This actually resulted in some horrendously costly errors in the real world, and as a result, today no modern compiler will accept a bare assignment inside an if-clause. (Actually I can think of an exception. But let's save that for another time.) If you really mean to do an assignment inside an if, you must encapsulate it in parentheses.

Not so with JavaScript, a language that (like K&R C) assumes that the programmer knows what he or she is doing. People unwittingly create accidental assignments inside if-clauses all the time. It's not a syntactical error, so the interpreter doesn't complain. Meanwhile you've got a very difficult situation to debug, and the language itself gets blamed. (A poor craftsman always blames his tools.)

As a defensive programming technique, I always put the non-Lvaluable operand on the left side of an equality operator, and that way if I make a typing mistake, the interpreter slaps me in the face at the earliest opportunity rather than spitting in my general direction some time later. It's a defensive programming tactic that has served me well. I'm surprised more people don't do it.

Thursday, October 02, 2008

Wednesday, October 01, 2008

Serialize any POJO to XML

Ever since Java 1.4.2 came out, I've been a big fan of java.beans.XMLEncoder, which lets you serialize runtime objects (including the values of instance variables, etc.) as XML, using just a few lines of code:


XMLEncoder e = new XMLEncoder(
new BufferedOutputStream(
new FileOutputStream("Test.xml")));
e.writeObject(new JButton("Hello, world"));
e.close();

This is an extraordinarily useful capability. You can create an elaborate Swing dialog (for example) containing dozens of nested widgets, then serialize the whole thing as a single XML file, capturing its state, using XMLEncoder (then deserialize it later, in another time and place, perhaps).

A favorite trick of mine is to serialize an application's key objects ahead of time, then JAR them up and instantiate them at runtime using XMLDecoder. With a Swing dialog, this eliminates a ton of repetitive container.add( someWidget) code, and similar Swing incantations (you know what I'm talking about). So it cleans up your code incredibly. It also makes Swing dialogs (and other objects) declarative in nature; they become static XML that you can edit separately from code, using XML tools. At runtime, of course, you can use DOM and other XML-manipulation technologies to tweak serialized objects before instantiating them. (Let your imagination run.)

As an aside: I am constantly shocked at how many of my Java-programming friends have never heard of this class.

If there's a down side to XMLEncoder, it's that it will only serialize Java beans, or so the documentation says, but actually the documentation is not quite right. (More on that in a moment.) With Swing objects, for example, XMLEncoder will serialize widgets but not any event handlers you've set on them. At runtime, you end up deserializing the Swing object, only to have to hand-decorate it with event handlers before it's usable in your application.

There's a solution for this, and again it's something relatively few Java programmers seem to know anything about. In a nutshell, the answer is to create your own custom persistence delegates. XMLEncoder will call the appropriate persistence delegate when it encounters an object in the XML graph that has a corresponding custom delegate.

This is (need I say?) exceptionally handy, because it provides a transparent, interception-based approach to controlling XMLEncoder's behavior, at a very fine level of control. If you have a Swing dialog that contains 8 different widget classes (some of them possibly containing multiple nested objects), many of which need special treatment at deserialization time, you can configure an XMLEncoder instance to serialize the whole dialog in just the fashion you need.

The nuts and bolts of this are explained in detail in this excellent article by Philip Milne. The article shows how to use custom persistence delegates to make XMLEncoder serialize almost any Java object, not just beans. Suffice it to say, you should read that article if you're as excited about XMLEncoder as I am.

Monday, September 29, 2008

A number that's not equal to itself

All this time, I've been thinking NaN is not a number. What an idiot I've been.

In JavaScript:

   typeof NaN == 'number'   // true

And yet of course, NaN == NaN is false.

There you go. Amaze your friends.

Wednesday, September 24, 2008

Great hack: PNG-compressed text



I only recently stumbled across what's got to be the most outlandish scripting hack I've seen in a long time. Jacob Seidelin tells of how he managed to stuff text into a PNG image, then get it back out with the <canvas> getImageData( ) method. What's neat about it? Mainly the free compression you get with the PNG format. For example, when Jacob put the 124kb Prototype library into PNG format, it shrunk to 30kb. Of course, it makes for an awful-looking image (see above), which one might think of as a degenerate case of steganography, i.e. embedded data in an image, minus the image.

The trick doesn't work for all browsers, since you need canvas for it to work. And it's kind of pointless given that you can use gzip instead. But it's kind of neat in that it opens the door to browser steganography, embedding of private metadata, and potentially lots of other cool things.

Tuesday, September 23, 2008

JavaScript beautifiers suck

I keep looking for an online code beautifier that will convert my distinctly simian-looking Greasemonkey scripts to properly indented, formatted source code. My current favorite code editor (Notepad) doesn't provide proper code formatting. I know what you're thinking: Why aren't you using a proper IDE in the first place? Then you wouldn't have this problem! Well, first of all, I am thinking of upgrading to Wordpad. But it doesn't do formatting either. Second of all, I haven't found a JavaScript IDE worthy of the name, which is why I use Notepad. More on that in a minute.

I spent an hour the other day looking for an online beautifier that would do a makeover on my ugly JavaScript. What I found is that most people point either to this one or this one. (I tried others as well.) They either don't keep my existing newlines, or don't indent "if" blocks properly (or at all), and/or just plain don't indent consistently. Quite unacceptable.

Finally I gave up on the online schlockware and went straight to Flexbuilder (which has been sitting unused on my desktop), and I thought "Surely this will do the trick."

Imagine the look of abject horror on my face when I found that the ActionScript editor could not do the equivalent of Control-Shift-F (for Java in Eclipse). In fact, the formatter built into Flexbuilder's ActionScript editor won't even do auto-indenting: You have to manually grab blocks of code and do the old shift-right/shift-left indent/outdent thing by hand, over and over and over again, throughout your code, until the little beads of blood begin to form on your forehead.

I'm left, alas, with half-solutions. But unfortunately, two or three or ten half-solutions don't add up to a solution. (How fortunate we would all be if it did.)

Monday, September 22, 2008

Firebug on Vista giving problems

Is it just me or does anyone else find Firebug+FF3 on Vista to be flaky? It loses my console code if I switch tabs (not windows, just going to another tab and coming back). Sometimes the FB console stops working or won't execute "console.log( )". And it seems as though weird bugs show up in the Firefox console that don't show up in the Firebug log pane, and vice versa.

Also, I don't appreciate having to manually turn on the console for every web domain I go to. What a PITA. I wonder if that behavior can be disabled somehow? Right now, I'm feeling disabled.

Thursday, September 18, 2008

JavaScript runs at C++ speed, if you let it

The common perception (ignorance of the crowd) is that JavaScript is slow. What I'm constantly finding, however, is that people will hand-craft a JavaScript loop to do, say, string parsing, when they could and should be using the language's built-in String methods (which always run fast).

Example: You need a "trim" function to remove leading and trailing whitespaces from user-entered text in a form. If you go out on the web and look at what people are doing in their scripts, you see a lot of things like:

function trim10 (str) {
var whitespace = ' \n\r\t\f\x0b\xa0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000';
for (var i = 0; i < str.length; i++) {
if (whitespace.indexOf(str.charAt(i)) === -1) {
str = str.substring(i);
break;
}
}
for (i = str.length - 1; i >= 0; i--) {
if (whitespace.indexOf(str.charAt(i)) === -1) {
str = str.substring(0, i + 1);
break;
}
}
return whitespace.indexOf(str.charAt(0)) === -1 ? str : '';
}


I took this code verbatim from a web page in which the author of it claims (ironically) that it's an incredibly fast routine!

Compare with:

function trim(a) {
return a.replace(/^ +/,"").replace(/ +$/,"");
}

In testing, I found the shorter routine faster by 50% on very small strings with very few leading or trailing spaces, and faster by 300% or more on strings of length ~150 with ten to twenty leading or trailing spaces.

The better performance of the shorter function has nothing to do with it being shorter, of course. It has everything to do with the fact that the built-in JavaScript "replace( )" method (on the String pseudoclass) is implemented in C++ and runs at compiled-C speed.

This is an important point. Interpreters are written in C++ (Spidermonkey) or Java (Rhino). The built-in functions of the ECMAScript language are implemented in C++ in your browser. Harness that power! Use the built-in functions of the language. Never hand-parse strings with "indexOf" inside for-loops (etc.) when you can use native methods that run at compiled speed. Why walk if you can ride the bullet train?

The implications here for client/server web-app design are quite far-reaching. If you are using server-side JavaScript, and your server runtimes are Java-based, it means your server-side scripts are running (asymptotically, at least) at Java speed. Well-written client-side JavaScript runs (asymptotically) at C++ speed. Therefore, any script logic you can move to the client should be moved there. It's madness to waste precious server cycles.

Madness, I say.

Wednesday, September 17, 2008

Getting Greasemonkey to work in Firefox3 on Vista

Wasn't happening for me until I started with a fresh (empty) FF3 user profile. Vista seems to be the problem in all of this. GM on FF3 on WinXP works fine, but with Vista, GM doesn't install properly unless you zero out your FF3 profile first. At least, that's the state of things today as I write this (17 Sept 2008). Hopefully it will get fixed soon. Until then ...

The procedure is:

1. In FF3, go to Organize Bookmarks and export your bookmarks as HTML so you don't foolishly lose them.

2. In the Vista "Start" panel, choose Run...

3. Launch Firefox with a command line of "firefox -profilemanager".

4. When the profile manager dialog appears, create a new profile.

5. When FF launches, install Greasemonkey.

6. Import your bookmarks.

7. Exit Firefox. Return to step 3. When profile manager dialog appears, delete your old profile. (Or else leave it and have to contend with logging in to one or the other profile whenever FF launches.)

Whew + sheesh.

Wednesday, September 10, 2008

How to use JS 1.7 in Greasemonkey?

Problem: I need to be able to use the 'yield' keyword in a Greasemonkey script. This is a Javascript 1.7 language feature available in Firefox 2 and later. You must explicitly "turn on" support for this feature, however, by specifying

<script type="application/javascript;version=1.7"/>

in the HTML page.

That's not what I need. I need to turn it on in Greasemonkey's execution context.

Others have run into this problem. It appears, however, that the Greasemonkey guys won't do anything about it.

I was hoping there'd be some clever back-door way to do this, but that seems unlikely. There appears, alas, to be no workaround, short of the usual (for Greasemonkey) expedient of vulturing the unsafeWindow, which is of course repulsive and unacceptable.

If anyone knows of a non-ugly solution to this problem (the problem of how to use 'yield' in Greasemonkey scripts), please advise.

Tuesday, September 09, 2008

Selection object in Firefox

I've learned some interesting things about the way selections work in Mozilla.

Every window has a singleton selection object, even when the user has selected no items on the rendered page. Therefore, window.getSelection( ) always succeeds.

If you simply want user-selected text as a string, getSelection( ).toString( ) will work. But if you really intend to walk the selected DOM nodes, or process the selection in any non-trivial way, you will need access its Range objects with

window.getSelection( ).getRangeAt( i );

There is a "rangeCount" property on the Range object, so that you can know how many Ranges were selected by the user. In Firefox 2.0 and prior, the rangeCount was never more than one. But in Firefox 3, the user can do multi-selection of page contents. (Try it: Hold the Control key down as you swipe across various pieces of a page.) That means the range count can be more than one.

If you need to process a Range's contents, be sure to use the cloneContents( ) method, not the extractContents( ) method. The latter will actually remove nodes from the DOM tree, affecting the rendered page's appearance. (That is to say, content suddenly disappears!)

This is all spelled out at the Moz Developer Center page on Ranges.

Friday, September 05, 2008

XPath Query in Sling

I've been playing with Sling lately, and I was pleasantly surprised to find that Sling comes with a JSON query servlet that exposes SQL and XPath query capability through a RESTful HTTP GET syntax. (Thanks to Moritz Havelock for pointing this out.)

But I quickly ran into a small problem. (And just as quickly, the solution.) Allow me to explain.

The problem: I want to search for nodes in the repository that have a (multivalued) "pets" attribute containing the value "dog." Note that the "pets" attribute might have multiple values. I want to filter against just one. Therefore I can't do an equality test. I must use the XPath contains() function.

My test query was:

http://localhost:7402/content.query.json?
queryType=xpath&statement=//*[contains(@pets,'dog')]


This produced an InvalidQueryException, with a message of "Unsupported function: contains (500)".

I was a bit surprised that the servlet seemed to know nothing about any contains() function. A true "WTF moment."

Taking my hint from the stack trace, I quickly ran a Google Code Search on org.apache.jackrabbit.core.query.xpath, and immediately found the answer in XPathQueryBuilder.java: It turns out you have to use the function's qualified name, jcr:contains(). Like so:

http://localhost:7402/content.query.json?
queryType=xpath&statement=//*[jcr:contains(@pets,'dog')]


I'm so much of an XPath newb that I don't even know if I should have been surprised by this, but it did stymie me briefly. Anyway, it works now and I'm thrilled to be able to do XPath queries right from the GET-go.

Tuesday, September 02, 2008

Google Chrome: nice console, ugly browser


I downloaded Chrome today and immediately started using the JavaScript console. It's pretty nice, but if you're already accustomed to Firebug in Firefox, it's no substitute. Also, what good is Chrome if you can't use Greasemonkey scripts with it?

The JS engine is presumably based on Spidermonkey (since the Chrome guys apparently used a lot of Mozilla code to slap this thing together). But they forgot to include E4X. And so help me, I haven't figured out how to enter a newline in the console without triggering an eval( ). In other words, I can only enter one line of code at a time, and then I have to execute it. As soon as I hit Enter, CR, Control-Enter, etc., the code on the current line executes. Oh well...

As a browser, this thing is not terribly impressive, from what I can tell.

In any case, Chrome itself strikes me as too fugly to deal with. I'm not sure which I'd rather do: spend a work-day using Chrome as my main browser, or jam prickly-pears into both my eyes at once.

I think I'll stay with Firefox until Chrome gets out of beta. Which (if it's like Gmail) it never will.

Friday, August 29, 2008

Pretty-print serialized DOM

Another great Mozilla feature: pretty-format a serialized DOM tree. The following code will serialize an entire web page and pretty-format the markup:

var serializer = new XMLSerializer( );
var str = serializer.serializeToString( document.documentElement );
var pretty = XML( str ).toXMLString( );


As mentioned in my earlier post about XMLSerializer, the XML you get isn't perfect: element names come out ALL CAPS for some weird reason. And you get a bunch of automatic entity substitutions, most of which you probably want, others of which will simply break things if you try to deserialize the text back into a DOM later. (Forget about easy roundtripping.) But overall, it's a really useful trick.

I was hoping maybe this trick would also (as a free bonus) pretty-format any embedded scripts inside CDATA sections, but of course no such luck. In fact, due to automatic entity substitution, <![CDATA[ gets converted to &lt;![CDATA[, which is hilarious in a sad kind of way.

Serializing DOM nodes to XML in Firefox

I keep having to re-teach myself this, so I might as well post it where I can always find it!

Let 'd' be any arbitrary DOM node. To serialize the node (and its descendants) via JavaScript:

var serializer = new XMLSerializer();
var str = serializer.serializeToString( d );


Having a top-level XMLSerializer object in Mozilla is so nice. So, so nice.

But sadly, the output from the serializeToString( ) method is not the kind of XML I'd like to see. Element names come out ALL CAPS whether or not that's what you want (and it's never what I want). It also converts the greater-than and less-than symbols inside scripts to their entity equivalents, even if you enclose your scripts in CDATA sections. To me, entity substitution inside a CDATA section makes no sense whatsoever.

Still, it's handy to be able to serialize a DOM tree. Even if the tags come out ALL CAPS.

Tuesday, February 12, 2008

Stackless Stack

I made mention of the Stackless Stack in my CMS Watch blog the other day. I need to write a followup blog on it, explaining the OSGi connection.

Who knows if someday I won't also need to write a blog about the Javaless JVM?

Friday, December 21, 2007

Schema-Typed Languages

BEA Systems may have invented something quite novel and useful. In a recent patent application, BEA's John Schneider proposes using XML schema definitions as data types in, say, ECMAScript. The main intuition is that you would use an import statement to make the interpreter aware of a particular schema definition. From that point on, you could instantiate whole objects based on that schema def, or (if it's a simple data type) declare variables to be of type "MyElement.xsd" and manipulate them directly. Type-checking is delegated to the schema validator; and suddenly you have a scripting language that acts like a strongly typed language and groks XML to boot.

At first blush, it sounds and feels a lot like a new twist on object relational mapping, but it's actually a bit more than that. This goes to the heart of language design and behavior.

Neat. I wonder what BEA plans to do with it next?

Friday, December 14, 2007

IE8 and XHTML

I ran across an interesting post by Mary Jo Foley talking about Internet Explorer 8. It mentions that IE8 probably still won't support XHTML properly.

This is all so very wrong.

Wednesday, December 12, 2007

Google Charts

Projects under the heading "Google Apps" don't tend to excite me very much these days, but this one is too good to go unmentioned.

Google Charts is a simple REST-style API for creating graphs and charts on the fly, such as this one:



Details here:
http://code.google.com/apis/chart/#encoding_data

Monday, July 23, 2007

Menus as Non-Modal Dialogs

I was thinking the other day about how best to keep the details of application logic hidden from Swing widgets (in the spirit of Martin Fowler's Presentation Model), the main intuition being that a user app can/should (arguably) be modeled as a set of nonvisual capabilities to which utterly dumb GUI widgets can later be mapped. Achieving this in a clean way is incredibly difficult. (Or at least for me it is.)

I had an epiphany of sorts. When you design a standalone user app (a menu-driven desktop app), what's the first piece of UI you design? The menu system. And what is a menu? In Swing (Java), it's a series of nested buttons. (JMenu and JMenuItem inherit from javax.swing.AbstractButton.)

The menubar never goes away. Some apps let you hide it, in which case it's merely made invisible (it doesn't actually get released from memory). There's a name, of course, for collections of buttons that never go away: a non-modal dialog. My epiphany was/is that a menu system is a collection of non-modal dialogs. (And I hate non-modal dialogs, both as a user and as a programmer.)

In the typical menu-driven app, menus are non-modal dialogs in which each button "knows too much" about deep application internals. The ever-changing state of the entire app is controlled through this collage of interdependent buttons, and managing the underlying ill-formed dependency graph is difficult, and this is why menu apps are a pain the ass to write.

Friday, March 09, 2007

Fractal-Dimensional Transforms

I was on the back porch thinking about image transforms the other morning, and it occurred to me that we just assume that many types of data are either one-dimensional, two-dimensional, or three-dimensional, etc. (with nothing in between), despite the fact that fractals are everywhere in nature. And we apply transformations and convolutions (2-dimensional DCT, in the case of JPEG) to the data without regard for the data's true dimensionality.

So I'm left wondering: how do you do, say, a 2.2D DCT or DFT? What if I want to convolve the fractal residue of a time series?

Wednesday, January 24, 2007

Privacy Leakage Patent

Identity data-mining disturbs me. What disturbs me even more is that you can patent a technique for, say, guessing someone's age based on their purchasing habits (which is what Amazon has succeeded in doing).

Evil, evil, evil.

Thursday, January 11, 2007

jrunscript

It turns out JDK 6 comes with a JavaScript console facility so that you can play with Rhino interactively from a command line. Look for a file called jrunscript.exe in your JDK's /bin directory.

A pretty good article on Java/JavaScript integration in Java 6 can be found on the Sun Developer Network site right here.

Wednesday, January 03, 2007

OpenOffice.org Dev Hurdles

Over the holidays I decided to wade into the murky waters of OpenOffice development. I was quickly up to my neck in mud.

It turns out I'm not the only one. Key OOo insiders are acutely aware that the barriers to participation in OOo development are way too high (keeping community participation in OOo development way too low).

It's not just that finding all the code is hard or that the C++ codebase is around 7 million lines of code. It's that a full compile-and-build of OOo takes 15 hours on a typical desktop PC. If you can get it to build at all.

Some of the entry-barrier issues are more fully discussed in Jens Heiner Rechtien's 31 Dec 2006 blog.

Friday, December 22, 2006

JRuby for OpenOffice Development

Juergen Schmidt (who gave a talk at last week's Javapolis conference on why Java programmers should get more involved with OpenOffice) blogged yesterday about the prospect of using JRuby for OOo development:

I also met two Sun colleagues Thomas Enebo and Charles Oliver Nutter, two of the JRuby core developers, and brainstormed a little bit with them about the support of JRuby in OpenOffice.org. JRuby comes directly with Java in the future and the integration work into NetBeans is ongoing. So it would be great to have a good support for JRuby from UNO as well. JRuby as one of the main scripting languages for OpenOffice.org with a smart integration in NetBeans is a really cool idea and I hope that we can deliver something in this direction. We will see what's possible and when!

Some interesting podcasts from Javapolis (including quite a few on agile development) are here.

Wednesday, November 08, 2006

Project Tamarin

By now everyone has heard the news that Adobe will donate code for its ActionScript VM to the Mozilla Foundation for use in Firefox. For a quick snapshot of what's going on, see:

A lot of the blog commentary around this has centered on Flash. IMHO this has little to do with Flash. It has everything to do with ECMA4/JS2 (see my blog entry previous to this one) and the future of AJAX. It will also keep Adobe honest in terms of making sure ActionScript doesn't continue on the path of becoming its own bastard variant of JavaScript (a la JScript), which is to say a not-quite-compliant dialect of ECMA-262.

The ability to run JIT-compiled JavaScript on a VM is killer, because it knocks down all complaints of JS being slow. And it also opens the door to ultra-fast JS on the server (and pure-JS doublesided AJAX).

The VM architecture looks like this:




But again, it's not really about .swf, it's about compiling JS2 into bytecode, which is an incredibly important advancement.

Brendan Eich held an IRC chat yesterday in which he and Kevin Lynch of Adobe fielded questions about Tamarin. A few interesting factoids came to light:

  • Acrobat's JS engine will move from Spidermonkey to Tamarin.
  • The expansion factor for jitting bytecode to x86 is roughtly from 5X for strongly typed, early-bindable code, to 20X for loosly typed, unbindable code. Thus, you pay a price in memory hunger for the ability to JIT-compile JS, but JS2's new typing system mitigates it somewhat.
  • The Tamarin codebase comprises 135,000 lines of C++ (smaller than I would have thought). This is sure to grow but Brendan Eich indicated very strongly that Firefox needs to shrink, not grow, hence there will be pressure to keep Tamarin as lean and efficient as possible.
  • Tamarin is not 64-bit-ready. But if the project gets the kind of (huge) traction that it appears it will get in the community, the "64-bit Flash" question may finally get solved. And maybe ES4/JS2 will get a "long" data type in addition to int/uint/double. ;^)

Thursday, November 02, 2006

New ECMA Draft

ECMA's 262 revision-4 working group just published a draft spec of what will hopefully become (by next summer) JavaScript 2.0. This is the first major upgrade to the JavaScript language in almost a decade. Guaranteed to take Ajax to the next level.

Tuesday, October 24, 2006

Fuzzing

I learned about fuzzing today. Think of it as fault discovery by random input. The underlying assumption: If unexpected input makes an app produce unexpected behavior, you're hosed. Hackers rely on fault-injection to find vulnerabilities. QA can use it to find bugs.

There's a list of open-source fuzzers here.

Friday, October 06, 2006

Adobe Ditches SVG Viewer

Friend and colleague Pascal Barbier pointed out to me the other day that Adobe will soon stop supporting/developing its free SVG Viewer plug-in for web browsers. As of January 2007, Adobe will simply abandon the SVG Viewer.

Although this move is certainly consistent with Adobe's longterm Flash strategy, I don't think it's motivated by anything Flashy. (Call me naïve.) Adobe already supports SVG in most of its products and will soon leverage SVG in Acrobat via PxDF. Support for SVG goes on. Just not in the browser.

The move mostly affects Internet Explorer users, since SVG support is native in Firefox. But let's face it, how many IE users even have the Adobe plug-in? How many IE users have ever tried to view an SVG page? (How many can even spell SVG?)

I don't blame Adobe (or any company) for abandoning a development-intensive non-product that requires huge gobs of time and money to support. But that raises the question: Why doesn't Adobe donate its Viewer code to the open-source community? This is a great opportunity, after all, for Adobe to win badly needed points in the F/OSS world. From a P.R. standpoint, it's Something Very Good.

Surely they'll figure it out.

Wednesday, September 27, 2006

Adobe PxDF

Word is slowly leaking out about Adobe's planned XML grammar for PDF (code name Mars, so think SVG-in-a-space-suit).

The new XML-based PDF format ("PxDF") is basically SVG with some extensions to allow for various kinds of embedded resources and references thereto. Recall that PDF can contain form widgets, annotations, JavaScript, and other flotsam. You can specify some of these items as reusable resources, refer to them using XLink, ball everything up into a zip archive, and expect Acrobat 8.x to deal with it (possibly as early as November).

Tuesday, September 26, 2006

Concurrent JavaScript

There is no such thing, I just made that phrase up. But it seems inevitable. The really ancient concept of futures (from concurrent programming languages) has interesting pointcuts in AJAX development, so I'm forced to give renewed attention to things like Narrative JavaScript, jwacs, and Chris Double's admirable forays into JavaScript future and promise support. All really awesome stuff. It's always interesting to see the JS community outrunning Eich and ECMA on occasion.

If you're still scratching your head, I recommend spending some time with Alice.

Monday, September 18, 2006

How to Make SVG Slower

It's called Dojo2D.

Try this test page. On my machine, Firefox 1.5.0.7 will load that page in 8 seconds, which is about 7.99 billion clock cycles too many, for my taste. But Internet Explorer locks up for a full 30 seconds (consistently, every time) when trying to load the page.

I'm happy to see IE users brutally punished in this fashion, of course. But honestly, this has to be some kind of sick, sick joke, right?

Wednesday, September 13, 2006

Dynamic Languages on the JVM

In certain parts of the world it is said that there are three things that can never be known to any man: The hour of one's death, the true name of Allah, and the current status of JSR-292.

Nevertheless, it seems clear that the fruits of JSR-292 will be folded into Dolphin (Java 7, to be released in 2008).

Let's see, that's (how many is it?) thirteen years that it took Sun to realize some people may actually want to do serious programming in something other than you-know-what.

Tuesday, September 05, 2006

JavaScript 1.7 is in Firefox Beta

True to Brendan Eich's earlier timeline predictions, Firefox 2b2 now implements JavaScript 1.7, with such new features as:
These are currently available only in the browser, alas, and not in Mozilla's Rhino engine. Rhino has a history of following the ECMA standards quite closely, and the JS1.7 features are not part of any standard (yet).

Of course, none of this will be in Internet Explorer any time soon. Once again, the Firefox folks have made a bold move into unexplored territory, leaving the safe, comfortable, Web 1.0 weenie-world of Ballmer & Co. ever further behind.

Thursday, August 31, 2006

Free Security Book

I owe this one to my colleague Stephen Holmes in Dublin, who today pointed me at the freely downloadable version of Ross Anderson's superb Security Engineering. This is without a doubt one of the finest free online books (of any kind) that I've ever seen, beyond being a celebrated classic in security circles for several years now. The author is a Professor of Security Engineering at the University of Cambridge's Computer Laboratory. Even so, he writes entertainingly. ;^)

The chapters are individually downloadable, or you can shag the whole book. For a quick look, I recommend Chapter 11 (which had me utterly spellbound).

Tuesday, August 29, 2006

Dojo 2D

In the ever-widening quest for richer web widgets, the Dojo guys, it turns out, are considering implementing their own 2D graphics API. It would actually be a bunch of wrappers around SVG, VML, and Canvas methods, of course. The primary target is SVG.

Implementing this for even a small subset of SVG will be arduous. (The Flash ninjas must be laughing themselves sick right about now.) I'm tempted to dismiss Dojo 2D as a quixotic quest. But I also know AJAX developers are clamoring for just this sort of thing, and I'm sure Dojo 2D will be a scandalous success.

Performance is apt to be underwhelming (SVG is already sluggish enough without wrapper layers), but that's never stopped a market disruptor before, and anyway, Dr. Moore can't be far behind with the cure.

Monday, August 28, 2006

Lightweight 3D in Java


I finally found the ultimate no-frills super-lightweight 3D library written in Java: Peter Walser's idx3d framework. (Freeware, of course.)

After playing with idx3d for a month, I'm still astonished at how much functionality Peter crammed into just 29 (count 'em) .java files. The code is streamlined and easy to follow (a rarity in 3D engines). No frills, no baroque overfactoring, no "let's be fully general so as to handle the occasional weird-ass edge-case even if it means slowing everything else down."

I've found the idx3d code to be extremely stable, reasonably fast (again, a rarity in Java 3D engines), and after 30 hours of flogging it mercilessly in Eclipse (on Novell SUSE Linux Enterprise Desktop), I have yet to see an OutOfMemoryError.

The most wonderful thing about Peter Walser's code is that it was written in Y2K (back when Java was lean and mean) and has very few JRE dependencies: you'll see an occasional java.util class, but for the most part, Walser's code files contain no imports. Which is astonishing.

If you're interested in 3D programming, check this thing out.

Wednesday, July 26, 2006

The Zen of Hashing

Hashing and hash algorithms are a pet interest of mine. Understanding hashing at a low level takes a fair amount of meditation. Most programmers are too busy for that. Thus hashing is not well understood outside of, say, cryptography circles.

As it turns out, the guy who did the amusing boredom-graph cartoon (see yesterday's blog) also has written one of the best overviews of hashing I've seen in a long time. Be sure to see his excellent Hash Functions and Block Ciphers page as well.

Study the material on Bob's site. Save yourself years of meditation.

Tuesday, July 25, 2006

A Timeless Graph




Bob Jensen created this wonderful graph, which confirms what I've long thought: boredom tends to be continuous over its range.

Monday, July 17, 2006

Making XML Smaller

In all the hand-wringing discussions about XML's verbosity that I've read over the years, I have yet to hear anyone suggest simply truncating all closing tags to </>. In other words, if you've got

<data>
  <item>something</item>
</data>

why not just shorten it to


<data>
  <item>something</>
</>


Verbose closing tags are a pure waste of space (albeit required by XML spec). Abbreviated closing tags don't make the file any less parsable. When the parser encounters </> it knows that the closure is at the nesting level of the previous opening tag. If not, the XML was not well-formed to begin with.

Verbose closing tags are just that. Unneeded verbosity.

Wednesday, June 28, 2006

When Identity Theft is not Theft

Two years from now, it will not be necessary to steal anyone's identity. Web surfers will have given away more personal info to the world than even the greediest thief would ever want to rip off by illegal means.

I'm not so much talking about static identity info, like your Social Security number (which will be worthless anyway in a year or two). I'm talking about the really interesting dirt. Your shopping habits, reading habits, movewatching habits, hobbies, favorite travel destinations, where you went to school, who you've worked for and how long you stayed at each job, and (let's not mince words) sexual preferences, who your friends are, the names and ages of your children. Most of this info can be scraped, right now today, from blog bios, online resumes, mySpace profiles, tag-sharing sites, social networking sites (like linkedin.com), and photo-sharing sites. Your info is out there. You put it there yourself.

And the bad part is, there's no taking it back. Google archives old pages. So does the Wayback Machine.

You're leaking personal info to the world every time you use an online service of any kind. Particularly the spate of Web 2.0 applications offering free online word processing, spreadsheets, chats, etc. Those are hosted apps. Most of the hosts are trustworthy (arguably), but the hosts tend to archive chatlogs and other interaction records, which means the storage media on which that material is archived can be stolen or lost just like the Veteran's Administration guy's laptop.

Or it can be inadvertantly indexed by Google and exposed to searchers (as has happened with supposedly private test scores).

The outflux of identity info onto the Web is massive, and it's accelerating daily, driven largely by the explosion in popularity of "Web 2.0" apps.

All of which is great news to the National Security Agency, who by some accounts are sifting through your data right now.

Tuesday, June 20, 2006

RoR Gaining on Atkins Diet

It's official: Ruby on Rails is about to edge past the Atkins diet for popularity.

RoR has also overtaken West Nile virus.

It has to be true. I saw it on Google Trends.

Thursday, June 08, 2006

Spring Framework Backlash

It's refreshing (and healthy, I think) to see open, honest debate erupt over the usefulness of IoC frameworks, in particular the certifiably trendy Spring framework. I refer to Bob Lee's gratifyingly blunt I Don't Get Spring.

Surprisingly, most of the comments at the end of Lee's blog are dispassionate, logical, and in full agreement with Lee's premise, which (to oversimplify) is that Spring is cryptic, over-architected, and malodorous at a code level (among other felonies), begging the question of why anyone would use it.

I can understand why Lee would feel that way. He's right on most counts. Spring is indeed byzantine and heavy (as most things surrounding J2EE are), and buries too many dependencies in XML. But that doesn't mean Spring doesn't have its legitimate uses.

Monday, June 05, 2006

JVM as Web-Service Endpoint

Imagine if you could ping a running JVM over HTTP to obtain realtime diagnostic info. That seems to be what Sun has in mind with U.S. Patent 7,039,691, "Java Virtual Machine Configurable to Perform as a Web Server," granted to Sun Microsystems last month.

Abstract: A virtual machine, such as a Java(tm) virtual machine, is configured to operate as a web server so that users, using a browser, can make general-purpose inquiries into the state of the virtual machine or, in some cases, mutate the state of the VM. A "browsable" VM contains a network traffic worker, such as an HTTP thread, a services library, and a VM operations thread, which is an existing component in most virtual machines. The network traffic worker and the VM operations thread communicate through a request data structure. The VM operations thread generates a reply to the request upon receiving a request data structure from the traffic worker. Such a reply can be in the form of an HTTP response containing HTML or XML pages. These pages are transmitted back to the browser/user by the network traffic worker.

Thursday, June 01, 2006

Metacompilers and Checkers

Imagine if your favorite compiler were extensible in such a way that you could add your own custom static checks, to find bugs of a special kind that you need to be able to find but that your compiler is too stupid to know about out-of-the-box. That's the intuition behind metacompiler (MC) technology. You write a checker, which is a snap-in that knows how to check for whatever kind of syntactic or other blunder you care about, and add it to the compiler. Then the compiler knows how to emit new warnings or error messages.

A checker can be as simple or as sophisticated as you want it to be. Maybe you want to be sure that every call to foo( ) is eventually followed by a corresponding call to bar( ). Or you may have application-specific security concerns (in the context of export laws, perhaps). Or you may have company policy around certain syntactical idiosyncracies that would only be of specific concern to your department or your company.

Interestingly, the Stanford MC guys did a pass against the Linux kernel using their own custom checkers plugged into their own MC-aware gcc and found almost 600 potentially serious bugs, most of which have not been looked into yet (if you believe Coverity's latest findings).

Wednesday, May 31, 2006

Tuesday, May 23, 2006

Continuations Thought Harmful

In late March, I blogged a couple times about continuations. Suddenly, Sun's Tim Bray and Gilad Bracha have broached the subject, stimulating much heated discussion in the blogosphere. Much heat, little useful work at the crankshaft.

Of all the recent posts on this surprisingly controversial subject, I find Curtis Poe's the most clueful.

Friday, May 19, 2006

Putting a Face on AJAX

This online facial-compositing app is the weirdest thing ever. It lets you merge facial features (from actual photos) together to create your own police composite sketches, kind of.

I spent 30 minutes fooling with it. Everything came out looking like Pia Zadora.

Wednesday, May 10, 2006

AJAX as a Man-in-the-Middle Architecture

A friend at work showed me Gabbly, which is an AJAX IM-chat pushlet that gives the appearance of putting a chat window over the top of any web page you choose (kind of like gmail-chat).

Odd thing is, it even worked for us when we set the URL to a secure wiki page inside the company firewall.

We promptly exited our Gabbly session and began chatting about it on Groupwise Messenger (our company standard). The whole experience was freaky and left us with serious security worries. Especially when Firefox crashed on me within minutes of leaving the Gabbly-iframed page.

According to a discussion at Ajaxian, Gabbly is indeed vulnerable to cross-site scripting attacks. But I'm equally worried about things like Gabbly JS code being able to walk up to the _top frame and read a supposedly secure container page (not to mention issues around Gabbly.com slurping our plaintext conversation in real time). Likewise, there's nothing stopping the Gabbly server from stomping on any Javascript code that's already in-scope in your page.

The thought of people using a 3rd-party-hosted chat app like this at work scares the hell out of me.

But that's the trouble with things like shorttext.com, ajaxwrite.com, and other free-neato-trendy AJAX "services": They require you to rely on the trustworthiness of the host. I put it too delicately. These are man-in-the-middle applications.

User beware.

Thursday, May 04, 2006

Stallman on MSWord Attachments

Recently a friend reminded me of this discussion (old but still relevant) by Richard Stallman of Word attachments and why they're basically the work of Satan.

Friday, April 28, 2006

YAMSWK (Yet Another M$Word-Killer)

My nomination in the category of "best AJAX-based Word workalike" for this week is Zoho Write, one of a suite of impressive Zoho apps. It took a while (30sec) for Firefox to pull down all 51 external .js scripts, but when the app opened, it was a thing of beauty. Imagine my abject stupefaction upon using the Import button to suck in a complex (many tables, many fonts) .sxw file, and seeing it open without errors, looking just the way it should! Yes, Zoho Write handles OpenOffice files. Just as Nature intended.

Unlike a lot of Web2.0 apps, Zoho is not the product of a teenager locked in a closet. Behind the Z-suite is a ten-year-old company, AdventNet, with offices around the world.

This is starting to get exciting.

Thursday, April 27, 2006

109 Laughs

Assignment: Write a 3-dozen-line XML file that will lock up any modern browser.

Answer: See The Billion Laughs attack.

Tuesday, April 25, 2006

Big Blue: Leaders in Teleportation?

No one will ever accuse Big Blue of clairvoyance. But they just may have a handle on teleportation.

Just for fun, go to IBM's site and do a search on "teleportation."

You'll get 19 hits.

IBM Game Research

The IBM Systems Journal is one of those rare publications that you wish would come out more frequently (just the opposite of drain-clogs like eWeek, which I wish would come out half as often). The journal's content is uniformly excellent, and the subject matter frequently delights. Such is the case with Volume 45, Number 1, 2006, devoted entirely to (of all things) Online Game Technology.

Wednesday, April 19, 2006

Stacklessness

I blogged a while ago about continuations, which may play a role in making AJAX scale well. Today I learned that continuations have been implemented (on an experimental basis) in Mono's virtual machine.

I'm not a Python person so I didn't realize (until after Googling around a bit) that the so-called microthreads of Stackless Python are a way of achieving the same thing.

The key intuition behind stacklessness is that you move everything that would normally be kept on "the stack" out to a data structure on the heap. Therefore one thread can jump between potentially tens of thousands of execution frames.

The ability to run huge numbers of processes concurrently is obviously important in many kinds of applications. If AJAX becomes another driver of this technology, it'll be interesting to see who'll be first to implement a stackless-Java virtual machine.

Thursday, April 13, 2006

XQuery Engines Compared

While digging around for news/views on JRockit, I happened to stumble onto an XQuery-engine comparative evaluation by (of all people) the Washington Publishing Company, a seller of EDI and HIPAA publications. In case you don't have time to wade through the full study (which is a good read, incidentally), the bottom line is, for maximum performance, robustness, and flexibility, you want the Saxon engine running atop the BEA JRockit JVM.

Wednesday, April 05, 2006

How to Comment AJAX Code

Lately I've been perusing some of Oracle's Javascript code from its ADF Faces. I see that it's extraordinarily well commented.

I'm looking at it in OpenOffice, so just for fun, I tell OOo to do a regex-search on

//.*$

and globally replace that with zilch, thereby wiping out all comment lines.

The result? With comments, Oracle's Core.js file is 140 KB. Without comments: 95 KB. Imagine: almost 50K of comments in a 140K file.

I don't think I've ever seen such well-commented code in any language, ever.

Kas

Oracle AJAX Best Practices from 2002

AJAX-the-acronym has been around only since 2005, but (as many observers have pointed out) the underlying techniques have been around much longer.

It turns out Oracle has been publishing its own best-practices advice on "Partial Page Rendering" since 2002.

For the very latest Oracle thoughts on AJAX, I suggest reading the comments in their ADF Javascript code.

Friday, March 31, 2006

"The Java Problem" (Sun Memo)

Java is slow, piggish, and breaks otherwise-stable software with every new release. We all know that. Or at least Sun Microsystems does: Read this incredible internal Sun memo (dating to 2003, but still very much applicable).

"Within Sun, Java is not viewed as a satisfactory language for the construction of commercial applications."

"That our Java implementation is perceived as inappropriate for many uses is supported by internal documents and policies."


A great, great memo, filled with priceless insights.

One of my personal favorites:

"Our experience in filing bugs against Java has been to see them rapidly closed as 'will not fix'. 22% of accepted non-duplicate bugs against base Java are closed in this way as opposed to 7% for C++."

A particularly noteworthy aside concerns one engineer's desperate quest to circumvent resource exhaustion on Solaris Servers by implementing a particular daemon using J2ME code (yes, that's right: J2ME).

You have to read this memo for yourself. If you're like me, you won't know whether to laugh, cry, or go to work for the circus.

Thursday, March 30, 2006

Continuations (Continued)

Several weeks ago, I was reading the doc for Rhino 1.6.2 and came across a mention of support for a new Continuation object. I didn't think much of it. After letting it drop, I returned to it later, looking for examples on the Web of real-world uses of Rhino Continuations. I quickly found a poster child in Apache Cocoon. And another one in Jetty 6.

Then I realized the Web was RIFE with examples of people trying to bring continuations support to various web frameworks. In fact, continuation servers are sprouting all over the place, with funny names like Seaside, Wee, Lakeshore, Continuity, Borges. Written in a variety of languages.


Continuation Servers

ServerLanguage  
BorgesRuby
ContinuityPerl
LakeshoreJava
Seasidesmalltalk  
WeeRuby


So why the fuss over continuations? The short answer is that it offers an elegant way to keep track of session state in a multi-user client-server app. You end up writing code that looks compact, linear, and obvious, rather than the typical MVC pasta-pile.

But the benefits go far beyond elegant state management. There are payoffs in scalability and efficient use of resources as well.

If you want to grok the basic paradigm shift (and you have time to read only one article), invest a few minutes reading this brilliant minitorial. You just might have a Mega-Aha Moment.

Wednesday, March 29, 2006

goto Returns

AJAX and Ruby are driving a lot of changes in how people do web programming. Witness the resurrection of the hoary concept of continuations (otherwise known as goto in a tuxedo).

The basic notion of a continuation is that it lets you exit from a scope (using neither a return statement nor a "throw" nor a continue nor a break), go do something else, then reenter the original scope as if nothing happened. In fact, if you serialize the continuation, you can come back months later, and continue in a new thread.

It may help to think of a continuation as a snapshot of the current call stack and program counter. The main intuition is that if you can save off enough information about the current execution context, you can reenter that context at your leisure, kind of like hitting Play again after Pausing a video to go make popcorn.

The concept of continuations has been around a long time. In fact, the formalisms around continuations were invented in order to talk meaningfully about the goto statement. But the goto entered lexical leper status after Dijkstra famously savaged it. By 1980, no self-respecting programmer (outside of the Scheme community -- a leper colony in its own right) would speak the word aloud, much less use it in a program.

And yet, goto is a reserved word in Java.

The reason continuations are important to Web 2.0 is that they hold the key to making AJAX scalable. Continuations enable a threadless polling architecture that would be hard to achieve (cleanly) any other way.

I'll have more to say on continuations. In the meantime, if you want to wrap your head around it further, I strongly recommend reading about Cocoon's use of continuations.

Tuesday, March 28, 2006

Google brings <canvas> to IE

Upon joining the Canvas Developer's Group a few minutes ago, I caught wind of the (astonishing) fact that Google has hacked a <canvas> compatibility layer for IE users, essentially finishing work that was begun a few months ago by Emil A. Eklund. The hack, ironically, relies on VML (the IE-only graphics API that went largely ignored). A similar VML-based effort to bring SVG support (sans Adobe) to IE is being pursued by Mark Finkle.

The full Google <canvas> compatibility script is at http://www.abrahamjoffe.com.au/ben/canvascape/canvas.js.

Test it out on Canvascape.

Friday, March 24, 2006

AjaxWrite

AjaxWrite might just be the best reason yet to remove Word and Internet Exploder from your hard drive.

Thursday, March 23, 2006

PayPal Computing

Just as Sun Microsystems announces supercomputing on demand, available to anyone and everyone for a paltry $1 per cpu-hour (PayPal gladly accepted), Amazon comes along and offers near-free data storage via its Simple Storage Service.

I suppose it's just a matter of time before Google steps in and moots both offerings.

But let's pull back for a minute and look at this carefully. Suppose you were to substitute the name "Microsoft" for Sun/Amazon/Google. Would you trust your online computing and storage needs to Microsoft, at any price?

Then why would you put that kind of trust in Sun, Amazon, or Google?

Monday, March 20, 2006

WS-Meltdown

Some revealingly candid dialog has been going on over at Loud Thinking regarding the slow, relentless heat death of WS-*. (Random quote: Getting your head around all the WS-* stuff is like trying to eat an elephant.)

Someone asked David Heinemeier Hansson whether he thought SOAP had legitimate uses or was, to the contrary, simply evil. DHH tactfully replied that SOAP mostly seems unnecessary. "So SOAP feels more like the doorknob to the gates of hell," he concluded. "In itself, a doorknob is hardly evil. But once you turn..."

Write Once, Curse Everywhere

Like many of my colleagues, I have a torrid love-hate relationship with the Java language.

But let us not forget, it's more than just a language. And therein lies the hitch.

Steve Yegge (now at Google) put together a refreshingly non-religious series of posts about various programming languages that, in one incarnation or another, can run atop the JVM. As an exercise, he wrote a simple game program, then ported it to various languages, then wrote about the experience.

Steve's appraisal of Java resonates with me. "Java has lots of wonderful features," he observes, "but Java isn't one of them. Java's appeal as a platform for doing real work rests precisely on its strengths as a platform, not as a language."

Hypothesis: Sun's greatest contribution to the history of computing is the 'VM', not the 'J', in JVM. The 'J' part is, like a 10-year-old Ford Taurus, beyond economical repair. Yet the world continues to use it, due to virtual-machine lock-in. The barriers to exit are just too high.

Friday, March 10, 2006

MOA (Mashup Oriented Architecture)

MOA continues to move forward rapidly.

Some important Web 2.0 architectural memes are starting to come together in the form of things like Feedflare API, Ning Atom API, and shortText.com. (If the latter would expose a REST API, it could become the Clipboard of the Web instead of merely the Notepad of the Web.) What's interesting about the Feedflare API is that it involves late evaluation of embedded XPath, giving mashers a nice combination of declarative and imperative styles to draw upon. They also silently cast your RSS to Atom at parse-time.

As more and more powerful mash APIs come on the scene, and as people normalize on Atom as a datagram format, JSON as an object-passing/serialization format, things like shortText.com for clipboard storage, etc., Web 2.0 reaches an architectural maturity level where the likelihood increases that someone will create the "killer app" that finally tips the tipping point away from IE (for good), towards Firefox, Opera, and the Web 2.0 compatibles . . . thereby locking Microsoft out of a Web 2.0 future (if it isn't locked out already). The coming "killer app" will no doubt leverage one or another IE-incompatible technology such as <canvas> and/or E4X and/or Greasemonkey and/or SVG and/or some other cutting edge technology that's fully available in Firefox/Opera but not in IE.

Mark it as a future I-told-you-so.

Wednesday, March 08, 2006

Adobe's Linux Problem

The astonishing finding (widely reported) that Adobe Photoshop is on the list of top-ten most wanted Linux applications tells me a couple of things.

It pretty clearly says that GIMP needs to suck a lot less.

Secondly, it tells me Adobe Systems doesn't really care about the Linux market (much less the community).

Adobe's Pam Deziel admits that the shrinkwrap giant has known for some time (well before the Novell survey) about the pent-up demand for a Linux version of Photoshop, based on its own research.

So in other words, Adobe has known for some time that it could make money tomorrow by offering a commercial version of Photoshop on Linux. It chooses to leave this money on the table (shareholders be damned). Not a big enough market, says Adobe.

How, then, does Adobe explain the fact that it currently offers a Solaris version of FrameMaker 7.2? FrameMaker has nowhere near as many users as Photoshop, and Solaris is nowhere near as popular as Linux.

Adobe's story is nowhere near making sense.

Sunday, February 26, 2006

Backwalking the Breadcrumbs

If you're the nosy type like me, you've probably been guilty (on more than a few occasions) of navigating a site by popping successive pieces off the tail end of the URL. In other words, if you've found yourself at http://www.somedomain.com/c/b/a/great.txt, you may have been curious about what else is at http://www.somedomain.com/c/b/a, so you hand-excise "great.txt" off the URL in the browser address line and hit Go. After that, you're curious about http://www.somedomain.com/c/b so you hand-remove the /a, etc. Repeat until carpal-tunnel syndrome.

A linkbar button with some Javascript behind it is a lot easier than clicking into the URL, highlighting text, deleting it, hitting Go or Enter, and so on, over and over again. Here's the Javascript that will do this (prefaced by "javascript:" so that it'll run in the address field of the browser):

javascript:ar=location.href.split('/');
if(ar.pop()=='')ar.pop();
u=ar.join('/');
location.href=u;

Remember that for this to work as a bookmarklet, it all has to be on one line. I've broken the code apart here for illustration purposes.

All we do is make array out of the individual location elements of the current URL by breaking it at forward slashes, then pop the tail element off, re-join() the array with '/' delimiters, and make the browser go to the newly formed URL.

Works like a charm.

I keep this script in a link button (called "Peelback") on Firefox's linkbar. It's handy as heck when you've landed on an interesting web page and you want to further navigate a given URL via the ancestor axis.

Friday, February 24, 2006

XSS: Digg This

According to a recent Digg post, BestBuy's website (allegedly) contains a cross-site-scripting (XSS) vulnerability.

Which is doubly ironic when you consider that until recently, Digg itself was reportedly an XSS risk.

Note: Every verb on this page should be considered to be prepended by "allegedly" unless otherwise indicated.

Tuesday, February 21, 2006

Unipage

Someone on Slashdot wrongly called Unipage a possible replacement for PDF. It has no relationship to PDF whatsoever. It's also not a file format.

So what is it, then? It's simply a way to serialize a web page and its contents, making the page 100% self-contained (with no reliance on outbound links). Images are stored inside the page in data:url format (see RFC 2397), which of course makes the whole scheme incompatible with IE. Then again, the Adobe SVG plugin for IE does support data:url, so it may be possible for some clever soul to write a script that uses embedded SVG islands to work around this IE limitation in semi-transparent manner. So to speak.

Thursday, February 16, 2006

The Firefox SVG Code-Bloat Crisis

Fellow Novell-er and longtime Mozilla contributor Robert O'Callahan penned a blistering (yet obviously well-intentioned) philippic the other day on code-bloat in the Firefox SVG engine. For a minute there, I didn't think anyone else still cared about code size or memory usage. Happily, that turns out not to be the case. O'Callahan worries about code size at the bit level.

But code size is not the only issue. O'Callahan dives quite deeply into the architectural waters and comes up with refreshingly brash statements like "XPCOM is a disease ... people acquire it by being exposed to infected code." He bristles at the notion that a single SVG <rect> element requires 1.2 Kbytes of pointer storage and carries around empty transformLists. (One wonders what he would say about Java, wherein a mere JPanel has 330 methods.)

The real problem, of course, is the SVG spec, which defies any attempt at elegant implementation.

Bring on sXBL with a <canvas> binding.

Wednesday, February 15, 2006

Javascript Source-Code Viewer for Firefox

JSViews is a Firefox plugin that does a code-dump (into one or more new browser windows) of remote ".js" code referenced by any web page. I don't know yet what the limitations are, but it's definitely a worthwhile plugin. I recommend you install it right now, restart Firefox, go to http://www.cnn.com, right-click on the page, choose View External JS, and watch about a dozen Javascript source-view windows open (some containing ad-tracker code).

Monday, February 06, 2006

Novell Open-Sources FLAIM Database

Novell has decided to donate its high-performance FLAIM Database Engine (and the XML-savvy version, XFLAIM) to open source. FLAIM is not new and is not meant to compete with MySQL. It's a platform-neutral, massively scalable, very-high-performance transactional database, written in C++ as a persistence back-end for Novell eDirectory. (It is also used in GroupWise.) The XFLAIM version uses XPath for a query language.

The fact that there are Novell customers with well over 100 million objects in their FLAIM-backed eDirectory trees should tell you how scalable and robust the FLAIM technology is. (It should be robust, after nearly 20 years of development!)

If you want to get a feel for what XFLAIM is all about (given that I can't do it justice here), go to the documentation.

Is this Novell's first step toward open-sourcing eDirectory? Quite honestly, I don't know. (If I did know, I couldn't write about it here.)

Thursday, February 02, 2006

AJAX Toolkit Framework (ATF) Project

This announcement describes the new proposed AJAX toolkit for Eclipse 3.2, which is actually "tooling to enable tooling" in the sense that it's a pluggable AJAX IDE framework meant to wrap any of Dojo, Zimbra (Apache Kabuki), or OpenRico. And possibly others, later on.

It's obvious that something like this is sorely needed and will be widely used (and abused). I give the ATF proposal ten thumbs up.

While I expect ATF to sail through Creation Review (and thus become an active Eclipse.org project) with nary a hiccup, I can't say the same thing about the newly proposed Kabuki Project for incubating Zimbra at Apache.

The Kabuki Proposal has not been 100% warmly received over at ASF. The debate has included criticism of Zimbra's code (for being too Java-like, poorly namespaced, bloated, not working correctly out-of-the-box) as well as criticism of Zimbra's lack of community presence, inability to find non-salaried committers, etc. One gets the impression that Zimbra expected that Apache.org would jump on any well-established AJAX framework donation. Evidently not.

What's ironic is that the IBM guys took the ATF Proposal to Apache first before putting it in front of the Eclipse.org WTD group. (I can't recall IBM ever taking an Eclipse technology to Apache for incubation. Can you?) Gory details here.




Wednesday, February 01, 2006

AJ4X

My new invented term for AJAX-using-E4X. Quick, somebody trademark it! (Not.)

Javascript, XML, and Element Names

So as E4X finds increasing use in the AJAX world, a potential stumbling block comesinto focus around hyphens and other non-word-characters in element names.


The issue is this. E4X has a dot-syntax for XML objects that allows expressions like root.x.y to obtain the y element under an x element under the root. But when an element name contains a hyphen, this syntax breaks. Consider:


var fragment =
<content>
<field>
<display-label>Please approve.</display-label>
</field>
</content>;

// Now try to access the display-label value
var value = fragment.field.display-label; // ReferenceError!


The interpreter treats a hyphen as a minus-sign, of course, and since label hasn't been declared, it's undefined and unusable. If a variable named "label" does happen to exist in the current scope (e.g. you used var label in place of var value above), you won't get any error at all, since subtraction on two defined entities is always a legal production in Javascript.


The Workaround

Fortunately, E4X supplies an alternative syntax we can use.

// Instead of this:
var value = fragment.field.display-label; // ReferenceError!

// Do this:
var value = fragment.field["display-label"]; // "Please approve."

There's one more syntax breakage to deal with, and that involves the descendant-retrieval syntax. E.g., root..y returns a list of all y descendants under root, regardless of what level in the tree each one is at. This syntax obviously breaks down if y is something like data-item.

The workaround is to use the E4X descendants() method.



// Instead of this:
var allLabels = fragment..display-label; // error!

// Do this:
var allLabels = fragment.descendants("display-label"); // list of nodes

Similar breakages and workarounds exist for E4X attribute syntax, the details of which are left as an exercise for the reader. smile