blogorrhea: 2008

Thursday, December 11, 2008

LinkedIn Flex group hits membership limit

From Adobe Technical Evangelist Ben Forta comes word that the LinkedIn Flex Developers Group is now full to capacity and can accept no more members.

Apparently, LinkedIn groups can have a maximum of 3000 members, and that's how many the Flex-dev group now has. (I wonder who the genius was that hard-coded that limit?) Forta says he is sitting on 50-some-odd requests to join, and can't approve them. Moreover, he pinged LinkedIn Customer Services to ask if there was a way to raise that limit. He was told he can't approve any more requests. "If the limit gets raised," Forta says, "I'll let you know (and will approve those in the queue)."

Tuesday, December 09, 2008

SlingPostServlet demystified

One of the neatest things about Apache Sling (the JCR-based application framework) is its easy-to-use REST API, which allows you to do CRUD operations against a Java Content Repository using ordinary HTML web forms (or should I say, ordinary HTTP POST and GET), among many other interesting capabilities. The magic happens by way of a class called SlingPostServlet. Understanding that class is key, if you want to leverage the power of Sling without writing actual Java code.

Turns out there's an exceptionally thorough (and readable) discussion of the many capabilities of the SlingPostServlet at the Sling incubation area of Apache.org. You can think of it as the fully exploded version of Lars Trieloff's Cheat Sheet for Sling (an excellent resource). It's the next best thing to reading the source code.

Monday, December 08, 2008

MS Office apps as services

According to Information Week, Tibco and OpenSpan have "teamed up to make parts of Microsoft's Office applications available as services for inclusion in an enterprise service-oriented architecture."

OpenSpan has (in the words of a rather breathless reporter) "demonstrated that it's possible to generate mashups of Microsoft Office applications without changing the underlying application code."

On a superficial level, this is the kind of thing that people do routinely with OpenOffice running in server mode. (Various content management systems use OO to do document transformations on the server. The OO developer documentation shows how to set this up.)

I gather the OpenSpan stuff has tooling for making it easy to create Office-service mashups. It's a good idea and I wish such tooling existed for OpenOffice. It'll be interesting to see if Tibco and OpenSpan score any OEM deals with CMS vendors.

Friday, December 05, 2008

How OSGi changed one person's life

Peter Kriens has written a really nice article for ACM Queue called How OSGi Changed My Life. Go here to read it online. It's a high-level overview for people who are still trying to grok the whole OSGi phenomenon.

OSGi is a game-changing technology, IMHO, because it brings familiar SOA precepts to ordinary POJO programming. (How's that for acronym abuse?) POJOs end up having fewer unwanted intimacies, and if you run them inside Spring inside OSGi, the POJOs don't have to know so much about the runtime framework, either. Compositionality is greatly facilitated in OSGi; the level of abstraction is high; the benefits are numerous and far-reaching. I see OSGi as revitalizing Java programming for enterprise.

Good tooling for OSGi is still scarce. (Doing "Hello World!" is much harder than it should be.) I suspect that will change very soon, though. Meanwhile, OSGi is quite pervasive already (it's in quite a few products, though seldom advertised), and I look for 2009 to be the year when OSGi finally goes double-platinum.

Tuesday, December 02, 2008

Exception-throwing antipatterns

Tim McCune has written an interesting article called Exception-Handling Antipatterns, at http://today.java.net (a fine place to find articles of this kind, BTW). The comments at the end of the article are every bit as stimulating as the article itself.

McCune lists a number of patterns that (I find) are very widely used (nearly universal, in fact) in Java programming, such as log-and-rethrow, catch-and-ignore, and catch-and-return-null; all considered evil by McCune. My comment is: If those are antipatterns, the mere fact that such idioms are so ubiquitous in real-world Java code says more about the language than it does about programmers.

I've always had a love-hate relationship with the exception mechanism. On the whole, I think it is overused and overrated, at least in the Java world, where people seem to get a little nutty about inventing (and sublcassing) custom exceptions and ways to handle them, when they should probably spend that energy writing better code to begin with.

Monday, December 01, 2008

Get paid for being job-interviewed

At last, an online service that deters head hunters from pestering me.

The unusual promise made by NotchUp is that potential employers who want to contact you directly (avoinding the expensive services of a professional placement agency) will actually pay you to agree to an interview. All you have to do is sign up with NotchUp, and wait for the phone to ring. (And wait. And wait.)

How much will you get paid? The NotchUp folks have put a fee calculator on their site. It shows that an IT professional with 10 years of experience can expect to receive $380 per interview.

NotchUp membership is free to interviewees, but you have to go through an application process and be accepted. Which already sounds fishy to me.

Wednesday, November 26, 2008

What Sun should really do

I've worked for companies that are in Sun's situation (most recently Novell), and I have a few observations based on my years of watching hugely talented groups of people produce astoundingly good Java technology, only to see the Greater Organization fail to find a way to monetize it.

Like Sun, Novell is a venerable tech company with an interesting past. It finds itself today in a situation (like Sun) where profitability is consistently miserable, but the balance sheet is good. The parallels between Sun and Novell are far-reaching. Both were started more than two decades ago (Novell in 1979, Sun in 1982) as hardware companies. Both soon found themselves in the operating-system business. Novell owned DR-DOS, which led to Novell DOS, which in turn became the boot loader for NetWare. Along the way, Novell acquired UNIX from AT&T.

NetWare was an extraordinarily successful OS that Novell foolishly stopped supporting shortly after acquiring SUSE Linux in late 2003. I say foolishly because NetWare was a cash cow that required very little code maintenance to keep going, whereas SUSE Linux sucked Novell's coffers dry as the entire company pivoted in the direction of things that are extremely hard to make money on (viz., open-source software, something Novell had little experience with).

Distraction destroys profitability (someone please make that a bumper sticker...), and this is something that has cost Novell and Sun a great deal of money over the years. Technology is exciting, and promising technologies have a way of siphoning attention away from more prosaic sorts of things, like finding and solving customer pain (i.e., making money).

Technologists have a way of convincing themselves that some technologies are more worthy than others. And quite often, what happens is that people at the top who should know better become seduced into allocating large amounts of lucre to the promotion of money-losers (on the theory that eventually they are bound to become money-makers) while cash-cow products get money taken away (on the theory that "this product is a winner, it's throwing off cash like crazy; we don't need to promote it").

I've started and run two successful businesses (including one that I launched in 1979, which is still in operation under a different owner, today) and I've seen friends and relatives start and run businesses, so I know first-hand what it takes to keep a business right-side-up; and I know some sure-fire techniques for making a successful business cartwheel out of control into a ditch, trailing black smoke.

One of the main things I've learned is that you never promote a loser; you always put your money behind proven winners. Never take marketing or development funding away from a winner to promote something that is either a proven loser or not yet proven to be a winner. (That's too long for a bumper sticker. It should be on billboards.)

Back to the software biz for a minute. Novell and Sun are both "operating system companies" to some degree. This already carries with it the stench of death. Being an OS company was a great thing back in the Carter years; it was lucrative in those days. It's not a great thing today. It siphons off money that is better deployed elsewhere. Microsoft (with its "Live" series of SaaSified product offerings) has recently gotten the message that the Web is the new OS, and the desktop is irrelevant as a metaphor. This is a huge paradigm shift for Microsoft. But they finally get it: They get that the future is in things like collaboration (human connectivity) and dynamic assembly of reusable content. They are starting to understand that infrastructure is not something customers want to have to know about; that everything that can be virtualized should be virtualized. Customers instinctively know this, even if they can't articulate it.

So then, what's a Sun to do?

First, stop worrying so much about the future and figure out what's making money now, so you can try to massively scale whatever that happens to be. Remember: Invest in winners, not losers. Find out what's working. Crank the volume full max on it.

The next thing to do is obvious: Kill your losers. Utterly walk away from them, now, today, this minute. Redeploy the resources to your winners (or else sell them off).

The very next thing to do is apply the foregoing principles to your people. Find the winners (the true contributors, the people who are making successful things successful) and reward them. Not just with money, but with whatever else they want: promotion, recognition, travel, alone time, or whatever. People are different. Most techies are not motivated by money.

Likewise, identify and weed out the mediocre, the tired, the overly comfortable, the complainers and morale-killers; find the toxic individuals (they're everywhere) and remove them somehow. Just getting rid of the toxic people will cause those who are left to be more productive.

Next, pivot the orgnization in the direction of innovation. This is exceptionally difficult to do. I was involved with the "Fostering Innovation" Community of Practice at Novell during a time when Novell was desperately trying to become more of an innovation-centric culture. One of the kneejerk things Novell did was increase the bonus paid to employees who contributed patentable inventions. Novell eventually was paying $4500 plus hundreds of shares of restricted stock for each invention accepted by the Inventions Committee (of which I was a member). What happened was that we got goofy submissions from all over the company, while certain senior engineers who knew how to game the system succeeded in making a nice side-living on patent bonuses.

Innovation is fostered when you simply set innovative people free. Innovators will innovate for their own reasons; money has nothing to do with it. All you need to do is clear the path for these individuals. At Novell, as at Sun, there's a special honor reserved for senior people who have a track record of accomplishment. We called them Distinguished Engineers. These people were like tenured professors. They came to work in pajamas (not really) and did whatever they wanted, basically, with no fear of ever being fired.

That's a stupid system. Tenured professorships lead to sloth. Not every Distinguished Engineer is a burned-out has-been on the dole, but some are, and it sets a bad example.

Younger engineers (and others) who are proving their potential as innovators need to be recognized while they're at their peak. (The 45-year-olds with a track record of innvoation in the 1990s need to be considered for early retirement. Recognizing someone a decade too late serves no purpose.) What I advocate is a system of "innovation sabbaticals," awarded to budding innovators (of any age) who are doing Great Things and are likely to do more if set free.

Finally, going forward, hire good people. This is any company's best and only salvation. It's the foundation for all success. When you have difficult problems to solve (as any troubled business does), hire very, very smart people who have no prior experience with the problems in question. That's how you get fresh answers that bear tasty fruit.

This blog is already too long, so I'll stop. In a nutshell, what Sun needs to do is focus light on itself and conduct a pre-mortem. The first order of business is to find out which pieces of the business are profitable, and scale those. Then find out which pieces of the business are sucking cash, and amputate those. If that's two-thirds of Sun, so be it. It means Sun needs to be a third as big as it is now. It'll shrink down to that size eventually, so why spend time and money getting there the slow way? Go there now. Shareholders will applaud.

And by the way, be clear on one thing: This is all about earnings-per-share. There is no other goal, no other agenda. Sun is a business. It's not a charity organization or a full-employment program for has-beens. Earnings per share comes first. Everything else follows.

Find and reward (not necessarily with cash!) your best people. Get rid of the losers who are bringing morale and productivity down for everyone else.

Set innvoators free. They will innovate for their own reasons. Just let them.

And get out of the operating system business. The Web is the OS, for cryin'-out-loud. Even Microsoft has figured that one out.

Tuesday, November 25, 2008

Google wants to hire 665 people

The WebGuild story about Google laying off ten thousand workers is (sadly) mostly made-up nonsense.

Probably the most outlandish statement in the article is "Since August, hundreds of employees have been laid off and there are reports that about 500 of them were recruiters."

Five hundred recruiters?? ROTFLMAO.

The only scintilla of truth in the entire article, as far as I can determine, is the bit about Google having approximately ten thousand contract workers, which Sergey Brin confirmed in an October 16 story in The Mercury News. The notion that Google will be letting all of them go is nonsense, however. Brin (in the same Mercury News story) did say Google "has a plan to significantly reduce that number through vendor management, converting some contractors to regular employees, and other approaches." That's all he said: "significantly reduce."

It's quite easy to verify that Google is, in fact, still hiring at a brisk pace. Go here to browse the 665 open positions.

Monday, November 24, 2008

Flex meets Inversion-of-Control

I didn't realize until just now that there is a Spring-like inversion-of-control framework for Flex, called Prana (available under a BSD license).

Seeing something like Prana makes me wonder how many other staples of the Java world will be emulated by Flex folk.

Very interesting indeed.

Saturday, November 22, 2008

Death of an Eclipse project

A November 12, 2008 slide deck explains why the Eclipse Application Lifecycle Framework (ALF) project will be terminated due, basically, to lack of interest. Except, there was, in fact, interest by a corporation: enterprise-mashup player Serena Software, who contributed significantly to the code base.

The whole story is a little weird. The ALF project morphed into an SOA framework of sorts shortly after its inception in 2005. Serena (an application lifecycle management software firm, originally, but also known for the now-moribund Serena Collage Content Management System) got involved early on. Eventually, ALF was adopted as the underlying SOA and authentication framework by Serena Business Mashups in Dec 2007.

And now the "project leadership" has decided that the Eclipse ALF project should be shut down, with the code being donated to the Higgins project. The Project Leader for ALF is (was) Brian Carroll, a Serena Fellow.

Higgins, it turns out, is actually not ALF-related except in the most tangential sense. I was working in the Identity Services division at Novell in 2006 when Higgins was created. I knew about it through Duane Buss and Daniel Sanders (both of whom are still principals on the project). Daniel and I worked together on the Novell Inventions Committee.

Higgins is (according to the project FAQ) "an open source Internet identity framework designed to integrate identity, profile, and social relationship information across multiple sites, applications, and devices. Higgins is not a protocol, it is software infrastructure to support a consistent user experience that works with all popular digital identity protocols, including WS-Trust, OpenID, SAML, XDI, LDAP, and so on."

It's really largely about identity cards or "information cards" (InfoCards, I-Cards).

In case you're wondering about the name: Higgins is the name of a long-tailed Tasmanian jumping mouse.

So, ah . . . ALF isn't the only SOA-related Eclipse project being taken down now. For info on the others, see this story in the Register.

Thursday, November 20, 2008

How to set your head on fire

The folks at Jabra (the Danish headset manufacturer) are having a product recall. It seems Jabra's lithium batteries can overheat and catch fire. According to the company's announcement:

Dear Jabra GN9120 Customer
In cooperation with the Danish Safety Technology Authority (Sikkerhedsstyrelsen) and the U.S. Consumer Product Safety Commission, and other regulatory agencies GN Netcom is voluntarily recalling Lithium-ion batteries from ATL (ATL P/N 603028) used in GN9120 wireless headsets and sold from January 2005 through September 2008. These lithium-ion polymer batteries can overheat due to an internal short circuit in the batteries, which can pose a fire hazard. The battery has only been used in the GN9120 wireless headset. If you are using any other headset solution from GN Netcom you are not affected by this statement.

Not to worry, though. The "extra-crispy" look is in.

Why are CSS editors so fugly?

The other day, I happened upon a long list of CSS editors, arranged chronologically (newest tools first). I haven't tried any of them except Stylizer, which (beware) lays down the .NET 2.0 framework as part of its install process. (Allow 10 minutes.) Stylizer has a really beautiful UI but is far from being the point-and-click WYSIWYG stylesheet designer I've been looking for. (It's really just an editor; you do a lot of typing.) Although I must say, Stylizer beats the living crap out of most other free (or crippled-down eval-version) CSS editors I've seen, which all tend to look like this unfortunate travesty.

Does anybody else see the irony in the fact that most CSS editors are unbelievably fugly? I mean, if anything cries out for a decent visual design tool with eye-pleasing widgets, it would have to be a CSS editor. But most CSS tools (at the freeware level, anyway) look like they were designed by Eclipse programmers on bad acid.

I guess this is the ultimate example of programmers not knowing how to design user interfaces, and design experts not knowing how to program. Maybe it's no wonder 99% of CSS editors look like Notepad in a 3-piece suit.

Wednesday, November 19, 2008

Using Yahoo Pipes in anger

I finally had an opportunity to use Yahoo Pipes to do something useful.

The quest: Create a super-feed that aggregates a bunch of different Google developer blogs (12 in all), including AJAX Search API, Gears, Gadgets, OpenSocial, Open Source, Mashup Editor, Web Toolkit, App Engine, Google Code, iGoogle, Desktop, and Data API blogs. And: Show the most recent 8 entries for each of the 12 blogs.

Also: Make a searchable version of same, so that you can do a search for (let's say) "Atom" across all 96 latest blog entries in the 12 categories.

I was inspired to create this Pipes-app (plumbingware?) when I saw the recent press release concerning ArnoldIT's Google monitoring service. The ArnoldIT aggregator is dubbed "Overflight" (for reasons known only to the CIA, perhaps).

I was disappointed to find that Overflight is not available as an RSS feed. It also is not searchable. Hence, I went ahead and mashed together my own version of Overflight using Pipes.

As it turns out, I was able to create the Pipe app in a matter of 90 minutes or so (around half an hour longer than I'd budgeted). I didn't have time to aggregate all 74 Google blogs, so I focused just on twelve developer blogs. The resulting app is at Google Developer Blogs Super-Feed, which you can subscribe to here. The keyword-search version is here. (It supports single words or exact phrases.)

I confess I was skeptical, at first, as to whether the performance of a Pipes app that draws together 96 content items from 12 feeds could possibly be acceptable. It turns out to be amazingly fast. Even the queryable version is fast. I have yet to run a keyword or key-phrase search that takes more than 4 seconds to bring back results.

If you haven't tried Pipes yet, you should definitely spend a few minutes exploring it. It's a bit klutzy and constraining (in my experience), and it's sure to frustrate many a grizzled Java or C++ developer. But as a visual Web-app designer, it's an interesting approach. Here's hoping Yahoo takes it a bit further.

Tuesday, November 18, 2008

Pixel Bender plug-in for Photoshop is out now

According to John Nack, the Pixel Bender Gallery plug-in for Photoshop CS4 is now available for download from Adobe Labs. Nack explains that the plug-in "runs filters really, really fast on your graphics card," and notes that the filters people write for Flash will also work in Photoshop (or so he says). A nice added bonus is that the same filters will work in After Effects CS4.

Can't wait to try it.

Moore's Law v2.0

It's no secret that conventional chip designs are about to hit the wall with respect to scaling. Moore's Law 1.0 is in danger of being repealed.

Not to worry, though. Years of research into so-called 3D chip architecture is finally beginning to bear fruit, and it looks like cubes will start replacing chips in at least some devices soon. (HP is making steady progress in this area, along with IBM and others.) Moore v2.0 is well on the way to reality.

If you want to learn more about this technology, check out the latest issue of the IBM Journal of Research and Development, which is devoted to 3D integrated circuit technology. A particularly good overview article is here.

Monday, November 17, 2008

Java HotSpot VM options explained

Have you ever wanted an exhaustive list of all those inscrutable command-line options you can use when you want to force the JVM to do something a certain way while you're either troubleshooting an extremely bizarre bug or trying to figure out why performance sucks even more than usual?

Try going to:

http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp

I don't know if this is an exhaustive list, but it certainly looks like it is. Just reading the various descriptions is quite educational. If you're interested in tuning the JVM for max performance, this is a must-read.

Saturday, November 15, 2008

The fugliest code I've ever written

The other day, I started to wonder: What's the single fugliest piece of code I've ever written?

It's really hard to answer that, because I've been writing code of one flavor or another for roughly twenty years, and in that time I've committed every code atrocity known to man or beast. As I like to tell people, I'm an expert on spotting bad code, because I've written so much of it myself.

What I 've finally decided is that the following line of C code is probably the single most ghastly line of code I've ever perpetrated:

(*((*(srcPixMap))->pmTable))->ctSeed =
(*((*((*aGDevice)->gdPMap))->pmTable))->ctSeed;

Explanation: Long ago, I used to do graphics programming on the Mac. I don't know how the Mac does things today, but ten years ago the Color Manager rebuilt a table whenever Color QuickDraw, the Color Picker Manager, or the Palette Manager requested colors from a graphics device whose color lookup table had changed. To determine whether CLUT had in fact changed, the Color Manager compared the ctSeed field of the current GDevice color table against the ctSeed field of that graphics device's inverse table. If ctSeed values didn't match, the Color Manager invalidated the inverse table and rebuilt it. For fast redraws, you want to avoid that. You could avoid it by forcing the ctSeed field values to be equal.

This is one of many fast-blit tips I explained in an article I wrote years ago for MacTech magazine. Fortunately, I write mostly Java and JavaScript now, and I no longer have to deal with pointer indirection, and today my aspirin drawer is only half-full -- or is it half-empty?

Friday, November 14, 2008

Google Chatterbot

Google's Douwe Osinga has come up with a freaky little online app that turns the almighty Google hash engine into an oracle (not to be confused with Oracle). All you do is enter a word or two into the text box, and wait. The app will do a Google search on your words, find the next "suggested" word and print that, then it will remove the first word of your search string, add the found word, and repeat.

Quite often, the app generates a disarmingly logical response. For example, this morning when I entered "JavaFX will," the reply came back: "javafx will be open sourced monday."

Occasionally you learn something you didn't know. "Richard Stallman" brings back a response of "richard stallman founder of the free african american press."

Interestingly, "top secret" brings back nothing.

Sometimes the app produces garbage. Or is it poetry? When I entered "Sun will," I expected something like "lay off thousands." Instead I got: "sun will shine lyrics by stone roses that are not newly completed and that are in earth orbit but more likely at the top of the mark is to show that the flds not only persevered they fought back they didnt".

e e cummings lives!

Thursday, November 13, 2008

Free downloadable tech books

http://www.freetechbooks.com/

As you'd expect, the list includes a lot of stale and/or not-very-valuable titles, but there's also a lot of genuinely worthwhile stuff there. Judging from their "most popular" list, the site is a big hit with C++ programmers. But there are also 23 free Java books, and lots of timeless reference material for programmers of all stripes.

Wednesday, November 12, 2008

For lack of a nail (Java version)

// For the lack of a nail,
throw new HorseshoeNailNotFoundException("no nails!");

// For the lack of a horseshoe,
EquestrianDoctor.getLocalInstance().getHorseDispatcher()
.dispatch();

// For the lack of a horse,
RidersGuild.getRiderNotificationSubscriberList().getBroadcaster()
.run(
new BroadcastMessage(StableFactory.getNullHorseInstance()));

// For the lack of a rider,
MessageDeliverySubsystem.getLogger().logDeliveryFailure(
MessageFactory.getAbstractMessageInstance(
 new MessageMedium(MessageType.VERBAL),
 new MessageTransport(MessageTransportType.MOUNTED_RIDER),
 new MessageSessionDestination(BattleManager.getRoutingInfo(
                                 BattleLocation.NEAREST))),
MessageFailureReasonCode.UNKNOWN_RIDER_FAILURE);

// For the lack of a message,
((BattleNotificationSender)
BattleResourceMediator.getMediatorInstance().getResource(
BattleParticipant.PROXY_PARTICIPANT,
 BattleResource.BATTLE_NOTIFICATION_SENDER)).sendNotification(
 ((BattleNotificationBuilder)
   (BattleResourceMediator.getMediatorInstance().getResource(
    BattleOrganizer.getBattleParticipant(Battle.Participant.GOOD_GUYS),
    BattleResource.BATTLE_NOTIFICATION_BUILDER))).buildNotification(
     BattleOrganizer.getBattleState(BattleResult.BATTLE_LOST),
     BattleManager.getChainOfCommand().getCommandChainNotifier()));

// For the lack of a battle,
try {
 synchronized(BattleInformationRouterLock.getLockInstance()) {
   BattleInformationRouterLock.getLockInstance().wait();
 }
} catch (InterruptedException ix) {
if (BattleSessionManager.getBattleStatus(
BattleResource.getLocalizedBattleResource(Locale.getDefault()),
  BattleContext.createContext(
    Kingdom.getMasterBattleCoordinatorInstance(
      new TweedleBeetlePuddlePaddleBattle()).populate(
        RegionManager.getArmpitProvince(Armpit.LEFTMOST)))) ==
   BattleStatus.LOST) {
 if (LOGGER.isLoggable(Level.TOTALLY_SCREWED)) {
  LOGGER.logScrewage(BattleLogger.createBattleLogMessage(
   BattleStatusFormatter.format(BattleStatus.LOST_WAR,
         Locale.getDefault())));
 }
}
}

// For the lack of a war,
return new Kingdom();

Adapted from Steve Yegge's Blog Rant of March 30, 2006. Apologies to Ben Franklin (who in turn adapted the original proverb from George Herbert's Jacula Prudentum).

Tuesday, November 11, 2008

Finalization is evil

After listening to the excellent presentation by Hans Boehm on "Finalization, Threads, and the Java Technology Based Memory Model," I have come to the conclusion that finalization is one of Java's worst features, if not the worst.

Be clear, I am not talking about the final keyword (which is actually a great feature of the language). Rather, I am talking about the notion of finalizers, or special "cleanup" methods that the JVM will call before an object is finally reclaimed by the garbage collector. The idea is that if you have an object that's holding onto some system resource (such as a file descriptor), you can free that resource in the finalize() method right before your no-longer-used object gets garbage collected.

The only problem is, there is not only no guarantee as to how quickly, or in what order, your finalizers will be called, there's also no guarantee that they will be called at all.

Sun's Tony Printezis gives a good explanation of finalization in an article on the Sun Developer Network site. It's a brilliant article, but I found myself quite nauseated by the time I got to the end of it. Finalization is just so wrong. So wrong.

"The JVM does not guarantee the order in which it will call the finalizers of the objects in the finalization queue," Printezis points out. "And finalizers from all classes -- application, libraries, and so on -- are treated equally. So an object that is holding on to a lot of memory or a scarce native resource can get stuck in the finalization queue behind objects whose finalizers are making slow progress."

Oh great, that's just what I need. Finalizers blocking on other finalizers while my heap fragments.

It turns out that an instantiation time, an object that contains a finalizer is marked as such and treated differently by the JVM. The extra overhead incurs a performance hit. If your application creates many short-lived objects with finalizers, the hit can be quite substantial. Hans Boehm (see link further above) did some testing and found a 7X slowdown of a test app when objects had finalizers, compared to no finalizers. (With a really fast JVM, namely JRockit, the slowdown was eleven-fold.)

The funny thing is, in all the articles and book chapters I've read about finalization, I have never, not even once, seen a good real-world example of a situation requiring the use of a finalizer. Supposedly, you use a finalizer when you're holding onto a system resource and need to free it before your object goes out of scope. But in reality, it's almost always the case that system resources that are considered scarce or precious have a dispose() or close() or other, similar method, for the explicit purpose of freeing the resource. If you use the resource's normal release mechanism, you don't need a finalizer. In fact a finalizer only lets you hold onto a resource longer than you should.

Someone will argue that you don't always know when or if an object is going out of scope; therefore you should put a call to the release method in a finalizer and be assured that the resource will eventually be released. Okay, <sigh/> that's fine and dandy as long as you can count on your finalize() method being called (which you can't) and as long as your machine doesn't starve for file descriptors, sockets, or whatever the precious resource happens to be, before the finalizer is finally called. Remember, the JVM makes no guarantees about any of this. Finalization is non-deterministic.

I have to say, though, that the contorted, non-real-world examples that are always trotted out to justify the existence of the finalizer mechanism in Java have always struck me as more than a little malodorous. They all have that unmistakeable antipattern smell that gets in your clothing and makes you feel like taking a hot shower when you get home.

Maybe we should just confront the possibility (the likelihood) that finalization is evil. After all, even the people who write long articles about it end up urging you not to use it.

That's good enough for me.

Saturday, November 08, 2008

Google to downsize?

The NYC Google office takes up a city block.

Word comes by way of the Silicon Valley Insider that Google will soon be subletting 50,000 square feet of space at its New York City Googleplex. I've toured the place, and let me tell you, it's Big: the 111 8th Ave office occupies an entire city block of space, between 8th and 9th Avenues and 15th and 16th Streets. It's around 300K square feet altogether.

What kind of luck Google will have subletting this space in the current economy, I don't know. It's very inconveniently located, in the meat-packing district, a couple miles south of Grand Central Terminal and just far enough from the Village to be annoying.

From what I saw on my walk-through, I can tell you that Google tends to be rather wasteful of space, by industry standards. The cafeteria is the size of Macy's and there's one conference room for every three employees (well, almost), and very few programmers can actually reach out and touch someone despite the lack of walls. I'd say there has to be an average of at least 500 sq. ft. per employee by the time you factor in all the conference rooms, hallways, etc.

So there's only two possibilities. Either Google will try to use its space more efficiently (and not lay anyone off) at its NYC office after subletting one-sixth of its available space; or it will lay off a sixth of its Manhattan workforce (around 120 people). Or some combination of both.

My guess is both.

Friday, November 07, 2008

Slow page loads == job cuts?

Interesting factoid: Every 100ms of latency costs Amazon 1% in profit-per-visit.

The same source claims that Google stickiness drops 20% if page load time increases by 500ms.

Which leads me to wonder how much revenue-loss LinkedIn has suffered over the past five years because of its agonizingly slow page loads, and how many of the 10% of its employees who were just laid off might still have their jobs if the pitiful dunderheads who allowed LinkedIn's site to be so pitifully slow hadn't been such pitiful dunderheads.

Thursday, November 06, 2008

Hardware-assisted garbage collection

I find myself spending more and more time thinking about garbage collection, not just as a career move but as a fundamental problem in computer science. I don't pretend to have any expertise in garbage collection, mind you. But I find it an interesting problem space. Particularly when you start to talk about things like hardware-assisted GC.

Yes, there is such a thing as GC-aware chip architecture, and the guys who know a lot about this are the folks at Azul Systems. A good starting point, if you want to read up on this, is Pauseless Garbage Collection: Improving Application Scalability and Predictability. Good late-night reading for propeller-heads who need something to do while waiting for the propeller to wind down.

Wednesday, November 05, 2008

Paging memory leaks to disk

At last month's OOPSLA 2008, there was an interesting presentation by Michael D. Bond on a technology called Melt, which aims to prevent out-of-memory errors in Java programs that harbor memory leaks (which is to say, 99 percent of large Java programs). The Intel-funded research paper, Tolerating Memory Leaks (by Bond and his thesis advisor, Kathryn S. McKinley, U. Texas at Austin), is well worth reading.

The key intuition is that reachability is an over-approximation of liveness, and thus if you can identify objects that are (by dint of infrequent use) putative orphans, you can move those orphan objects to disk and stop trying to garbage-collect them, thereby freeing up heap space and relieving the collector of unnecessary work. If the running program later tries to access the orphaned object, you bring it back to life. All of this is done at a very low level so that neither the garbage collector nor the running program knows that anything special is going on.

Melt's staleness-tracking logic and read-blockers don't actually become activated until the running application is approaching memory exhaustion, defined (arbitrarily) as 80-percent heap fullness. Rather than letting the program get really close to memory exhaustion (which causes garbage collection to become so frequent that the program seems to grind to a halt), stale objects are moved to disk so that the running app doesn't slow down.

Purists will complain that sweeping memory leaks under the carpet like this is no substitute for actually fixing the leaks. In very large programs, however, it can be impractical to find and fix all memory leaks. (I question whether it's even provably possible to do so.) And even if you could find and fix all potential leaks in your program, what about the JRE? (Does it never leak?) What about external libraries? Are you going to go on a quest to fix other people's leaks? How will you know when you've found them all?

I believe in fixing memory leaks. But I'm also a pragmatist, and I think if your app is mission-critical, it can't hurt to have a safety net under it; and Melt is that safety net.

Good work, Michael.

Tuesday, November 04, 2008

Garbage-collection bug causes car crash

A few days ago I speculated that you could lose an expensive piece of hardware (such as a $300 million spacecraft) if a non-deterministic garbage-collection event were to happen at the wrong time.

It turns out there has indeed been a GC-related calamity: one in which $2 million was on the line. (To be fair, this particular calamity wasn't actually caused by garbage collection; it was caused by programmer insanity. But it makes for an interesting story nevertheless. Read on.)

The event in question involved a driverless vehicle (shown above) powered by 10K lines of C# code.

At codeproject.com, you'll find the in-depth post-mortem discussion of how a GC-related bug caused a driverless DARPA Grand Challenge vehicle to crash in the middle of a contest, eliminating the Princeton team from competition and dashing their hopes of winning a $2 million cash prize.

The vehicle had been behaving erratically on trial runs. A member of the team recalls: "Sitting in a McDonald's the night before the competition, we still didn't know why the computer kept dying a slow death. Because we didn't know why this problem kept appearing at 40 minutes, we decided to set a timer. After 40 minutes, we would stop the car and reboot the computer to restore the performance."

The team member described the computer-vision logic: "As the car moves, we call an update function on each of the obstacles that we know about, to update their position in relation to the car. Obviously, once we pass an obstacle, we don't need keep it in memory, so everything 10 feet behind the car got deleted."

"On race day, we set the timer and off she went for a brilliant 9.8 mile drive. Unfortunately, our system was seeing and cataloging every bit of tumbleweed and scrub that it could find along the side of the road. Seeing far more obstacles than we'd ever seen in our controlled tests, the list blew up faster than expected and the computers died only 28 minutes in, ending our run."

The vehicle ran off the road and crashed.

The problem? Heap exhaustion. Objects that should have been garbage-collected weren't. Even though delete was being called on all "rear-view mirror" objects, those objects were still registered as subscribers to a particular kind of event. Hence they were never released, and the garbage collector passed them by.

In Java, you could try the tactic of making rear-view-mirror objects weakly reachable, but eventually you're bound to drive the car onto a shiny, pebble-covered beach or some other kind of terrain that causes new objects to be created faster than they can possibly be garbage-collected, and then you're back to the same problem as before. (There are lots of ways out of this dilemma. Obviously, the students were trying a naive approach for simplicity's sake. Even so, had they not made the mistake of keeping objects bound to event listeners, their naive approach no doubt would have been good enough.)

As I said, this wasn't really a GC-caused accident. It was caused by programmer error. Nevertheless, it's the kind of thing that makes you stop and think.

Monday, November 03, 2008

Why 64-bit Java is slow

In an interesting post at the WebSphere Community Blog, Andrew Spyker explains why it is that when you switch from 32-bit Java to a 64-bit runtime environment, you typically see speed go down 15 percent and memory consumption go up by around 50 percent. The latter is explained by the fact that addresses are simply bigger in 64-bit-land, and complex data structures use a lot of 64-bit values even if they only need 32-bit values. The reason performance drops is because although address width has gotten bigger, processor memory caches have not got bigger in terms of overall Kbytes available. Thus, you are bound to see things drop out of L1 and L2 cache more often. Hence cache misses go up and speed goes down.

Why, then, would anyone invest in 64-bit machines if the 64-bit JVM is going to give you an immediate performance hit? The answer is simple. The main reason you go with 64-bit architecture is to address a larger memory space (and flow more bytes through the data bus). In other word, if you're running heap-intensive apps, you have a lot to gain by going 64-bit. If you have an app that needs more than around 1.5 GB of RAM, you have no choice.

Why 1.5GB? It might actually be less than that. On a 4GB Win machine, the OS hogs 2GB of RAM and will only let applications have 2GB. The JVM, of course, needs its own RAM. And then there's the heap space within the JVM; that's what your app uses. It turns out that the JVM heap has to be contiguous (for reasons related to garbage collection). The largest piece of contiguous heap you can get, after the JVM loads (and taking into account all the garbage that has to run in the background in order to make Windows work), is between 1.2GB and 1.8 GB (roughly) depending on the circumstances.

To get more heap than that means either moving to a 64-bit JVM or using Terracotta. The latter (if you haven't heard of it) is a shared-memory JVM clustering technology that essentially gives you unlimited heap space. Or should I say, heap space is limited only by the amount of disk space. Terracotta pages out to disk as necessary. A good explanation of how that works is given here.

But getting back to the 64-bit-memory consumption issue: This issue (of RAM requirements for ordinary Java apps increasing dramatically when you run them on a 64-bit machine) is a huge problem, potentially, for hosting services that run many instances of Java apps for SaaS customers, because it means your scale-out costs rise much faster than they should. But it turns out there are things you can do. IBM, in its JVM, uses a clever pointer-compression scheme to (in essence) make good use of unused high-order bits in a 64-bit machine. The result? Performance is within 5 percent of 32-bit and RAM growth is only 3 percent. Graphs here.

Oracle has a similar trick for BEA's JRockit JVM, and Sun is just now testing a new feature called Compressed oops (ordinary object pointers). The latter is supposedly included in a special JDK 6 "performance release" (survey required). You have to use special command-line options to get the new features to work, however.

Anyway, now you know why 64-bit Java can be slow and piggish. Everything's fatter in 64-bit-land.

For information about large-memory support in Windows, see this article at support.microsoft.com. Also consult this post at sinewalker.

Sunday, November 02, 2008

Java 1.4.2 joins the undead

Java 1.4.2 died last week. According to Sun's "End of Service Life" page, Java 1.4.2 went EOSL last Thursday. The only trouble is, it's still moving.

Java 5 (SE) was released in 2004 and Java 6 has been out since 2006. Java 5 will, in fact, also be at EOSL in less than a year. (You might call it the Java "Dead Man Walking" Edition.) And yet, if you do a Google search on any of the following, guess what you get?

java.lang.Object
java.lang.Class
java.lang.Exception
java.lang.Throwable
java.lang.Runtime
java.awt.Image
java.io.File
java.net.URL
JComponent
JFrame

If you do a Google search on any one of these, the very first hit (in every case) is a link to Sun's Javadoc for the 1.4.2 version of the object in question.

A year from now (when Java 5 hits the dirt) I wonder how many of these 10 searches will still take you to 1.4.2 Javadoc? (Remember, Java 5 has been out for almost 5 years and still doesn't outrank 1.4.2 in Google searches.) I'm guessing half of them. What do you think?

Thursday, October 30, 2008

What's the strangest thing in Java?

There's an interesting discussion going on at TheServerSide.com right now. Someone asked "What’s the strangest thing about the Java platform?"

I can think of a lot of strange things about Java (space precludes a full enumeration here). Offhand, I'd say one of the more disturbing aspects of Java is its ill-behaved (unpredictable) System.gc( ) method.

According to Sun, System.gc( ) is not 100% reliable: "When control returns from the method call, the virtual machine has made its best effort to recycle all discarded objects." Notice the wording ("best effort"). There is absolutely no guarantee that gc() will actually force a garbage collection. This is well known to anybody who has actually tried to use it in anger.

The problem is, in the rare case when you actually do need to use gc(), you really do need it to work (or at least behave in a well-understood, deterministic way). Otherwise you can't make any serious use of it in a mission-critical application. Not to put too fine a point on it, but: If a method is not guaranteed to do what you expect it to do, then it seems to me the method becomes quite dangerous. I don't know about you, but I rely on System calls to work. If you can't rely on a System call, what can you rely on?

Suppose you've written a reentry program for a spacecraft, and you have an absolute need for a particular routine (e.g., to fire retro-rockets) to execute, without interruption, starting at a particular point in time. The spacecraft will be lost and the mission will fail (at a cost to taxpayers of $300 million) if the retro-rockets don't fire on time or don't shut off on time.

Now imagine that just as your program's fireRetroRockets() method is entered, the JVM decides to "stop the world" and do a garbage-collect.

Houston, we have a . . . well, you know.

The point is, if you could call System.gc( ) ahead of time, and count on it doing exactly what you want it to do (collect garbage immediately, so that an uncommanded GC won't happen at the wrong moment), you could save the mission. (Arguably.)

Obviously, this example is somewhat academic. No one in his right mind would actually use Java to program a spacecraft, in real life.

And that, I think, says a great deal about the Java platform.

Wednesday, October 29, 2008

Chaos in query-land

I wrote a micro-rant the other day at CMSWatch.com on the need for an industry-standard syntax for plain-language keyword search. I, for one, am tired of learning a different search syntax for every site I go to. I find myself naively assuming (like an idiot) that every search engine obeys Google syntax. Not true, of course. It's a free-for-all out there. For example, not every search engine "ANDs" keywords together by default. Even at this simple level (a two-keyword search!) users are blindsided by products that behave unpredictably.

At any rate, Lars Trieloff pointed out to me yesterday that Apache Jackrabbit (the Java Content Repository reference implementation, which underpins Apache Sling) implements something called GQL, which is colloquially understood to mean Google Query Language, although in fact it means GQL. It does not implement Google's actual search syntax in comprehensive detail. It merely allows Jackrabbit to support plaintext queries in a Google-like way, so that if you are one of those people (like me) who automatically assumes that any given search widget will honor Google grammar, you won't be disappointed.

It turns out, the source code for GQL.java is remarkably compact, because really it's just a thin linguistic facade over an underlying XPath query facility. GQL.java does nothing more than transliterate your query into XPath. It's pretty neat, though.

I'm all for something like GQL becoming, say, an IETF RFC, so that vendors and web sites can begin implementing (and advertising) support for Google-like syntax. First there will need to be a name change, though. Google already uses "GQL" to describe a SQL-like language used in the Google App Engine. There's also a Graphical Query Language that has nothing to do with Jackrabbit nor Google.

See what I mean? It's chaos out there in query-land.

Tuesday, October 28, 2008

Pixel Bender plug-in for Photoshop

When I first heard about Adobe's Pixel Bender technology, I became very excited. An ActionScript-based pixel shader API? What could be more fun that than? (By now you know what my social life must be like.)

When I saw that PB was a Flash-only technology, my enthusiasm got tamped down a bit. Later, I learned that PB would be supported in After Effects, which had me scratching my chin again. (I've written AE plug-ins before. It's much less punishing than writing Photoshop plug-ins.)

Now it turns out there will be a Pixel Bender plug-in for the next version of Photoshop. According to Adobe's John Nack, "Pixel Bender won't be supported in the box in the next version of Photoshop, but we plan to offer a PB plug-in as a free download when CS4 ships. Therefore it's effectively part of the release."

This is great news for those of us who like to peek and poke pixels but can't be bothered to use the Byzantine C++ based Photoshop SDK.

In case you're wondering what you can do with Pixel Bender, some nice sample images and scripts can be found here. The image shown above was created with this 60-line script.

Nice.

Monday, October 27, 2008

Java 7 gets "New" New I/O package

I've always hated Java I/O with all its convoluted, Rube-Goldbergish special classes with special knowledge of special systems, and the legacy readLine( ) type of garbage that brings back so many bad memories of the Carter years.

With JSR 203 (to be implemented in Java SE 7), we get a new set of future legacy methods. This is Sun's third major attempt in 13 years to get I/O right. And from what I've seen, it doesn't look good. (Examples here.) My main question at this point is where they got that much lipstick.

The main innovation is the new Path object, which seems to be a very slightly more abstract version of File. (This is progress?) You would think any new I/O library these days would make heavy use of URIs, URLs, and Schemes (file:, http:, etc.) and lessons learned in the realization of concepts like REST, AJAX, and dependency injection. No such luck. Instead we have exotic new calls like FileSystem.getRootDirectories()and DirectoryEntry.newSeekableByteChannel(). It's like we've learned nothing at all in the last 20 years.

When I want to do I/O, I want to be able to do something like

dataSrc = new DataGetter( );
dataSrc.setPref( DataGetter.EIGHTBITBYTES );
dataSrc.setPref( DataGetter.SLURPALL );
data = dataSrc.getData( uri );

and be done with it. (And by the way, let me pass a string for the URI, if I want to. Don't make me create a special object.)

I don't want to have to know about newlines, buffering, or file-system obscurata, unless those things are terribly important to me, in which case I want to be able to inject dependencies at will. But don't make me instantiate totally different object types for buffered vs. non-buffered streams, and all the rest. Don't give me a million flavors of special objects. Just let me pass hints into the DataGetter, and let the DataGetter magically grok what I'm trying to do (by making educated guesses, if need be). If I want a special kind of buffering, filtering, encoding, error-handling, etc., let me craft the right cruftball of flags and constants, and I'll pass them to the DataGetter. Otherwise, there should be reasonable defaults for every situation.

I would like a file I/O library that is abstract enough to let me read one bit at a time, if I want; or 6 bits at a time; or 1024 bits, etc. To me, bits are bits. I should be able to hand parse them if I want, in the exact quantities that I want. If I'm doing some special type of data compression and I need to write 13 bits to output, then 3 bits, then 12, then 10, and so on, I should be able to do that with ease and elegance. I shouldn't have to stand on my head or instantiate exotic objects for reading, buffering, filtering, or anything else.

I could write a long series of articles on what's wrong with Java I/O. But I don't look forward to revising that article every few years as each "new" I/O package comes out. Like GUI libraries and 2D graphics, this is something Sun's probably never going to get right. It's an area that begs for intervention by fresh talent, young programmers who are self-taught (not infected by orthodoxies acquired in college courses) and have no understanding at all of legacy file systems, kids whose idea of I/O is HTTP GET. Until people with "beginner's mind" get involved, there's no hope of making Java I/O right.

Friday, October 24, 2008

Enterprise Software Feared Overpriced

I'm being sardonic with that headline, obviously, but I have to agree with Tim Bray, who said in passing the other day: "I just don’t believe that Enterprise Software, as currently priced, has much future, in the near term anyhow."

I take this to mean that the days of the seven-figure software deal (involving IBM, Oracle, EMC, Open Text, etc.) may not exactly be over, but certainly those kinds of sales are going to be vanishingly rare, going forward.

I would take Bray's statement a step further, though. He's speaking to the high cost of enterprise software itself (or at least that's how I interpret his statement). Enterprise systems take a lot of manpower to build and maintain. The budget for a new system rollout tends to break out in such a way that the software itself represents only 10 to 50 percent of the overall cost. In other words, software cost is a relatively minor factor.

Therefore I would extend Bray's comment to say that old-school big-budget Enterprise Software projects involving a cast of thousands, 12 months of development and testing, seven-figure software+services deals, etc., are on the way out. In its place? Existing systems! Legacy systems will be maintained, modified, built out as necessary (and only as necessary) using agile methodologies, high-productivity tools and languages (i.e., scripting), RESTful APIs, and things that make economic sense.

There's no room any more for technologies and systems that aren't provably (and majorly) cost-effective. IBM, Oracle, EMC, listen up: Million-dollar white elephants are on the endangered species list.

Wednesday, October 22, 2008

Lucene site powered by Google!

What's wrong with this picture?

Flash-drive RAID

I stumbled upon the floppy-drive RAID story (see previous blog) as part of a Google search to see if any such thing as a memory stick (Flash-drive) RAID array is available for Vista. No such luck, of course. But there are quite a few blogs and articles on the Web by Linux users who have successfully created ad-hoc Flash RAIDs from commodity USB hubs and memory sticks. (I recommend this June 2008 article from the Linux Gazette and this even more entertaining, not to mention better-illustrated, piece by Daddy Kewl. Definitely do not fail to read the latter!) Linux supports this kind of madness natively.

MacOS is even better for this. Evidently you can plug two sticks into a PowerBook's USB ports and configure them as a RAID array with native MacOS dialogs. (Details here.) How I envy Mac users!

Tuesday, October 21, 2008

Floppy-disk RAID array

This has got to be the funniest thing I've seen all year. And trust me, this has been a funny year.

Daniel Blade Olson, a man after my own heart (even if that phrase doesn't translate well into foreign languages...), has rigged a bunch of floppy drives to form a RAID array. His disturbing writeup is here.

Saturday, October 18, 2008

Fast pixel-averaging

I don't know why it took me so long to realize that there's an easy, fast way to obtain the average of two RGB pixel values. (An RGB pixel is commonly represented as a 32-bit integer. Let's assume the top 4 bits aren't used.)

To ensure proper averaging of red, green, and blue components of two pixels requires parsing those 8-bit values out of each pixel and adding them together, then dividing by two, and crafting a new pixel out of the new red, green, and blue values. Or at least that's the naive way of doing things. In code (I'll show it in JavaScript, but it looks much the same in C or Java):


   // The horribly inefficient naive way:

   function average( a,b ) {

      var REDMASK = 0x00ff0000;
      var GREENMASK = 0x0000ff00;
      var BLUEMASK = 0x000000ff;
      var aRed = a & REDMASK;
      var aGreen = a & GREENMASK;
      var aBlue = a & BLUEMASK;
      var bRed = b & REDMASK;
      var bGreen = b & GREENMASK;
      var bBlue = b & BLUEMASK;

      var aveRed = (aRed + bRed) >> 1;
      var aveGreen = (aGreen + bGreen) >> 1;
      var aveBlue = (aBlue + bBlue) >> 1;
   
      return aveRed | aveGreen | aveBlue;
   }

That's a lot of code to average two 32-bit values, but remember that red, green, and blue values (8 bits each) have to live in their own swim lanes. You can't allow overflow.

Here's the much cleaner, less obvious, hugely faster way:


   // the fast way:

   MASK7BITS = 0x00fefeff; 

   function ave( a,b ) {
      
      a &= MASK7BITS;
      b &= MASK7BITS;  
      return (a+b)>>1;
   }

The key intuition here is that you want to clear the bottom bit of the red and green channels in order to make room for overflow from the green and blue "adds."

Of course, in the real world, you would inline this code rather than use it as a function. (In a loop that's processing 800 x 600 pixels you surely don't want to call a function hundreds of thousands of times.)

Similar mask-based techniques can be used for adding and subtracting pixel values. Overflow is handled differently, though (left as an exercise for the reader).

Friday, October 17, 2008

Loading an iframe programmatically

This is a nasty hack. It's so useful, though. So useful.

Suppose you want to insert a new page (a new window object and DOM document) into your existing page. Not a new XML fragment or subtree on your current page; I'm talking about a whole new page within a page. An iframe, in other words.

The usual drill is to create an <iframe> node using document.createElement( ) and attach it to the current page somewhere. But suppose you want to populate the iframe programmatically. The usual technique is to start building DOM nodes off the iframe's contentDocument node using DOM methods. Okay, that's fine, but it's a lot of drudgery. (I'm sweating already.) At some point you're probably going to start assigning string values to body.innerHTML (or whatever). But then you're into markup-stringification hell. (Is there a JavaScript programmer among us who hasn't frittered away major portions of his or her waking life escaping quotation marks and dealing with line-continuation-after-line-continuation in order to stringify some hellish construction, whether it's a piece of markup or an argument to RegExp( ) or whatever?)

Well. All of that is best left to Internet Explorer programmers. If you're a Mozilla user, you can use E4X as your "get out of stringification-jail FREE" card, and you can use a data URL to load your iframe without passing through DOM hell.

Suppose you want your iframe to contain a small form. First, declare it as an XML literal (which you can do as follows, using E4X):


myPage = <html>
<body>
  <form action="">
     ... a bunch of markup here
  </form>
</body>
</html>;

Now create an iframe to hold it:

   iframe = top.document.createElement( "iframe" );

Now (the fun part...) you just need to populate the iframe, which you can do in one of two ways. You can attach the iframe node to the top.document, then assign myPage.toXMLString() to iframe.contentDocument.body, or (much more fun) you can convert myPage to a data URL and then set the iframe's src attribute to that URL:


    // convert XML object to data URL
 function xmlToDataURL( theXML ) {   

         var preamble = "data:text/html;charset=utf-8,";
         var octetString = escape( theXML.toXMLString( )  );
         return preamble + octetString;
 }

 dataURL = xmlToDataURL( myPage );

 iframe.setAttribute( "src", dataURL ); // load frame

 // attach the iframe to your current page
 top.document.body.insertBefore( iframe ,
       top.document.body.firstChild );

A shameless hack, as I say. It works fine in Firefox, though, even with very large data URLs. I don't recall the exact size limit on data URLs in Mozilla, but I seem to remember that it's megabytes. MSIE, of course, has some wimpy limit like 4096 characters (maybe it's changed in IE8?).

In my opinion, all browsers SHOULD support unlimited-length data URLs, just like they SHOULD support E4X and MUST support JavaScript. Notwithstanding any of this, Microsoft MAY go to hell.

Saturday, October 11, 2008

Russians use graphics card to break WiFi encryption

The same Russians who got in a lot of trouble a few years ago for selling a small program that removes password protection from locked PDF files (I'm talking about the guys at Elcomsoft) are at it again. It seems this time they've used an NVidia graphics card GPU to crack WiFi WPA2 encryption.

They used the graphics card, of course, for sheer number-crunching horsepower. The GeForce 8800 GTX delivers something like 300 gigaflops of crunch, which I find astonishing (yet believable). Until now, I had thought that the most powerful chipset in common household use was the Cell 8-core unit used in the Sony Playstation 3 (which weighs in at 50 to 100 gigaflops). Only 6 of the PS/3's processing units are available to programmers, though, and the Cell architecture is meant for floating-point operations, so for all I know the GeForce 8800 (or its relatives) might be the way to go if you need blazing-fast integer math.

Even so, it would be interesting to know what you could do with, say, an 8-box cluster of overclocked PS/3s. Simulate protein-ribosome interactions on an atom-by-atom basis, perhaps?

Decimal to Hex in JavaScript

There's an easy way to get from decimal to hexadecimal in JavaScript:

  function toHex( n ) { return n.toString( 16 ); }

The string you get back may not look the way you want, though. For example, toHex(256) gives "100", when you're probably wanting "0x0100" or "0x00000100". What you need is front-padding. Just the right amount of front-padding.

// add just the right number of 'ch' characters
// to the front of string to give a new string of
// the desired final length 'dfl'

  function frontPad( string, ch, dfl ) {
     var array = new Array( ++dfl - string.length );
     return array.join( ch ) + string;
  }

Of course, you should ensure that 'dfl' is not smaller than string.length, to prevent a RangeError when allocating the array.

If you're wondering why "++dfl" instead of plain "dfl", stop now to meditate. Or run the code until enlightenment occurs.

At this point you can do:

  function toHex( n ) {
    return "0x" + frontPad( n.toString( 16 ), 0, 8);
  }

  toHex( 256 )  // gives "0x00000100"

If you later need to use this value as a number, no problem. You can apply any numeric operation except addition on it with perfect safety. Addition will be treated as string concatenation whenever any operand is a string (that's the standard JS intepreter behavior), so if you need to do "0x00000100" + 4, you have to cast the hex-string to a number.

  n = toHex( 256 );  // "0x00000100"
  typeof n  // "string"
  isNaN( n )  // false
  x = n * n;  // 65536
  x = n + 256  // "0x00000100256"
  x = Number( n ) + 256   //  512

Wednesday, October 08, 2008

$20 touchscreen, anyone?

Touchless is one of those ideas that's so obvious, yet so cool, that after you hear it, you wonder why someone (such as yourself) didn't think of it ages ago. Aim a webcam at your screen; have software that follows your fingers around; move things around in screen space in response to your finger movements. Voila! Instant touch-screen on the cheap.

Mike Wasserman came up with Touchless as a college project while attending Columbia University. He's now with Microsoft. The source code is free.

Awesome.

Saturday, October 04, 2008

Accidental assignment

People sometimes look at my JavaScript and wonder why there is so much "backwards" notation:

   if ( null == arguments[ 0 ] )
     return "Nothing to do";

   if ( 0 == array.length )
     break;

And so on, instead of putting the null or the zero on the right side of the '==' the way everyone else does.

The answer is, I'm a very fast typist and it's not uncommon for me to type "s" when I meant to type "ss," or "4" when I meant to type "44," or "=" when I meant to type "==".

In JavaScript, if I write the if-clause in the normal (not backwards) way, and I mistakenly type "=" for "==", like so...

   if ( array.length = 0 )
  break;

... then of course I'm going to destroy the contents of the array (because in JavaScript, you can wipe out an array by setting its length to zero) and my application is going to behave strangely or throw an exception somewhere down the line.

This general type of programmer error is what I call "accidental assignment." Note that I refer to it as a programmer error. It is not a syntactical error. The interpreter will be only too happy to assign a value to a variable inside an if-clause, if you tell it to. And it may be quite some time before you are able to locate the "bug" in your program, because at runtime the interpreter will dutifully execute your code without putting messages in the console. If an exception is eventually thrown, it could be in an operation that's a thousand lines of code away from your syntactical blunder.

So the answer is quite simple. If you write the if-clause "backwards," with zero on the left, an accidental assignment will be caught right away by the interpreter, and the resulting console message will tell you the exact line number of the offending code, because you can't assign a value to zero (or to null, or to any other baked-in constant).

In an expression like "null == x" we say that null is not Lvaluable. The terms "l-value" and "r-value" originally meant left-hand value and right-hand value. But when Kernighan and Ritchie created C, the meaning changed, to become more precise. Today an Lvalue is understood to be a locatable value, something that has an address in memory. A compiler will allocate an address for each named variable at compile-time. The value stored in this address (its r-value) is generally not known until runtime. It's impossible, in any case, to refer to an r-value by its address if it hasn't been assigned to an l-value, hence the compiler won't even try to do so and you'll get an error if you try to compile "null = x".

On the other hand, "x = null" is perfectly legal, and in K&R days a C-compiler would obediently compile such a statement whether it was in an if-clause or not. This actually resulted in some horrendously costly errors in the real world, and as a result, today no modern compiler will accept a bare assignment inside an if-clause. (Actually I can think of an exception. But let's save that for another time.) If you really mean to do an assignment inside an if, you must encapsulate it in parentheses.

Not so with JavaScript, a language that (like K&R C) assumes that the programmer knows what he or she is doing. People unwittingly create accidental assignments inside if-clauses all the time. It's not a syntactical error, so the interpreter doesn't complain. Meanwhile you've got a very difficult situation to debug, and the language itself gets blamed. (A poor craftsman always blames his tools.)

As a defensive programming technique, I always put the non-Lvaluable operand on the left side of an equality operator, and that way if I make a typing mistake, the interpreter slaps me in the face at the earliest opportunity rather than spitting in my general direction some time later. It's a defensive programming tactic that has served me well. I'm surprised more people don't do it.

Thursday, October 02, 2008

How this blog looks to Wordle

click for larger version (make your own at wordle)

Wednesday, October 01, 2008

Serialize any POJO to XML

Ever since Java 1.4.2 came out, I've been a big fan of java.beans.XMLEncoder, which lets you serialize runtime objects (including the values of instance variables, etc.) as XML, using just a few lines of code:


   XMLEncoder e = new XMLEncoder(
                      new BufferedOutputStream(
                          new FileOutputStream("Test.xml")));
   e.writeObject(new JButton("Hello, world"));
   e.close();

This is an extraordinarily useful capability. You can create an elaborate Swing dialog (for example) containing dozens of nested widgets, then serialize the whole thing as a single XML file, capturing its state, using XMLEncoder (then deserialize it later, in another time and place, perhaps).

A favorite trick of mine is to serialize an application's key objects ahead of time, then JAR them up and instantiate them at runtime using XMLDecoder. With a Swing dialog, this eliminates a ton of repetitive container.add( someWidget) code, and similar Swing incantations (you know what I'm talking about). So it cleans up your code incredibly. It also makes Swing dialogs (and other objects) declarative in nature; they become static XML that you can edit separately from code, using XML tools. At runtime, of course, you can use DOM and other XML-manipulation technologies to tweak serialized objects before instantiating them. (Let your imagination run.)

As an aside: I am constantly shocked at how many of my Java-programming friends have never heard of this class.

If there's a down side to XMLEncoder, it's that it will only serialize Java beans, or so the documentation says, but actually the documentation is not quite right. (More on that in a moment.) With Swing objects, for example, XMLEncoder will serialize widgets but not any event handlers you've set on them. At runtime, you end up deserializing the Swing object, only to have to hand-decorate it with event handlers before it's usable in your application.

There's a solution for this, and again it's something relatively few Java programmers seem to know anything about. In a nutshell, the answer is to create your own custom persistence delegates. XMLEncoder will call the appropriate persistence delegate when it encounters an object in the XML graph that has a corresponding custom delegate.

This is (need I say?) exceptionally handy, because it provides a transparent, interception-based approach to controlling XMLEncoder's behavior, at a very fine level of control. If you have a Swing dialog that contains 8 different widget classes (some of them possibly containing multiple nested objects), many of which need special treatment at deserialization time, you can configure an XMLEncoder instance to serialize the whole dialog in just the fashion you need.

The nuts and bolts of this are explained in detail in this excellent article by Philip Milne. The article shows how to use custom persistence delegates to make XMLEncoder serialize almost any Java object, not just beans. Suffice it to say, you should read that article if you're as excited about XMLEncoder as I am.

Monday, September 29, 2008

A number that's not equal to itself

All this time, I've been thinking NaN is not a number. What an idiot I've been.

In JavaScript:

   typeof NaN == 'number'   // true

And yet of course, NaN == NaN is false.

There you go. Amaze your friends.

Wednesday, September 24, 2008

Great hack: PNG-compressed text

I only recently stumbled across what's got to be the most outlandish scripting hack I've seen in a long time. Jacob Seidelin tells of how he managed to stuff text into a PNG image, then get it back out with the <canvas> getImageData( ) method. What's neat about it? Mainly the free compression you get with the PNG format. For example, when Jacob put the 124kb Prototype library into PNG format, it shrunk to 30kb. Of course, it makes for an awful-looking image (see above), which one might think of as a degenerate case of steganography, i.e. embedded data in an image, minus the image.

The trick doesn't work for all browsers, since you need canvas for it to work. And it's kind of pointless given that you can use gzip instead. But it's kind of neat in that it opens the door to browser steganography, embedding of private metadata, and potentially lots of other cool things.

Tuesday, September 23, 2008

JavaScript beautifiers suck

I keep looking for an online code beautifier that will convert my distinctly simian-looking Greasemonkey scripts to properly indented, formatted source code. My current favorite code editor (Notepad) doesn't provide proper code formatting. I know what you're thinking: Why aren't you using a proper IDE in the first place? Then you wouldn't have this problem! Well, first of all, I am thinking of upgrading to Wordpad. But it doesn't do formatting either. Second of all, I haven't found a JavaScript IDE worthy of the name, which is why I use Notepad. More on that in a minute.

I spent an hour the other day looking for an online beautifier that would do a makeover on my ugly JavaScript. What I found is that most people point either to this one or this one. (I tried others as well.) They either don't keep my existing newlines, or don't indent "if" blocks properly (or at all), and/or just plain don't indent consistently. Quite unacceptable.

Finally I gave up on the online schlockware and went straight to Flexbuilder (which has been sitting unused on my desktop), and I thought "Surely this will do the trick."

Imagine the look of abject horror on my face when I found that the ActionScript editor could not do the equivalent of Control-Shift-F (for Java in Eclipse). In fact, the formatter built into Flexbuilder's ActionScript editor won't even do auto-indenting: You have to manually grab blocks of code and do the old shift-right/shift-left indent/outdent thing by hand, over and over and over again, throughout your code, until the little beads of blood begin to form on your forehead.

I'm left, alas, with half-solutions. But unfortunately, two or three or ten half-solutions don't add up to a solution. (How fortunate we would all be if it did.)

Monday, September 22, 2008

Firebug on Vista giving problems

Is it just me or does anyone else find Firebug+FF3 on Vista to be flaky? It loses my console code if I switch tabs (not windows, just going to another tab and coming back). Sometimes the FB console stops working or won't execute "console.log( )". And it seems as though weird bugs show up in the Firefox console that don't show up in the Firebug log pane, and vice versa.

Also, I don't appreciate having to manually turn on the console for every web domain I go to. What a PITA. I wonder if that behavior can be disabled somehow? Right now, I'm feeling disabled.

Thursday, September 18, 2008

JavaScript runs at C++ speed, if you let it

The common perception (ignorance of the crowd) is that JavaScript is slow. What I'm constantly finding, however, is that people will hand-craft a JavaScript loop to do, say, string parsing, when they could and should be using the language's built-in String methods (which always run fast).

Example: You need a "trim" function to remove leading and trailing whitespaces from user-entered text in a form. If you go out on the web and look at what people are doing in their scripts, you see a lot of things like:

function trim10 (str) {
 var whitespace = ' \n\r\t\f\x0b\xa0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000';
 for (var i = 0; i < str.length; i++) {
  if (whitespace.indexOf(str.charAt(i)) === -1) {
   str = str.substring(i);
   break;
  }
 }
 for (i = str.length - 1; i >= 0; i--) {
  if (whitespace.indexOf(str.charAt(i)) === -1) {
   str = str.substring(0, i + 1);
   break;
  }
 }
 return whitespace.indexOf(str.charAt(0)) === -1 ? str : '';
}

I took this code verbatim from a web page in which the author of it claims (ironically) that it's an incredibly fast routine!

Compare with:


function trim(a) {
  return a.replace(/^ +/,"").replace(/ +$/,"");
}

In testing, I found the shorter routine faster by 50% on very small strings with very few leading or trailing spaces, and faster by 300% or more on strings of length ~150 with ten to twenty leading or trailing spaces.

The better performance of the shorter function has nothing to do with it being shorter, of course. It has everything to do with the fact that the built-in JavaScript "replace( )" method (on the String pseudoclass) is implemented in C++ and runs at compiled-C speed.

This is an important point. Interpreters are written in C++ (Spidermonkey) or Java (Rhino). The built-in functions of the ECMAScript language are implemented in C++ in your browser. Harness that power! Use the built-in functions of the language. Never hand-parse strings with "indexOf" inside for-loops (etc.) when you can use native methods that run at compiled speed. Why walk if you can ride the bullet train?

The implications here for client/server web-app design are quite far-reaching. If you are using server-side JavaScript, and your server runtimes are Java-based, it means your server-side scripts are running (asymptotically, at least) at Java speed. Well-written client-side JavaScript runs (asymptotically) at C++ speed. Therefore, any script logic you can move to the client should be moved there. It's madness to waste precious server cycles.

Madness, I say.

Wednesday, September 17, 2008

Getting Greasemonkey to work in Firefox3 on Vista

Wasn't happening for me until I started with a fresh (empty) FF3 user profile. Vista seems to be the problem in all of this. GM on FF3 on WinXP works fine, but with Vista, GM doesn't install properly unless you zero out your FF3 profile first. At least, that's the state of things today as I write this (17 Sept 2008). Hopefully it will get fixed soon. Until then ...

The procedure is:

1. In FF3, go to Organize Bookmarks and export your bookmarks as HTML so you don't foolishly lose them.

2. In the Vista "Start" panel, choose Run...

3. Launch Firefox with a command line of "firefox -profilemanager".

4. When the profile manager dialog appears, create a new profile.

5. When FF launches, install Greasemonkey.

6. Import your bookmarks.

7. Exit Firefox. Return to step 3. When profile manager dialog appears, delete your old profile. (Or else leave it and have to contend with logging in to one or the other profile whenever FF launches.)

Whew + sheesh.