Wednesday, November 26, 2008

What Sun should really do

I've worked for companies that are in Sun's situation (most recently Novell), and I have a few observations based on my years of watching hugely talented groups of people produce astoundingly good Java technology, only to see the Greater Organization fail to find a way to monetize it.

Like Sun, Novell is a venerable tech company with an interesting past. It finds itself today in a situation (like Sun) where profitability is consistently miserable, but the balance sheet is good. The parallels between Sun and Novell are far-reaching. Both were started more than two decades ago (Novell in 1979, Sun in 1982) as hardware companies. Both soon found themselves in the operating-system business. Novell owned DR-DOS, which led to Novell DOS, which in turn became the boot loader for NetWare. Along the way, Novell acquired UNIX from AT&T.

NetWare was an extraordinarily successful OS that Novell foolishly stopped supporting shortly after acquiring SUSE Linux in late 2003. I say foolishly because NetWare was a cash cow that required very little code maintenance to keep going, whereas SUSE Linux sucked Novell's coffers dry as the entire company pivoted in the direction of things that are extremely hard to make money on (viz., open-source software, something Novell had little experience with).

Distraction destroys profitability (someone please make that a bumper sticker...), and this is something that has cost Novell and Sun a great deal of money over the years. Technology is exciting, and promising technologies have a way of siphoning attention away from more prosaic sorts of things, like finding and solving customer pain (i.e., making money).

Technologists have a way of convincing themselves that some technologies are more worthy than others. And quite often, what happens is that people at the top who should know better become seduced into allocating large amounts of lucre to the promotion of money-losers (on the theory that eventually they are bound to become money-makers) while cash-cow products get money taken away (on the theory that "this product is a winner, it's throwing off cash like crazy; we don't need to promote it").

I've started and run two successful businesses (including one that I launched in 1979, which is still in operation under a different owner, today) and I've seen friends and relatives start and run businesses, so I know first-hand what it takes to keep a business right-side-up; and I know some sure-fire techniques for making a successful business cartwheel out of control into a ditch, trailing black smoke.

One of the main things I've learned is that you never promote a loser; you always put your money behind proven winners. Never take marketing or development funding away from a winner to promote something that is either a proven loser or not yet proven to be a winner. (That's too long for a bumper sticker. It should be on billboards.)

Back to the software biz for a minute. Novell and Sun are both "operating system companies" to some degree. This already carries with it the stench of death. Being an OS company was a great thing back in the Carter years; it was lucrative in those days. It's not a great thing today. It siphons off money that is better deployed elsewhere. Microsoft (with its "Live" series of SaaSified product offerings) has recently gotten the message that the Web is the new OS, and the desktop is irrelevant as a metaphor. This is a huge paradigm shift for Microsoft. But they finally get it: They get that the future is in things like collaboration (human connectivity) and dynamic assembly of reusable content. They are starting to understand that infrastructure is not something customers want to have to know about; that everything that can be virtualized should be virtualized. Customers instinctively know this, even if they can't articulate it.

So then, what's a Sun to do?

First, stop worrying so much about the future and figure out what's making money now, so you can try to massively scale whatever that happens to be. Remember: Invest in winners, not losers. Find out what's working. Crank the volume full max on it.

The next thing to do is obvious: Kill your losers. Utterly walk away from them, now, today, this minute. Redeploy the resources to your winners (or else sell them off).

The very next thing to do is apply the foregoing principles to your people. Find the winners (the true contributors, the people who are making successful things successful) and reward them. Not just with money, but with whatever else they want: promotion, recognition, travel, alone time, or whatever. People are different. Most techies are not motivated by money.

Likewise, identify and weed out the mediocre, the tired, the overly comfortable, the complainers and morale-killers; find the toxic individuals (they're everywhere) and remove them somehow. Just getting rid of the toxic people will cause those who are left to be more productive.

Next, pivot the orgnization in the direction of innovation. This is exceptionally difficult to do. I was involved with the "Fostering Innovation" Community of Practice at Novell during a time when Novell was desperately trying to become more of an innovation-centric culture. One of the kneejerk things Novell did was increase the bonus paid to employees who contributed patentable inventions. Novell eventually was paying $4500 plus hundreds of shares of restricted stock for each invention accepted by the Inventions Committee (of which I was a member). What happened was that we got goofy submissions from all over the company, while certain senior engineers who knew how to game the system succeeded in making a nice side-living on patent bonuses.

Innovation is fostered when you simply set innovative people free. Innovators will innovate for their own reasons; money has nothing to do with it. All you need to do is clear the path for these individuals. At Novell, as at Sun, there's a special honor reserved for senior people who have a track record of accomplishment. We called them Distinguished Engineers. These people were like tenured professors. They came to work in pajamas (not really) and did whatever they wanted, basically, with no fear of ever being fired.

That's a stupid system. Tenured professorships lead to sloth. Not every Distinguished Engineer is a burned-out has-been on the dole, but some are, and it sets a bad example.

Younger engineers (and others) who are proving their potential as innovators need to be recognized while they're at their peak. (The 45-year-olds with a track record of innvoation in the 1990s need to be considered for early retirement. Recognizing someone a decade too late serves no purpose.) What I advocate is a system of "innovation sabbaticals," awarded to budding innovators (of any age) who are doing Great Things and are likely to do more if set free.

Finally, going forward, hire good people. This is any company's best and only salvation. It's the foundation for all success. When you have difficult problems to solve (as any troubled business does), hire very, very smart people who have no prior experience with the problems in question. That's how you get fresh answers that bear tasty fruit.

This blog is already too long, so I'll stop. In a nutshell, what Sun needs to do is focus light on itself and conduct a pre-mortem. The first order of business is to find out which pieces of the business are profitable, and scale those. Then find out which pieces of the business are sucking cash, and amputate those. If that's two-thirds of Sun, so be it. It means Sun needs to be a third as big as it is now. It'll shrink down to that size eventually, so why spend time and money getting there the slow way? Go there now. Shareholders will applaud.

And by the way, be clear on one thing: This is all about earnings-per-share. There is no other goal, no other agenda. Sun is a business. It's not a charity organization or a full-employment program for has-beens. Earnings per share comes first. Everything else follows.

Find and reward (not necessarily with cash!) your best people. Get rid of the losers who are bringing morale and productivity down for everyone else.

Set innvoators free. They will innovate for their own reasons. Just let them.

And get out of the operating system business. The Web is the OS, for cryin'-out-loud. Even Microsoft has figured that one out.

Tuesday, November 25, 2008

Google wants to hire 665 people

The WebGuild story about Google laying off ten thousand workers is (sadly) mostly made-up nonsense.

Probably the most outlandish statement in the article is "Since August, hundreds of employees have been laid off and there are reports that about 500 of them were recruiters."

Five hundred recruiters?? ROTFLMAO.

The only scintilla of truth in the entire article, as far as I can determine, is the bit about Google having approximately ten thousand contract workers, which Sergey Brin confirmed in an October 16 story in The Mercury News. The notion that Google will be letting all of them go is nonsense, however. Brin (in the same Mercury News story) did say Google "has a plan to significantly reduce that number through vendor management, converting some contractors to regular employees, and other approaches." That's all he said: "significantly reduce."

It's quite easy to verify that Google is, in fact, still hiring at a brisk pace. Go here to browse the 665 open positions.

Monday, November 24, 2008

Flex meets Inversion-of-Control

I didn't realize until just now that there is a Spring-like inversion-of-control framework for Flex, called Prana (available under a BSD license).

Seeing something like Prana makes me wonder how many other staples of the Java world will be emulated by Flex folk.

Very interesting indeed.

Saturday, November 22, 2008

Death of an Eclipse project

A November 12, 2008 slide deck explains why the Eclipse Application Lifecycle Framework (ALF) project will be terminated due, basically, to lack of interest. Except, there was, in fact, interest by a corporation: enterprise-mashup player Serena Software, who contributed significantly to the code base.

The whole story is a little weird. The ALF project morphed into an SOA framework of sorts shortly after its inception in 2005. Serena (an application lifecycle management software firm, originally, but also known for the now-moribund Serena Collage Content Management System) got involved early on. Eventually, ALF was adopted as the underlying SOA and authentication framework by Serena Business Mashups in Dec 2007.

And now the "project leadership" has decided that the Eclipse ALF project should be shut down, with the code being donated to the Higgins project. The Project Leader for ALF is (was) Brian Carroll, a Serena Fellow.

Higgins, it turns out, is actually not ALF-related except in the most tangential sense. I was working in the Identity Services division at Novell in 2006 when Higgins was created. I knew about it through Duane Buss and Daniel Sanders (both of whom are still principals on the project). Daniel and I worked together on the Novell Inventions Committee.

Higgins is (according to the project FAQ) "an open source Internet identity framework designed to integrate identity, profile, and social relationship information across multiple sites, applications, and devices. Higgins is not a protocol, it is software infrastructure to support a consistent user experience that works with all popular digital identity protocols, including WS-Trust, OpenID, SAML, XDI, LDAP, and so on."

It's really largely about identity cards or "information cards" (InfoCards, I-Cards).

In case you're wondering about the name: Higgins is the name of a long-tailed Tasmanian jumping mouse.

So, ah . . . ALF isn't the only SOA-related Eclipse project being taken down now. For info on the others, see this story in the Register.

Thursday, November 20, 2008

How to set your head on fire



The folks at Jabra (the Danish headset manufacturer) are having a product recall. It seems Jabra's lithium batteries can overheat and catch fire. According to the company's announcement:
Dear Jabra GN9120 Customer
In cooperation with the Danish Safety Technology Authority (Sikkerhedsstyrelsen) and the U.S. Consumer Product Safety Commission, and other regulatory agencies GN Netcom is voluntarily recalling Lithium-ion batteries from ATL (ATL P/N 603028) used in GN9120 wireless headsets and sold from January 2005 through September 2008. These lithium-ion polymer batteries can overheat due to an internal short circuit in the batteries, which can pose a fire hazard. The battery has only been used in the GN9120 wireless headset. If you are using any other headset solution from GN Netcom you are not affected by this statement.
Not to worry, though. The "extra-crispy" look is in.

Why are CSS editors so fugly?



The other day, I happened upon a long list of CSS editors, arranged chronologically (newest tools first). I haven't tried any of them except Stylizer, which (beware) lays down the .NET 2.0 framework as part of its install process. (Allow 10 minutes.) Stylizer has a really beautiful UI but is far from being the point-and-click WYSIWYG stylesheet designer I've been looking for. (It's really just an editor; you do a lot of typing.) Although I must say, Stylizer beats the living crap out of most other free (or crippled-down eval-version) CSS editors I've seen, which all tend to look like this unfortunate travesty.

Does anybody else see the irony in the fact that most CSS editors are unbelievably fugly? I mean, if anything cries out for a decent visual design tool with eye-pleasing widgets, it would have to be a CSS editor. But most CSS tools (at the freeware level, anyway) look like they were designed by Eclipse programmers on bad acid.

I guess this is the ultimate example of programmers not knowing how to design user interfaces, and design experts not knowing how to program. Maybe it's no wonder 99% of CSS editors look like Notepad in a 3-piece suit.

Wednesday, November 19, 2008

Using Yahoo Pipes in anger

I finally had an opportunity to use Yahoo Pipes to do something useful.

The quest: Create a super-feed that aggregates a bunch of different Google developer blogs (12 in all), including AJAX Search API, Gears, Gadgets, OpenSocial, Open Source, Mashup Editor, Web Toolkit, App Engine, Google Code, iGoogle, Desktop, and Data API blogs. And: Show the most recent 8 entries for each of the 12 blogs.

Also: Make a searchable version of same, so that you can do a search for (let's say) "Atom" across all 96 latest blog entries in the 12 categories.

I was inspired to create this Pipes-app (plumbingware?) when I saw the recent press release concerning ArnoldIT's Google monitoring service. The ArnoldIT aggregator is dubbed "Overflight" (for reasons known only to the CIA, perhaps).

I was disappointed to find that Overflight is not available as an RSS feed. It also is not searchable. Hence, I went ahead and mashed together my own version of Overflight using Pipes.

As it turns out, I was able to create the Pipe app in a matter of 90 minutes or so (around half an hour longer than I'd budgeted). I didn't have time to aggregate all 74 Google blogs, so I focused just on twelve developer blogs. The resulting app is at Google Developer Blogs Super-Feed, which you can subscribe to here. The keyword-search version is here. (It supports single words or exact phrases.)

I confess I was skeptical, at first, as to whether the performance of a Pipes app that draws together 96 content items from 12 feeds could possibly be acceptable. It turns out to be amazingly fast. Even the queryable version is fast. I have yet to run a keyword or key-phrase search that takes more than 4 seconds to bring back results.

If you haven't tried Pipes yet, you should definitely spend a few minutes exploring it. It's a bit klutzy and constraining (in my experience), and it's sure to frustrate many a grizzled Java or C++ developer. But as a visual Web-app designer, it's an interesting approach. Here's hoping Yahoo takes it a bit further.

Tuesday, November 18, 2008

Pixel Bender plug-in for Photoshop is out now

According to John Nack, the Pixel Bender Gallery plug-in for Photoshop CS4 is now available for download from Adobe Labs. Nack explains that the plug-in "runs filters really, really fast on your graphics card," and notes that the filters people write for Flash will also work in Photoshop (or so he says). A nice added bonus is that the same filters will work in After Effects CS4.

Can't wait to try it.

Moore's Law v2.0

It's no secret that conventional chip designs are about to hit the wall with respect to scaling. Moore's Law 1.0 is in danger of being repealed.

Not to worry, though. Years of research into so-called 3D chip architecture is finally beginning to bear fruit, and it looks like cubes will start replacing chips in at least some devices soon. (HP is making steady progress in this area, along with IBM and others.) Moore v2.0 is well on the way to reality.

If you want to learn more about this technology, check out the latest issue of the IBM Journal of Research and Development, which is devoted to 3D integrated circuit technology. A particularly good overview article is here.

Monday, November 17, 2008

Java HotSpot VM options explained

Have you ever wanted an exhaustive list of all those inscrutable command-line options you can use when you want to force the JVM to do something a certain way while you're either troubleshooting an extremely bizarre bug or trying to figure out why performance sucks even more than usual?

Try going to:

http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp


I don't know if this is an exhaustive list, but it certainly looks like it is. Just reading the various descriptions is quite educational. If you're interested in tuning the JVM for max performance, this is a must-read.

Saturday, November 15, 2008

The fugliest code I've ever written

The other day, I started to wonder: What's the single fugliest piece of code I've ever written?

It's really hard to answer that, because I've been writing code of one flavor or another for roughly twenty years, and in that time I've committed every code atrocity known to man or beast. As I like to tell people, I'm an expert on spotting bad code, because I've written so much of it myself.

What I 've finally decided is that the following line of C code is probably the single most ghastly line of code I've ever perpetrated:
(*((*(srcPixMap))->pmTable))->ctSeed =
(*((*((*aGDevice)->gdPMap))->pmTable))->ctSeed;
Explanation: Long ago, I used to do graphics programming on the Mac. I don't know how the Mac does things today, but ten years ago the Color Manager rebuilt a table whenever Color QuickDraw, the Color Picker Manager, or the Palette Manager requested colors from a graphics device whose color lookup table had changed. To determine whether CLUT had in fact changed, the Color Manager compared the ctSeed field of the current GDevice color table against the ctSeed field of that graphics device's inverse table. If ctSeed values didn't match, the Color Manager invalidated the inverse table and rebuilt it. For fast redraws, you want to avoid that. You could avoid it by forcing the ctSeed field values to be equal.

This is one of many fast-blit tips I explained in an article I wrote years ago for MacTech magazine. Fortunately, I write mostly Java and JavaScript now, and I no longer have to deal with pointer indirection, and today my aspirin drawer is only half-full -- or is it half-empty?

Friday, November 14, 2008

Google Chatterbot

Google's Douwe Osinga has come up with a freaky little online app that turns the almighty Google hash engine into an oracle (not to be confused with Oracle). All you do is enter a word or two into the text box, and wait. The app will do a Google search on your words, find the next "suggested" word and print that, then it will remove the first word of your search string, add the found word, and repeat.

Quite often, the app generates a disarmingly logical response. For example, this morning when I entered "JavaFX will," the reply came back: "javafx will be open sourced monday."

Occasionally you learn something you didn't know. "Richard Stallman" brings back a response of "richard stallman founder of the free african american press."

Interestingly, "top secret" brings back nothing.

Sometimes the app produces garbage. Or is it poetry? When I entered "Sun will," I expected something like "lay off thousands." Instead I got: "sun will shine lyrics by stone roses that are not newly completed and that are in earth orbit but more likely at the top of the mark is to show that the flds not only persevered they fought back they didnt".

e e cummings lives!

Thursday, November 13, 2008

Free downloadable tech books

http://www.freetechbooks.com/

As you'd expect, the list includes a lot of stale and/or not-very-valuable titles, but there's also a lot of genuinely worthwhile stuff there. Judging from their "most popular" list, the site is a big hit with C++ programmers. But there are also 23 free Java books, and lots of timeless reference material for programmers of all stripes.

Wednesday, November 12, 2008

For lack of a nail (Java version)

// For the lack of a nail,
throw new HorseshoeNailNotFoundException("no nails!");

// For the lack of a horseshoe,
EquestrianDoctor.getLocalInstance().getHorseDispatcher()
.dispatch();

// For the lack of a horse,
RidersGuild.getRiderNotificationSubscriberList().getBroadcaster()
.run(
new BroadcastMessage(StableFactory.getNullHorseInstance()));

// For the lack of a rider,
MessageDeliverySubsystem.getLogger().logDeliveryFailure(
MessageFactory.getAbstractMessageInstance(
new MessageMedium(MessageType.VERBAL),
new MessageTransport(MessageTransportType.MOUNTED_RIDER),
new MessageSessionDestination(BattleManager.getRoutingInfo(
BattleLocation.NEAREST))),
MessageFailureReasonCode.UNKNOWN_RIDER_FAILURE);

// For the lack of a message,
((BattleNotificationSender)
BattleResourceMediator.getMediatorInstance().getResource(
BattleParticipant.PROXY_PARTICIPANT,
BattleResource.BATTLE_NOTIFICATION_SENDER)).sendNotification(
((BattleNotificationBuilder)
(BattleResourceMediator.getMediatorInstance().getResource(
BattleOrganizer.getBattleParticipant(Battle.Participant.GOOD_GUYS),
BattleResource.BATTLE_NOTIFICATION_BUILDER))).buildNotification(
BattleOrganizer.getBattleState(BattleResult.BATTLE_LOST),
BattleManager.getChainOfCommand().getCommandChainNotifier()));

// For the lack of a battle,
try {
synchronized(BattleInformationRouterLock.getLockInstance()) {
BattleInformationRouterLock.getLockInstance().wait();
}
} catch (InterruptedException ix) {
if (BattleSessionManager.getBattleStatus(
BattleResource.getLocalizedBattleResource(Locale.getDefault()),
BattleContext.createContext(
Kingdom.getMasterBattleCoordinatorInstance(
new TweedleBeetlePuddlePaddleBattle()).populate(
RegionManager.getArmpitProvince(Armpit.LEFTMOST)))) ==
BattleStatus.LOST) {
if (LOGGER.isLoggable(Level.TOTALLY_SCREWED)) {
LOGGER.logScrewage(BattleLogger.createBattleLogMessage(
BattleStatusFormatter.format(BattleStatus.LOST_WAR,
Locale.getDefault())));
}
}
}

// For the lack of a war,
return new Kingdom();


Adapted from Steve Yegge's Blog Rant of March 30, 2006. Apologies to Ben Franklin (who in turn adapted the original proverb from George Herbert's Jacula Prudentum).

Tuesday, November 11, 2008

Finalization is evil

After listening to the excellent presentation by Hans Boehm on "Finalization, Threads, and the Java Technology Based Memory Model," I have come to the conclusion that finalization is one of Java's worst features, if not the worst.

Be clear, I am not talking about the final keyword (which is actually a great feature of the language). Rather, I am talking about the notion of finalizers, or special "cleanup" methods that the JVM will call before an object is finally reclaimed by the garbage collector. The idea is that if you have an object that's holding onto some system resource (such as a file descriptor), you can free that resource in the finalize() method right before your no-longer-used object gets garbage collected.

The only problem is, there is not only no guarantee as to how quickly, or in what order, your finalizers will be called, there's also no guarantee that they will be called at all.

Sun's Tony Printezis gives a good explanation of finalization in an article on the Sun Developer Network site. It's a brilliant article, but I found myself quite nauseated by the time I got to the end of it. Finalization is just so wrong. So wrong.

"The JVM does not guarantee the order in which it will call the finalizers of the objects in the finalization queue," Printezis points out. "And finalizers from all classes -- application, libraries, and so on -- are treated equally. So an object that is holding on to a lot of memory or a scarce native resource can get stuck in the finalization queue behind objects whose finalizers are making slow progress."

Oh great, that's just what I need. Finalizers blocking on other finalizers while my heap fragments.

It turns out that an instantiation time, an object that contains a finalizer is marked as such and treated differently by the JVM. The extra overhead incurs a performance hit. If your application creates many short-lived objects with finalizers, the hit can be quite substantial. Hans Boehm (see link further above) did some testing and found a 7X slowdown of a test app when objects had finalizers, compared to no finalizers. (With a really fast JVM, namely JRockit, the slowdown was eleven-fold.)

The funny thing is, in all the articles and book chapters I've read about finalization, I have never, not even once, seen a good real-world example of a situation requiring the use of a finalizer. Supposedly, you use a finalizer when you're holding onto a system resource and need to free it before your object goes out of scope. But in reality, it's almost always the case that system resources that are considered scarce or precious have a dispose() or close() or other, similar method, for the explicit purpose of freeing the resource. If you use the resource's normal release mechanism, you don't need a finalizer. In fact a finalizer only lets you hold onto a resource longer than you should.

Someone will argue that you don't always know when or if an object is going out of scope; therefore you should put a call to the release method in a finalizer and be assured that the resource will eventually be released. Okay, <sigh/> that's fine and dandy as long as you can count on your finalize() method being called (which you can't) and as long as your machine doesn't starve for file descriptors, sockets, or whatever the precious resource happens to be, before the finalizer is finally called. Remember, the JVM makes no guarantees about any of this. Finalization is non-deterministic.

I have to say, though, that the contorted, non-real-world examples that are always trotted out to justify the existence of the finalizer mechanism in Java have always struck me as more than a little malodorous. They all have that unmistakeable antipattern smell that gets in your clothing and makes you feel like taking a hot shower when you get home.

Maybe we should just confront the possibility (the likelihood) that finalization is evil. After all, even the people who write long articles about it end up urging you not to use it.

That's good enough for me.

Saturday, November 08, 2008

Google to downsize?


The NYC Google office takes up a city block.

Word comes by way of the Silicon Valley Insider that Google will soon be subletting 50,000 square feet of space at its New York City Googleplex. I've toured the place, and let me tell you, it's Big: the 111 8th Ave office occupies an entire city block of space, between 8th and 9th Avenues and 15th and 16th Streets. It's around 300K square feet altogether.

What kind of luck Google will have subletting this space in the current economy, I don't know. It's very inconveniently located, in the meat-packing district, a couple miles south of Grand Central Terminal and just far enough from the Village to be annoying.

From what I saw on my walk-through, I can tell you that Google tends to be rather wasteful of space, by industry standards. The cafeteria is the size of Macy's and there's one conference room for every three employees (well, almost), and very few programmers can actually reach out and touch someone despite the lack of walls. I'd say there has to be an average of at least 500 sq. ft. per employee by the time you factor in all the conference rooms, hallways, etc.

So there's only two possibilities. Either Google will try to use its space more efficiently (and not lay anyone off) at its NYC office after subletting one-sixth of its available space; or it will lay off a sixth of its Manhattan workforce (around 120 people). Or some combination of both.

My guess is both.

Friday, November 07, 2008

Slow page loads == job cuts?

Interesting factoid: Every 100ms of latency costs Amazon 1% in profit-per-visit.

The same source claims that Google stickiness drops 20% if page load time increases by 500ms.

Which leads me to wonder how much revenue-loss LinkedIn has suffered over the past five years because of its agonizingly slow page loads, and how many of the 10% of its employees who were just laid off might still have their jobs if the pitiful dunderheads who allowed LinkedIn's site to be so pitifully slow hadn't been such pitiful dunderheads.

Thursday, November 06, 2008

Hardware-assisted garbage collection

I find myself spending more and more time thinking about garbage collection, not just as a career move but as a fundamental problem in computer science. I don't pretend to have any expertise in garbage collection, mind you. But I find it an interesting problem space. Particularly when you start to talk about things like hardware-assisted GC.

Yes, there is such a thing as GC-aware chip architecture, and the guys who know a lot about this are the folks at Azul Systems. A good starting point, if you want to read up on this, is Pauseless Garbage Collection: Improving Application Scalability and Predictability. Good late-night reading for propeller-heads who need something to do while waiting for the propeller to wind down.

Wednesday, November 05, 2008

Paging memory leaks to disk

At last month's OOPSLA 2008, there was an interesting presentation by Michael D. Bond on a technology called Melt, which aims to prevent out-of-memory errors in Java programs that harbor memory leaks (which is to say, 99 percent of large Java programs). The Intel-funded research paper, Tolerating Memory Leaks (by Bond and his thesis advisor, Kathryn S. McKinley, U. Texas at Austin), is well worth reading.

The key intuition is that reachability is an over-approximation of liveness, and thus if you can identify objects that are (by dint of infrequent use) putative orphans, you can move those orphan objects to disk and stop trying to garbage-collect them, thereby freeing up heap space and relieving the collector of unnecessary work. If the running program later tries to access the orphaned object, you bring it back to life. All of this is done at a very low level so that neither the garbage collector nor the running program knows that anything special is going on.

Melt's staleness-tracking logic and read-blockers don't actually become activated until the running application is approaching memory exhaustion, defined (arbitrarily) as 80-percent heap fullness. Rather than letting the program get really close to memory exhaustion (which causes garbage collection to become so frequent that the program seems to grind to a halt), stale objects are moved to disk so that the running app doesn't slow down.

Purists will complain that sweeping memory leaks under the carpet like this is no substitute for actually fixing the leaks. In very large programs, however, it can be impractical to find and fix all memory leaks. (I question whether it's even provably possible to do so.) And even if you could find and fix all potential leaks in your program, what about the JRE? (Does it never leak?) What about external libraries? Are you going to go on a quest to fix other people's leaks? How will you know when you've found them all?

I believe in fixing memory leaks. But I'm also a pragmatist, and I think if your app is mission-critical, it can't hurt to have a safety net under it; and Melt is that safety net.

Good work, Michael.

Tuesday, November 04, 2008

Garbage-collection bug causes car crash



A few days ago I speculated that you could lose an expensive piece of hardware (such as a $300 million spacecraft) if a non-deterministic garbage-collection event were to happen at the wrong time.

It turns out there has indeed been a GC-related calamity: one in which $2 million was on the line. (To be fair, this particular calamity wasn't actually caused by garbage collection; it was caused by programmer insanity. But it makes for an interesting story nevertheless. Read on.)

The event in question involved a driverless vehicle (shown above) powered by 10K lines of C# code.

At codeproject.com, you'll find the in-depth post-mortem discussion of how a GC-related bug caused a driverless DARPA Grand Challenge vehicle to crash in the middle of a contest, eliminating the Princeton team from competition and dashing their hopes of winning a $2 million cash prize.

The vehicle had been behaving erratically on trial runs. A member of the team recalls: "Sitting in a McDonald's the night before the competition, we still didn't know why the computer kept dying a slow death. Because we didn't know why this problem kept appearing at 40 minutes, we decided to set a timer. After 40 minutes, we would stop the car and reboot the computer to restore the performance."

The team member described the computer-vision logic: "As the car moves, we call an update function on each of the obstacles that we know about, to update their position in relation to the car. Obviously, once we pass an obstacle, we don't need keep it in memory, so everything 10 feet behind the car got deleted."

"On race day, we set the timer and off she went for a brilliant 9.8 mile drive. Unfortunately, our system was seeing and cataloging every bit of tumbleweed and scrub that it could find along the side of the road. Seeing far more obstacles than we'd ever seen in our controlled tests, the list blew up faster than expected and the computers died only 28 minutes in, ending our run."

The vehicle ran off the road and crashed.

The problem? Heap exhaustion. Objects that should have been garbage-collected weren't. Even though delete was being called on all "rear-view mirror" objects, those objects were still registered as subscribers to a particular kind of event. Hence they were never released, and the garbage collector passed them by.

In Java, you could try the tactic of making rear-view-mirror objects weakly reachable, but eventually you're bound to drive the car onto a shiny, pebble-covered beach or some other kind of terrain that causes new objects to be created faster than they can possibly be garbage-collected, and then you're back to the same problem as before. (There are lots of ways out of this dilemma. Obviously, the students were trying a naive approach for simplicity's sake. Even so, had they not made the mistake of keeping objects bound to event listeners, their naive approach no doubt would have been good enough.)

As I said, this wasn't really a GC-caused accident. It was caused by programmer error. Nevertheless, it's the kind of thing that makes you stop and think.

Monday, November 03, 2008

Why 64-bit Java is slow

In an interesting post at the WebSphere Community Blog, Andrew Spyker explains why it is that when you switch from 32-bit Java to a 64-bit runtime environment, you typically see speed go down 15 percent and memory consumption go up by around 50 percent. The latter is explained by the fact that addresses are simply bigger in 64-bit-land, and complex data structures use a lot of 64-bit values even if they only need 32-bit values. The reason performance drops is because although address width has gotten bigger, processor memory caches have not got bigger in terms of overall Kbytes available. Thus, you are bound to see things drop out of L1 and L2 cache more often. Hence cache misses go up and speed goes down.

Why, then, would anyone invest in 64-bit machines if the 64-bit JVM is going to give you an immediate performance hit? The answer is simple. The main reason you go with 64-bit architecture is to address a larger memory space (and flow more bytes through the data bus). In other word, if you're running heap-intensive apps, you have a lot to gain by going 64-bit. If you have an app that needs more than around 1.5 GB of RAM, you have no choice.

Why 1.5GB? It might actually be less than that. On a 4GB Win machine, the OS hogs 2GB of RAM and will only let applications have 2GB. The JVM, of course, needs its own RAM. And then there's the heap space within the JVM; that's what your app uses. It turns out that the JVM heap has to be contiguous (for reasons related to garbage collection). The largest piece of contiguous heap you can get, after the JVM loads (and taking into account all the garbage that has to run in the background in order to make Windows work), is between 1.2GB and 1.8 GB (roughly) depending on the circumstances.

To get more heap than that means either moving to a 64-bit JVM or using Terracotta. The latter (if you haven't heard of it) is a shared-memory JVM clustering technology that essentially gives you unlimited heap space. Or should I say, heap space is limited only by the amount of disk space. Terracotta pages out to disk as necessary. A good explanation of how that works is given here.

But getting back to the 64-bit-memory consumption issue: This issue (of RAM requirements for ordinary Java apps increasing dramatically when you run them on a 64-bit machine) is a huge problem, potentially, for hosting services that run many instances of Java apps for SaaS customers, because it means your scale-out costs rise much faster than they should. But it turns out there are things you can do. IBM, in its JVM, uses a clever pointer-compression scheme to (in essence) make good use of unused high-order bits in a 64-bit machine. The result? Performance is within 5 percent of 32-bit and RAM growth is only 3 percent. Graphs here.

Oracle has a similar trick for BEA's JRockit JVM, and Sun is just now testing a new feature called Compressed oops (ordinary object pointers). The latter is supposedly included in a special JDK 6 "performance release" (survey required). You have to use special command-line options to get the new features to work, however.

Anyway, now you know why 64-bit Java can be slow and piggish. Everything's fatter in 64-bit-land.

For information about large-memory support in Windows, see this article at support.microsoft.com. Also consult this post at sinewalker.

Sunday, November 02, 2008

Java 1.4.2 joins the undead

Java 1.4.2 died last week. According to Sun's "End of Service Life" page, Java 1.4.2 went EOSL last Thursday. The only trouble is, it's still moving.

Java 5 (SE) was released in 2004 and Java 6 has been out since 2006. Java 5 will, in fact, also be at EOSL in less than a year. (You might call it the Java "Dead Man Walking" Edition.) And yet, if you do a Google search on any of the following, guess what you get?

java.lang.Object
java.lang.Class
java.lang.Exception
java.lang.Throwable
java.lang.Runtime
java.awt.Image
java.io.File
java.net.URL
JComponent
JFrame

If you do a Google search on any one of these, the very first hit (in every case) is a link to Sun's Javadoc for the 1.4.2 version of the object in question.

A year from now (when Java 5 hits the dirt) I wonder how many of these 10 searches will still take you to 1.4.2 Javadoc? (Remember, Java 5 has been out for almost 5 years and still doesn't outrank 1.4.2 in Google searches.) I'm guessing half of them. What do you think?