In an interesting post at the WebSphere Community Blog, Andrew Spyker explains why it is that when you switch from 32-bit Java to a 64-bit runtime environment, you typically see speed go down 15 percent and memory consumption go up by around 50 percent. The latter is explained by the fact that addresses are simply bigger in 64-bit-land, and complex data structures use a lot of 64-bit values even if they only need 32-bit values. The reason performance drops is because although address width has gotten bigger, processor memory caches have not got bigger in terms of overall Kbytes available. Thus, you are bound to see things drop out of L1 and L2 cache more often. Hence cache misses go up and speed goes down.
Why, then, would anyone invest in 64-bit machines if the 64-bit JVM is going to give you an immediate performance hit? The answer is simple. The main reason you go with 64-bit architecture is to address a larger memory space (and flow more bytes through the data bus). In other word, if you're running heap-intensive apps, you have a lot to gain by going 64-bit. If you have an app that needs more than around 1.5 GB of RAM, you have no choice.
Why 1.5GB? It might actually be less than that. On a 4GB Win machine, the OS hogs 2GB of RAM and will only let applications have 2GB. The JVM, of course, needs its own RAM. And then there's the heap space within the JVM; that's what your app uses. It turns out that the JVM heap has to be contiguous (for reasons related to garbage collection). The largest piece of contiguous heap you can get, after the JVM loads (and taking into account all the garbage that has to run in the background in order to make Windows work), is between 1.2GB and 1.8 GB (roughly) depending on the circumstances.
To get more heap than that means either moving to a 64-bit JVM or using Terracotta. The latter (if you haven't heard of it) is a shared-memory JVM clustering technology that essentially gives you unlimited heap space. Or should I say, heap space is limited only by the amount of disk space. Terracotta pages out to disk as necessary. A good explanation of how that works is given here.
But getting back to the 64-bit-memory consumption issue: This issue (of RAM requirements for ordinary Java apps increasing dramatically when you run them on a 64-bit machine) is a huge problem, potentially, for hosting services that run many instances of Java apps for SaaS customers, because it means your scale-out costs rise much faster than they should. But it turns out there are things you can do. IBM, in its JVM, uses a clever pointer-compression scheme to (in essence) make good use of unused high-order bits in a 64-bit machine. The result? Performance is within 5 percent of 32-bit and RAM growth is only 3 percent. Graphs here.
Oracle has a similar trick for BEA's JRockit JVM, and Sun is just now testing a new feature called Compressed oops (ordinary object pointers). The latter is supposedly included in a special JDK 6 "performance release" (survey required). You have to use special command-line options to get the new features to work, however.
Anyway, now you know why 64-bit Java can be slow and piggish. Everything's fatter in 64-bit-land.
For information about large-memory support in Windows, see this article at support.microsoft.com. Also consult this post at sinewalker.