Thursday, February 05, 2009

OpenCalais-OpenOffice mashup

OpenCalais is one of the most innovative and potentially disruptive online services to hit the Web in recent memory. To understand its importance, you have to be a bit of a geek, preferably a text-analytics or computational-linguistics geek, maybe an information-access or "search" geek, or a reasonably technical content-technology freak who understands the potential uses of metadata. It's not easy to sum up OpenCalais in a few words. Suffice it to say, though, if you haven't heard of OpenCalais before, you should visit http://www.opencalais.com. It's an interesting undertaking, to be sure.

One of the services OpenCalais exposes is automatic extraction of entity metadata from text. If you call the OpenCalais service using the proper arguments, you can essentially pass it any kind of text content you want (an article, a blog, a Wikipedia entry, an Obama speech, whatever) and the service will hand you back an itemized list of the entities it detected in the text. "Entities" means things like names of persons, cities, states or provinces, countries, prices, e-mail addresses, industry terms -- almost anything that would qualify as a "proper noun" or a term with special significance (not just keywords).

The OpenCalais service brings back more than a list of terms. It also reports the number of occurrences of the terms and a relevancy score for each term. The latter is a measure of the relative semantic importance of the term in question to the text in question. This score can help in determining cut-offs for automatic tagging, ordering of metadata in tag clouds, and other purposes. It's a way to get at the "aboutness" of a document.

OpenCalais does many, many things beyond entity extraction. But you should already be able to imagine the many downstream disruptions that could occur, for example, in enterprise search if Text-Analytics-as-a-Service (or heaven forbid, Machine-Learning-as-a-Service) were to catch on bigtime.

The OpenCalais API is still growing and evolving (it's at an early stage), but it's already amazingly powerful, yet easy to use. Writing a semantic AJAX app is a piece of cake.

My first experiment with OpenCalais involved OpenOffice. I use OpenOffice intensively (as a direct replacement for the Microsoft Office line of shovelware), and although OpenOffice (like Office) has more than its fair share of annoyances, it also has some features that are just plain crazy-useful, such as support for Macros written in any of four languages (Python, Basic, beanshell, and JavaScript). The JavaScript binding is particularly useful, since it's implemented in Java and allows you to tap the power of the JRE. But I'm getting ahead of myself.

What I decided to try to do is create an OpenOffice macro that would let me push a button and have instant entity-extraction. Here's the use case: I've just finished writing a long business document using OpenOffice, and now I want to develop entity metadata for the document so that it's easier to feed into my company's Lucene-based search system and shows up properly categorized on the company intranet. To make it happen, I want (as a user) to be able to highlight (select) any portion of the document's text, or all of it, then click a button and have OpenOffice make a silent AJAX call to the OpenCalais service. Two seconds later, the metadata I want appears, as if by magic, at the bottom of the last page of the document, as XML. (Ideally, after reviewing the XML, I would be able to click an "Accept" button and have the XML vanish into the guts of the .odf file.)

I wrote a 160-line script that does this. The source code is posted at http://sites.google.com/site/snippetry/Home/opencalais-macro-for-openoffice. Please note that the code won't work for you until you get your own OpenCalais license key and plug it into the script at line No. 145. For space reasons, I'm not going to explain how to install an OpenOffice macro (or create a toolbar button for it after it's installed). That's all standard OpenOffice stuff.

The key to understanding the OpenCalais macro is that all we're doing is performing an HTTP POST programmatically using Java called from JavaScript. Remember that the JavaScript engine in OpenOffice is actually the same Rhino-based engine that's part of the JRE. This means you can instantiate a Java object using syntax like:

var url = new java.net.URL( "http://www.whatever.url" );

Opening a connection and POSTing data to a remote site over the wire is straightforward, using standard Java conventions. The only tricky part is crafting the parameters expected by OpenCalais. It's all well-documented on the OpenCalais site, fortunately. Lines 105-120 of the source code show how to query OpenCalais for entity data. You have to send a slightly ungainly chunk of XML in your POST. No big deal.

For testing purposes, I ran my script against text that I cut and pasted into OpenOffice from an Associated Press news story about Bernard Madoff's customer list (the investment advisor who showed his clients how to make a small fortune out of a large one). OpenCalais generated the following metadata in roughly three seconds:

<!-- Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service. -->

<!--City: NEW YORK, Danbury, Brookline, Oceanside, Pembroke Pines, West Linn, Company: Associated Press, CNN, Country: Switzerland, Kenya, Cayman Islands, Currency: USD, Event: Person Communication and Meetings, Facility: Wall Street, World Trade Center, Hall of Fame, IndustryTerm: Internet support group, MedicalCondition: brain injury, Movie: World Trade Center, NaturalFeature: San Francisco Bay, Long Island, Organization: U.S. Bankruptcy Court, Person: Bernard Madoff, Alan English, Patricia Brown, Bob Finkin, Bonnie Sidoff, Evelyn Rosen, Teri Ryan, Lynn Lazarus Serper, Sharon Cohen, Sandy Koufax, Jordan Robertson, Neill Robertson, Samantha Bomkamp, ADAM GELLER, John Malkovich, Bernie Madoff, Allen G. Breed, Larry King, Rita, Mike, Nancy Fineman, Larry Silverstein, ProvinceOrState: Florida, Oregon, New York, Massachusetts, Connecticut, Technology: ADAM, --><OpenCalaisSimple>

<Description>

<calaisRequestID>a1d28b3b-4ef7-4aa6-b293-8df46ea5e988</calaisRequestID>

<id>http://id.opencalais.com/KG8hyw2LGKjgRnJnRN86FQ</id>

<about>http://d.opencalais.com/dochash-1/6010f15f-bb32-3e59-9b55-c8fef29d38ed</about>

</Description>

<CalaisSimpleOutputFormat>

<Person count="38" relevance="0.771">Bernard Madoff</Person>

<Person count="20" relevance="0.606">Alan English</Person>

<Person count="16" relevance="0.370">Patricia Brown</Person>

<Currency count="11" relevance="0.686">USD</Currency>

<Person count="10" relevance="0.524">Bob Finkin</Person>

<Person count="9" relevance="0.586">Bonnie Sidoff</Person>

<Person count="8" relevance="0.574">Evelyn Rosen</Person>

<Person count="6" relevance="0.378">Teri Ryan</Person>

<Person count="5" relevance="0.373">Lynn Lazarus Serper</Person>

<ProvinceOrState count="4" relevance="0.578" normalized="Florida,United States">Florida</ProvinceOrState>

<City count="3" relevance="0.428" normalized="New York,New York,United States">NEW YORK</City>

<Facility count="2" relevance="0.165">Wall Street</Facility>

<NaturalFeature count="2" relevance="0.398">San Francisco Bay</NaturalFeature>

<ProvinceOrState count="2" relevance="0.305" normalized="Oregon,United States">Oregon</ProvinceOrState>

<Event count="2">Person Communication and Meetings</Event>

<City count="1" relevance="0.051" normalized="Danbury,Connecticut,United States">Danbury</City>

<City count="1" relevance="0.078" normalized="Brookline,Massachusetts,United States">Brookline</City>

<City count="1" relevance="0.058" normalized="Oceanside,New York,United States">Oceanside</City>

<City count="1" relevance="0.134" normalized="Pembroke Pines,Florida,United States">Pembroke Pines</City>

<City count="1" relevance="0.283" normalized="West Linn,Oregon,United States">West Linn</City>

<Company count="1" relevance="0.031" normalized="Associated Press">Associated Press</Company>

<Company count="1" relevance="0.286" normalized="Time Warner Inc.">CNN</Company>

<Country count="1" relevance="0.249" normalized="Switzerland">Switzerland</Country>

<Country count="1" relevance="0.249" normalized="Kenya">Kenya</Country>

<Country count="1" relevance="0.249" normalized="Cayman Islands">Cayman Islands</Country>

<Facility count="1" relevance="0.286">World Trade Center</Facility>

<Facility count="1" relevance="0.286">Hall of Fame</Facility>

<IndustryTerm count="1" relevance="0.104">Internet support group</IndustryTerm>

<MedicalCondition count="1" relevance="0.078">brain injury</MedicalCondition>

<Movie count="1" relevance="0.286">World Trade Center</Movie>

<NaturalFeature count="1" relevance="0.141">Long Island</NaturalFeature>

<Organization count="1" relevance="0.289">U.S. Bankruptcy Court</Organization>

<Person count="1" relevance="0.031">Sharon Cohen</Person>

<Person count="1" relevance="0.286">Sandy Koufax</Person>

<Person count="1" relevance="0.031">Jordan Robertson</Person>

<Person count="1" relevance="0.031">Neill Robertson</Person>

<Person count="1" relevance="0.031">Samantha Bomkamp</Person>

<Person count="1" relevance="0.297">ADAM GELLER</Person>

<Person count="1" relevance="0.286">John Malkovich</Person>

<Person count="1" relevance="0.279">Bernie Madoff</Person>

<Person count="1" relevance="0.031">Allen G. Breed</Person>

<Person count="1" relevance="0.286">Larry King</Person>

<Person count="1" relevance="0.141">Rita</Person>

<Person count="1" relevance="0.260">Mike</Person>

<Person count="1" relevance="0.083">Nancy Fineman</Person>

<Person count="1" relevance="0.286">Larry Silverstein</Person>

<ProvinceOrState count="1" relevance="0.058" normalized="New York,United States">New York</ProvinceOrState>

<ProvinceOrState count="1" relevance="0.078" normalized="Massachusetts,United States">Massachusetts</ProvinceOrState>

<ProvinceOrState count="1" relevance="0.051" normalized="Connecticut,United States">Connecticut</ProvinceOrState>

<Technology count="1" relevance="0.297">ADAM</Technology>

<Topics>

<Topic Score="0.403" Taxonomy="Calais">Business_Finance</Topic>

</Topics>

</CalaisSimpleOutputFormat></OpenCalaisSimple>

Think about it: With the push of a button, a person creating text in a word processor can generate semantically rich metadata in real time without leaving the word-processor environment, using no IT resources. And at no cost. (OpenCalais is free to all.) With very little work, the entire process could be made to happen 100% transparently. The author doesn't even have to know anything's happening over the wire. At check-in time, his or her document is already a good semantic citizen, by magic.

This is just one tiny example of what can be done with the technology. Maybe you can come up with others? If so, please keep me in the loop. I'm interested in knowing what you're doing with OpenCalais.

Retweet this.

OpenCalais: Metadata-as-a-Service


By now, you've probably heard of OpenCalais, the free (as in free) online metadata-extraction service. If you haven't heard of it (my God, man, where have you been?) you should drop what you're doing right now (yes, now; I'll wait) and go immediately to the OpenCalais web site and drink from the fire hydrant. This is a game-changer in the making. You owe it to yourself to be on this bus with both feet.

Basically, what we're dealing with here is Metadata-as-a-Service: OpenCalais is a text analytics web service (created by Thomson Reuters) with SOAP and REST APIs. It's "open" not in the sense of source code, but of open access. Anyone can use the service for any reason (commercial or personal). Calais point man Thomas Tague explains the motivations behind OpenCalais in a Rob McNealy podcast interview, remarkable for (among other things) Tague's surprising explanation of why a megalith like Reuters would offer such a powerful, valuable online service for free.

I've been experimenting with the API (more on that tomorrow) and I have to say, I'm impressed. You can query the OpenCalais service, sending it text in any of several formats, and receive metadata back in your choice of RDF, "text/simple", or Microformats (great if you're wanting big lists of rel-tags). The "text/simple" format is great for entity extraction, and I'll give source code for how to do that tomorrow.

Response data comes back in your choice of XML or JSON. So yeah, it means what you think it means: Text analytics is now the province of AJAX. And it's free. Mash away.

Like I say, I've been experimenting with the APIs, and I've come up with a fairly impressive (if I may say so) little demo, written in JavaScript, that will erase any residue of doubt in anyone's mind as to how powerful the Metadata-as-a-Service metaphor is. Return here tomorrow for the demo, with source code.

Meanwhile, Twitter me at @kasthomas if you have questions. And please retweet this.

Wednesday, February 04, 2009

A Semantic Web Crash Course

I finally found a wonderfully terse, easy-to-follow "executive summary" (not exactly short, though) explaining How to publish Linked Data on the Web: In other words, how to make your site Semantic-Web-ready.

If you've struggled with trying to visualize how the various pieces of the Semantic Web fit together (all the RDF-based standards, for example), and you still feel as though you aren't quite grasping the big picture, go read How to publish Linked Data on the Web. It'll bring you up to speed fast. The authors deserve special mention (join me in a polite round of applause, if you will):
Chris Bizer (Web-based Systems Group, Freie Universität Berlin, Germany)
Richard Cyganiak (Web-based Systems Group, Freie Universität Berlin, Germany)
Tom Heath (Knowledge Media Institute, The Open University, Milton Keynes, UK)
I hope vendors in the content-management space will get to work producing tools aimed at helping people implement "highly semantic sites" (tm). Search 2.0-and-up will rely heavily on linked data, and the advantages of a linked-data-driven Web in terms of enabling thousands of Web APIs to be conflated down to scores or hundreds will become apparent quickly once the ball starts rolling.

Tuesday, February 03, 2009

Why super( ) sucks

One complaint I heard someone make recently, in the context of JavaScript not having a true inheritance model, is that there is no super() in JavaScript. Somebody, in a forum somewhere, actually whined and moaned about not being able to call super(). I believe the whiner was a Java programmer.

There shouldn't be a super() in Java, either, though. That's the real issue.

I'm flabbergasted that anyone thinks super() is a meaningful thing to have to write, in any language. What could be more obscure and arcane than super()? It's totally cryptic. It's shorthand for "go invoke a method of my parent that I happen to have intimate knowledge of. Never mind the side effects, I'm clairvoyant enough to understand all that, even if my parent's concrete implementation changed without my knowing it."

I thought secret knowledge and hidden dependencies were supposed to be evil.

Monday, February 02, 2009

Twitter traffic still soaring


(Click on the graph for a larger version. Or go to quantcast for more.)

Thank goodness there's something in this economy that isn't slowing down.

Sunday, February 01, 2009

Inheritance as Antipattern

Allen Holub tells of once attending a Java user group meeting where James Gosling was the featured speaker. According to Holub, during the Q&A session, someone asked Gosling: "If you could do Java over again, what would you change?" Gosling replied: "I'd leave out classes."

Holub recalls: "After the laughter died down, he explained that the real problem wasn't classes per se, but rather implementation inheritance: the extends relationship."

I bring this story up because it seems a lot of people still think inheritance (supposedly the cornerstone of OOP) is good. Those same people want to impose the inheritance model on JavaScript. Which to me would be a terrible thing to do. I wouldn't go so far as to say inheritance is evil, even though many experts have indeed said exactly that. But it is certainly the most misused feature of Java. It ruins most otherwise-good APIs, I've found. (Google's Joshua Bloch has observed the same thing.) In the real world, inheritance tends to be an antipattern.

Inheritance violates encapsulation, undercutting the most basic of OOP principles.

Quite simply: Inheritance requires children to understand their parents (which I can tell you from personal experience is a dangerous assumption).

Subclassing leads to bloat (something Java needs more of...), because children inherit the methods of their entire ancestry chain. Which leads to things like JMenu having 433 methods.

It also locks new classes into preexisting concrete implementations, which introduces brittleness. A change in an ancestral method can break children unexpectedly. This is a well known drawback of inheritance.

Here is a verbatim quote from the Java API documentation for the Properties class:

Because Properties inherits from Hashtable, the put and putAll methods can be applied to a Properties object. Their use is strongly discouraged as they allow the caller to insert entries whose keys or values are not Strings. The setProperty method should be used instead. If the store or save method is called on a “compromised” Properties object that contains a non-String key or value, the call will fail.

This sort of thing has an odor about it. It reeks of poor design.

There's plenty more to be said on this subject, but it's been said elsewhere and I won't regurgitate needlessly. And again, I have to stress, I don't consider inheritance evil so much as misused. More on that some other time.

The thing that bothers me is that so many Java programmers who haven't taken the time to grok Brendan Eich's motivations for making JavaScript the way it is (drill into some of the links at this page to get a tiny taste of what I'm talking about) think JavaScript's compositionality-based prototype model is a flaw, or at the very least, an egregious oversight. Hardly. The langauge was designed that way for a reason.

Gosling, Eich, Bloch, Holub, all know what they're talking about. Inheritance is overrated.

Saturday, January 31, 2009

Script for bypassing Google's "site may harm your computer" page

There was an outbreak of the bogus "visiting this web site may harm your computer" warning-page redirection on Google this morning. Apparently there have been occurrences of this phenomenon before (judging from blogs going back to 2007). You run a search on Google, and all of a sudden every hit has a warning link under it that says "visiting this web site may harm your computer", and if you try to go to the page in question, you get directed to a Google warning page that urges you not to go to the actual page you want.

On Twitter, people began labelling the problem #GOOGLEMAYHARM, which of course is phonetically similar to GOOGLE MAYHEM.

Naturally, I went to work on a Greasemonkey script to fix the situation. And naturally, in the time it took me to write the script, Google fixed the silly redirection thing.

In any event, if you are seeing the "harmful site" warning, here's a Greasemonkey script that should allow you to bypass the Google redirection page:

// ==UserScript==
// @name GoogleHitFixer
// @namespace fixer
// @include http://www.google.com/*
// ==/UserScript==

// Routes around the bogus warning page that says
// "visiting this web site may harm your computer"

// Public domain. Author: Kas Thomas

( function main( ) {

var signature = "interstitial?url";

var address = location.toString( );

if ( address.indexOf( signature ) == -1 )
return;

var newUrl = address.split( "?url=" )[1];

location.href = newUrl;

} )( );

Friday, January 30, 2009

"Crux" app wins JCR Cup

Day Software announced the winner of the JCR Cup 08 competition today. College sophomore Russell Toris won top prize (taking home a MacBook Pro) with a little web app called "Crux" (a shameless play on CRX, which is Day's commercial Java Content Repository).

I managed to learn a tiny bit more about Crux. And from what I've seen, it is indeed a clever use of JSR-170 technology.

What it lets you do is copy and paste arbitrary selections from any web page that's open in your browser, and save them straight to a JSR-170 repository (in this case, Day CRX, which is built atop Apache Jackrabbit). When you want to retrieve the selection(s) again, you can browse the repository and open them again in your browser.

Why is this useful? Here's the use case. Suppose you've got a dozen tabs open in Firefox (because you're researching a term paper) and you want to save references to the various content items you've been looking at. The conventional thing to do is bookmark all the open pages. But the problem with bookmarks is that they don't actually encapsulate any content from the pages you were on: They just encapsulate URLs and page titles (which are often meaningless).

With Crux, you highlight and Copy content selections from pages, then push those items into the repository with the click of a button. (Of course, you have to have a repository server running somewhere, reachable via HTTP.) When you want the clipped items again, you visit one URL (the node in the repository where the items are stored), and there are all your snippets, viewable in a single summary page. And they render nicely since Crux saves actual selection-source markup, not just raw text. Any embedded links, images, etc., in the clipped content are still there. Also, each entry in Crux contains a trackback link to the original source page, in case you really do need to go back to the page in question.

If you think about it, saving content clippings is actually a very compelling alternative to bookmarking. A bookmark is just an address. What you care about is the content, not the address. I have hundreds of bookmarks already. I can't keep them straight. They just keep piling up, and I can't remember what most of them are for. (Even the ones I use a lot, I sometimes have trouble finding again.) Crux provides a useful alternative.

How do you find something in the repository after you've pushed hundreds of content items into it with Crux? You use whatever repository search tools you'd normally use. Only this time, you can actually run full text searches on the content items you stored, rather searching page names in your Bookmarks collection.

Functionality similar to Crux is available via Clipmarks. Also, Microsoft tries to do some of this with its Onfolio and OneNote products (which are, IMHO, painfully klutzy). Crux looks and feels very light and simple. It definitely hits a sweet spot.

Whether Crux's source code will ever see the light of day, I don't know. (Entrants in the JCR Cup competition were not required to make source code public.) Reportedly, the code is all JavaScript and requires Greasemonkey.

In any event, congratulations Russell Toris! And kudos to Day for sponsoring the competition. It's nice to see JCR being used for something practical, lightweight, and simple. Well done.

Google Measurement Labs?

Google has introduced yet another service, called Google Measurement Labs, designed to test your connection speed and provide various types of information about your last-mile chokepoints.

I have read Google's own announcement about this as well as several blogs that try to explain it, and honestly, I still can't fathom the true motivation(s) behind it or why the heck anyone outside of academia (or perhaps the NSA) would even care. Obviously, Google has an interest in last-mile problems (the Internet is its lifeblood), but offering this set of diagnostics to the general public gives the impression that Google is very proudly answering a question nobody asked.

I don't get it.

Wednesday, January 28, 2009

The energy cost of SSL

I just finished reading a paper called The Energy Cost of SSL in Deeply Embedded Systems, by Sun Microsystems researchers Vipul Gupta and Michael Wurm. Fascinating stuff.

It turns out that secure communication over SSL shortens battery life by approximately 15% in very small (mote-like) wireless devices that use SSL. The size of such devices (commonly used as sensors in manufacturing, but soon to be all around us, if you believe the sci-fi hype) makes them extraordinarily sensitive to anything that draws electrical current, including computation. In a mote, it's not uncommon for 5% of the available energy from a pair of alkaline batteries to be consumed by SSL handshakes, 10% by polling, 25% by SSL data transfer, and the remaining 60% by the device itself. Those ratios will be different for non-secure (non-SSL) data transfer. If you do the apples-to-apples energy balance, the SSL mote pays an energy penalty of 15%, overall, for security.

The authors of the paper don't discuss things like efficient versus inefficient implementations (in assembly language) of handshake algorithms (such as Elliptic Curve), but obviously a poor implementation could significantly affect performance. An unfriendly chip architecture could affect things too. The authors do mention that the particular chip they used (TI MSP430) "offers a rotate instruction which speeds up SHA1 and MD5 by almost 40%."

Motes aren't ubiquitous yet, but hopefully by the time they are, they'll be powered by something other than batteries (e.g., ambient light), so that we don't have to worry about SSL causing even more zinc and manganese to enter the environment when worn-out mote batteries find their way into landfills. Imagine that: SSL as an environmental threat . . .

Tuesday, January 27, 2009

Microsoft aims to patent CSS extensions

I came across an interesting patent application from Microsoft (published 15 January 2009) called Extended Cascading Style Sheets in which Microsoft extols the virtues of something called CSSX (which I suppose means CSS Extensions). From the Abstract:
A CSSX (Extended Cascading Style Sheets) file including non-CSS (Cascading Style Sheet) extensions is used to define and reference variables and inheritance sets. A CSSX file compiler determines a value of the defined variable, modifies the CSSX file by replacing all references to the defined variable with the value, and generates the CSS file from the modified CSSX file. The inheritance set is defined in the CSSX file and includes a reference to a previously defined CSS rule set. The CSSX file compiler defines a new CSS rule set as a function of the determined attributes included in the previously defined CSS rule set of the defined inheritance set and generates the CSS file including the newly defined CSS rule set.
From what I can tell, Microsoft is proposing adding #defines (and other precompiler-looking stuff) to Cascading Style Sheets so that a last-minute "compile pass" on the server will generate CSS of the correct flavor for a given page request (correct as to localization, reading direction, accessibility, etc.) -- all done dynamically, just in time. The intent is clearly to eliminate the need for webmasters and others to create and manage multiple hard-coded flavors of the same stylesheet. In fact, CSSX aims to make CSS more compositional all the way around. (The patent talks about introducing new inheritance notions into CSS, for example.)

Of course, there are drawbacks to consider. CSSX is not as easy to read or maintain as CSS (but I suppose if your development tools are good enough, this won't matter so much). CSSX is more verbose than CSS. It's doubtless harder to QA-test. But the main drawback, I think, is that it tends to mix presentation logic with non-presentation logic. That's a dangerous place to go.

Unfortunately, Microsoft wants to patent CSSX when it should actually be working with a standards body on it. Does the world really need another proprietary "standard" from Redmond, at this point? What's the point in extending a standard, then trying to patent it?

That part seems really, really stupid to me.

Sunday, January 25, 2009

Most Google employee options are under water

I didn't listen to the recent Google conference call, but according to someone who did, 85% of Google employee stock options are now under water. That's got to put a damper on "company spirit" for everyday workers. I've been in this situation myself (i.e., working for a company where everybody has options, but the options are hopelessly far below the strike price). It is a dreadful feeling, especially if the company you work for has a great history, a great culture, and brilliantly engineered products that should be doing much better in the market than they are.

Everyone knows, of course, that there is no guarantee a company's stock price will go up over time, and most employees are mature about this realization. But it still hurts. Under-water options hurt.

Knowing full well that this kind of thing saps employee enthusiasm and causes the wrong kind of water-cooler conversation, Google last week announced a new option-repricing plan for employees. The features of the plan:
  • It is a one-for-one, voluntary exchange.
  • The offer period begins on January 29, 2009 and ends at 6:00 a.m. Pacific Time on March 3, 2009, unless Google is required or opts to extend the offer period.
  • Employees will be able to exchange their under-water options for new options with a strike price equal to the closing price of Google stock on March 2, 2009.
  • The new options will have a new vesting schedule that adds 12 months to the original vesting schedule.
As it turns out, a company I worked for did this same thing. It offered employees the chance to roll over their existing options into new ones based on a new (current) strike price. But the vesting date moved out. If you were almost-vested (perhaps already vested) in your worthless options, you lost your vesting.

The problem with resetting the clock, of course, is that if the stock keeps sinking, you're still screwed. Also, if you have to be an employee in order to see your options continue to vest, who's to say you'll still be working for the company in a year?

Options have expiration dates. The company I worked for set a shorter expiration date for the new options (in this plan) than the original options had. So the time window for you to see a gain was narrowed. I don't know if that's the case with the new Google plan.

Bottom line, options (as an employee incentive) are tricky. In good times, they do work as an incentive. In bad times, they work as a disincentive (from what I've witnessed). Repricing plans don't always work out. (In the case of the company I worked for, it did not work to the employees' benefit.) In fact, repricing plans generally tend to favor the company, in one way or another. I believe that's the case here. Otherwise, I don't think Google would offer the plan at all.

Friday, January 23, 2009

JSON beautifier

The other day, I wanted to take a look at my Firefox bookmarks file. I could have exported my bookmarks to an HTML file using the Organize Bookmarks dialog, but instead I wanted to just use the existing bookmarks file (the private copy Firefox already uses). It turns out Firefox keeps archives of your bookmarks in

C:\Documents and Settings\[USER]\Application Data\Mozilla\Firefox\Profiles\bookmarkbackups

(on Windows)

and they are formatted as JSON! Trouble is, the JSON text has no newlines or tabs or other spacing, so if you open the bookmarks file in Notepad, you'll see One Big Huge Line of unformatted text.

Unformatted JSON is ugly. But fortunately, there's an answer.

Over at http://archive.dojotoolkit.org/nightly/dojotoolkit/dojox/gfx/demos/beautify.html there's an online form that will beautify (pretty-print to your screen) any raw JSON that you paste into the form. It does an exceptionally nice job. Give it a try if you have a need to reformat JSON source.

Tuesday, January 20, 2009

What politicians and company blogs have in common

Forrester Research, in a report called Time To Rethink Your Corporate Blogging Ideas, has confirmed what some of us have long suspected, which is that company blogs are viewed with distrust by the overwhelming majority of people who read them.

Josh Bernoff's research found that out of 18 different possible sources of information (ranging from personal e-mails to newspapers and TV to wikis and online classifieds), corporate blogs rank at the very bottom of the trust scale (18th place), with only 16% of people who read them saying that they trust them.

By comparison, 15% of Americans say they trust politicians (ref).

I was able to download a free copy of the $279 Forrester report at http://www.forrester.com/imagesV2/uplmisc/Josh_blogging.pdf. Hopefully the link will still work when you go there.

Monday, January 19, 2009

Data retrieval resource list

I came across this web page that contains a large and interesting list of online resources pertaining to information retrieval and search technologies. It includes books, courses, research articles, SEO tips, and much more.

Scroll down to get to the go0d stuff.

Saturday, January 17, 2009

Dr. Dobbs is (un)dead

This is an incredibly sad day for me.

Dr. Dobbs Journal, one of the great programming resources of the late DOS/early Windows era, has finally died, a victim (ironically) of the Internet's triumph over pulp-and-ink.

The venerable programmer's magazine hasn't exactly gone away entirely: It will (somewhat sadly) continue as "Dr. Dobbs Report — A Special Software Development Monthly Section in InformationWeek Magazine."

But that, too, has the smell of death about it.

To say that I owe a lot to DDJ is an understatement. DDJ was a critical part of my programming education. Allen Holub's early DDJ articles on the newfangled C language taught me a huge amount about programming and profoundly influenced my development as a coder. (Eventually, in 1991, I even wrote an article myself for DDJ.)

It's a sad thing, this disappearance of the printed word, this seemingly unstoppable deprecation of protons and neutrons. Magazines, newspapers, books, music CDs -- all on the endangered species list. Is all of human culture destined to be disseminated by coax cable and microwave radiation?

If you'll excuse me, I have to be alone right now.

Friday, January 16, 2009

The carbon cost of a Google search

Thursday, January 15, 2009

Adware author tells all

There's a truly fascinating interview with Ruby/Lisp/Scheme/C programmer (and onetime adware creator) Matt Knox over at http://philosecurity.org. Anybody who has always wondered how adware works, and why it's so infuriatingly difficult to get rid of, needs to read that interview.

It so happens, I recently spent several hours ridding my son's machine of a particularly nasty adware furball. I was able to eradicate most of it, but there were some peculiar registry entries I couldn't get rid of no matter how I tried. Immutable registry entries.

Now I know why such entries can exist.

Matt Knox explains how, in his days working for Direct Revenue (the firm Eliot Spitzer sued a couple years ago, for -- ahem -- propagating Trojans), he created unwritable registry keys by exploiting a little-known difference between the Win32 API and the NT API. "Windows, ever since XP, is fundamentally built on top of the NT kernel," Matt Knox explains. "NT is fundamentally a Unicode system, so all the strings internally are 16-bit Unicode. The Win32 API is fundamentally ASCII. There are strings that you can express in 16-bit counted Unicode that you can’t express in ASCII." (Um, yeah: A Unicode string can contain 16-bit values in which the top 8 bits are zeros. In C, strings are null-terminated, so a Unicode string containing what appear to be null bytes might appear truncated to a process that was not expecting Unicode. )

Matt continues: "That meant that we could, for instance, write a Registry key that had a null in the middle of it. Since the user interface is based on the Win32 API, people would be able to see the key, but they wouldn’t be able to interact with it because when they asked for the key by name, they would be asking for the null-terminated one."

This is just one example (cited by Knox) of the countless Microsoft design weirdnesses that have led to the tragic security mess that is Windows. This sort of thing is why the Spybot database now contains almost a half a million entries, and also why Norton security updates (and Windows updates) will soon be eating 99 percent of available CPU cycles from machines connected to the Internet. And if you read between the lines of Matt Knox's interview, you'll understand that the mischief is really only just beginning.

Take my advice. Read the interview. It's an eye-opener.

Wednesday, January 14, 2009

Making server outages scalable

I mentioned not long ago the server outages that ensued when Windows 7 beta downloads became available. This is not supposed to happen in the world of cloud computing, of course. You're not supposed to be able to bring down "the cloud."

But there've been a number of high-profile cloud failures. Just in the past 90 days:
Question: Where are you supposed to stand when the sky is falling?

Tuesday, January 13, 2009

Monday, January 12, 2009

The stampede away from Vista accelerates

An editor from a well-known "information technology" publication recently asked me to name some tech trends that I thought would be important this year. I told him the stampede to Windows 7 would break Richter scales around the globe and possibly affect the earth's rotation.

Looks like the madness has already begun.

Friday, January 09, 2009

Most Google products make no money

There's a poignant table at Google Blogoscoped that gives a detailed breakdown of 87 Google "products" and services, with an explanation of how they work and what they cost.

The interesting part is that only about 20 of the 87 have an associated revenue model. True, you only need one good one (one really profitable product). But still, why so much entropy?


Thursday, January 08, 2009

No DAM middle ground

Yesterday, eWeek interviewed me for a story, "Midmarket Digital Asset Management Firms Disappearing." The thrust of it is that there are damn few DAM solutions for small to midsize businesses, specifically businesses that need a solution that's scalable and plays well with typical SMB IT infrastructure, all for under $100K. The situation is somewhat odd, given that there are tons of CMS vendors in the $10K to $50K range (plus open-source offerings). In the DAM world there aren't even any serious open-source contenders.

I think there's an opportunity here (arguably) for someone to build a powerful but affordable DAM application that will run atop (or alongside?) an open-source CMS.

Failing that, I'd be happy if someone would just build an Adobe Bridge-to-Alfresco connector.

Monday, January 05, 2009

When Certs Collide



A couple months ago, I mentioned that some Russians had cracked WiFi WPA2 security using a GeForce 8800 graphics processor. I also speculated on what a determined person might be able to do with the fearsome power of multiple Sony PS3 machines networked together.

Now we know. You can hack MD5 security. It's been done: Researchers Jake Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, Alex Sotirov, Marc Stevens, and Benne de Weger successfully used 200 PlayStation 3s (see photo, above) to craft a rogue Certification Authority certificate, based on finding hash collisions in MD5-space. The 40-slide deck describing the work is available here.

According to the researchers, 200 PlayStations is roughly equivalent to 8000 desktop PCs, and the processing power needed to crack a cert based on 128-bit MD5 would require $20K of Amazon cloud time.

Crafting a rogue CA cert means (essentially) the crackers were able to convey Cert Authority status on themselves. What's hilarious is that the bogus cert contains no revocation URL and thus can't (easily) be revoked! For demo purposes, the hackers back-dated their cert to August 2004. A malicious hacker could create a cert that never expires.

After you read the slide deck, you won't know whether to laugh or cry. I did both.

Thursday, December 11, 2008

LinkedIn Flex group hits membership limit

From Adobe Technical Evangelist Ben Forta comes word that the LinkedIn Flex Developers Group is now full to capacity and can accept no more members.

Apparently, LinkedIn groups can have a maximum of 3000 members, and that's how many the Flex-dev group now has. (I wonder who the genius was that hard-coded that limit?) Forta says he is sitting on 50-some-odd requests to join, and can't approve them. Moreover, he pinged LinkedIn Customer Services to ask if there was a way to raise that limit. He was told he can't approve any more requests. "If the limit gets raised," Forta says, "I'll let you know (and will approve those in the queue)."

Tuesday, December 09, 2008

SlingPostServlet demystified

One of the neatest things about Apache Sling (the JCR-based application framework) is its easy-to-use REST API, which allows you to do CRUD operations against a Java Content Repository using ordinary HTML web forms (or should I say, ordinary HTTP POST and GET), among many other interesting capabilities. The magic happens by way of a class called SlingPostServlet. Understanding that class is key, if you want to leverage the power of Sling without writing actual Java code.

Turns out there's an exceptionally thorough (and readable) discussion of the many capabilities of the SlingPostServlet at the Sling incubation area of Apache.org. You can think of it as the fully exploded version of Lars Trieloff's Cheat Sheet for Sling (an excellent resource). It's the next best thing to reading the source code.

Monday, December 08, 2008

MS Office apps as services

According to Information Week, Tibco and OpenSpan have "teamed up to make parts of Microsoft's Office applications available as services for inclusion in an enterprise service-oriented architecture."

OpenSpan has (in the words of a rather breathless reporter) "demonstrated that it's possible to generate mashups of Microsoft Office applications without changing the underlying application code."

On a superficial level, this is the kind of thing that people do routinely with OpenOffice running in server mode. (Various content management systems use OO to do document transformations on the server. The OO developer documentation shows how to set this up.)

I gather the OpenSpan stuff has tooling for making it easy to create Office-service mashups. It's a good idea and I wish such tooling existed for OpenOffice. It'll be interesting to see if Tibco and OpenSpan score any OEM deals with CMS vendors.

Friday, December 05, 2008

How OSGi changed one person's life

Peter Kriens has written a really nice article for ACM Queue called How OSGi Changed My Life. Go here to read it online. It's a high-level overview for people who are still trying to grok the whole OSGi phenomenon.

OSGi is a game-changing technology, IMHO, because it brings familiar SOA precepts to ordinary POJO programming. (How's that for acronym abuse?) POJOs end up having fewer unwanted intimacies, and if you run them inside Spring inside OSGi, the POJOs don't have to know so much about the runtime framework, either. Compositionality is greatly facilitated in OSGi; the level of abstraction is high; the benefits are numerous and far-reaching. I see OSGi as revitalizing Java programming for enterprise.

Good tooling for OSGi is still scarce. (Doing "Hello World!" is much harder than it should be.) I suspect that will change very soon, though. Meanwhile, OSGi is quite pervasive already (it's in quite a few products, though seldom advertised), and I look for 2009 to be the year when OSGi finally goes double-platinum.

Tuesday, December 02, 2008

Exception-throwing antipatterns

Tim McCune has written an interesting article called Exception-Handling Antipatterns, at http://today.java.net (a fine place to find articles of this kind, BTW). The comments at the end of the article are every bit as stimulating as the article itself.

McCune lists a number of patterns that (I find) are very widely used (nearly universal, in fact) in Java programming, such as log-and-rethrow, catch-and-ignore, and catch-and-return-null; all considered evil by McCune. My comment is: If those are antipatterns, the mere fact that such idioms are so ubiquitous in real-world Java code says more about the language than it does about programmers.

I've always had a love-hate relationship with the exception mechanism. On the whole, I think it is overused and overrated, at least in the Java world, where people seem to get a little nutty about inventing (and sublcassing) custom exceptions and ways to handle them, when they should probably spend that energy writing better code to begin with.

Monday, December 01, 2008

Get paid for being job-interviewed

At last, an online service that deters head hunters from pestering me.

The unusual promise made by NotchUp is that potential employers who want to contact you directly (avoinding the expensive services of a professional placement agency) will actually pay you to agree to an interview. All you have to do is sign up with NotchUp, and wait for the phone to ring. (And wait. And wait.)

How much will you get paid? The NotchUp folks have put a fee calculator on their site. It shows that an IT professional with 10 years of experience can expect to receive $380 per interview.

NotchUp membership is free to interviewees, but you have to go through an application process and be accepted. Which already sounds fishy to me.

Wednesday, November 26, 2008

What Sun should really do

I've worked for companies that are in Sun's situation (most recently Novell), and I have a few observations based on my years of watching hugely talented groups of people produce astoundingly good Java technology, only to see the Greater Organization fail to find a way to monetize it.

Like Sun, Novell is a venerable tech company with an interesting past. It finds itself today in a situation (like Sun) where profitability is consistently miserable, but the balance sheet is good. The parallels between Sun and Novell are far-reaching. Both were started more than two decades ago (Novell in 1979, Sun in 1982) as hardware companies. Both soon found themselves in the operating-system business. Novell owned DR-DOS, which led to Novell DOS, which in turn became the boot loader for NetWare. Along the way, Novell acquired UNIX from AT&T.

NetWare was an extraordinarily successful OS that Novell foolishly stopped supporting shortly after acquiring SUSE Linux in late 2003. I say foolishly because NetWare was a cash cow that required very little code maintenance to keep going, whereas SUSE Linux sucked Novell's coffers dry as the entire company pivoted in the direction of things that are extremely hard to make money on (viz., open-source software, something Novell had little experience with).

Distraction destroys profitability (someone please make that a bumper sticker...), and this is something that has cost Novell and Sun a great deal of money over the years. Technology is exciting, and promising technologies have a way of siphoning attention away from more prosaic sorts of things, like finding and solving customer pain (i.e., making money).

Technologists have a way of convincing themselves that some technologies are more worthy than others. And quite often, what happens is that people at the top who should know better become seduced into allocating large amounts of lucre to the promotion of money-losers (on the theory that eventually they are bound to become money-makers) while cash-cow products get money taken away (on the theory that "this product is a winner, it's throwing off cash like crazy; we don't need to promote it").

I've started and run two successful businesses (including one that I launched in 1979, which is still in operation under a different owner, today) and I've seen friends and relatives start and run businesses, so I know first-hand what it takes to keep a business right-side-up; and I know some sure-fire techniques for making a successful business cartwheel out of control into a ditch, trailing black smoke.

One of the main things I've learned is that you never promote a loser; you always put your money behind proven winners. Never take marketing or development funding away from a winner to promote something that is either a proven loser or not yet proven to be a winner. (That's too long for a bumper sticker. It should be on billboards.)

Back to the software biz for a minute. Novell and Sun are both "operating system companies" to some degree. This already carries with it the stench of death. Being an OS company was a great thing back in the Carter years; it was lucrative in those days. It's not a great thing today. It siphons off money that is better deployed elsewhere. Microsoft (with its "Live" series of SaaSified product offerings) has recently gotten the message that the Web is the new OS, and the desktop is irrelevant as a metaphor. This is a huge paradigm shift for Microsoft. But they finally get it: They get that the future is in things like collaboration (human connectivity) and dynamic assembly of reusable content. They are starting to understand that infrastructure is not something customers want to have to know about; that everything that can be virtualized should be virtualized. Customers instinctively know this, even if they can't articulate it.

So then, what's a Sun to do?

First, stop worrying so much about the future and figure out what's making money now, so you can try to massively scale whatever that happens to be. Remember: Invest in winners, not losers. Find out what's working. Crank the volume full max on it.

The next thing to do is obvious: Kill your losers. Utterly walk away from them, now, today, this minute. Redeploy the resources to your winners (or else sell them off).

The very next thing to do is apply the foregoing principles to your people. Find the winners (the true contributors, the people who are making successful things successful) and reward them. Not just with money, but with whatever else they want: promotion, recognition, travel, alone time, or whatever. People are different. Most techies are not motivated by money.

Likewise, identify and weed out the mediocre, the tired, the overly comfortable, the complainers and morale-killers; find the toxic individuals (they're everywhere) and remove them somehow. Just getting rid of the toxic people will cause those who are left to be more productive.

Next, pivot the orgnization in the direction of innovation. This is exceptionally difficult to do. I was involved with the "Fostering Innovation" Community of Practice at Novell during a time when Novell was desperately trying to become more of an innovation-centric culture. One of the kneejerk things Novell did was increase the bonus paid to employees who contributed patentable inventions. Novell eventually was paying $4500 plus hundreds of shares of restricted stock for each invention accepted by the Inventions Committee (of which I was a member). What happened was that we got goofy submissions from all over the company, while certain senior engineers who knew how to game the system succeeded in making a nice side-living on patent bonuses.

Innovation is fostered when you simply set innovative people free. Innovators will innovate for their own reasons; money has nothing to do with it. All you need to do is clear the path for these individuals. At Novell, as at Sun, there's a special honor reserved for senior people who have a track record of accomplishment. We called them Distinguished Engineers. These people were like tenured professors. They came to work in pajamas (not really) and did whatever they wanted, basically, with no fear of ever being fired.

That's a stupid system. Tenured professorships lead to sloth. Not every Distinguished Engineer is a burned-out has-been on the dole, but some are, and it sets a bad example.

Younger engineers (and others) who are proving their potential as innovators need to be recognized while they're at their peak. (The 45-year-olds with a track record of innvoation in the 1990s need to be considered for early retirement. Recognizing someone a decade too late serves no purpose.) What I advocate is a system of "innovation sabbaticals," awarded to budding innovators (of any age) who are doing Great Things and are likely to do more if set free.

Finally, going forward, hire good people. This is any company's best and only salvation. It's the foundation for all success. When you have difficult problems to solve (as any troubled business does), hire very, very smart people who have no prior experience with the problems in question. That's how you get fresh answers that bear tasty fruit.

This blog is already too long, so I'll stop. In a nutshell, what Sun needs to do is focus light on itself and conduct a pre-mortem. The first order of business is to find out which pieces of the business are profitable, and scale those. Then find out which pieces of the business are sucking cash, and amputate those. If that's two-thirds of Sun, so be it. It means Sun needs to be a third as big as it is now. It'll shrink down to that size eventually, so why spend time and money getting there the slow way? Go there now. Shareholders will applaud.

And by the way, be clear on one thing: This is all about earnings-per-share. There is no other goal, no other agenda. Sun is a business. It's not a charity organization or a full-employment program for has-beens. Earnings per share comes first. Everything else follows.

Find and reward (not necessarily with cash!) your best people. Get rid of the losers who are bringing morale and productivity down for everyone else.

Set innvoators free. They will innovate for their own reasons. Just let them.

And get out of the operating system business. The Web is the OS, for cryin'-out-loud. Even Microsoft has figured that one out.

Tuesday, November 25, 2008

Google wants to hire 665 people

The WebGuild story about Google laying off ten thousand workers is (sadly) mostly made-up nonsense.

Probably the most outlandish statement in the article is "Since August, hundreds of employees have been laid off and there are reports that about 500 of them were recruiters."

Five hundred recruiters?? ROTFLMAO.

The only scintilla of truth in the entire article, as far as I can determine, is the bit about Google having approximately ten thousand contract workers, which Sergey Brin confirmed in an October 16 story in The Mercury News. The notion that Google will be letting all of them go is nonsense, however. Brin (in the same Mercury News story) did say Google "has a plan to significantly reduce that number through vendor management, converting some contractors to regular employees, and other approaches." That's all he said: "significantly reduce."

It's quite easy to verify that Google is, in fact, still hiring at a brisk pace. Go here to browse the 665 open positions.

Monday, November 24, 2008

Flex meets Inversion-of-Control

I didn't realize until just now that there is a Spring-like inversion-of-control framework for Flex, called Prana (available under a BSD license).

Seeing something like Prana makes me wonder how many other staples of the Java world will be emulated by Flex folk.

Very interesting indeed.

Saturday, November 22, 2008

Death of an Eclipse project

A November 12, 2008 slide deck explains why the Eclipse Application Lifecycle Framework (ALF) project will be terminated due, basically, to lack of interest. Except, there was, in fact, interest by a corporation: enterprise-mashup player Serena Software, who contributed significantly to the code base.

The whole story is a little weird. The ALF project morphed into an SOA framework of sorts shortly after its inception in 2005. Serena (an application lifecycle management software firm, originally, but also known for the now-moribund Serena Collage Content Management System) got involved early on. Eventually, ALF was adopted as the underlying SOA and authentication framework by Serena Business Mashups in Dec 2007.

And now the "project leadership" has decided that the Eclipse ALF project should be shut down, with the code being donated to the Higgins project. The Project Leader for ALF is (was) Brian Carroll, a Serena Fellow.

Higgins, it turns out, is actually not ALF-related except in the most tangential sense. I was working in the Identity Services division at Novell in 2006 when Higgins was created. I knew about it through Duane Buss and Daniel Sanders (both of whom are still principals on the project). Daniel and I worked together on the Novell Inventions Committee.

Higgins is (according to the project FAQ) "an open source Internet identity framework designed to integrate identity, profile, and social relationship information across multiple sites, applications, and devices. Higgins is not a protocol, it is software infrastructure to support a consistent user experience that works with all popular digital identity protocols, including WS-Trust, OpenID, SAML, XDI, LDAP, and so on."

It's really largely about identity cards or "information cards" (InfoCards, I-Cards).

In case you're wondering about the name: Higgins is the name of a long-tailed Tasmanian jumping mouse.

So, ah . . . ALF isn't the only SOA-related Eclipse project being taken down now. For info on the others, see this story in the Register.

Thursday, November 20, 2008

How to set your head on fire



The folks at Jabra (the Danish headset manufacturer) are having a product recall. It seems Jabra's lithium batteries can overheat and catch fire. According to the company's announcement:
Dear Jabra GN9120 Customer
In cooperation with the Danish Safety Technology Authority (Sikkerhedsstyrelsen) and the U.S. Consumer Product Safety Commission, and other regulatory agencies GN Netcom is voluntarily recalling Lithium-ion batteries from ATL (ATL P/N 603028) used in GN9120 wireless headsets and sold from January 2005 through September 2008. These lithium-ion polymer batteries can overheat due to an internal short circuit in the batteries, which can pose a fire hazard. The battery has only been used in the GN9120 wireless headset. If you are using any other headset solution from GN Netcom you are not affected by this statement.
Not to worry, though. The "extra-crispy" look is in.

Why are CSS editors so fugly?



The other day, I happened upon a long list of CSS editors, arranged chronologically (newest tools first). I haven't tried any of them except Stylizer, which (beware) lays down the .NET 2.0 framework as part of its install process. (Allow 10 minutes.) Stylizer has a really beautiful UI but is far from being the point-and-click WYSIWYG stylesheet designer I've been looking for. (It's really just an editor; you do a lot of typing.) Although I must say, Stylizer beats the living crap out of most other free (or crippled-down eval-version) CSS editors I've seen, which all tend to look like this unfortunate travesty.

Does anybody else see the irony in the fact that most CSS editors are unbelievably fugly? I mean, if anything cries out for a decent visual design tool with eye-pleasing widgets, it would have to be a CSS editor. But most CSS tools (at the freeware level, anyway) look like they were designed by Eclipse programmers on bad acid.

I guess this is the ultimate example of programmers not knowing how to design user interfaces, and design experts not knowing how to program. Maybe it's no wonder 99% of CSS editors look like Notepad in a 3-piece suit.

Wednesday, November 19, 2008

Using Yahoo Pipes in anger

I finally had an opportunity to use Yahoo Pipes to do something useful.

The quest: Create a super-feed that aggregates a bunch of different Google developer blogs (12 in all), including AJAX Search API, Gears, Gadgets, OpenSocial, Open Source, Mashup Editor, Web Toolkit, App Engine, Google Code, iGoogle, Desktop, and Data API blogs. And: Show the most recent 8 entries for each of the 12 blogs.

Also: Make a searchable version of same, so that you can do a search for (let's say) "Atom" across all 96 latest blog entries in the 12 categories.

I was inspired to create this Pipes-app (plumbingware?) when I saw the recent press release concerning ArnoldIT's Google monitoring service. The ArnoldIT aggregator is dubbed "Overflight" (for reasons known only to the CIA, perhaps).

I was disappointed to find that Overflight is not available as an RSS feed. It also is not searchable. Hence, I went ahead and mashed together my own version of Overflight using Pipes.

As it turns out, I was able to create the Pipe app in a matter of 90 minutes or so (around half an hour longer than I'd budgeted). I didn't have time to aggregate all 74 Google blogs, so I focused just on twelve developer blogs. The resulting app is at Google Developer Blogs Super-Feed, which you can subscribe to here. The keyword-search version is here. (It supports single words or exact phrases.)

I confess I was skeptical, at first, as to whether the performance of a Pipes app that draws together 96 content items from 12 feeds could possibly be acceptable. It turns out to be amazingly fast. Even the queryable version is fast. I have yet to run a keyword or key-phrase search that takes more than 4 seconds to bring back results.

If you haven't tried Pipes yet, you should definitely spend a few minutes exploring it. It's a bit klutzy and constraining (in my experience), and it's sure to frustrate many a grizzled Java or C++ developer. But as a visual Web-app designer, it's an interesting approach. Here's hoping Yahoo takes it a bit further.

Tuesday, November 18, 2008

Pixel Bender plug-in for Photoshop is out now

According to John Nack, the Pixel Bender Gallery plug-in for Photoshop CS4 is now available for download from Adobe Labs. Nack explains that the plug-in "runs filters really, really fast on your graphics card," and notes that the filters people write for Flash will also work in Photoshop (or so he says). A nice added bonus is that the same filters will work in After Effects CS4.

Can't wait to try it.

Moore's Law v2.0

It's no secret that conventional chip designs are about to hit the wall with respect to scaling. Moore's Law 1.0 is in danger of being repealed.

Not to worry, though. Years of research into so-called 3D chip architecture is finally beginning to bear fruit, and it looks like cubes will start replacing chips in at least some devices soon. (HP is making steady progress in this area, along with IBM and others.) Moore v2.0 is well on the way to reality.

If you want to learn more about this technology, check out the latest issue of the IBM Journal of Research and Development, which is devoted to 3D integrated circuit technology. A particularly good overview article is here.

Monday, November 17, 2008

Java HotSpot VM options explained

Have you ever wanted an exhaustive list of all those inscrutable command-line options you can use when you want to force the JVM to do something a certain way while you're either troubleshooting an extremely bizarre bug or trying to figure out why performance sucks even more than usual?

Try going to:

http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp


I don't know if this is an exhaustive list, but it certainly looks like it is. Just reading the various descriptions is quite educational. If you're interested in tuning the JVM for max performance, this is a must-read.

Saturday, November 15, 2008

The fugliest code I've ever written

The other day, I started to wonder: What's the single fugliest piece of code I've ever written?

It's really hard to answer that, because I've been writing code of one flavor or another for roughly twenty years, and in that time I've committed every code atrocity known to man or beast. As I like to tell people, I'm an expert on spotting bad code, because I've written so much of it myself.

What I 've finally decided is that the following line of C code is probably the single most ghastly line of code I've ever perpetrated:
(*((*(srcPixMap))->pmTable))->ctSeed =
(*((*((*aGDevice)->gdPMap))->pmTable))->ctSeed;
Explanation: Long ago, I used to do graphics programming on the Mac. I don't know how the Mac does things today, but ten years ago the Color Manager rebuilt a table whenever Color QuickDraw, the Color Picker Manager, or the Palette Manager requested colors from a graphics device whose color lookup table had changed. To determine whether CLUT had in fact changed, the Color Manager compared the ctSeed field of the current GDevice color table against the ctSeed field of that graphics device's inverse table. If ctSeed values didn't match, the Color Manager invalidated the inverse table and rebuilt it. For fast redraws, you want to avoid that. You could avoid it by forcing the ctSeed field values to be equal.

This is one of many fast-blit tips I explained in an article I wrote years ago for MacTech magazine. Fortunately, I write mostly Java and JavaScript now, and I no longer have to deal with pointer indirection, and today my aspirin drawer is only half-full -- or is it half-empty?

Friday, November 14, 2008

Google Chatterbot

Google's Douwe Osinga has come up with a freaky little online app that turns the almighty Google hash engine into an oracle (not to be confused with Oracle). All you do is enter a word or two into the text box, and wait. The app will do a Google search on your words, find the next "suggested" word and print that, then it will remove the first word of your search string, add the found word, and repeat.

Quite often, the app generates a disarmingly logical response. For example, this morning when I entered "JavaFX will," the reply came back: "javafx will be open sourced monday."

Occasionally you learn something you didn't know. "Richard Stallman" brings back a response of "richard stallman founder of the free african american press."

Interestingly, "top secret" brings back nothing.

Sometimes the app produces garbage. Or is it poetry? When I entered "Sun will," I expected something like "lay off thousands." Instead I got: "sun will shine lyrics by stone roses that are not newly completed and that are in earth orbit but more likely at the top of the mark is to show that the flds not only persevered they fought back they didnt".

e e cummings lives!

Thursday, November 13, 2008

Free downloadable tech books

http://www.freetechbooks.com/

As you'd expect, the list includes a lot of stale and/or not-very-valuable titles, but there's also a lot of genuinely worthwhile stuff there. Judging from their "most popular" list, the site is a big hit with C++ programmers. But there are also 23 free Java books, and lots of timeless reference material for programmers of all stripes.

Wednesday, November 12, 2008

For lack of a nail (Java version)

// For the lack of a nail,
throw new HorseshoeNailNotFoundException("no nails!");

// For the lack of a horseshoe,
EquestrianDoctor.getLocalInstance().getHorseDispatcher()
.dispatch();

// For the lack of a horse,
RidersGuild.getRiderNotificationSubscriberList().getBroadcaster()
.run(
new BroadcastMessage(StableFactory.getNullHorseInstance()));

// For the lack of a rider,
MessageDeliverySubsystem.getLogger().logDeliveryFailure(
MessageFactory.getAbstractMessageInstance(
new MessageMedium(MessageType.VERBAL),
new MessageTransport(MessageTransportType.MOUNTED_RIDER),
new MessageSessionDestination(BattleManager.getRoutingInfo(
BattleLocation.NEAREST))),
MessageFailureReasonCode.UNKNOWN_RIDER_FAILURE);

// For the lack of a message,
((BattleNotificationSender)
BattleResourceMediator.getMediatorInstance().getResource(
BattleParticipant.PROXY_PARTICIPANT,
BattleResource.BATTLE_NOTIFICATION_SENDER)).sendNotification(
((BattleNotificationBuilder)
(BattleResourceMediator.getMediatorInstance().getResource(
BattleOrganizer.getBattleParticipant(Battle.Participant.GOOD_GUYS),
BattleResource.BATTLE_NOTIFICATION_BUILDER))).buildNotification(
BattleOrganizer.getBattleState(BattleResult.BATTLE_LOST),
BattleManager.getChainOfCommand().getCommandChainNotifier()));

// For the lack of a battle,
try {
synchronized(BattleInformationRouterLock.getLockInstance()) {
BattleInformationRouterLock.getLockInstance().wait();
}
} catch (InterruptedException ix) {
if (BattleSessionManager.getBattleStatus(
BattleResource.getLocalizedBattleResource(Locale.getDefault()),
BattleContext.createContext(
Kingdom.getMasterBattleCoordinatorInstance(
new TweedleBeetlePuddlePaddleBattle()).populate(
RegionManager.getArmpitProvince(Armpit.LEFTMOST)))) ==
BattleStatus.LOST) {
if (LOGGER.isLoggable(Level.TOTALLY_SCREWED)) {
LOGGER.logScrewage(BattleLogger.createBattleLogMessage(
BattleStatusFormatter.format(BattleStatus.LOST_WAR,
Locale.getDefault())));
}
}
}

// For the lack of a war,
return new Kingdom();


Adapted from Steve Yegge's Blog Rant of March 30, 2006. Apologies to Ben Franklin (who in turn adapted the original proverb from George Herbert's Jacula Prudentum).

Tuesday, November 11, 2008

Finalization is evil

After listening to the excellent presentation by Hans Boehm on "Finalization, Threads, and the Java Technology Based Memory Model," I have come to the conclusion that finalization is one of Java's worst features, if not the worst.

Be clear, I am not talking about the final keyword (which is actually a great feature of the language). Rather, I am talking about the notion of finalizers, or special "cleanup" methods that the JVM will call before an object is finally reclaimed by the garbage collector. The idea is that if you have an object that's holding onto some system resource (such as a file descriptor), you can free that resource in the finalize() method right before your no-longer-used object gets garbage collected.

The only problem is, there is not only no guarantee as to how quickly, or in what order, your finalizers will be called, there's also no guarantee that they will be called at all.

Sun's Tony Printezis gives a good explanation of finalization in an article on the Sun Developer Network site. It's a brilliant article, but I found myself quite nauseated by the time I got to the end of it. Finalization is just so wrong. So wrong.

"The JVM does not guarantee the order in which it will call the finalizers of the objects in the finalization queue," Printezis points out. "And finalizers from all classes -- application, libraries, and so on -- are treated equally. So an object that is holding on to a lot of memory or a scarce native resource can get stuck in the finalization queue behind objects whose finalizers are making slow progress."

Oh great, that's just what I need. Finalizers blocking on other finalizers while my heap fragments.

It turns out that an instantiation time, an object that contains a finalizer is marked as such and treated differently by the JVM. The extra overhead incurs a performance hit. If your application creates many short-lived objects with finalizers, the hit can be quite substantial. Hans Boehm (see link further above) did some testing and found a 7X slowdown of a test app when objects had finalizers, compared to no finalizers. (With a really fast JVM, namely JRockit, the slowdown was eleven-fold.)

The funny thing is, in all the articles and book chapters I've read about finalization, I have never, not even once, seen a good real-world example of a situation requiring the use of a finalizer. Supposedly, you use a finalizer when you're holding onto a system resource and need to free it before your object goes out of scope. But in reality, it's almost always the case that system resources that are considered scarce or precious have a dispose() or close() or other, similar method, for the explicit purpose of freeing the resource. If you use the resource's normal release mechanism, you don't need a finalizer. In fact a finalizer only lets you hold onto a resource longer than you should.

Someone will argue that you don't always know when or if an object is going out of scope; therefore you should put a call to the release method in a finalizer and be assured that the resource will eventually be released. Okay, <sigh/> that's fine and dandy as long as you can count on your finalize() method being called (which you can't) and as long as your machine doesn't starve for file descriptors, sockets, or whatever the precious resource happens to be, before the finalizer is finally called. Remember, the JVM makes no guarantees about any of this. Finalization is non-deterministic.

I have to say, though, that the contorted, non-real-world examples that are always trotted out to justify the existence of the finalizer mechanism in Java have always struck me as more than a little malodorous. They all have that unmistakeable antipattern smell that gets in your clothing and makes you feel like taking a hot shower when you get home.

Maybe we should just confront the possibility (the likelihood) that finalization is evil. After all, even the people who write long articles about it end up urging you not to use it.

That's good enough for me.

Saturday, November 08, 2008

Google to downsize?


The NYC Google office takes up a city block.

Word comes by way of the Silicon Valley Insider that Google will soon be subletting 50,000 square feet of space at its New York City Googleplex. I've toured the place, and let me tell you, it's Big: the 111 8th Ave office occupies an entire city block of space, between 8th and 9th Avenues and 15th and 16th Streets. It's around 300K square feet altogether.

What kind of luck Google will have subletting this space in the current economy, I don't know. It's very inconveniently located, in the meat-packing district, a couple miles south of Grand Central Terminal and just far enough from the Village to be annoying.

From what I saw on my walk-through, I can tell you that Google tends to be rather wasteful of space, by industry standards. The cafeteria is the size of Macy's and there's one conference room for every three employees (well, almost), and very few programmers can actually reach out and touch someone despite the lack of walls. I'd say there has to be an average of at least 500 sq. ft. per employee by the time you factor in all the conference rooms, hallways, etc.

So there's only two possibilities. Either Google will try to use its space more efficiently (and not lay anyone off) at its NYC office after subletting one-sixth of its available space; or it will lay off a sixth of its Manhattan workforce (around 120 people). Or some combination of both.

My guess is both.

Friday, November 07, 2008

Slow page loads == job cuts?

Interesting factoid: Every 100ms of latency costs Amazon 1% in profit-per-visit.

The same source claims that Google stickiness drops 20% if page load time increases by 500ms.

Which leads me to wonder how much revenue-loss LinkedIn has suffered over the past five years because of its agonizingly slow page loads, and how many of the 10% of its employees who were just laid off might still have their jobs if the pitiful dunderheads who allowed LinkedIn's site to be so pitifully slow hadn't been such pitiful dunderheads.

Thursday, November 06, 2008

Hardware-assisted garbage collection

I find myself spending more and more time thinking about garbage collection, not just as a career move but as a fundamental problem in computer science. I don't pretend to have any expertise in garbage collection, mind you. But I find it an interesting problem space. Particularly when you start to talk about things like hardware-assisted GC.

Yes, there is such a thing as GC-aware chip architecture, and the guys who know a lot about this are the folks at Azul Systems. A good starting point, if you want to read up on this, is Pauseless Garbage Collection: Improving Application Scalability and Predictability. Good late-night reading for propeller-heads who need something to do while waiting for the propeller to wind down.

Wednesday, November 05, 2008

Paging memory leaks to disk

At last month's OOPSLA 2008, there was an interesting presentation by Michael D. Bond on a technology called Melt, which aims to prevent out-of-memory errors in Java programs that harbor memory leaks (which is to say, 99 percent of large Java programs). The Intel-funded research paper, Tolerating Memory Leaks (by Bond and his thesis advisor, Kathryn S. McKinley, U. Texas at Austin), is well worth reading.

The key intuition is that reachability is an over-approximation of liveness, and thus if you can identify objects that are (by dint of infrequent use) putative orphans, you can move those orphan objects to disk and stop trying to garbage-collect them, thereby freeing up heap space and relieving the collector of unnecessary work. If the running program later tries to access the orphaned object, you bring it back to life. All of this is done at a very low level so that neither the garbage collector nor the running program knows that anything special is going on.

Melt's staleness-tracking logic and read-blockers don't actually become activated until the running application is approaching memory exhaustion, defined (arbitrarily) as 80-percent heap fullness. Rather than letting the program get really close to memory exhaustion (which causes garbage collection to become so frequent that the program seems to grind to a halt), stale objects are moved to disk so that the running app doesn't slow down.

Purists will complain that sweeping memory leaks under the carpet like this is no substitute for actually fixing the leaks. In very large programs, however, it can be impractical to find and fix all memory leaks. (I question whether it's even provably possible to do so.) And even if you could find and fix all potential leaks in your program, what about the JRE? (Does it never leak?) What about external libraries? Are you going to go on a quest to fix other people's leaks? How will you know when you've found them all?

I believe in fixing memory leaks. But I'm also a pragmatist, and I think if your app is mission-critical, it can't hurt to have a safety net under it; and Melt is that safety net.

Good work, Michael.

Tuesday, November 04, 2008

Garbage-collection bug causes car crash



A few days ago I speculated that you could lose an expensive piece of hardware (such as a $300 million spacecraft) if a non-deterministic garbage-collection event were to happen at the wrong time.

It turns out there has indeed been a GC-related calamity: one in which $2 million was on the line. (To be fair, this particular calamity wasn't actually caused by garbage collection; it was caused by programmer insanity. But it makes for an interesting story nevertheless. Read on.)

The event in question involved a driverless vehicle (shown above) powered by 10K lines of C# code.

At codeproject.com, you'll find the in-depth post-mortem discussion of how a GC-related bug caused a driverless DARPA Grand Challenge vehicle to crash in the middle of a contest, eliminating the Princeton team from competition and dashing their hopes of winning a $2 million cash prize.

The vehicle had been behaving erratically on trial runs. A member of the team recalls: "Sitting in a McDonald's the night before the competition, we still didn't know why the computer kept dying a slow death. Because we didn't know why this problem kept appearing at 40 minutes, we decided to set a timer. After 40 minutes, we would stop the car and reboot the computer to restore the performance."

The team member described the computer-vision logic: "As the car moves, we call an update function on each of the obstacles that we know about, to update their position in relation to the car. Obviously, once we pass an obstacle, we don't need keep it in memory, so everything 10 feet behind the car got deleted."

"On race day, we set the timer and off she went for a brilliant 9.8 mile drive. Unfortunately, our system was seeing and cataloging every bit of tumbleweed and scrub that it could find along the side of the road. Seeing far more obstacles than we'd ever seen in our controlled tests, the list blew up faster than expected and the computers died only 28 minutes in, ending our run."

The vehicle ran off the road and crashed.

The problem? Heap exhaustion. Objects that should have been garbage-collected weren't. Even though delete was being called on all "rear-view mirror" objects, those objects were still registered as subscribers to a particular kind of event. Hence they were never released, and the garbage collector passed them by.

In Java, you could try the tactic of making rear-view-mirror objects weakly reachable, but eventually you're bound to drive the car onto a shiny, pebble-covered beach or some other kind of terrain that causes new objects to be created faster than they can possibly be garbage-collected, and then you're back to the same problem as before. (There are lots of ways out of this dilemma. Obviously, the students were trying a naive approach for simplicity's sake. Even so, had they not made the mistake of keeping objects bound to event listeners, their naive approach no doubt would have been good enough.)

As I said, this wasn't really a GC-caused accident. It was caused by programmer error. Nevertheless, it's the kind of thing that makes you stop and think.

Monday, November 03, 2008

Why 64-bit Java is slow

In an interesting post at the WebSphere Community Blog, Andrew Spyker explains why it is that when you switch from 32-bit Java to a 64-bit runtime environment, you typically see speed go down 15 percent and memory consumption go up by around 50 percent. The latter is explained by the fact that addresses are simply bigger in 64-bit-land, and complex data structures use a lot of 64-bit values even if they only need 32-bit values. The reason performance drops is because although address width has gotten bigger, processor memory caches have not got bigger in terms of overall Kbytes available. Thus, you are bound to see things drop out of L1 and L2 cache more often. Hence cache misses go up and speed goes down.

Why, then, would anyone invest in 64-bit machines if the 64-bit JVM is going to give you an immediate performance hit? The answer is simple. The main reason you go with 64-bit architecture is to address a larger memory space (and flow more bytes through the data bus). In other word, if you're running heap-intensive apps, you have a lot to gain by going 64-bit. If you have an app that needs more than around 1.5 GB of RAM, you have no choice.

Why 1.5GB? It might actually be less than that. On a 4GB Win machine, the OS hogs 2GB of RAM and will only let applications have 2GB. The JVM, of course, needs its own RAM. And then there's the heap space within the JVM; that's what your app uses. It turns out that the JVM heap has to be contiguous (for reasons related to garbage collection). The largest piece of contiguous heap you can get, after the JVM loads (and taking into account all the garbage that has to run in the background in order to make Windows work), is between 1.2GB and 1.8 GB (roughly) depending on the circumstances.

To get more heap than that means either moving to a 64-bit JVM or using Terracotta. The latter (if you haven't heard of it) is a shared-memory JVM clustering technology that essentially gives you unlimited heap space. Or should I say, heap space is limited only by the amount of disk space. Terracotta pages out to disk as necessary. A good explanation of how that works is given here.

But getting back to the 64-bit-memory consumption issue: This issue (of RAM requirements for ordinary Java apps increasing dramatically when you run them on a 64-bit machine) is a huge problem, potentially, for hosting services that run many instances of Java apps for SaaS customers, because it means your scale-out costs rise much faster than they should. But it turns out there are things you can do. IBM, in its JVM, uses a clever pointer-compression scheme to (in essence) make good use of unused high-order bits in a 64-bit machine. The result? Performance is within 5 percent of 32-bit and RAM growth is only 3 percent. Graphs here.

Oracle has a similar trick for BEA's JRockit JVM, and Sun is just now testing a new feature called Compressed oops (ordinary object pointers). The latter is supposedly included in a special JDK 6 "performance release" (survey required). You have to use special command-line options to get the new features to work, however.

Anyway, now you know why 64-bit Java can be slow and piggish. Everything's fatter in 64-bit-land.

For information about large-memory support in Windows, see this article at support.microsoft.com. Also consult this post at sinewalker.

Sunday, November 02, 2008

Java 1.4.2 joins the undead

Java 1.4.2 died last week. According to Sun's "End of Service Life" page, Java 1.4.2 went EOSL last Thursday. The only trouble is, it's still moving.

Java 5 (SE) was released in 2004 and Java 6 has been out since 2006. Java 5 will, in fact, also be at EOSL in less than a year. (You might call it the Java "Dead Man Walking" Edition.) And yet, if you do a Google search on any of the following, guess what you get?

java.lang.Object
java.lang.Class
java.lang.Exception
java.lang.Throwable
java.lang.Runtime
java.awt.Image
java.io.File
java.net.URL
JComponent
JFrame

If you do a Google search on any one of these, the very first hit (in every case) is a link to Sun's Javadoc for the 1.4.2 version of the object in question.

A year from now (when Java 5 hits the dirt) I wonder how many of these 10 searches will still take you to 1.4.2 Javadoc? (Remember, Java 5 has been out for almost 5 years and still doesn't outrank 1.4.2 in Google searches.) I'm guessing half of them. What do you think?