Saturday, January 31, 2009

Script for bypassing Google's "site may harm your computer" page

There was an outbreak of the bogus "visiting this web site may harm your computer" warning-page redirection on Google this morning. Apparently there have been occurrences of this phenomenon before (judging from blogs going back to 2007). You run a search on Google, and all of a sudden every hit has a warning link under it that says "visiting this web site may harm your computer", and if you try to go to the page in question, you get directed to a Google warning page that urges you not to go to the actual page you want.

On Twitter, people began labelling the problem #GOOGLEMAYHARM, which of course is phonetically similar to GOOGLE MAYHEM.

Naturally, I went to work on a Greasemonkey script to fix the situation. And naturally, in the time it took me to write the script, Google fixed the silly redirection thing.

In any event, if you are seeing the "harmful site" warning, here's a Greasemonkey script that should allow you to bypass the Google redirection page:

// ==UserScript==
// @name GoogleHitFixer
// @namespace fixer
// @include*
// ==/UserScript==

// Routes around the bogus warning page that says
// "visiting this web site may harm your computer"

// Public domain. Author: Kas Thomas

( function main( ) {

var signature = "interstitial?url";

var address = location.toString( );

if ( address.indexOf( signature ) == -1 )

var newUrl = address.split( "?url=" )[1];

location.href = newUrl;

} )( );

Friday, January 30, 2009

"Crux" app wins JCR Cup

Day Software announced the winner of the JCR Cup 08 competition today. College sophomore Russell Toris won top prize (taking home a MacBook Pro) with a little web app called "Crux" (a shameless play on CRX, which is Day's commercial Java Content Repository).

I managed to learn a tiny bit more about Crux. And from what I've seen, it is indeed a clever use of JSR-170 technology.

What it lets you do is copy and paste arbitrary selections from any web page that's open in your browser, and save them straight to a JSR-170 repository (in this case, Day CRX, which is built atop Apache Jackrabbit). When you want to retrieve the selection(s) again, you can browse the repository and open them again in your browser.

Why is this useful? Here's the use case. Suppose you've got a dozen tabs open in Firefox (because you're researching a term paper) and you want to save references to the various content items you've been looking at. The conventional thing to do is bookmark all the open pages. But the problem with bookmarks is that they don't actually encapsulate any content from the pages you were on: They just encapsulate URLs and page titles (which are often meaningless).

With Crux, you highlight and Copy content selections from pages, then push those items into the repository with the click of a button. (Of course, you have to have a repository server running somewhere, reachable via HTTP.) When you want the clipped items again, you visit one URL (the node in the repository where the items are stored), and there are all your snippets, viewable in a single summary page. And they render nicely since Crux saves actual selection-source markup, not just raw text. Any embedded links, images, etc., in the clipped content are still there. Also, each entry in Crux contains a trackback link to the original source page, in case you really do need to go back to the page in question.

If you think about it, saving content clippings is actually a very compelling alternative to bookmarking. A bookmark is just an address. What you care about is the content, not the address. I have hundreds of bookmarks already. I can't keep them straight. They just keep piling up, and I can't remember what most of them are for. (Even the ones I use a lot, I sometimes have trouble finding again.) Crux provides a useful alternative.

How do you find something in the repository after you've pushed hundreds of content items into it with Crux? You use whatever repository search tools you'd normally use. Only this time, you can actually run full text searches on the content items you stored, rather searching page names in your Bookmarks collection.

Functionality similar to Crux is available via Clipmarks. Also, Microsoft tries to do some of this with its Onfolio and OneNote products (which are, IMHO, painfully klutzy). Crux looks and feels very light and simple. It definitely hits a sweet spot.

Whether Crux's source code will ever see the light of day, I don't know. (Entrants in the JCR Cup competition were not required to make source code public.) Reportedly, the code is all JavaScript and requires Greasemonkey.

In any event, congratulations Russell Toris! And kudos to Day for sponsoring the competition. It's nice to see JCR being used for something practical, lightweight, and simple. Well done.

Google Measurement Labs?

Google has introduced yet another service, called Google Measurement Labs, designed to test your connection speed and provide various types of information about your last-mile chokepoints.

I have read Google's own announcement about this as well as several blogs that try to explain it, and honestly, I still can't fathom the true motivation(s) behind it or why the heck anyone outside of academia (or perhaps the NSA) would even care. Obviously, Google has an interest in last-mile problems (the Internet is its lifeblood), but offering this set of diagnostics to the general public gives the impression that Google is very proudly answering a question nobody asked.

I don't get it.

Wednesday, January 28, 2009

The energy cost of SSL

I just finished reading a paper called The Energy Cost of SSL in Deeply Embedded Systems, by Sun Microsystems researchers Vipul Gupta and Michael Wurm. Fascinating stuff.

It turns out that secure communication over SSL shortens battery life by approximately 15% in very small (mote-like) wireless devices that use SSL. The size of such devices (commonly used as sensors in manufacturing, but soon to be all around us, if you believe the sci-fi hype) makes them extraordinarily sensitive to anything that draws electrical current, including computation. In a mote, it's not uncommon for 5% of the available energy from a pair of alkaline batteries to be consumed by SSL handshakes, 10% by polling, 25% by SSL data transfer, and the remaining 60% by the device itself. Those ratios will be different for non-secure (non-SSL) data transfer. If you do the apples-to-apples energy balance, the SSL mote pays an energy penalty of 15%, overall, for security.

The authors of the paper don't discuss things like efficient versus inefficient implementations (in assembly language) of handshake algorithms (such as Elliptic Curve), but obviously a poor implementation could significantly affect performance. An unfriendly chip architecture could affect things too. The authors do mention that the particular chip they used (TI MSP430) "offers a rotate instruction which speeds up SHA1 and MD5 by almost 40%."

Motes aren't ubiquitous yet, but hopefully by the time they are, they'll be powered by something other than batteries (e.g., ambient light), so that we don't have to worry about SSL causing even more zinc and manganese to enter the environment when worn-out mote batteries find their way into landfills. Imagine that: SSL as an environmental threat . . .

Tuesday, January 27, 2009

Microsoft aims to patent CSS extensions

I came across an interesting patent application from Microsoft (published 15 January 2009) called Extended Cascading Style Sheets in which Microsoft extols the virtues of something called CSSX (which I suppose means CSS Extensions). From the Abstract:
A CSSX (Extended Cascading Style Sheets) file including non-CSS (Cascading Style Sheet) extensions is used to define and reference variables and inheritance sets. A CSSX file compiler determines a value of the defined variable, modifies the CSSX file by replacing all references to the defined variable with the value, and generates the CSS file from the modified CSSX file. The inheritance set is defined in the CSSX file and includes a reference to a previously defined CSS rule set. The CSSX file compiler defines a new CSS rule set as a function of the determined attributes included in the previously defined CSS rule set of the defined inheritance set and generates the CSS file including the newly defined CSS rule set.
From what I can tell, Microsoft is proposing adding #defines (and other precompiler-looking stuff) to Cascading Style Sheets so that a last-minute "compile pass" on the server will generate CSS of the correct flavor for a given page request (correct as to localization, reading direction, accessibility, etc.) -- all done dynamically, just in time. The intent is clearly to eliminate the need for webmasters and others to create and manage multiple hard-coded flavors of the same stylesheet. In fact, CSSX aims to make CSS more compositional all the way around. (The patent talks about introducing new inheritance notions into CSS, for example.)

Of course, there are drawbacks to consider. CSSX is not as easy to read or maintain as CSS (but I suppose if your development tools are good enough, this won't matter so much). CSSX is more verbose than CSS. It's doubtless harder to QA-test. But the main drawback, I think, is that it tends to mix presentation logic with non-presentation logic. That's a dangerous place to go.

Unfortunately, Microsoft wants to patent CSSX when it should actually be working with a standards body on it. Does the world really need another proprietary "standard" from Redmond, at this point? What's the point in extending a standard, then trying to patent it?

That part seems really, really stupid to me.

Sunday, January 25, 2009

Most Google employee options are under water

I didn't listen to the recent Google conference call, but according to someone who did, 85% of Google employee stock options are now under water. That's got to put a damper on "company spirit" for everyday workers. I've been in this situation myself (i.e., working for a company where everybody has options, but the options are hopelessly far below the strike price). It is a dreadful feeling, especially if the company you work for has a great history, a great culture, and brilliantly engineered products that should be doing much better in the market than they are.

Everyone knows, of course, that there is no guarantee a company's stock price will go up over time, and most employees are mature about this realization. But it still hurts. Under-water options hurt.

Knowing full well that this kind of thing saps employee enthusiasm and causes the wrong kind of water-cooler conversation, Google last week announced a new option-repricing plan for employees. The features of the plan:
  • It is a one-for-one, voluntary exchange.
  • The offer period begins on January 29, 2009 and ends at 6:00 a.m. Pacific Time on March 3, 2009, unless Google is required or opts to extend the offer period.
  • Employees will be able to exchange their under-water options for new options with a strike price equal to the closing price of Google stock on March 2, 2009.
  • The new options will have a new vesting schedule that adds 12 months to the original vesting schedule.
As it turns out, a company I worked for did this same thing. It offered employees the chance to roll over their existing options into new ones based on a new (current) strike price. But the vesting date moved out. If you were almost-vested (perhaps already vested) in your worthless options, you lost your vesting.

The problem with resetting the clock, of course, is that if the stock keeps sinking, you're still screwed. Also, if you have to be an employee in order to see your options continue to vest, who's to say you'll still be working for the company in a year?

Options have expiration dates. The company I worked for set a shorter expiration date for the new options (in this plan) than the original options had. So the time window for you to see a gain was narrowed. I don't know if that's the case with the new Google plan.

Bottom line, options (as an employee incentive) are tricky. In good times, they do work as an incentive. In bad times, they work as a disincentive (from what I've witnessed). Repricing plans don't always work out. (In the case of the company I worked for, it did not work to the employees' benefit.) In fact, repricing plans generally tend to favor the company, in one way or another. I believe that's the case here. Otherwise, I don't think Google would offer the plan at all.

Friday, January 23, 2009

JSON beautifier

The other day, I wanted to take a look at my Firefox bookmarks file. I could have exported my bookmarks to an HTML file using the Organize Bookmarks dialog, but instead I wanted to just use the existing bookmarks file (the private copy Firefox already uses). It turns out Firefox keeps archives of your bookmarks in

C:\Documents and Settings\[USER]\Application Data\Mozilla\Firefox\Profiles\bookmarkbackups

(on Windows)

and they are formatted as JSON! Trouble is, the JSON text has no newlines or tabs or other spacing, so if you open the bookmarks file in Notepad, you'll see One Big Huge Line of unformatted text.

Unformatted JSON is ugly. But fortunately, there's an answer.

Over at there's an online form that will beautify (pretty-print to your screen) any raw JSON that you paste into the form. It does an exceptionally nice job. Give it a try if you have a need to reformat JSON source.

Tuesday, January 20, 2009

What politicians and company blogs have in common

Forrester Research, in a report called Time To Rethink Your Corporate Blogging Ideas, has confirmed what some of us have long suspected, which is that company blogs are viewed with distrust by the overwhelming majority of people who read them.

Josh Bernoff's research found that out of 18 different possible sources of information (ranging from personal e-mails to newspapers and TV to wikis and online classifieds), corporate blogs rank at the very bottom of the trust scale (18th place), with only 16% of people who read them saying that they trust them.

By comparison, 15% of Americans say they trust politicians (ref).

I was able to download a free copy of the $279 Forrester report at Hopefully the link will still work when you go there.

Monday, January 19, 2009

Data retrieval resource list

I came across this web page that contains a large and interesting list of online resources pertaining to information retrieval and search technologies. It includes books, courses, research articles, SEO tips, and much more.

Scroll down to get to the go0d stuff.

Saturday, January 17, 2009

Dr. Dobbs is (un)dead

This is an incredibly sad day for me.

Dr. Dobbs Journal, one of the great programming resources of the late DOS/early Windows era, has finally died, a victim (ironically) of the Internet's triumph over pulp-and-ink.

The venerable programmer's magazine hasn't exactly gone away entirely: It will (somewhat sadly) continue as "Dr. Dobbs Report — A Special Software Development Monthly Section in InformationWeek Magazine."

But that, too, has the smell of death about it.

To say that I owe a lot to DDJ is an understatement. DDJ was a critical part of my programming education. Allen Holub's early DDJ articles on the newfangled C language taught me a huge amount about programming and profoundly influenced my development as a coder. (Eventually, in 1991, I even wrote an article myself for DDJ.)

It's a sad thing, this disappearance of the printed word, this seemingly unstoppable deprecation of protons and neutrons. Magazines, newspapers, books, music CDs -- all on the endangered species list. Is all of human culture destined to be disseminated by coax cable and microwave radiation?

If you'll excuse me, I have to be alone right now.

Friday, January 16, 2009

The carbon cost of a Google search

Thursday, January 15, 2009

Adware author tells all

There's a truly fascinating interview with Ruby/Lisp/Scheme/C programmer (and onetime adware creator) Matt Knox over at Anybody who has always wondered how adware works, and why it's so infuriatingly difficult to get rid of, needs to read that interview.

It so happens, I recently spent several hours ridding my son's machine of a particularly nasty adware furball. I was able to eradicate most of it, but there were some peculiar registry entries I couldn't get rid of no matter how I tried. Immutable registry entries.

Now I know why such entries can exist.

Matt Knox explains how, in his days working for Direct Revenue (the firm Eliot Spitzer sued a couple years ago, for -- ahem -- propagating Trojans), he created unwritable registry keys by exploiting a little-known difference between the Win32 API and the NT API. "Windows, ever since XP, is fundamentally built on top of the NT kernel," Matt Knox explains. "NT is fundamentally a Unicode system, so all the strings internally are 16-bit Unicode. The Win32 API is fundamentally ASCII. There are strings that you can express in 16-bit counted Unicode that you can’t express in ASCII." (Um, yeah: A Unicode string can contain 16-bit values in which the top 8 bits are zeros. In C, strings are null-terminated, so a Unicode string containing what appear to be null bytes might appear truncated to a process that was not expecting Unicode. )

Matt continues: "That meant that we could, for instance, write a Registry key that had a null in the middle of it. Since the user interface is based on the Win32 API, people would be able to see the key, but they wouldn’t be able to interact with it because when they asked for the key by name, they would be asking for the null-terminated one."

This is just one example (cited by Knox) of the countless Microsoft design weirdnesses that have led to the tragic security mess that is Windows. This sort of thing is why the Spybot database now contains almost a half a million entries, and also why Norton security updates (and Windows updates) will soon be eating 99 percent of available CPU cycles from machines connected to the Internet. And if you read between the lines of Matt Knox's interview, you'll understand that the mischief is really only just beginning.

Take my advice. Read the interview. It's an eye-opener.

Wednesday, January 14, 2009

Making server outages scalable

I mentioned not long ago the server outages that ensued when Windows 7 beta downloads became available. This is not supposed to happen in the world of cloud computing, of course. You're not supposed to be able to bring down "the cloud."

But there've been a number of high-profile cloud failures. Just in the past 90 days:
Question: Where are you supposed to stand when the sky is falling?

Tuesday, January 13, 2009

Lose a Facebook friend, eat a Whopper

Actually, you have to drop 10 friends from Facebook, not just one, in order to get a free Whopper from Burger King.


Monday, January 12, 2009

The stampede away from Vista accelerates

An editor from a well-known "information technology" publication recently asked me to name some tech trends that I thought would be important this year. I told him the stampede to Windows 7 would break Richter scales around the globe and possibly affect the earth's rotation.

Looks like the madness has already begun.

Friday, January 09, 2009

Most Google products make no money

There's a poignant table at Google Blogoscoped that gives a detailed breakdown of 87 Google "products" and services, with an explanation of how they work and what they cost.

The interesting part is that only about 20 of the 87 have an associated revenue model. True, you only need one good one (one really profitable product). But still, why so much entropy?

Thursday, January 08, 2009

No DAM middle ground

Yesterday, eWeek interviewed me for a story, "Midmarket Digital Asset Management Firms Disappearing." The thrust of it is that there are damn few DAM solutions for small to midsize businesses, specifically businesses that need a solution that's scalable and plays well with typical SMB IT infrastructure, all for under $100K. The situation is somewhat odd, given that there are tons of CMS vendors in the $10K to $50K range (plus open-source offerings). In the DAM world there aren't even any serious open-source contenders.

I think there's an opportunity here (arguably) for someone to build a powerful but affordable DAM application that will run atop (or alongside?) an open-source CMS.

Failing that, I'd be happy if someone would just build an Adobe Bridge-to-Alfresco connector.

Monday, January 05, 2009

When Certs Collide

A couple months ago, I mentioned that some Russians had cracked WiFi WPA2 security using a GeForce 8800 graphics processor. I also speculated on what a determined person might be able to do with the fearsome power of multiple Sony PS3 machines networked together.

Now we know. You can hack MD5 security. It's been done: Researchers Jake Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, Alex Sotirov, Marc Stevens, and Benne de Weger successfully used 200 PlayStation 3s (see photo, above) to craft a rogue Certification Authority certificate, based on finding hash collisions in MD5-space. The 40-slide deck describing the work is available here.

According to the researchers, 200 PlayStations is roughly equivalent to 8000 desktop PCs, and the processing power needed to crack a cert based on 128-bit MD5 would require $20K of Amazon cloud time.

Crafting a rogue CA cert means (essentially) the crackers were able to convey Cert Authority status on themselves. What's hilarious is that the bogus cert contains no revocation URL and thus can't (easily) be revoked! For demo purposes, the hackers back-dated their cert to August 2004. A malicious hacker could create a cert that never expires.

After you read the slide deck, you won't know whether to laugh or cry. I did both.