Thursday, December 31, 2009

The year in blogs: Yours Truly @ CMS Watch

I wrote a number of blog posts for CMS Watch this year (40 in all, which is far more than I remember writing, actually; but Google doesn't lie). As a free service to those suffering from chronic insomnia, I hereby offer this consolidated listing of my 2009 CMS Watch blog posts, arranged chronologically.
  1. Time to Tame the Apache Menagerie (Dec 8, 2009)
  2. Day sets up shop in Boston (where tech firms go to be ... (Nov 23, 2009)
  3. RFI as rich asset (Nov 17, 2009)
  4. IBM, Lucene, and the future of search (Nov 11, 2009)
  5. Solr heads for an even sunnier future (Oct 28, 2009)
  6. Usability still improving -- improvement still needed (Oct 19, 2009)
  7. Terracotta offers bolt-on distributed caching (Oct 8, 2009)
  8. Where did all the HTML editors go? (Sep 19, 2009)
  9. New Course on Web Development Platforms (Sep 15, 2009)
  10. Thoughts on the Future of Content Management (Aug 31, 2009)
  11. Recommind productizes its categorization engine (Aug 18, 2009)
  12. Thinking beyond the RFP (Aug 3, 2009)
  13. Day reports sunny results for 1H2009 (Jul 30, 2009)
  14. Are we reaching the limits of UI buildout? (Jul 28, 2009)
  15. Interest in Lucene continues to accelerate (Jul 14, 2009)
  16. In defense of silos (Jul 9, 2009)
  17. Clickability shows how not to write a white paper (Jul 1, 2009)
  18. The Coming Acronym Crisis (Jun 25, 2009)
  19. Vignette bets big on beta-SaaS (Jun 18, 2009)
  20. In DAM, Flashy does not always mean Flex (Jun 11, 2009)
  21. At Henry Stewart DAM Symposium: A Grey New World (Jun 2, 2009)
  22. Open Text buys Vignette: Investment or impulse? (May 6, 2009)
  23. Adobe: an elephant in the DAM room? (May 4, 2009)
  24. Open Text goes to the (eye) candy store (Apr 10, 2009)
  25. We live in interesting DAM times (Apr 8, 2009)
  26. Are you investing in technology, or people? (Apr 6, 2009)
  27. It's time for seat-based software licensing to end (Mar 25, 2009)
  28. OASIS blesses UIMA - What does it mean? (Mar 20, 2009)
  29. DAM vendor Ancept under new ownership (Mar 6, 2009)
  30. A reality checklist for vendors (Feb 26, 2009)
  31. Day releases not-so-sunny financial results (Feb 25, 2009)
  32. Software vendors need to understand how the web really works (Feb 17, 2009)
  33. Thoughts on Google Monoculture and the Cloud (Jan 31, 2009)
  34. IBM, Microsoft, and the patent mess - how to protect yourself (Jan 31, 2009)
  35. Day tries pay-as-you-go licensing (Jan 28, 2009)
  36. Startup offers commercial support for Lucene (Jan 26, 2009)
  37. What next for Interwoven? (Jan 23, 2009)
  38. Alfresco unveils a major upgrade (Jan 21, 2009)
  39. Vignette Village 2009 cancelled (Jan 14, 2009)
  40. Green IT versus blue sky (Jan 12, 2009)

Monday, December 14, 2009

Remembering the VIC-20

Does anyone else remember the VIC-20? Am I dreaming? Did this $299 consumer appliance (the first "personal computer" to ship a million units) really transform people's lives? Or just mine?

I wonder if William Shatner remembers being in this ad?

Commodore's computer-in-a-keyboard, you may recall, came with a grand total of 5 KB of RAM (enough to run the operating system and leave 3583 bytes left over for you, the discerning consumer, to play with). Fear not, though: RAM was expandable up to 40 KB with an add-on memory cartridge.

Does anyone else recall logging onto CompuServe with a 300-baud modem, using the VIC-20 wired to a TV as a monitor? Or am I showing my age?

On second thought, don't answer that.

Wednesday, December 09, 2009

NoSQL Required Reading

I've been trying to follow the fast-moving world of NoSQL lately, and -- like a visit to the carnival funhouse -- it has left me with double vision, queasy stomach, and a staggering gait. (And it's not even Saturday morning...) Yet I find myself coming back for more.

If you're new to NoSQL, you'll want to do a bit of background reading. I'll keep this quick and limit my recommendations to just the essentials:

1. The Amazon Dynamo paper is classic. Almost everyone in the NoSQL world has read this paper.

2. Google's Bigtable paper. Again, very widely read.

3. Werner Vogels's "Eventually Consistent" (originally published in ACM Queue) is absolutely the one article you should read if you're not clear on the rationale behind "eventual consistency."

4. Brewer's CAP Theorem (a foundational bit of scalability theory) is well-explained here. Also see Brewer's original slides from his famous July 2000 PODC keynote.

5. The slideshows from the June 11, 2009 NoSQL meetup in SFO bring to mind adjectives like classic, influential, seminal, pivotal, memorable. Ignore these decks at your peril.

6. SQL Databases Don't Scale is short, basic, and to-the-point. Essential background info if you're not already a battle-scarred DBA with scalability wounds.

7. For a tabular overview of major distributed databases and how they compare with each other, see NoSQL Ecosystem by Jonathan Ellis. A similar effort is the Quick Reference to Alternative data storages page. Ellis's post is noteworthy for its clueful, concise, helpful narrative (in addition to the tables). The Quick Reference page is mainly tables -- but the tables are more complete than Ellis's.

Other Essential Resources -- This site bills itself as "Your Ultimate Guide to the Non-Relational Universe!", and also self-assuredly calls itself "the biggest nosql link archiv in the web." It's worth knowing about, certainly.

IMHO, all fully conformant NoSQL geeks MUST follow @nosqlupdate on Twitter.

Conformant geeks SHOULD follow @al3xandru (creator of the excellent MyNoSQL blog and NoSQL Week in Review). NoSQL Week in Review is new. I'm hoping it will be updated regularly. It's excellent.

You MAY want to read recent blog posts by Ricky Ho that aptly summarize key aspects of distributed data-store technology. Two noteworthy examples: Query Processing for NoSQL Databases, and his widely read NoSQL Design Patterns post.

That SHOULD be enough to get you started. ;)

Hadoop and Solr popularity continue to scale well

Blue lines: Hadoop
Red lines: Solr

A quick check of Google Trends shows that Apache Solr (the search server based on Lucene) and Hadoop (the open-source implementation of MapReduce) are popular query terms -- and becoming more popular by the day. (For links to the news stories labelled with flags 'A', 'B', 'C', etc., go to this Google Trends page.)

Likewise: Job trend data from leaves little doubt that Hadoop and Solr skills are increasingly in demand:
Bottom line? If you're a developer, enrichening your Hadoop and/or Lucene+Solr skills can only be considered a good investment.

Wednesday, December 02, 2009

Will HTML5 be SQL-free?

The Los Angeles Times story about Google deprecating Gears in favor of "HTML5" got quite a bit of attention in the Twitterverse yesterday, and has the blogosphere abuzz now, as well.

There are several interesting aspects to the story. One is that the name Microsoft doesn't come up at all. Instead, Apple figures rather prominently in the Times story. In fact, the Times's depiction of Google, Apple, and W3C deciding the fate of the post-2.0 Web evokes images of the Big Three debating Europe's postwar reorganization at the Yalta Conference. One gets the (fanciful) impression that Microsoft's future is, to some extent, being decided without anyone from Redmond being present. Of course, that's not quite true. ;)

Another interesting aspect of the Times story is that it talks about HTML5 wrapping the various technologies that will (ostensibly, soon) make Gears superfluous, when technically speaking, many of the functionalities being attributed to HTML5 in the Times story are, in fact, not part of the HTML5 specification at all. They are part of various other WebApps Working Group specs.

Be that as it may, the decision facing the browser-makers at this point is what kind of offline storage to use for browser-mediated web apps. Specifically, will the underlying store support SQL, or not?

This is (trust me) a Huge Hairy Issue -- HHI(tm) -- and don't let the Times or anybody else tell you otherwise: It's far from being settled yet.

HTML5 talks about SQL quite openly. And it appears Opera, Safari, and (soon) Chrome are implementing WebDB, which is a SQL database in the spirit of the (emerging) Web SQL Database spec. But that's not to say WebDB is a traditional SQL database. It implements SQLite, which is another beast entirely.

Know well, though, not everyone wants SQLite -- or SQL, for that matter. In fact,
Microsoft's Adrian Bateman has stated that Redmond probably will not go that route. In a WebApp WG teleconference, Bateman said:
Microsoft's position is that WebSimpleDB is what we'd like to see
... we don't think we'll reasonably be able to ship an interoperable version of WebDB
... trying to arrive at an interoperable version of SQL will be too hard
WebSimpleDB, also known as the Nikunj proposal (in deference to the author, Nikunj R. Mehta, of Oracle Corporation), proposes a key-value store of the NoSQL variety. And interestingly enough, this approach is getting serious consideration not only from Microsoft but from Mozilla as well. (In the aforementioned teleconference, Mozilla's Jonas Sicking said: "We’ve talked to a lot of developers, the feedback we got is that we really don’t want SQL...")

It's too early to know how it will all play out. About the only thing that's certain at this point is that Google has (thankfully) decided it's more important to back-burner proprietary approaches to web-app infrastructure than to stay on board with mainstream industry standards, even if those standards are (in some cases) still quite fluid and ill-formed. One hopes Microsoft will learn this lesson too. Otherwise? Yalta will decide.

Where Google's power goes

Ever wonder where all the electrical power ends up being used in a Google data center? This is the approximate breakout according to a recent book by Google engineers Luiz André Barroso and Urs Hölzle‌.

Tuesday, December 01, 2009

Unexpected relationship between hard-drive life and temperature

Today, I was reading Failure Trends in a Large Disk Drive Population [PDF], a February 2007 paper by Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso of Google, containing lots of great data on hard-drive failures and the difficulty of predicting same. The above graph depicts one of the more interesting findings, which is that the effect of operating temperature on disk reliability appears to vary with disk age, such that younger drives tend to be more susceptible to low-temperature failures, whereas older drives tend to be more susceptible to failure at elevated temperatures. Results are grouped by age of disk at failure, then broken out into subgroups (histogram bars) based on their operating temperature. So for example, among disks that failed at 3 years of age, the Annualized Failure Rate (AFR) was about 15% for those disks that had had operating temps of 45 deg. C or more, versus a fail rate of 5% for those that had seen temps of less than 30 deg. C.

Many people have assumed that high temps are bad for disks. And indeed maybe they are bad for 3-year-old disks, but disks that fail at younger ages tend to be much more traumatized by cold than by heat. Pinheiro et al. give additional data for this, and it's pretty convincing. For example, if you have a look at Fig. 4 of the paper, you'll see a bathtub curve, showing that extremes of temperature are deleterious to disk life expectancy. Ironically, the bathtub curve reaches its lowest point at around 38 deg. C -- very close to human body temperature.