Hashing and hash algorithms are a pet interest of mine. Understanding hashing at a low level takes a fair amount of meditation. Most programmers are too busy for that. Thus hashing is not well understood outside of, say, cryptography circles.
As it turns out, the guy who did the amusing boredom-graph cartoon (see yesterday's blog) also has written one of the best overviews of hashing I've seen in a long time. Be sure to see his excellent Hash Functions and Block Ciphers page as well.
Study the material on Bob's site. Save yourself years of meditation.
Wednesday, July 26, 2006
Tuesday, July 25, 2006
A Timeless Graph

Bob Jensen created this wonderful graph, which confirms what I've long thought: boredom tends to be continuous over its range.
Monday, July 17, 2006
Making XML Smaller
In all the hand-wringing discussions about XML's verbosity that I've read over the years, I have yet to hear anyone suggest simply truncating all closing tags to </>. In other words, if you've got
why not just shorten it to
Verbose closing tags are a pure waste of space (albeit required by XML spec). Abbreviated closing tags don't make the file any less parsable. When the parser encounters </> it knows that the closure is at the nesting level of the previous opening tag. If not, the XML was not well-formed to begin with.
Verbose closing tags are just that. Unneeded verbosity.
<data>
<item>something</item>
</data>
why not just shorten it to
<data>
<item>something</>
</>
Verbose closing tags are a pure waste of space (albeit required by XML spec). Abbreviated closing tags don't make the file any less parsable. When the parser encounters </> it knows that the closure is at the nesting level of the previous opening tag. If not, the XML was not well-formed to begin with.
Verbose closing tags are just that. Unneeded verbosity.
Wednesday, June 28, 2006
When Identity Theft is not Theft
Two years from now, it will not be necessary to steal anyone's identity. Web surfers will have given away more personal info to the world than even the greediest thief would ever want to rip off by illegal means.
I'm not so much talking about static identity info, like your Social Security number (which will be worthless anyway in a year or two). I'm talking about the really interesting dirt. Your shopping habits, reading habits, movewatching habits, hobbies, favorite travel destinations, where you went to school, who you've worked for and how long you stayed at each job, and (let's not mince words) sexual preferences, who your friends are, the names and ages of your children. Most of this info can be scraped, right now today, from blog bios, online resumes, mySpace profiles, tag-sharing sites, social networking sites (like linkedin.com), and photo-sharing sites. Your info is out there. You put it there yourself.
And the bad part is, there's no taking it back. Google archives old pages. So does the Wayback Machine.
You're leaking personal info to the world every time you use an online service of any kind. Particularly the spate of Web 2.0 applications offering free online word processing, spreadsheets, chats, etc. Those are hosted apps. Most of the hosts are trustworthy (arguably), but the hosts tend to archive chatlogs and other interaction records, which means the storage media on which that material is archived can be stolen or lost just like the Veteran's Administration guy's laptop.
Or it can be inadvertantly indexed by Google and exposed to searchers (as has happened with supposedly private test scores).
The outflux of identity info onto the Web is massive, and it's accelerating daily, driven largely by the explosion in popularity of "Web 2.0" apps.
All of which is great news to the National Security Agency, who by some accounts are sifting through your data right now.
I'm not so much talking about static identity info, like your Social Security number (which will be worthless anyway in a year or two). I'm talking about the really interesting dirt. Your shopping habits, reading habits, movewatching habits, hobbies, favorite travel destinations, where you went to school, who you've worked for and how long you stayed at each job, and (let's not mince words) sexual preferences, who your friends are, the names and ages of your children. Most of this info can be scraped, right now today, from blog bios, online resumes, mySpace profiles, tag-sharing sites, social networking sites (like linkedin.com), and photo-sharing sites. Your info is out there. You put it there yourself.
And the bad part is, there's no taking it back. Google archives old pages. So does the Wayback Machine.
You're leaking personal info to the world every time you use an online service of any kind. Particularly the spate of Web 2.0 applications offering free online word processing, spreadsheets, chats, etc. Those are hosted apps. Most of the hosts are trustworthy (arguably), but the hosts tend to archive chatlogs and other interaction records, which means the storage media on which that material is archived can be stolen or lost just like the Veteran's Administration guy's laptop.
Or it can be inadvertantly indexed by Google and exposed to searchers (as has happened with supposedly private test scores).
The outflux of identity info onto the Web is massive, and it's accelerating daily, driven largely by the explosion in popularity of "Web 2.0" apps.
All of which is great news to the National Security Agency, who by some accounts are sifting through your data right now.
Tuesday, June 20, 2006
RoR Gaining on Atkins Diet
It's official: Ruby on Rails is about to edge past the Atkins diet for popularity.
RoR has also overtaken West Nile virus.
It has to be true. I saw it on Google Trends.
RoR has also overtaken West Nile virus.
It has to be true. I saw it on Google Trends.
Thursday, June 08, 2006
Spring Framework Backlash
It's refreshing (and healthy, I think) to see open, honest debate erupt over the usefulness of IoC frameworks, in particular the certifiably trendy Spring framework. I refer to Bob Lee's gratifyingly blunt I Don't Get Spring.
Surprisingly, most of the comments at the end of Lee's blog are dispassionate, logical, and in full agreement with Lee's premise, which (to oversimplify) is that Spring is cryptic, over-architected, and malodorous at a code level (among other felonies), begging the question of why anyone would use it.
I can understand why Lee would feel that way. He's right on most counts. Spring is indeed byzantine and heavy (as most things surrounding J2EE are), and buries too many dependencies in XML. But that doesn't mean Spring doesn't have its legitimate uses.
Surprisingly, most of the comments at the end of Lee's blog are dispassionate, logical, and in full agreement with Lee's premise, which (to oversimplify) is that Spring is cryptic, over-architected, and malodorous at a code level (among other felonies), begging the question of why anyone would use it.
I can understand why Lee would feel that way. He's right on most counts. Spring is indeed byzantine and heavy (as most things surrounding J2EE are), and buries too many dependencies in XML. But that doesn't mean Spring doesn't have its legitimate uses.
Monday, June 05, 2006
JVM as Web-Service Endpoint
Imagine if you could ping a running JVM over HTTP to obtain realtime diagnostic info. That seems to be what Sun has in mind with U.S. Patent 7,039,691, "Java Virtual Machine Configurable to Perform as a Web Server," granted to Sun Microsystems last month.
Abstract: A virtual machine, such as a Java(tm) virtual machine, is configured to operate as a web server so that users, using a browser, can make general-purpose inquiries into the state of the virtual machine or, in some cases, mutate the state of the VM. A "browsable" VM contains a network traffic worker, such as an HTTP thread, a services library, and a VM operations thread, which is an existing component in most virtual machines. The network traffic worker and the VM operations thread communicate through a request data structure. The VM operations thread generates a reply to the request upon receiving a request data structure from the traffic worker. Such a reply can be in the form of an HTTP response containing HTML or XML pages. These pages are transmitted back to the browser/user by the network traffic worker.
Abstract: A virtual machine, such as a Java(tm) virtual machine, is configured to operate as a web server so that users, using a browser, can make general-purpose inquiries into the state of the virtual machine or, in some cases, mutate the state of the VM. A "browsable" VM contains a network traffic worker, such as an HTTP thread, a services library, and a VM operations thread, which is an existing component in most virtual machines. The network traffic worker and the VM operations thread communicate through a request data structure. The VM operations thread generates a reply to the request upon receiving a request data structure from the traffic worker. Such a reply can be in the form of an HTTP response containing HTML or XML pages. These pages are transmitted back to the browser/user by the network traffic worker.
Thursday, June 01, 2006
Metacompilers and Checkers
Imagine if your favorite compiler were extensible in such a way that you could add your own custom static checks, to find bugs of a special kind that you need to be able to find but that your compiler is too stupid to know about out-of-the-box. That's the intuition behind metacompiler (MC) technology. You write a checker, which is a snap-in that knows how to check for whatever kind of syntactic or other blunder you care about, and add it to the compiler. Then the compiler knows how to emit new warnings or error messages.
A checker can be as simple or as sophisticated as you want it to be. Maybe you want to be sure that every call to foo( ) is eventually followed by a corresponding call to bar( ). Or you may have application-specific security concerns (in the context of export laws, perhaps). Or you may have company policy around certain syntactical idiosyncracies that would only be of specific concern to your department or your company.
Interestingly, the Stanford MC guys did a pass against the Linux kernel using their own custom checkers plugged into their own MC-aware gcc and found almost 600 potentially serious bugs, most of which have not been looked into yet (if you believe Coverity's latest findings).
A checker can be as simple or as sophisticated as you want it to be. Maybe you want to be sure that every call to foo( ) is eventually followed by a corresponding call to bar( ). Or you may have application-specific security concerns (in the context of export laws, perhaps). Or you may have company policy around certain syntactical idiosyncracies that would only be of specific concern to your department or your company.
Interestingly, the Stanford MC guys did a pass against the Linux kernel using their own custom checkers plugged into their own MC-aware gcc and found almost 600 potentially serious bugs, most of which have not been looked into yet (if you believe Coverity's latest findings).
Wednesday, May 31, 2006
Brendan Eich JS-Futures Update
If you're a serious user of Javascript, you must stop reading now and immediately go to:
http://developer.mozilla.org/presentations/xtech2006/javascript/
http://developer.mozilla.org/presentations/xtech2006/javascript/
Tuesday, May 23, 2006
Continuations Thought Harmful
In late March, I blogged a couple times about continuations. Suddenly, Sun's Tim Bray and Gilad Bracha have broached the subject, stimulating much heated discussion in the blogosphere. Much heat, little useful work at the crankshaft.
Of all the recent posts on this surprisingly controversial subject, I find Curtis Poe's the most clueful.
Of all the recent posts on this surprisingly controversial subject, I find Curtis Poe's the most clueful.
Friday, May 19, 2006
Putting a Face on AJAX
This online facial-compositing app is the weirdest thing ever. It lets you merge facial features (from actual photos) together to create your own police composite sketches, kind of.
I spent 30 minutes fooling with it. Everything came out looking like Pia Zadora.
I spent 30 minutes fooling with it. Everything came out looking like Pia Zadora.
Wednesday, May 10, 2006
AJAX as a Man-in-the-Middle Architecture
A friend at work showed me Gabbly, which is an AJAX IM-chat pushlet that gives the appearance of putting a chat window over the top of any web page you choose (kind of like gmail-chat).
Odd thing is, it even worked for us when we set the URL to a secure wiki page inside the company firewall.
We promptly exited our Gabbly session and began chatting about it on Groupwise Messenger (our company standard). The whole experience was freaky and left us with serious security worries. Especially when Firefox crashed on me within minutes of leaving the Gabbly-iframed page.
According to a discussion at Ajaxian, Gabbly is indeed vulnerable to cross-site scripting attacks. But I'm equally worried about things like Gabbly JS code being able to walk up to the _top frame and read a supposedly secure container page (not to mention issues around Gabbly.com slurping our plaintext conversation in real time). Likewise, there's nothing stopping the Gabbly server from stomping on any Javascript code that's already in-scope in your page.
The thought of people using a 3rd-party-hosted chat app like this at work scares the hell out of me.
But that's the trouble with things like shorttext.com, ajaxwrite.com, and other free-neato-trendy AJAX "services": They require you to rely on the trustworthiness of the host. I put it too delicately. These are man-in-the-middle applications.
User beware.
Odd thing is, it even worked for us when we set the URL to a secure wiki page inside the company firewall.
We promptly exited our Gabbly session and began chatting about it on Groupwise Messenger (our company standard). The whole experience was freaky and left us with serious security worries. Especially when Firefox crashed on me within minutes of leaving the Gabbly-iframed page.
According to a discussion at Ajaxian, Gabbly is indeed vulnerable to cross-site scripting attacks. But I'm equally worried about things like Gabbly JS code being able to walk up to the _top frame and read a supposedly secure container page (not to mention issues around Gabbly.com slurping our plaintext conversation in real time). Likewise, there's nothing stopping the Gabbly server from stomping on any Javascript code that's already in-scope in your page.
The thought of people using a 3rd-party-hosted chat app like this at work scares the hell out of me.
But that's the trouble with things like shorttext.com, ajaxwrite.com, and other free-neato-trendy AJAX "services": They require you to rely on the trustworthiness of the host. I put it too delicately. These are man-in-the-middle applications.
User beware.
Thursday, May 04, 2006
Stallman on MSWord Attachments
Recently a friend reminded me of this discussion (old but still relevant) by Richard Stallman of Word attachments and why they're basically the work of Satan.
Friday, April 28, 2006
YAMSWK (Yet Another M$Word-Killer)
My nomination in the category of "best AJAX-based Word workalike" for this week is Zoho Write, one of a suite of impressive Zoho apps. It took a while (30sec) for Firefox to pull down all 51 external .js scripts, but when the app opened, it was a thing of beauty. Imagine my abject stupefaction upon using the Import button to suck in a complex (many tables, many fonts) .sxw file, and seeing it open without errors, looking just the way it should! Yes, Zoho Write handles OpenOffice files. Just as Nature intended.
Unlike a lot of Web2.0 apps, Zoho is not the product of a teenager locked in a closet. Behind the Z-suite is a ten-year-old company, AdventNet, with offices around the world.
This is starting to get exciting.
Unlike a lot of Web2.0 apps, Zoho is not the product of a teenager locked in a closet. Behind the Z-suite is a ten-year-old company, AdventNet, with offices around the world.
This is starting to get exciting.
Thursday, April 27, 2006
109 Laughs
Assignment: Write a 3-dozen-line XML file that will lock up any modern browser.
Answer: See The Billion Laughs attack.
Answer: See The Billion Laughs attack.
Tuesday, April 25, 2006
Big Blue: Leaders in Teleportation?
No one will ever accuse Big Blue of clairvoyance. But they just may have a handle on teleportation.
Just for fun, go to IBM's site and do a search on "teleportation."
You'll get 19 hits.
Just for fun, go to IBM's site and do a search on "teleportation."
You'll get 19 hits.
IBM Game Research
The IBM Systems Journal is one of those rare publications that you wish would come out more frequently (just the opposite of drain-clogs like eWeek, which I wish would come out half as often). The journal's content is uniformly excellent, and the subject matter frequently delights. Such is the case with Volume 45, Number 1, 2006, devoted entirely to (of all things) Online Game Technology.
Wednesday, April 19, 2006
Stacklessness
I blogged a while ago about continuations, which may play a role in making AJAX scale well. Today I learned that continuations have been implemented (on an experimental basis) in Mono's virtual machine.
I'm not a Python person so I didn't realize (until after Googling around a bit) that the so-called microthreads of Stackless Python are a way of achieving the same thing.
The key intuition behind stacklessness is that you move everything that would normally be kept on "the stack" out to a data structure on the heap. Therefore one thread can jump between potentially tens of thousands of execution frames.
The ability to run huge numbers of processes concurrently is obviously important in many kinds of applications. If AJAX becomes another driver of this technology, it'll be interesting to see who'll be first to implement a stackless-Java virtual machine.
I'm not a Python person so I didn't realize (until after Googling around a bit) that the so-called microthreads of Stackless Python are a way of achieving the same thing.
The key intuition behind stacklessness is that you move everything that would normally be kept on "the stack" out to a data structure on the heap. Therefore one thread can jump between potentially tens of thousands of execution frames.
The ability to run huge numbers of processes concurrently is obviously important in many kinds of applications. If AJAX becomes another driver of this technology, it'll be interesting to see who'll be first to implement a stackless-Java virtual machine.
Thursday, April 13, 2006
XQuery Engines Compared
While digging around for news/views on JRockit, I happened to stumble onto an XQuery-engine comparative evaluation by (of all people) the Washington Publishing Company, a seller of EDI and HIPAA publications. In case you don't have time to wade through the full study (which is a good read, incidentally), the bottom line is, for maximum performance, robustness, and flexibility, you want the Saxon engine running atop the BEA JRockit JVM.
Wednesday, April 05, 2006
How to Comment AJAX Code
Lately I've been perusing some of Oracle's Javascript code from its ADF Faces. I see that it's extraordinarily well commented.
I'm looking at it in OpenOffice, so just for fun, I tell OOo to do a regex-search on
//.*$
and globally replace that with zilch, thereby wiping out all comment lines.
The result? With comments, Oracle's Core.js file is 140 KB. Without comments: 95 KB. Imagine: almost 50K of comments in a 140K file.
I don't think I've ever seen such well-commented code in any language, ever.
Kas
I'm looking at it in OpenOffice, so just for fun, I tell OOo to do a regex-search on
//.*$
and globally replace that with zilch, thereby wiping out all comment lines.
The result? With comments, Oracle's Core.js file is 140 KB. Without comments: 95 KB. Imagine: almost 50K of comments in a 140K file.
I don't think I've ever seen such well-commented code in any language, ever.
Kas
Subscribe to:
Posts (Atom)