Subscribe to our mailing list

* indicates required
Close

Wednesday, March 18, 2009

Making one line of code do the work of 47

Every year or two, I go back and re-read certain papers, presentations, and book chapters that have inspired me or been the source of priceless "Aha moments." One paper that I periodically revisit is John K. Ousterhout's Scripting: Higher Level Programming for the 21st Century, written when Ousterhout was affiliated with Interwoven. (He's currently a research professor at Stanford.)

If you're not familiar with John K. Ousterhout, it might be because you don't use Tcl (Tool Command Language, the scripting language created by Ousterhout). Sun Microsystems hired Ousterhout in 1994 specifically to accelerate the development of Tcl. It turns out that Sun's CTO at the time (the person who hired Ousterhout) was Eric Schmidt. (Yes, that Eric Schmidt.) The whole story of the creation of Tcl (an interesting tale in its own right) is told by Ousterhout here.

In any case, if you haven't yet encountered Ousterhout's excellent "Scripting" paper (originally published in IEEE Computing), I recommend that you check it out. Don't be misled by the 1998 date. It's still a very relevant paper.

Rather than argue for or against scripting languages, Ousterhout lays out the philosophy (and relative benefits) of various kinds of languages and explains, often with recourse to real-world data, why certain languages are advantageous in certain situations and others are not.

Ousterhout talks about the productivity-multiplier effect of high-level languages:
On average, each line of code in a system programming language translates to about five machine instructions, compared to one instruction per line in assembly language (in an informal analysis of eight C files written by five different people, I found that the ratio ranged from about 3 to 7 instructions per line[7]; in a study of numerous languages Capers Jones found that for a given task, assembly languages require about 3-6 times as many lines of code as system programming languages[3]). Programmers can write roughly the same number of lines of code per year regardless of language[1], so system programming languages allow applications to be written much more quickly than assembly language.
The same effect applies when going from a high-level compiled language to a scripting language, except that the multiplier effect is even greater. As Ousterhout says: "A typical statement in a scripting language executes hundreds or thousands of machine instructions." The net result is summarized in the following graph.



This graph produced a kind of "Aha!" moment for me, because I realized (in a way I somehow hadn't, before) that scripting languages were all about code reuse; that if I could reduce the number of lines of code I write, I can (almost by definition) reduce the number of bugs I write; and that if one can accomplish an operation by calling a library method (such as the String "replace" method in JavaScript), scripted code can run at compiled-language speed, because a scripting language's built-in methods are implemented in C++ (Spidermonkey) or Java (Rhino).

If you haven't read Ousterhout's paper before, I don't want to spoil the suspense here. Suffice it to say, the paper gives a balanced account of the strengths and weaknesses of languages of all kinds; there's no language-bigotry, no theological diatribes. It's a concise and eloquent treatment of a tricky topic, and considering the year in which it was written, it's quite a prescient piece in many ways. On top of everything else, it's just plain entertaining to read -- a rarity these days, online or off-.

12 comments:

  1. very nice thought ... and actually I found it very freaking useful..

    ReplyDelete
  2. Whenever I give a talk on Javascript best practices, I always have to stress that -- unlike other languages / environments -- Javascript often runs faster when you write LESS of it. Less to download, less to parse, less to interpret. There are exceptions, of course.

    But the real speed kicker happens when you leverage the "native" speed of the compiled resources (as you state).

    Other native-speed freebies include:

    Regexes

    Array.join() rather than string concatenation (6x-8x speed boost in IE6, 4x speed boost in IE7, not much diff in Safari or Chrome, and up to 20% increase in FF3.1).

    Native Javascript 1.6 Array iterators such as forEach, some, and every (these are abstracted and mimicked for IE in all major toolkits)

    In a browser environment, using CSS to position widgets and content also provides a speed boost -- as much as 50x (or more in really complex situations)!

    Btw, the switch statement in Javascript is not optimizable like it is in compiled languages. The general consensus amongst Javascript ninjas is to use if-else since it creates more highly-structured code.

    ReplyDelete
  3. I find it interesting how as C/C++/Java each grow more strongly typed, their instructions per statement count rises, and I am led to wonder if the grouping draw is truly how they are typed, or if it has more to do with the fact that the languages are so related, both in terms in syntax and history.

    I would be very curious to see how Python and haskell rate on this graph - Python because it is dynamic and often thought to be 'executable psuedocode', and Haskell because it is even more strongly typed than Java, to such an extend that function signatures can be inferred.

    ReplyDelete
  4. "Instructions per statement" is a deceptive metric. Take the simple statement

    a=b+c;

    where a, b, and c are all 32 bit integers. A language like C might be able to compile that into 1-3 instructions depending on the instruction set and where the variables are stored.

    In classic TCL, strings are the only data type, so the the system needs to convert the strings to numbers, add the numbers, then convert back to strings. It saves the programmer from having to explicitly convert strings on input and output, but it's just busywork when you're in the middle of a big calculation...

    Modern static languages are greatly reducing the verbosity tax: yes, Java is stagnating, but Microsoft is doing everything it can to reduce the verbosity of C# -- I've seen C# code that's about as concise as Ruby.

    If you don't like Microsoft, there's Scala for the JVM and oCaml with an entirely open-source implementation: both languages with functional programming support powerful type systems, type inference and a focus on concise syntax.

    Statically typed languages support IDEs that compensate for the verbosity. These days I wouldn't dream of programming Java in emacs when Eclipse has good tools to refactor and navigate in your source code. A 50% increase in verbosity is nothing if you get automation that makes it 50% easier to handle the code.

    Of course, technologies from static languages are diffusing back into dynamic languages, which is exciting. Dynamic language runtimes can use type inference to elide runtime type checks, and, someday, I think we'll see intelligent IDE's that can really understand code in low-dynamic languages such as Javascript. high-dynamic languages such as PHP and Python (those that support constructs like __get and $_GLOBALS) might be a little harder.

    ReplyDelete
  5. @unscriptable

    Array.join is so much faster in IE because the string library in IE is notoriously slow. Without getting too wonky in a comment, it's largely because of the way the IE Jscript interpreter uses memory when concatenating strings.

    ReplyDelete
  6. I don't think writing in a higher level language reduces the number of bugs you write, especially not for the reason you give: fewer lines written.

    As the quoted paper says, "Programmers can write roughly the same number of lines of code per year regardless of language." Since bugs are concerned with the logic of what you write directly (the lines), rather than the underlying assembly, you'll write roughly the same number of bugs per year regardless of language. They'll be different species of bugs, but they'll be there: "Silly me, I meant to write a web server in that one line of code, not a word processor."

    But not to quibble. Absent other needs and requirements, the benefit of higher leverage in higher level languages is attractive. Thanks for the pointer.

    ReplyDelete
  7. @devonianfarm: I'm not sure which version "classic" is, but Tcl has not done that for many years, if ever.

    When it sees you using a variable as a list, it stores it internally as a list. When it sees you using a variable as an integer, it stores it as an integer. And so on.

    If you mix-and-match (say) string and list operations on the same variable, then it's true your performance will suffer (and Tcl-heads have some funny word I can never remember for when it has to convert back-and-forth all the time). This is really no different from other languages -- only less explicit.

    cheers!

    ReplyDelete
  8. @devonianfarm: "Dynamic language runtimes can use type inference to elide runtime type checks, and, someday, I think we'll see intelligent..."

    This is what I think is so cool about Common Lisp, you get both generality and then, if you want it, speed. I can do addition with generalized numbers (can be fractions, complex numbers, ints, integers, or floats), or if I want speed, I tell the compiler that yes, these are just supposed to be 32-bit ints and then the code is compiled down without all the extra infrastructure. The speed of the compiled code is close to C.

    ReplyDelete
  9. @ken: It was true for classic Tcl up to Tcl 8.0. All newer versions have efficent internal reps that do things much more clever. The funny word is 'shimmering'..., it describes the situation when the internal datatype changes which should only be noticeable via performance (with rare exceptions for some C extensions that do nasty things with pointers).

    ReplyDelete
  10. @colin: Python will often perform worse than Tcl when you count lines for meaningful problems (Python usually wins for the typical benchmark style problems, because of its more concise syntax, Tcl wins because it makes it way easier to write a DSL for your problem, which is often a good approach for real world problems).

    ReplyDelete
  11. "I've seen C# code that's about as concise as Ruby."

    Please show me the C# code that is as concise as Ruby.

    It makes me wonder what kind of thing you use with Ruby, because I can guarantee that you are simply wrong.

    ReplyDelete
  12. shevegen, how about filtering a list for elements less than 3?

    Ruby:
    list.select { |n| n < 3 }
    C#:
    list.Where(n => n < 3)

    ReplyDelete

Add a comment. Registration required because trolls.