Monday, March 23, 2009

How to write fast code

There was a time, early in my programming career, when I needed to rewrite a particular program (a very small one) to make it run faster. I was quite new to programming and thought that the way to get something to run faster was to rewrite it in assembly. In those days, you could unroll a loop in assembly and pretty much count on getting a worthwhile speedup, if it was a tight loop to begin with.

Fortunately, I had a fabulous mentor in those days, a coder with wisdom and experience far beyond his years. The person in question was a first-class code ninja and a master circuit designer, a genius of Woz-like proportions. Silicon obeyed him the way marble obeyed Michelangelo.

When it came to code, John could do astounding things. He could optimize (and did optimize) virtually any algorithm for any situation, and do it in so little code that you'd sit there studying the printout, wondering where the heck the algorithm went! I remember John had this peculiar way of making loops vanish, for example. They'd turn into table-lookups or recursion or self-modifying code, or some combination of the three.

One day my mentor asked me what I was working on and I told him. I mentioned that I was frantically searching for a way to speed up my little program. I described a few of the things I'd tried so far. He listened intently.

When I was done talking, John gave me some of the most profound advice any programming expert has ever given me. (It was profound for me, at the time. Maybe it'll be stupid-sounding to you.)

"The CPU," he said, "runs at a certain speed. It can execute a fixed number of instructions per second, and no more. There is a finite limit to how many instructions per second it can execute. Right?"

"Right," I said.

"So there is no way, really, to make code go faster, because there is no way to make instructions execute faster. There is only such a thing as making the machine do less."

He paused for emphasis.

"To go fast," he said slowly, "do less."

To go fast, do less. Do less; go fast. Yes, of course. It makes perfect sense. There's no other way to make a program run faster except to make it do less. (Here, when I say "program," I'm not talking about complex, orchestrated web apps or anything with fancy dependencies, just standalone executables in which there's a "main loop.")

Key takeaway: Don't think in terms of making a slow piece of code run faster. Instead, think in terms of making it do less.

In many cases, doing less means using a different algorithm. Then again, it may be as simple as inserting a few if-elses to check for a few trivial (but frequently encountered) "special cases" and return early, before entering a fully-generalized loop.

It may mean canonicalizing your data in some way before passing it to the main routine, so that the main routine doesn't have to include code that checks for corner cases.

The tricks are endless, but they end up with the CPU doing less, not more; and that's the key.

The "go fast do less" mantra has been a valuable one for me, paying off in many ways, in many situations, over the years. It has helped me understand performance issues in a different kind of way. I'm grateful to have been exposed to that concept early in my career. So I provide it here for you to use (or not use) as you see fit.

Maybe you received a similarly influential piece of advice early in your career? If so, please leave a comment. I'd love to hear about it.


  1. While I agree on a lot of your points, and algorithmic optimizations (i.e., making the CPU do less) are arguably the most important optimizations you can make to a program, you're missing a whole family of closer-to-the-metal tricks that can be done.

    The CPU by no means executes a fixed amount of instructions per second. The things mostly affecting the CPU's performance (in terms of instructions per second) are cache misses and pipeline stalls (especially mis-predicted branches).

    Just by re-organizing your data such it gets accessed coherently may gain you immense speed-ups. A great example is quicksort vs. radix exchange sort. Quicksort is often faster even though it is in O(n log n) operation while a radix exchange executes in O(n) -- this happens even with a relatively large n. The reason is that the quicksort partitions its data into chunks that fit into the cache as is therefore a whole lot more coherent with its memory accesses.

  2. very very true.
    I have a Nokia tablet and have been working towards creating the UI I wanted it to have when I got the device.

    The system by default runs a variant of linux called maemo and has the full x11 window stack running on a little cpu (upto 400mhz) and slow gpu and people said it simply was not possible to get flashy effects out of it :)

    I have had to start literally from scratch and have made sure that I obtain the desired effect whilst staying within the extremely limited timing overhead available to me.

    the results have left people breathless :)

    moral of the story, your friend is absolutely right, remove cruft and build on small concrete working fragments and your system should optimize itself :)

  3. I've always found deep insight in the mantra:
    "The fastest code is the code that never executes." I guess that's essentially the same thing.

  4. Anonymous9:04 AM

    Sorry, just browsed on in from somewhere.

    Anyways, yes petrikero very true. I do agree so don't think my comment is to say you are wrong.

    I was just thinking though that couldn't that also be looked at as another case of making the CPU do less? It's just less cache misses and pipeline stalls instead of less instructions, but nevertheless it's less of something.

    I also wonder if multiple cores change this idea? Will writing parallel code mean that in the future the opposite is true? To make your program go faster make it do more instead? That being, make it use more cores.

  5. To Petrikero's point: "Fixed number of instructions per second" was perhaps a poor wording. What's important (what I should have stressed) is that there is a maximum number of instructions per sec that can't be exceeded under any circumstances. I'm well aware of the importance of instruction interleaving, pipeline stalls, etc. (which I discussed, in fact, in a MacTech article in 1999; see Vol. 15 No. 6), but I guess the point I'd make is, a bubble sort that executes with no pipeline stalls is still a bubble sort, and the best-interleaved instructions in the world won't make a discrete Fourier transform go any faster.

  6. I totally agree, I have being coding for a sometime now, in different environments, with loads of different projects with various coders, old code.
    And the lessons I learned that I live by, simply are, the less code the better, the less complex the better.

  7. Anonymous10:32 AM

    Very nice post. I think this advise is true even for complex web applications as it is for stand alone programs.

    In many web applications, there is an immense amount of data being passed around. Dealing effectively with that data can speed up the application significantly.

    I have very often seen huge trees of data being build just to be able to use some nodes from them. These trees are send all around the code where they are parsed for extracting a little data out of them.

  8. @Matthew above:

    Regarding parallel programming, the same principle holds true - less work generally means quicker turnaround.

    There are somewhat non-intuitive subtleties that need to be understood though, just as the cache-miss issue mentioned above.

    A similar maxim for parallel computing would probably be "To work quickly, talk less", because shared-state can nullify much of the power that a parallel environment can provide.

  9. Anonymous10:50 AM

    From the very first days I started to code I wanted to write the fastest code. I didn't have much of an idea what really would happen but I always challenged myself to write the shortest possible code. I still have that need in me. But still don't know whether it makes a big change. But I enjoy what I do & believe that I at least save on byte by writing short code.

    1. This comment has been removed by the author.

    2. In other words I take out the things that are not necessary and make that code for that purpose only and then compress it to remove the spaces and other stuff.

  10. This comment has been removed by a blog administrator.

  11. @Matthew:

    The other way of thinking about multi-core processing is that by balancing the load between cores, you're making each core do the minimum possible. Hence, make each core do less.

    I agree with Fansipans that 'talking less' would also be very important.

  12. Excellent post, I also see this advice as a supporting argument for XP's YAGNI principle.

  13. dear friends this is the first thing they teach on class Algorithm 101

  14. @ Liquid... wow. That's a really cool UI!

  15. This comment has been removed by the author.

  16. The greatest optimization is simply to return nil in all cases. Bypass all of your jumbled spagetti code, all of your object oriented layers, and just cut to the chase and return nil. You know you want to.

  17. Some good folks I once worked with introduced me to the magic 'optimize' button available on every keyboard - the delete key.

  18. Anonymous4:28 PM

    Rubber duck has nothing to say.

  19. Anonymous4:38 PM

    Early on in my days at Nortel, a few former NeXT developers dropped by and one of them said: "Most tough problems you come across in programming can be solved by adding an extra layer of abstraction."

    Similar to yours this seems somewhat obvious but also like yours is something that has resonated with me ever since.

  20. Do the slow stuff outside of loops. especially with database apps, if you have some processing based on data, load what you can into arrays first then go into the loop, once you exit the loop you can reaccess the database.

  21. This post has really made me think about optimizations and reflect a bit on my practices. There are a lot of ways you can optimize beyond just having fewer instructions, particularly around memory access, but nonetheless this was quite insightful.

  22. well have you considered taking up a course on computer algorithms in a university?

    this is a course dedicated to the idea of optimizing code and the fundamental idea is: (as the author put it) "To go fast, do less"

    Unfortunately, alot of programmers nowadays despite equipped with a wealth of programming language knowledge, is still lacking this key piece of information.

    don't do O(n!) when you can do it in O(n) :)

  23. this is a good start.

  24. life is better, do less.

  25. Here's one, and as far as I'm concerned -- I made it up myself.
    "" Use your reasoning and problem-solving skills. ""

    And part of that is "Know your stuff". If you don't know how the hardware you build on with works, you have no hope of optimizing, whatsoever.

    Thanks for the article.

  26. To Matthew & Kas: I think we all agree on it that "making the CPU do less" is wonderful advice in general. I suppose my actual beef is in that I don't think the complicated field of software optimization can be meaningfully summarized with a single phrase. Also, the more general you interpret the meaning of the phrase to be, the more it'll become to be as helpful as "just make it go faster".

    While being witty, and correct (they way I interpret it), I just wanted to point out that it misses on a whole lot of things that are involved in optimizing software. There are even cases where you specifically add more code and executed instructions to make your program go faster overall.

    An example of this would be a bunch of operations (in multiple passes) being performed on data which is stored in a linked list. As we all know, linked lists can be horribly incoherent with their memory accesses, so converting the data into an array on which the processing is done. Instead of suffering the cache misses on all passes, we bite the bullet once (with the conversion) and then do multiple passes on extremely coherently stored data.

    Anyway, I think we're all right here, just looking at the thing from a somewhat different angle. The article talks about optimization philosophy, whereas I'm more interested in practical advice. Oh, and I did like the article, too! :)

  27. very great mantra! hehehe ...

  28. you can also make code run fast by making cpu work all the time :)

  29. Nikolai5:27 PM

    Making the CPU do less does not always mean writing less code. Very little code can make the CPU do a lot.

  30. I like the way this is a short, to-the-point paradigm.

    Also, your mentor reminds me of a friend of mine in school. He used to program recursive calls to the function itself (and other tricks like that), making loops vanish. His stuff in ZX Spectrum Basic was faster than my feeble attempts in Z80 assembly. (We were 11 years old at the time -- he set the standard for what I still think is a "real" programmer or developer, and still makes me think I'd never be anything more than a hack, at best. He later went on to a PhD in quantum physics, so maybe I'm being unfair on myself comparing).

    But, why wouldn't this apply to "complex, orchestrated web apps or anything with fancy dependencies"? The main difference is that you then have various ways of throwing more hardware at it, which might actually be more cost-effective. But it still feels like the kind of cheating that will eventually get you into trouble. The main problem with many complex, orchestrated web apps is that there is too much code...

  31. Although it may seem evident once you read it, the advice makes a lot of sense and it's really interesting to think that way.

    Thanks for the article ;)

  32. Anonymous3:27 AM





    switch ($a)
    case $b
    case $c

    Is this an example of making the cpu do less ?

  33. Anonymous8:02 PM

    What i found, is that the key to optimization is finding where your code spends its time.
    There may be many many inefficient places in your program/library/class/code/whatever, and 99% of them don't matter zilch, because they are rarely (some of them probably never) called, the loops are short, they are overshadowed by IO, etc.
    but then you may have one innocent-looking line that happens to waste tons of time.
    Optimizing this one line might be trivial. finding the actual line that needs optimization is the real sleuth work.
    just my 2c...

  34. The author of mailinator outlines how he is able to just run a single machine to handle the huge amount of emails coming in daily while similar services are forced to throw ever more machines to the problem.

    It's quite an interesting read. I hope you'll enjoy it.

  35. Anonymous2:11 AM

    nice post mate......


  36. This comment has been removed by the author.

  37. Anonymous3:17 AM

    Part 1: (Meta-Data of thinking while programming)

    Completely true on the mantra. It gives you more thinking before doing.

    Part 2: (While programming)
    Lot of people write good and bad code, same person can write good or bad. Today world have moved to Functional programming. This is paradigm should get into the thinking process as we have moved to multi core processors.

    Do less, do right and do immutable.


Add a comment. Registration required because trolls.