Monday, March 23, 2009

How to write fast code

There was a time, early in my programming career, when I needed to rewrite a particular program (a very small one) to make it run faster. I was quite new to programming and thought that the way to get something to run faster was to rewrite it in assembly. In those days, you could unroll a loop in assembly and pretty much count on getting a worthwhile speedup, if it was a tight loop to begin with.

Fortunately, I had a fabulous mentor in those days, a coder with wisdom and experience far beyond his years. The person in question was a first-class code ninja and a master circuit designer, a genius of Woz-like proportions. Silicon obeyed him the way marble obeyed Michelangelo.

When it came to code, John could do astounding things. He could optimize (and did optimize) virtually any algorithm for any situation, and do it in so little code that you'd sit there studying the printout, wondering where the heck the algorithm went! I remember John had this peculiar way of making loops vanish, for example. They'd turn into table-lookups or recursion or self-modifying code, or some combination of the three.

One day my mentor asked me what I was working on and I told him. I mentioned that I was frantically searching for a way to speed up my little program. I described a few of the things I'd tried so far. He listened intently.

When I was done talking, John gave me some of the most profound advice any programming expert has ever given me. (It was profound for me, at the time. Maybe it'll be stupid-sounding to you.)

"The CPU," he said, "runs at a certain speed. It can execute a fixed number of instructions per second, and no more. There is a finite limit to how many instructions per second it can execute. Right?"

"Right," I said.

"So there is no way, really, to make code go faster, because there is no way to make instructions execute faster. There is only such a thing as making the machine do less."

He paused for emphasis.

"To go fast," he said slowly, "do less."

To go fast, do less. Do less; go fast. Yes, of course. It makes perfect sense. There's no other way to make a program run faster except to make it do less. (Here, when I say "program," I'm not talking about complex, orchestrated web apps or anything with fancy dependencies, just standalone executables in which there's a "main loop.")

Key takeaway: Don't think in terms of making a slow piece of code run faster. Instead, think in terms of making it do less.

In many cases, doing less means using a different algorithm. Then again, it may be as simple as inserting a few if-elses to check for a few trivial (but frequently encountered) "special cases" and return early, before entering a fully-generalized loop.

It may mean canonicalizing your data in some way before passing it to the main routine, so that the main routine doesn't have to include code that checks for corner cases.

The tricks are endless, but they end up with the CPU doing less, not more; and that's the key.

The "go fast do less" mantra has been a valuable one for me, paying off in many ways, in many situations, over the years. It has helped me understand performance issues in a different kind of way. I'm grateful to have been exposed to that concept early in my career. So I provide it here for you to use (or not use) as you see fit.

Maybe you received a similarly influential piece of advice early in your career? If so, please leave a comment. I'd love to hear about it.

53 comments:

  1. While I agree on a lot of your points, and algorithmic optimizations (i.e., making the CPU do less) are arguably the most important optimizations you can make to a program, you're missing a whole family of closer-to-the-metal tricks that can be done.

    The CPU by no means executes a fixed amount of instructions per second. The things mostly affecting the CPU's performance (in terms of instructions per second) are cache misses and pipeline stalls (especially mis-predicted branches).

    Just by re-organizing your data such it gets accessed coherently may gain you immense speed-ups. A great example is quicksort vs. radix exchange sort. Quicksort is often faster even though it is in O(n log n) operation while a radix exchange executes in O(n) -- this happens even with a relatively large n. The reason is that the quicksort partitions its data into chunks that fit into the cache as is therefore a whole lot more coherent with its memory accesses.

    ReplyDelete
  2. very very true.
    I have a Nokia tablet and have been working towards creating the UI I wanted it to have when I got the device.

    The system by default runs a variant of linux called maemo and has the full x11 window stack running on a little cpu (upto 400mhz) and slow gpu and people said it simply was not possible to get flashy effects out of it :)

    I have had to start literally from scratch and have made sure that I obtain the desired effect whilst staying within the extremely limited timing overhead available to me.

    the results have left people breathless :)

    http://www.youtube.com/watch?v=iMXp0Dg_UaY


    moral of the story, your friend is absolutely right, remove cruft and build on small concrete working fragments and your system should optimize itself :)

    ReplyDelete
  3. I've always found deep insight in the mantra:
    "The fastest code is the code that never executes." I guess that's essentially the same thing.

    ReplyDelete
  4. Sorry, just browsed on in from somewhere.

    Anyways, yes petrikero very true. I do agree so don't think my comment is to say you are wrong.

    I was just thinking though that couldn't that also be looked at as another case of making the CPU do less? It's just less cache misses and pipeline stalls instead of less instructions, but nevertheless it's less of something.

    I also wonder if multiple cores change this idea? Will writing parallel code mean that in the future the opposite is true? To make your program go faster make it do more instead? That being, make it use more cores.

    ReplyDelete
  5. To Petrikero's point: "Fixed number of instructions per second" was perhaps a poor wording. What's important (what I should have stressed) is that there is a maximum number of instructions per sec that can't be exceeded under any circumstances. I'm well aware of the importance of instruction interleaving, pipeline stalls, etc. (which I discussed, in fact, in a MacTech article in 1999; see Vol. 15 No. 6), but I guess the point I'd make is, a bubble sort that executes with no pipeline stalls is still a bubble sort, and the best-interleaved instructions in the world won't make a discrete Fourier transform go any faster.

    ReplyDelete
  6. I totally agree, I have being coding for a sometime now, in different environments, with loads of different projects with various coders, old code.
    And the lessons I learned that I live by, simply are, the less code the better, the less complex the better.

    ReplyDelete
  7. Very nice post. I think this advise is true even for complex web applications as it is for stand alone programs.

    In many web applications, there is an immense amount of data being passed around. Dealing effectively with that data can speed up the application significantly.

    I have very often seen huge trees of data being build just to be able to use some nodes from them. These trees are send all around the code where they are parsed for extracting a little data out of them.

    ReplyDelete
  8. @Matthew above:

    Regarding parallel programming, the same principle holds true - less work generally means quicker turnaround.

    There are somewhat non-intuitive subtleties that need to be understood though, just as the cache-miss issue mentioned above.

    A similar maxim for parallel computing would probably be "To work quickly, talk less", because shared-state can nullify much of the power that a parallel environment can provide.

    ReplyDelete
  9. From the very first days I started to code I wanted to write the fastest code. I didn't have much of an idea what really would happen but I always challenged myself to write the shortest possible code. I still have that need in me. But still don't know whether it makes a big change. But I enjoy what I do & believe that I at least save on byte by writing short code.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. In other words I take out the things that are not necessary and make that code for that purpose only and then compress it to remove the spaces and other stuff.

      Delete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. @Matthew:

    The other way of thinking about multi-core processing is that by balancing the load between cores, you're making each core do the minimum possible. Hence, make each core do less.

    I agree with Fansipans that 'talking less' would also be very important.

    ReplyDelete
  12. Excellent post, I also see this advice as a supporting argument for XP's YAGNI principle.

    ReplyDelete
  13. dear friends this is the first thing they teach on class Algorithm 101

    ReplyDelete
  14. @ Liquid... wow. That's a really cool UI!

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. The greatest optimization is simply to return nil in all cases. Bypass all of your jumbled spagetti code, all of your object oriented layers, and just cut to the chase and return nil. You know you want to.

    ReplyDelete
  17. Some good folks I once worked with introduced me to the magic 'optimize' button available on every keyboard - the delete key.

    ReplyDelete
  18. Rubber duck has nothing to say.

    ReplyDelete
  19. Early on in my days at Nortel, a few former NeXT developers dropped by and one of them said: "Most tough problems you come across in programming can be solved by adding an extra layer of abstraction."

    Similar to yours this seems somewhat obvious but also like yours is something that has resonated with me ever since.

    ReplyDelete
  20. Do the slow stuff outside of loops. especially with database apps, if you have some processing based on data, load what you can into arrays first then go into the loop, once you exit the loop you can reaccess the database.

    ReplyDelete
  21. This post has really made me think about optimizations and reflect a bit on my practices. There are a lot of ways you can optimize beyond just having fewer instructions, particularly around memory access, but nonetheless this was quite insightful.

    ReplyDelete
  22. well have you considered taking up a course on computer algorithms in a university?

    this is a course dedicated to the idea of optimizing code and the fundamental idea is: (as the author put it) "To go fast, do less"

    Unfortunately, alot of programmers nowadays despite equipped with a wealth of programming language knowledge, is still lacking this key piece of information.

    don't do O(n!) when you can do it in O(n) :)

    ReplyDelete
  23. this is a good start.

    http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/CourseHome/

    ReplyDelete
  24. life is better, do less.

    ReplyDelete
  25. Here's one, and as far as I'm concerned -- I made it up myself.
    "" Use your reasoning and problem-solving skills. ""

    And part of that is "Know your stuff". If you don't know how the hardware you build on with works, you have no hope of optimizing, whatsoever.

    Thanks for the article.

    ReplyDelete
  26. To Matthew & Kas: I think we all agree on it that "making the CPU do less" is wonderful advice in general. I suppose my actual beef is in that I don't think the complicated field of software optimization can be meaningfully summarized with a single phrase. Also, the more general you interpret the meaning of the phrase to be, the more it'll become to be as helpful as "just make it go faster".

    While being witty, and correct (they way I interpret it), I just wanted to point out that it misses on a whole lot of things that are involved in optimizing software. There are even cases where you specifically add more code and executed instructions to make your program go faster overall.

    An example of this would be a bunch of operations (in multiple passes) being performed on data which is stored in a linked list. As we all know, linked lists can be horribly incoherent with their memory accesses, so converting the data into an array on which the processing is done. Instead of suffering the cache misses on all passes, we bite the bullet once (with the conversion) and then do multiple passes on extremely coherently stored data.

    Anyway, I think we're all right here, just looking at the thing from a somewhat different angle. The article talks about optimization philosophy, whereas I'm more interested in practical advice. Oh, and I did like the article, too! :)

    ReplyDelete
  27. very great mantra! hehehe ...

    ReplyDelete
  28. you can also make code run fast by making cpu work all the time :)

    ReplyDelete
  29. Nikolai5:27 PM

    Making the CPU do less does not always mean writing less code. Very little code can make the CPU do a lot.

    ReplyDelete
  30. I like the way this is a short, to-the-point paradigm.

    Also, your mentor reminds me of a friend of mine in school. He used to program recursive calls to the function itself (and other tricks like that), making loops vanish. His stuff in ZX Spectrum Basic was faster than my feeble attempts in Z80 assembly. (We were 11 years old at the time -- he set the standard for what I still think is a "real" programmer or developer, and still makes me think I'd never be anything more than a hack, at best. He later went on to a PhD in quantum physics, so maybe I'm being unfair on myself comparing).

    But, why wouldn't this apply to "complex, orchestrated web apps or anything with fancy dependencies"? The main difference is that you then have various ways of throwing more hardware at it, which might actually be more cost-effective. But it still feels like the kind of cheating that will eventually get you into trouble. The main problem with many complex, orchestrated web apps is that there is too much code...

    ReplyDelete
  31. Although it may seem evident once you read it, the advice makes a lot of sense and it's really interesting to think that way.

    Thanks for the article ;)

    ReplyDelete
  32. Anonymous3:27 AM

    if($a==$b)
    {

    }elseif($a==$c)
    {

    }else{

    }

    switch ($a)
    {
    case $b
    break;
    case $c
    break;
    default:
    break;
    }

    Is this an example of making the cpu do less ?

    ReplyDelete
  33. Anonymous8:02 PM

    What i found, is that the key to optimization is finding where your code spends its time.
    There may be many many inefficient places in your program/library/class/code/whatever, and 99% of them don't matter zilch, because they are rarely (some of them probably never) called, the loops are short, they are overshadowed by IO, etc.
    but then you may have one innocent-looking line that happens to waste tons of time.
    Optimizing this one line might be trivial. finding the actual line that needs optimization is the real sleuth work.
    just my 2c...

    ReplyDelete
  34. The author of mailinator outlines how he is able to just run a single machine to handle the huge amount of emails coming in daily while similar services are forced to throw ever more machines to the problem.

    http://mailinator.blogspot.com/2007/01/architecture-of-mailinator.html

    It's quite an interesting read. I hope you'll enjoy it.

    ReplyDelete
  35. It won’t be wrong to say that the IT sector has made the world stand up and take notice of the countries like India. India’s IT poweress is one reason why such massive deals like the Tata Chorus deal and the Hindalco Novelis deal could get shape and turn into India’s favor. Had it not been for the IT sector, there were chances that these multi million dollar deals would not have matured the way they have. http://www.infysolutions.com/resources/resources.html

    ReplyDelete
  36. Hi these are the interesting points that how to write a fast code while web developing or Web Designing
    so i like thus post and want to thank you so much on sharing this information with us!

    ReplyDelete
  37. Anonymous10:58 AM

    thanks for sharing this site. there are various kinds of ebooks available from here

    http://feboook.blogspot.com

    ReplyDelete
  38. I think many of us how visited this post is some what agree to this point "To go fast, do less. Do less; go fast" but this is the stage of experts not newbies. Some excellent points by an expert thanks for them Website Development UK

    ReplyDelete
  39. it's good to see this information in your post, i was looking the same but there was not any proper resource, thanx now i have the link which i was looking for my research.

    UK Dissertations Help

    ReplyDelete
  40. This comment has been removed by the author.

    ReplyDelete
  41. Awesome information. I am really surprised with this topic. Keep up the good work and post more here to read. Business Logo Design

    ReplyDelete
  42. Web Designing Company in Noida
    We hire Web designers in our company who have knowledge of HTML, CSS, Java Script, J Quarry for Website designing in Noida, India.
    Keywords: -
    Web Designing Company in Noida
    Contact Us
    Brainguru Technologies Pvt Ltd.
    C -17, First Floor
    Sector - 2, Noida
    Call Us: +91-8010010000, 0120-4299500
    Email us: info@brainguru.in
    http://brainguru.in/career/web-designing-company-noida-india.html

    ReplyDelete


  43. Now go take a look on the street, from youth to middle age woman, most likely a pair of elegant high heel pedal, so as to bring out their charm. We can also see high heels style. High heels is the darling pandora australia the female, is the girls ' childhood dream, after implementation becomes a nightmare family, Cook, husband and children, this pair of heels with a little bit of grind.

    High heels, pandora charms as well as some beautiful scenery on solemn occasions such as wedding line, men gaze from her inviting high heels up and see all this woman. Curves and slender with beautiful depictions of a woman of instinct, and slender fingers over feet is their favorite.

    ReplyDelete
  44. Stephen Stapinski


    Really your blog is very interesting.... it contains great and unique information. I enjoyed to visiting your blog. It's just amazing.... Thanks very much for the share.

    ReplyDelete
  45. It is really nice to visit your article. I could get lot more knowledge from your article.. thanks for sharing.

    ReplyDelete

Add a comment.

Note: Only a member of this blog may post a comment.