When code is suspiciously fast: adventures in dead code elimination

Part of a recent assignment for one of my classes involved calculating the Fibonacci sequence both recursively and iteratively and measuring the speed of each method. (BONUS: For a fun diversion, here is a paper I wrote about using the Golden Ratio, which is closely related to the Fibonacci sequence, as a base for a number system). In addition, we were supposed to pass the actual calculation as a function pointer argument to a method that measured the execution time.

The task was fairly straight forward, so I fired up Visual Studio 2015 and got to work. I usually target x64 during development (due to some misguided belief that the code will be faster), and when I ran the code in release mode I received the following output as the time needed to calculate the 42nd Fibonacci number:

Recursive: 0.977294758 seconds
Iterative: 0.000000310 seconds

Since calculating $F_{42}$ through naive recursion requires ~866 million function calls, this pretty much jived with my expectations. I was ready to submit the assignment and close up shop, but I decided it’d be safer to submit the executable as as 32-bit application. I switched over to x86 in Visual Studio, and for good measure ran the program again.

Recursive: 0.000000000 seconds
Iterative: 0.000000311 seconds

Well then. That was… suspiciously fast. For reference, here is (a stripped down version of) the code I was using.

In debug mode the code took the expected amount of time; only the release build targeting x86 was exhibiting the seemingly blazingly fast performance. What was happening here? Some constexpr magic resulting in the compiler precomputing the answer? An overly aggressive reordering of the now() calls?

To figure out the answer I opened the executable in IDA and started poking around.

Start of main() on x86 generated by Visual Studio 2015
Start of main() on x86 generated by Visual Studio 2015

No wonder the code took almost no time — we’re simply measuring the time it takes to measure the lea instruction! The next section of code appeared to be the fib_iterative  function:

Inlined fib_iterative function
Inlined fib_iterative function

It would appear that a function pointer is no barrier to Visual Studio’s inlining; measure_execution_time  never explicitly appears as a discrete subroutine. Regardless, the inlined assembly fib_iterative  is about has straightforward as possible. Over on x64 the code appears even simpler (all code was compiled with /O2).

Start of main() on x64 generated by Visual Studio 2015
Start of main() on x64 generated by Visual Studio 2015

The function pointer inlining is gone, replaced with the more or less expected code, i.e. load the address of the function into an argument register and then call measure_execution_time .

So what’s the deal here? Where the heck did fib_recursive go on x86? I believe what we’re seeing is an unexpected application of dead code elimination. On Visual Studio the assert  macro is #define assert(expression) ((void)0)  in release mode, meaning the check that the return is equal to F_42  turns into nothing!

Since the return of fib_recursive (now) isn’t used, and the function itself simply does trivial math (besides calling itself), the compiler has decided it serves no purpose. What’s interesting is that the compiler did not make the same determination for fib_iterative . Given the choice between the two I would have assumed that fib_iterative , with its constant sized loop, would be easier to analyze than the recursive structure of fib_recursive . What’s even weirder is that this only happens on x86, not x64.

After modifying the code to display the the result of the functions with std::cout the problem went away. The moral of the story is that if you’re doing some performance unit testing make sure that your functions touch the outside world in some way; asserts aren’t always enough. Otherwise, the compiler may spontaneously decide to eliminate your code all together (and it may be platform dependent!), giving the illusion of incredible speed and cleverness on your part 🙂

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

10 thoughts on “When code is suspiciously fast: adventures in dead code elimination”

  1. I’d guess the reason for the recursive one being easier to eliminate is that it doesn’t have any assignments at all.

  2. Something I learned the hard way, too: You always have to make double sure your assertions have no side effects. Your assertion obviously had a side effect. Even if it was hard to spot.

    Good read!

  3. Clean, pretty code, can debug and down to assembly, and your assignment is Fibonacci?? Seriously? Apply for a PhD or something dude. You are wasting your talent 🙂

  4. You don’t even need IDA, just set a breakpoint in Visual Studio inside the fibonacci function and one at the start of main, you’ll notice that the fib function never gets executed (and visual studio has a nice warning on the breakpoint that it cannot associate that line of code with the assembly code)

  5. Since assert() is defined as a macro, and in c/c++ macros are expanded before the compiler gets to see the code, in release mode from the compilers point of view, the output of both fib_recursive and fib_iterative where in fact never used.

    So what the compiler saw from what you wrote was indeed dead code. As to why it decided to only dead code eliminate the recursive version, that is likely a question only MS can answer.

    Moral, don’t use the assert() macro as your only use of a result of a computation. That was never the intent of assert(), and given its implementation as a macro you get unexpected side effects when you use it in that manner.

  6. Your comparison is useless. The recursive version of the function uses an exponential algorithm with run time O(2**n) and the iterative version of the function uses a linear algorithm with run time O(n). That explains the difference in your first reported benchmark. The rest of your explanation boils down to “the compiler did what I told it to do” which was different that what you expected it to do.

    1. The comparison was NOT useless, the whole point of the exercise was to benchmark the difference in performance of the two implementations. In fact, you contradicted your first sentence by explaining the purpose in your second sentence.

  7. Is there a difference between how visuaal Studio 2013 and 2015 perform DCE? I have code with a memset(buf,0, n); at the end of a function, to zeroise a sensitive buffer. It’s optimized out when compiled on VS13 but not when compiled on VS15. This is good in principle but it would be nice to know when security sensitive information is or is not optimized out, as to not be caught unaware in another context.

Leave a Reply

Your email address will not be published.