{"id":1509,"date":"2016-05-05T14:46:25","date_gmt":"2016-05-05T18:46:25","guid":{"rendered":"http:\/\/jeffq.com\/blog\/?p=1509"},"modified":"2016-05-05T15:01:42","modified_gmt":"2016-05-05T19:01:42","slug":"when-code-is-suspiciously-fast-adventures-in-dead-code-elimination","status":"publish","type":"post","link":"http:\/\/jeffq.com\/blog\/when-code-is-suspiciously-fast-adventures-in-dead-code-elimination\/","title":{"rendered":"When code is suspiciously fast: adventures in dead code elimination"},"content":{"rendered":"<p>Part of a recent assignment for one of my classes involved calculating the Fibonacci sequence both recursively and iteratively\u00a0and measuring the speed of each method. (BONUS: For a fun diversion, <a href=\"http:\/\/jeffq.com\/blog\/wp-content\/uploads\/2015\/04\/golden_ratio_base.pdf\">here is a paper<\/a> I wrote about using the Golden Ratio, which is closely related to the\u00a0Fibonacci sequence, as a base for a number system). In addition, we were supposed to pass the actual calculation as a\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Function_pointer\">function pointer<\/a>\u00a0argument to a method that measured the execution time.<\/p>\n<p>The task was fairly straight forward, so I fired up Visual Studio 2015 and got to work. I usually target x64 during development (due to some misguided belief that the code will be faster), and when I ran the code in release mode I received the following output as the time needed to calculate the 42nd\u00a0Fibonacci number:<\/p>\n<p><span class=\"lang:default highlight:0 decode:true crayon-inline \">Recursive: 0.977294758 seconds<\/span><br \/>\n<span class=\"lang:default highlight:0 decode:true crayon-inline \">Iterative: 0.000000310 seconds<\/span><\/p>\n<p>Since calculating $F_{42}$ through naive recursion requires ~866 million function calls, this pretty much jived with my expectations. I was ready to submit the assignment and close up shop, but I decided it&#8217;d be safer to submit the executable as as 32-bit application. I switched over to x86 in Visual Studio, and for good measure ran the program again.<\/p>\n<p><span class=\"lang:default highlight:0 decode:true crayon-inline \">Recursive: 0.000000000 seconds<\/span><br \/>\n<span class=\"lang:default highlight:0 decode:true crayon-inline \">Iterative: 0.000000311 seconds<\/span><\/p>\n<p><!--more--><\/p>\n<p>Well then. That was&#8230;\u00a0suspiciously fast. For reference, here is (a stripped down version of) the code I was using.<\/p>\n<pre class=\"lang:c++ decode:true\">#include &lt;iostream&gt;\r\n#include &lt;iomanip&gt;\r\n#include &lt;chrono&gt;\r\n#include &lt;cassert&gt;\r\n\r\nconstexpr int MAX_FIB_NUM = 42;\r\nconstexpr int F_42 = 267914296;\r\n\r\nint fib_recursive(int n)\r\n{\r\n\tif (n &lt; 2)\r\n\t\treturn n;\r\n\telse\r\n\t\treturn fib_recursive(n - 1) + fib_recursive(n - 2);\r\n}\r\n\r\nint fib_iterative(int n)\r\n{\r\n\tint f_1 = 1, f_2 = 0;\r\n\r\n\tfor (int i = 1; i &lt; n; ++i)\r\n\t{\r\n\t\tint tmp = f_1;\r\n\t\tf_1 = f_1 + f_2;\r\n\t\tf_2 = tmp;\r\n\t}\r\n\r\n\treturn f_1;\r\n}\r\n\r\ndouble measure_execution_time(int(*func)(int))\r\n{\r\n\tauto start = std::chrono::high_resolution_clock::now();\r\n\tint ret = func(MAX_FIB_NUM);\r\n\tauto end = std::chrono::high_resolution_clock::now();\r\n\r\n\tassert(ret == F_42);\r\n\r\n\treturn std::chrono::duration&lt;double&gt;(end - start).count(); \/\/convert to fractional seconds\r\n}\r\n\r\nint main(int argc, char** argv)\r\n{\r\n\tauto recursive_duration = measure_execution_time(fib_recursive);\r\n\tauto iterative_duration = measure_execution_time(fib_iterative);\r\n\r\n\tstd::cout &lt;&lt; std::setprecision(9) &lt;&lt; std::fixed; \/\/up to nanoseconds\r\n\r\n\tstd::cout &lt;&lt; \"Recursive: \\t\" &lt;&lt; recursive_duration &lt;&lt; \" seconds \" &lt;&lt; std::endl;\r\n\tstd::cout &lt;&lt; \"Iterative: \\t\" &lt;&lt; iterative_duration &lt;&lt; \" seconds \" &lt;&lt; std::endl;\r\n\r\n \treturn 0;\r\n}<\/pre>\n<p>In debug mode the code took the expected amount of time; only the release build targeting x86 was exhibiting the seemingly blazingly\u00a0fast performance. What was happening here? Some <a href=\"http:\/\/stackoverflow.com\/questions\/22645551\/calculate-the-fibonacci-number-recursive-approach-in-compile-time-constexpr\">constexpr magic<\/a> resulting in the compiler precomputing the answer? An overly aggressive\u00a0reordering of the <span class=\"lang:default highlight:0 decode:true crayon-inline \">now()<\/span> calls?<\/p>\n<p>To figure out the answer I opened the executable in IDA and started poking around.<\/p>\n<figure id=\"attachment_1529\" aria-describedby=\"caption-attachment-1529\" style=\"width: 550px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/jeffq.com\/blog\/wp-content\/uploads\/2016\/05\/snip_recursive_removed.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1529\" src=\"http:\/\/jeffq.com\/blog\/wp-content\/uploads\/2016\/05\/snip_recursive_removed.png\" alt=\"Start of main() on x86 generated by Visual Studio 2015\" width=\"550\" height=\"379\" \/><\/a><figcaption id=\"caption-attachment-1529\" class=\"wp-caption-text\">Start of main() on x86 generated by Visual Studio 2015<\/figcaption><\/figure>\n<p>No wonder the code took almost no time &#8212; we&#8217;re simply measuring the time it takes to measure the <a href=\"https:\/\/www.aldeid.com\/wiki\/X86-assembly\/Instructions\/lea\">lea<\/a> instruction! The next section of code appeared to be the <span class=\"lang:default highlight:0 decode:true crayon-inline \">fib_iterative<\/span>\u00a0 function:<\/p>\n<figure id=\"attachment_1530\" aria-describedby=\"caption-attachment-1530\" style=\"width: 626px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/jeffq.com\/blog\/wp-content\/uploads\/2016\/05\/snip_iterative_inlined.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1530\" src=\"http:\/\/jeffq.com\/blog\/wp-content\/uploads\/2016\/05\/snip_iterative_inlined.png\" alt=\"Inlined fib_iterative function\" width=\"626\" height=\"313\" \/><\/a><figcaption id=\"caption-attachment-1530\" class=\"wp-caption-text\">Inlined fib_iterative function<\/figcaption><\/figure>\n<p>It would appear that a function pointer is no barrier to Visual Studio&#8217;s inlining; <span class=\"lang:default highlight:0 decode:true crayon-inline \">measure_execution_time<\/span>\u00a0 never explicitly appears as a discrete\u00a0subroutine. Regardless, the inlined assembly <span class=\"lang:default highlight:0 decode:true crayon-inline \">fib_iterative<\/span>\u00a0 is about has straightforward as possible. Over on x64 the code appears even simpler (all code was compiled with \/O2).<\/p>\n<figure id=\"attachment_1532\" aria-describedby=\"caption-attachment-1532\" style=\"width: 602px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/jeffq.com\/blog\/wp-content\/uploads\/2016\/05\/snip_x64.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1532\" src=\"http:\/\/jeffq.com\/blog\/wp-content\/uploads\/2016\/05\/snip_x64.png\" alt=\"Start of main() on x64 generated by Visual Studio 2015\" width=\"602\" height=\"243\" \/><\/a><figcaption id=\"caption-attachment-1532\" class=\"wp-caption-text\">Start of main() on x64 generated by Visual Studio 2015<\/figcaption><\/figure>\n<p>The function pointer inlining is gone, replaced with the more or less expected code, i.e. load the address of the function into an argument register and then call <span class=\"lang:default highlight:0 decode:true crayon-inline \">measure_execution_time<\/span>\u00a0.<\/p>\n<p>So what&#8217;s the deal here? Where the heck did <span class=\"lang:default highlight:0 decode:true crayon-inline \">fib_recursive<\/span> go on x86? I believe what we&#8217;re seeing is an unexpected application of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dead_code_elimination\">dead code elimination<\/a>. On Visual Studio the <span class=\"lang:default highlight:0 decode:true crayon-inline \">assert<\/span>\u00a0 macro is <span class=\"lang:c++ decode:true crayon-inline \">#define assert(expression) ((void)0)<\/span>\u00a0 in release mode, meaning the check that the return is equal to <span class=\"lang:default highlight:0 decode:true crayon-inline \">F_42<\/span>\u00a0 turns into nothing!<\/p>\n<p>Since the return\u00a0of <span class=\"lang:default highlight:0 decode:true crayon-inline \">fib_recursive<\/span>\u00a0(now) isn&#8217;t used, and the function\u00a0itself simply does trivial math (besides calling itself), the compiler has decided it serves no purpose.\u00a0<span style=\"line-height: 1.75;\">What&#8217;s interesting is that the compiler did <\/span><strong style=\"line-height: 1.75;\">not<\/strong><span style=\"line-height: 1.75;\"> make the same determination for <\/span><span class=\"lang:default highlight:0 decode:true crayon-inline \">fib_iterative<\/span><span style=\"line-height: 1.75;\">\u00a0. Given the choice between the two I would have assumed that <\/span><span class=\"lang:default highlight:0 decode:true crayon-inline \">fib_iterative<\/span><span style=\"line-height: 1.75;\">\u00a0, with its constant sized loop, would be easier to analyze than the recursive structure of <\/span><span class=\"lang:default highlight:0 decode:true crayon-inline \">fib_recursive<\/span><span style=\"line-height: 1.75;\">\u00a0. What&#8217;s even weirder\u00a0is that this only happens on x86, not x64.<\/span><\/p>\n<p>After modifying the code to display the\u00a0the result of the functions with <span class=\"lang:default highlight:0 decode:true crayon-inline\">std::cout<\/span> the problem went away. The moral of the story is that if you&#8217;re doing some performance unit testing make sure that your functions touch the outside world in some way; asserts aren&#8217;t always enough. Otherwise, the compiler may spontaneously decide to eliminate your code all together (and it may be platform dependent!), giving the illusion of incredible speed and cleverness on your part \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Part of a recent assignment for one of my classes involved calculating the Fibonacci sequence both recursively and iteratively\u00a0and measuring the speed of each method. (BONUS: For a fun diversion, here is a paper I wrote about using the Golden Ratio, which is closely related to the\u00a0Fibonacci sequence, as a base for a number system). [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/posts\/1509"}],"collection":[{"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/comments?post=1509"}],"version-history":[{"count":35,"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/posts\/1509\/revisions"}],"predecessor-version":[{"id":1547,"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/posts\/1509\/revisions\/1547"}],"wp:attachment":[{"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/media?parent=1509"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/categories?post=1509"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/jeffq.com\/blog\/wp-json\/wp\/v2\/tags?post=1509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}