> before getting into the diff of the binary given we can use objdump (I > understand it only now sorry) I suggest we compare function sizes with > objdump -t. I'm not sure I follow, so let me be more explicit: The attached archive contains the thread.o files from both runs, renamed to 'thread-foptimize-sibling-calls.o' and 'thread-fno-optimize-sibling-calls.o', respectively. This is mostly in case someone wants to generate their own dumps. Next, there's two text files for each that were generated using 'objdump -d -S' as Eli suggested; 'foptimize-sibling-calls.txt' and 'fno-optimize-sibling-calls.txt'. There's also a diff of the two in 'diff.txt', generated with 'diff -ubBw'. I noticed that in the diff, quite a lot of differences simply come down to addresses, so I edited the objdumps of both files by hand by replacing the addresses with ****. Those files are 'fno-optimize-sibling-calls-addr.txt' and 'foptimize-sibling-calls-addr.txt'. This greatly reduced their diff, which is provided in 'diff-addr.txt'. Now, as for objdump -t: I've attached the dumps and diff in this mail. > Also I'm assuming we are 100% sure the culprint is thread.o, given the > bug looks not very reproducible I'd repeat the test a couple of times to > be super sure we have identified the culprint. I did run it several times, I found it by doing a binary search over the .c files in the src folder (i.e. I compiled half the .c files with the optimization and half of them without it, then repeated with the succeeding half). I can't recall a single run where the build succeeded when thread.c was compiled with -foptimize-sibling-calls. Conversely, the build so far never failed when thread.c was compiled with -fno-optimize-sibling-calls.