> before getting into the diff of the binary given we can use objdump (I
> understand it only now sorry) I suggest we compare function sizes with
> objdump -t.

I'm not sure I follow, so let me be more explicit: The attached archive
contains the thread.o files from both runs, renamed to
'thread-foptimize-sibling-calls.o' and
'thread-fno-optimize-sibling-calls.o', respectively. This is mostly in
case someone wants to generate their own dumps.

Next, there's two text files for each that were generated using 'objdump
-d -S' as Eli suggested; 'foptimize-sibling-calls.txt' and
'fno-optimize-sibling-calls.txt'. There's also a diff of the two in
'diff.txt', generated with 'diff -ubBw'.

I noticed that in the diff, quite a lot of differences simply come down
to addresses, so I edited the objdumps of both files by hand by
replacing the addresses with ****. Those files are
'fno-optimize-sibling-calls-addr.txt' and
'foptimize-sibling-calls-addr.txt'. This greatly reduced their diff,
which is provided in 'diff-addr.txt'.

Now, as for objdump -t: I've attached the dumps and diff in this mail.

> Also I'm assuming we are 100% sure the culprint is thread.o, given the
> bug looks not very reproducible I'd repeat the test a couple of times to
> be super sure we have identified the culprint.

I did run it several times, I found it by doing a binary search over the
.c files in the src folder (i.e. I compiled half the .c files with the
optimization and half of them without it, then repeated with the
succeeding half). I can't recall a single run where the build succeeded
when thread.c was compiled with -foptimize-sibling-calls. Conversely,
the build so far never failed when thread.c was compiled with
-fno-optimize-sibling-calls.