> understand it only now sorry) I suggest we compare function sizes with
> objdump -t.

I'm not sure I follow, so let me be more explicit: The attached archive

contains the thread.o files from both runs, renamed to

'thread-foptimize-sibling-calls.o' and

'thread-fno-optimize-sibling-calls.o', respectively. This is mostly in

case someone wants to generate their own dumps.

Next, there's two text files for each that were generated using 'objdump

-d -S' as Eli suggested; 'foptimize-sibling-calls.txt' and

'fno-optimize-sibling-calls.txt'. There's also a diff of the two in

'diff.txt', generated with 'diff -ubBw'.

I noticed that in the diff, quite a lot of differences simply come down

to addresses, so I edited the objdumps of both files by hand by

replacing the addresses with ****. Those files are

'fno-optimize-sibling-calls-addr.txt' and

'foptimize-sibling-calls-addr.txt'. This greatly reduced their diff,

which is provided in 'diff-addr.txt'.

Now, as for objdump -t: I've attached the dumps and diff in this mail.

> Also I'm assuming we are 100% sure the culprint is thread.o, given the
> bug looks not very reproducible I'd repeat the test a couple of times to
> be super sure we have identified the culprint.

I did run it several times, I found it by doing a binary search over the

.c files in the src folder (i.e. I compiled half the .c files with the

optimization and half of them without it, then repeated with the

succeeding half). I can't recall a single run where the build succeeded

when thread.c was compiled with -foptimize-sibling-calls. Conversely,

the build so far never failed when thread.c was compiled with

-fno-optimize-sibling-calls.