On 07/16/2012 08:04 PM, Eli Zaretskii wrote: > we need a facility for doing this > only with a few functions that affect performance. It turns out that when compiling with -O0, always_inline functions are often slower than macros, as the inlined code also contains unnecessary instructions to copy arguments and results. So instead of using always_inline, it's better to do this performance-critical inlining by hand. I did that (patch relative to trunk bzr 109195 attached), inlining enough so that CPU performance improved by 8.7% compared to the current trunk, when compiled with gcc -O0. (This is the same benchmark as before, on x86-64 with GCC 4.7.1.) The performance win is because I inlined a bit more cleverly than the current code does. Like the earlier version, this patch should improve performance slightly in the default-optimization case too, since this patch is identical to the earlier one when default optimization is used. In short, it should take only a relatively small amount of hand-inlining to address the -O0 performance issue.