Alan Mackenzie writes: > On my Ryzen, I'm seeing a 50% penalty. :-( (Admittedly that's > comparing the year old branch to current master. I suppose I should > build the correct comparable revision and try again.) This suggests > that the branch prediction logic isn't present (or isn't active) on the > Ryzen. This is very strange. You cerntaly have to compare branches from the same epoch. I pretty sure in the last year Paul pushed changes to the inline policy with some measureble effect on performance. >> Interestingly with the __builtin_expect trick applied exec time gets >> back to 50.65s. > > How do you do this? I couldn't make much sense of the documentation of > __builtin_expect. :-( I attach the very simple patch I tried. Basically the compiler has an euristic branch predictor (in GCC predict.c) that is used to order the final basic block output. The wanted outcome is to have the most likely execution line as sequential, this on modern CPUs to maximize the front-end bandwidth. "__builtin_expect" is just a strong hint to this predictor. >> We could probably find a benchmark that better highlights the difference >> (this is potentially dominated by cache misses while pointer chasing the >> list) but is it worth? > > Could I ask you to do the following timing. > > Evaluate the following (e.g. in *scratch*): > > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > (defmacro time-it (&rest forms) > "Time the running of a sequence of forms using `float-time'. > Call like this: \"M-: (time-it (foo ...) (bar ...) ...)\"." > `(let ((start (float-time))) > ,@forms > (- (float-time) start))) > > (defun time-scroll (&optional arg) > (interactive "P") > (message "%s" > (time-it > (condition-case nil > (while t > (if arg (scroll-down) (scroll-up)) > (sit-for 0)) > (error nil))))) > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > > , visit .../emacs/src/xdisp.c, and do M-: (time-scroll). This scrolls > through the buffer and prints a timing in the minibuffer. (N.B. to run > this again, type something at BOB and undo it, thus marking the > fontification as stale.) > > I'm seeing 19.4s vs. 22.2s, which is around 15% difference. :-( I get 19.30 sec against 16.65 that is 15% difference here too. This is extremely interesting and would be worth profiling. I bet on the GC for this! (Note I'm notoriously wrong when speculating on benchmarks :) Regards Andrea -- akrl@sdf.org