I improved the benchmarking code by reducing the number of cube computations. Interestingly, this gives a huge improvement of guile-2.9.1 performance while improving less or even worsening performance for other scheme interpreters I've tested.

This could indicate that

1. the guile-2.9.1 optimizer is able to convert the extra code I introduced to avoid unnecessary cube computation (let:s and extra args) to something with low overhead
2. guile-2.9.1 arithmetic is a bit heavy

Anyhow, now guile-1.8 went up to 7.53 s, guile-2.9.1 went down to 0.53 s while python went down to 3.60 s.

Attaching version 2 of the benchmarks.

(BTW, on my machine the previous version is better at provoking a segfault.)