Hi all, and thanks for the great work being put into emacs! Now that wonderful lexical scope has been added, it is not very difficult adding tail-call elimination to byte-compiled code. So I've tried to do, just that. The implementation has been made with two new bytecode opcodes, byte-tail-call and byte-self-tail-call. byte-tail-call will allow tail-call elimination when calling any lexically-scoped byte-compiled function from any byte-compiled function. ------ byte-self-tail-call will allow tail-call elimination to itself, so e.g.: ; -*- lexical-binding: t (require 'benchmark) (defun f (x accum) (if (> x 0) (f (1- x) (+ x accum)) accum)) (defun g (x accum) (while (> x 0) (setq accum (+ x accum) x (1- x))) accum) (mapc #'byte-compile (list #'f #'g)) (benchmark-run-compiled 10 (f 1000000 0)) (benchmark-run-compiled 10 (g 1000000 0)) will on my setup even make f some 8% faster than g! ------- byte-tail-call allows mutually tail-recursive functions like e.g: (defun e (n) (if (= n 0) t (o (1- n)))) (defun o (n) (if (= n 0) nil (e (1- n)))) (mapc #'byte-compile (list #'e #'o)) (o 10000000) -> nil (e 10000000) -> t but is a bit slower than byte-self-tail-call. ------ self tail recursive functions has the following little problem: (f 1000000 0) (let ((y (symbol-function 'f))) (fset 'f (lambda (_a _b) -1)) (funcall y 1000000 1)) Where the interpreted behaviour give -1, but the byte-compiled wil be 500000500001. I don't think that is ultimately very serious though. ---- Another problem is that there is a little bit more difference in characteristic between interpreted and byte-compiled code, as interpreted code will quickly read max-eval-depth. I don't see any easy way out of that now tho, and tail-recursion is a very desirable thing for me and likely many others. The patch as is now, will also remove BYTE_CODE_SAFE and BYTE_CODE_METER. They made it more difficult for me to understand what actually went on in bytecode.c. If someone needs them I will gladly add them back. I did try to put up a heap-allocated byte-code stack so to optimize non-tail-recursive inter-byte-code calls avoiding copying the arguments on the byte-stack. This unfortunately gave a small but significant performance reduction, maybe due to the C-stack consistently being in the processor's cache. Also I'm not very fond of byte-compile--tail-position variable, and would rather add it as an argument to the 'byte-compile handlers. I have a patch that does just that along with disbanding byte-compile--for-effect and adding that as an argument along. The only problem is that some backward-compatibility may be broken, but I don't know how much external code is really adding their own 'byte-compile handlers. Would there be any problems with such a patch? In addition this little hunk has been included in the patch, which solves a problem where #'byte-compile would compile lexically-bound functions as though they were dynamically bound. @@ -2503,15 +2515,16 @@ (defun byte-compile (form) (when (symbolp form) (unless (memq (car-safe fun) '(closure lambda)) (error "Don't know how to compile %S" fun)) - (setq fun (byte-compile--reify-function fun)) - (setq lexical-binding (eq (car fun) 'closure))) + (setq lexical-binding (eq (car fun) 'closure)) + (setq fun (byte-compile--reify-function fun))) (unless (eq (car-safe fun) 'lambda) (error "Don't know how to compile %S" fun)) ;; Expand macros. (setq fun (byte-compile-preprocess fun)) Kind Regards Troels Nielsen