unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] Tail-call elimination in byte-compiled code.
@ 2012-09-20  8:15 Troels Nielsen
  2012-09-20 16:31 ` Stefan Monnier
  0 siblings, 1 reply; 2+ messages in thread
From: Troels Nielsen @ 2012-09-20  8:15 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3665 bytes --]

Hi all, and thanks for the great work being put into emacs!

Now that wonderful lexical scope has been added, it is not very
difficult adding tail-call elimination to byte-compiled code.  So I've
tried to do, just that.

The implementation has been made with two new bytecode opcodes,
byte-tail-call and byte-self-tail-call.
byte-tail-call will allow tail-call elimination when calling
any lexically-scoped byte-compiled function from any
byte-compiled function.

------
byte-self-tail-call will allow tail-call elimination to itself, so e.g.:

; -*- lexical-binding: t
(require 'benchmark)

(defun f (x accum)
       (if (> x 0) (f (1- x) (+ x accum)) accum))

(defun g (x accum)
       (while  (> x 0) (setq accum (+ x accum)
                            x (1- x)))
       accum)

(mapc #'byte-compile (list #'f #'g))

(benchmark-run-compiled 10 (f 1000000 0))
(benchmark-run-compiled 10 (g 1000000 0))

will on my setup even make f some 8% faster than g!

-------
byte-tail-call allows mutually tail-recursive functions like e.g:

(defun e (n) (if (= n 0) t (o (1- n))))
(defun o (n) (if (= n 0) nil (e (1- n))))

(mapc #'byte-compile (list #'e #'o))

(o 10000000) -> nil
(e 10000000) -> t

but is a bit slower than byte-self-tail-call.

------

self tail recursive functions has the following little problem:

(f 1000000 0)

(let ((y (symbol-function 'f)))
     (fset 'f (lambda (_a _b) -1))
     (funcall y 1000000 1))

Where the interpreted behaviour give -1, but the byte-compiled wil be
500000500001.

I don't think that is ultimately very serious though.
----

Another problem is that there is a little bit more difference
in characteristic between interpreted and byte-compiled code,
as interpreted code will quickly read max-eval-depth. I don't
see any easy way out of that now tho, and tail-recursion is a very
desirable thing for me and likely many others.

The patch as is now, will also remove BYTE_CODE_SAFE and
BYTE_CODE_METER. They made it more difficult for me to understand what
actually went on in bytecode.c. If someone needs them I will gladly
add them back.

I did try to put up a heap-allocated byte-code stack so to optimize
non-tail-recursive inter-byte-code calls avoiding copying the
arguments on the byte-stack. This unfortunately gave a small but
significant performance reduction, maybe due to the C-stack
consistently being in the processor's cache.

Also I'm not very fond of byte-compile--tail-position variable, and
would rather
add it as an argument to the 'byte-compile handlers. I have a patch
that does just that
along with disbanding byte-compile--for-effect and adding that as an
argument along.
The only problem is that some backward-compatibility may be broken,
but I don't know
how much external code is really adding their own 'byte-compile
handlers. Would there be
any problems with such a patch?

In addition this little hunk has been included in the patch, which
solves a problem where #'byte-compile would compile lexically-bound
functions as though they were dynamically bound.


@@ -2503,15 +2515,16 @@ (defun byte-compile (form)
         (when (symbolp form)
           (unless (memq (car-safe fun) '(closure lambda))
             (error "Don't know how to compile %S" fun))
-          (setq fun (byte-compile--reify-function fun))
-          (setq lexical-binding (eq (car fun) 'closure)))
+          (setq lexical-binding (eq (car fun) 'closure))
+          (setq fun (byte-compile--reify-function fun)))
         (unless (eq (car-safe fun) 'lambda)
           (error "Don't know how to compile %S" fun))
         ;; Expand macros.
         (setq fun (byte-compile-preprocess fun))

Kind Regards
Troels Nielsen

[-- Attachment #2: tail-call-elimination.diff --]
[-- Type: application/octet-stream, Size: 42682 bytes --]

=== modified file 'lisp/emacs-lisp/byte-opt.el'
--- lisp/emacs-lisp/byte-opt.el	2012-07-26 01:27:33 +0000
+++ lisp/emacs-lisp/byte-opt.el	2012-09-18 18:44:24 +0000
@@ -1316,8 +1316,10 @@
 	 (+ (aref bytes bytedecomp-ptr)
 	    (progn (setq bytedecomp-ptr (1+ bytedecomp-ptr))
 		   (lsh (aref bytes bytedecomp-ptr) 8))))
-	((and (>= bytedecomp-op byte-listN)
-	      (<= bytedecomp-op byte-discardN))
+	((or (and (>= bytedecomp-op byte-listN)
+                  (<= bytedecomp-op byte-discardN))
+             (= bytedecomp-op byte-tail-call)
+             (= bytedecomp-op byte-self-tail-call))
 	 (setq bytedecomp-ptr (1+ bytedecomp-ptr)) ;Offset in next byte.
 	 (aref bytes bytedecomp-ptr))))
 

=== modified file 'lisp/emacs-lisp/bytecomp.el'
--- lisp/emacs-lisp/bytecomp.el	2012-09-13 02:41:46 +0000
+++ lisp/emacs-lisp/bytecomp.el	2012-09-19 08:52:16 +0000
@@ -678,6 +678,8 @@
 ;;    discard (following one byte & 0x7F) stack entries _underneath_ TOS
 ;;    (that is, if the operand = 0x83,  ... X Y Z T  =>  ... T)
 (byte-defop 182 nil byte-discardN)
+(byte-defop 183 0 byte-tail-call)
+(byte-defop 184 0 byte-self-tail-call)
 ;; `byte-discardN-preserve-tos' is a pseudo-op that gets turned into
 ;; `byte-discardN' with the high bit in the operand set (by
 ;; `byte-compile-lapcode').
@@ -796,8 +798,10 @@
                ;; offset is too large for the normal version.
                (byte-compile-push-bytecode-const2 byte-stack-set2 off
                                                   bytes pc))
-              ((and (>= opcode byte-listN)
-                    (< opcode byte-discardN))
+              ((or (and (>= opcode byte-listN)
+                        (< opcode byte-discardN))
+                   (or (= opcode byte-tail-call) 
+                       (= opcode byte-self-tail-call)))
                ;; These insns all put their operand into one extra byte.
                (byte-compile-push-bytecodes opcode off bytes pc))
               ((= opcode byte-discardN)
@@ -923,6 +927,10 @@
 (defvar byte-compile-current-group nil)
 (defvar byte-compile-current-buffer nil)
 
+;;; These are used for making self-tail optimization.  
+(defvar byte-compile-current-lambda-arglist nil)
+(defvar byte-compile-current-lambda-name nil)
+
 ;; Log something that isn't a warning.
 (defmacro byte-compile-log (format-string &rest args)
   `(and
@@ -2033,6 +2041,8 @@
       nil)))
 
 (defvar byte-compile--for-effect)
+(defvar byte-compile--tail-position nil)
+
 
 (defun byte-compile-output-docform (preface name info form specindex quoted)
   "Print a form with a doc string.  INFO is (prefix doc-index postfix).
@@ -2145,7 +2155,7 @@
   (if byte-compile-output
       (let ((form (byte-compile-out-toplevel t 'file)))
 	(cond ((eq (car-safe form) 'progn)
-	       (mapc 'byte-compile-output-file-form (cdr form)))
+	       (mapc #'byte-compile-output-file-form (cdr form)))
 	      (form
 	       (byte-compile-output-file-form form)))
 	(setq byte-compile-constants nil
@@ -2279,7 +2289,7 @@
 (put 'prog1 'byte-hunk-handler 'byte-compile-file-form-progn)
 (put 'prog2 'byte-hunk-handler 'byte-compile-file-form-progn)
 (defun byte-compile-file-form-progn (form)
-  (mapc 'byte-compile-file-form (cdr form))
+  (mapc #'byte-compile-file-form (cdr form))
   ;; Return nil so the forms are not output twice.
   nil)
 
@@ -2288,7 +2298,7 @@
 (defun byte-compile-file-form-with-no-warnings (form)
   ;; cf byte-compile-file-form-progn.
   (let (byte-compile-warnings)
-    (mapc 'byte-compile-file-form (cdr form))
+    (mapc #'byte-compile-file-form (cdr form))
     nil))
 
 ;; This handler is not necessary, but it makes the output from dont-compile
@@ -2360,8 +2370,7 @@
            ;; Shadow existing definition.
            (set this-kind
                 (cons (cons name nil)
-                      (symbol-value this-kind))))
-          )
+                      (symbol-value this-kind)))))
 
     (when (and (listp body)
                (stringp (car body))
@@ -2384,7 +2393,7 @@
           ;; Tell the caller that we didn't compile it yet.
           nil)
 
-      (let* ((code (byte-compile-lambda (cons arglist body) t)))
+      (let ((code (byte-compile-lambda (cons arglist body) t nil name)))
         (if this-one
             ;; A definition in b-c-initial-m-e should always take precedence
             ;; during compilation, so don't let it be redefined.  (Bug#8647)
@@ -2462,6 +2471,9 @@
   (pcase-let* (((or (and `(lambda ,args . ,body) (let env nil))
                     `(closure ,env ,args . ,body)) fun)
                (renv ()))
+    ;; Remove docstring if it exists
+    (when (and (cdr body) (stringp body)) 
+      (setq body (cdr body)))
     ;; Turn the function's closed vars (if any) into local let bindings.
     (dolist (binding env)
       (cond
@@ -2503,15 +2515,16 @@
         (when (symbolp form)
           (unless (memq (car-safe fun) '(closure lambda))
             (error "Don't know how to compile %S" fun))
-          (setq fun (byte-compile--reify-function fun))
-          (setq lexical-binding (eq (car fun) 'closure)))
+          (setq lexical-binding (eq (car fun) 'closure))
+          (setq fun (byte-compile--reify-function fun)))
         (unless (eq (car-safe fun) 'lambda)
           (error "Don't know how to compile %S" fun))
         ;; Expand macros.
         (setq fun (byte-compile-preprocess fun))
         ;; Get rid of the `function' quote added by the `lambda' macro.
         (if (eq (car-safe fun) 'function) (setq fun (cadr fun)))
-        (setq fun (byte-compile-lambda fun))
+        (setq fun (byte-compile-lambda fun nil nil 
+                                       (and (symbolp form) form)))
         (if macro (push 'macro fun))
         (if (symbolp form)
             (fset form fun)
@@ -2587,7 +2600,7 @@
               (lsh rest 7)))))
 
 
-(defun byte-compile-lambda (fun &optional add-lambda reserved-csts)
+(defun byte-compile-lambda (fun &optional add-lambda reserved-csts name)
   "Byte-compile a lambda-expression and return a valid function.
 The value is usually a compiled function but may be the original
 lambda-expression.
@@ -2613,7 +2626,9 @@
                     ;; unless it is the last element of the body.
                     (if (cdr body)
                         (setq body (cdr body))))))
-	 (int (assq 'interactive body)))
+	 (int (assq 'interactive body))
+         (byte-compile-current-lambda-name name)
+         (byte-compile-current-lambda-arglist arglist))
     ;; Process the interactive spec.
     (when int
       (byte-compile-set-symbol-position 'interactive)
@@ -2653,7 +2668,7 @@
                                    ;; closed by now).
                                    (and lexical-binding
                                         (byte-compile-make-lambda-lexenv fun))
-                                   reserved-csts)))
+                                   reserved-csts t)))
       ;; Build the actual byte-coded function.
       (cl-assert (eq 'byte-code (car-safe compiled)))
       (apply #'make-byte-code
@@ -2713,7 +2728,7 @@
 ;; Given an expression FORM, compile it and return an equivalent byte-code
 ;; expression (a call to the function byte-code).
 (defun byte-compile-top-level (form &optional for-effect output-type
-                                    lexenv reserved-csts)
+                                    lexenv reserved-csts tail-position)
   ;; OUTPUT-TYPE advises about how form is expected to be used:
   ;;	'eval or nil	-> a single form,
   ;;	'progn or t	-> a list of forms,
@@ -2742,7 +2757,7 @@
       (when (> byte-compile-depth 0)
         (byte-compile-out-tag (byte-compile-make-tag))))
     ;; Now compile FORM
-    (byte-compile-form form byte-compile--for-effect)
+    (byte-compile-form form byte-compile--for-effect tail-position)
     (byte-compile-out-toplevel byte-compile--for-effect output-type)))
 
 (defun byte-compile-out-toplevel (&optional for-effect output-type)
@@ -2867,8 +2882,9 @@
 ;; correctly.  (Use byte-compile-form-do-effect to reset the
 ;; byte-compile--for-effect flag too.)
 ;;
-(defun byte-compile-form (form &optional for-effect)
-  (let ((byte-compile--for-effect for-effect))
+(defun byte-compile-form (form &optional for-effect tail-position)
+  (let ((byte-compile--for-effect for-effect)
+        (byte-compile--tail-position tail-position))
     (cond
      ((not (consp form))
       (cond ((or (not (symbolp form)) (macroexp--const-symbol-p form))
@@ -2912,7 +2928,7 @@
            ;; if the form comes out the same way it went in, that's
            ;; because it was malformed, and we couldn't unfold it.
            (not (eq form (setq form (byte-compile-unfold-lambda form)))))
-      (byte-compile-form form byte-compile--for-effect)
+      (byte-compile-form form byte-compile--for-effect tail-position)
       (setq byte-compile--for-effect nil))
      ((byte-compile-normal-call form)))
     (if byte-compile--for-effect
@@ -2935,16 +2951,51 @@
     (byte-compile-set-symbol-position 'mapcar)
     (byte-compile-warn
      "`mapcar' called for effect; use `mapc' or `dolist' instead"))
-  (byte-compile-push-constant (car form))
-  (mapc 'byte-compile-form (cdr form))	; wasteful, but faster.
-  (byte-compile-out 'byte-call (length (cdr form))))
+  (cond
+   ((and (eq (car form) byte-compile-current-lambda-name)
+         lexical-binding byte-compile--tail-position
+         (let ((sig (byte-compile-arglist-signature 
+                     byte-compile-current-lambda-arglist))
+               (nargs (length (cdr form))))
+           (and (>= (car sig) nargs)
+                (or (not (cdr sig)) 
+                    (<= nargs (cdr sig))))))
+    (setq form (cdr form))
+    (let* ((rest (memq '&rest byte-compile-current-lambda-arglist))
+           (nnormal-args (- (length byte-compile-current-lambda-arglist)
+                            (if rest 2 0)
+                            (if (memq '&optional 
+                                      byte-compile-current-lambda-arglist)
+                                1 0)))
+           (nargs 0))
+      (while (> nnormal-args nargs)
+        (cl-incf nargs)
+        (if form 
+            (byte-compile-form (pop form))
+          (byte-compile-constant nil)))
+      (when rest (byte-compile-list (cons 'list form)))
+      (byte-compile-out 'byte-self-tail-call 
+                        (+ nnormal-args (if rest 1 0)))
+      ;; FIXME: 
+      ;; Pad to make stack match expectations.
+      (byte-compile-constant nil)))
+   (t
+    (let ((fun (car form)))
+      (byte-compile-push-constant fun)
+      (mapc #'byte-compile-form (cdr form))	; wasteful, but faster.
+      (byte-compile-out 
+       ;; If fun is builtin now, it probably will be later too, so don't 
+       ;; tail-eliminate.
+       (if (and (not (subrp fun)) byte-compile--tail-position) 
+           'byte-tail-call 'byte-call) 
+       (length (cdr form)))))))
 
 
 ;; Splice the given lap code into the current instruction stream.
 ;; If it has any labels in it, you're responsible for making sure there
 ;; are no collisions, and that byte-compile-tag-number is reasonable
 ;; after this is spliced in.  The provided list is destroyed.
-(defun byte-compile-inline-lapcode (lap end-depth)
+(defun byte-compile-inline-lapcode (lap end-depth tail-position)
   ;; "Replay" the operations: we used to just do
   ;; (setq byte-compile-output (nconc (nreverse lap) byte-compile-output))
   ;; but that fails to update byte-compile-depth, so we had to assume
@@ -2957,6 +3008,11 @@
       (cond
        ((eq (car op) 'TAG) (byte-compile-out-tag op))
        ((memq (car op) byte-goto-ops) (byte-compile-goto (car op) (cdr op)))
+       ((eq (car op) 'byte-self-tail-call)
+        (error "Can not inline self-recursive function"))
+       ((and (eq (car op) 'byte-tail-call)
+             (not tail-position)) 
+        (byte-compile-out 'byte-call (cdr op)))
        ((eq (car op) 'byte-return)
         (byte-compile-discard (- byte-compile-depth end-depth) t)
         (byte-compile-goto 'byte-goto endtag))
@@ -2974,7 +3030,7 @@
          (alen (length (cdr form)))
          (dynbinds ()))
     (fetch-bytecode fun)
-    (mapc 'byte-compile-form (cdr form))
+    (mapc #'byte-compile-form (cdr form))
     (unless fmax2
       ;; Old-style byte-code.
       (cl-assert (listp fargs))
@@ -3007,7 +3063,8 @@
     (mapc #'byte-compile-dynamic-variable-bind dynbinds)
     (byte-compile-inline-lapcode
      (byte-decompile-bytecode-1 (aref fun 1) (aref fun 2) t)
-     (1+ start-depth))
+     (1+ start-depth)
+     byte-compile--tail-position)
     ;; Unbind dynamic variables.
     (when dynbinds
       (byte-compile-out 'byte-unbind (length dynbinds)))
@@ -3443,18 +3500,18 @@
     (cond ((= count 0)
 	   (byte-compile-constant nil))
 	  ((< count 5)
-	   (mapc 'byte-compile-form (cdr form))
+	   (mapc #'byte-compile-form (cdr form))
 	   (byte-compile-out
 	    (aref [byte-list1 byte-list2 byte-list3 byte-list4] (1- count)) 0))
 	  ((< count 256)
-	   (mapc 'byte-compile-form (cdr form))
+	   (mapc #'byte-compile-form (cdr form))
 	   (byte-compile-out 'byte-listN count))
 	  (t (byte-compile-normal-call form)))))
 
 (defun byte-compile-concat (form)
   (let ((count (length (cdr form))))
     (cond ((and (< 1 count) (< count 5))
-	   (mapc 'byte-compile-form (cdr form))
+	   (mapc #'byte-compile-form (cdr form))
 	   (byte-compile-out
 	    (aref [byte-concat2 byte-concat3 byte-concat4] (- count 2))
 	    0))
@@ -3462,7 +3519,7 @@
 	  ((= count 0)
 	   (byte-compile-form ""))
 	  ((< count 256)
-	   (mapc 'byte-compile-form (cdr form))
+	   (mapc #'byte-compile-form (cdr form))
 	   (byte-compile-out 'byte-concatN count))
 	  ((byte-compile-normal-call form)))))
 
@@ -3548,7 +3605,7 @@
   (cond ((null (cdr form))
 	 (byte-compile-constant nil))
 	((<= (length form) 256)
-	 (mapc 'byte-compile-form (cdr form))
+	 (mapc #'byte-compile-form (cdr form))
 	 (if (cdr (cdr form))
 	     (byte-compile-out 'byte-insertN (length (cdr form)))
 	   (byte-compile-out 'byte-insert 0)))
@@ -3614,18 +3671,18 @@
 \f
 ;;; control structures
 
-(defun byte-compile-body (body &optional for-effect)
+(defun byte-compile-body (body &optional for-effect tail-position)
   (while (cdr body)
     (byte-compile-form (car body) t)
     (setq body (cdr body)))
-  (byte-compile-form (car body) for-effect))
+  (byte-compile-form (car body) for-effect tail-position))
 
-(defsubst byte-compile-body-do-effect (body)
-  (byte-compile-body body byte-compile--for-effect)
+(defsubst byte-compile-body-do-effect (body &optional tail-position)
+  (byte-compile-body body byte-compile--for-effect tail-position)
   (setq byte-compile--for-effect nil))
 
-(defsubst byte-compile-form-do-effect (form)
-  (byte-compile-form form byte-compile--for-effect)
+(defsubst byte-compile-form-do-effect (form &optional tail-position)
+  (byte-compile-form form byte-compile--for-effect tail-position)
   (setq byte-compile--for-effect nil))
 
 (byte-defop-compiler-1 inline byte-compile-progn)
@@ -3642,16 +3699,20 @@
 (byte-defop-compiler-1 let* byte-compile-let)
 
 (defun byte-compile-progn (form)
-  (byte-compile-body-do-effect (cdr form)))
+  (byte-compile-body-do-effect (cdr form) byte-compile--tail-position))
 
 (defun byte-compile-prog1 (form)
-  (byte-compile-form-do-effect (car (cdr form)))
-  (byte-compile-body (cdr (cdr form)) t))
+  (byte-compile-form-do-effect 
+   (cadr form) 
+   (and byte-compile--tail-position (not (cddr form))))
+  (byte-compile-body (cddr form) t))
 
 (defun byte-compile-prog2 (form)
   (byte-compile-form (nth 1 form) t)
-  (byte-compile-form-do-effect (nth 2 form))
-  (byte-compile-body (cdr (cdr (cdr form))) t))
+  (byte-compile-form-do-effect 
+   (nth 2 form)
+   (and byte-compile--tail-position (not (nthcdr 3 form))))
+  (byte-compile-body (nthcdr 3 form) t))
 
 (defmacro byte-compile-goto-if (cond discard tag)
   `(byte-compile-goto
@@ -3732,16 +3793,21 @@
 	(progn
 	  (byte-compile-goto-if nil byte-compile--for-effect donetag)
 	  (byte-compile-maybe-guarded clause
-	    (byte-compile-form (nth 2 form) byte-compile--for-effect))
+	    (byte-compile-form (nth 2 form) 
+                               byte-compile--for-effect
+                               byte-compile--tail-position))
 	  (byte-compile-out-tag donetag))
       (let ((elsetag (byte-compile-make-tag)))
 	(byte-compile-goto 'byte-goto-if-nil elsetag)
 	(byte-compile-maybe-guarded clause
-	  (byte-compile-form (nth 2 form) byte-compile--for-effect))
+	  (byte-compile-form (nth 2 form) byte-compile--for-effect
+                             byte-compile--tail-position))
 	(byte-compile-goto 'byte-goto donetag)
 	(byte-compile-out-tag elsetag)
 	(byte-compile-maybe-guarded (list 'not clause)
-	  (byte-compile-body (cdr (cdr (cdr form))) byte-compile--for-effect))
+	  (byte-compile-body (cdr (cdr (cdr form))) 
+                             byte-compile--for-effect
+                             byte-compile--tail-position))
 	(byte-compile-out-tag donetag))))
   (setq byte-compile--for-effect nil))
 
@@ -3764,7 +3830,9 @@
 	       (setq nexttag (byte-compile-make-tag))
 	       (byte-compile-goto 'byte-goto-if-nil nexttag)
 	       (byte-compile-maybe-guarded (car clause)
-		 (byte-compile-body (cdr clause) byte-compile--for-effect))
+		 (byte-compile-body (cdr clause) 
+                                    byte-compile--for-effect
+                                    byte-compile--tail-position))
 	       (byte-compile-goto 'byte-goto donetag)
 	       (byte-compile-out-tag nexttag)))))
     ;; Last clause
@@ -3774,7 +3842,7 @@
 		  (byte-compile-goto-if nil byte-compile--for-effect donetag)
 		  (setq clause (cdr clause))))
       (byte-compile-maybe-guarded guard
-	(byte-compile-body-do-effect clause)))
+	(byte-compile-body-do-effect clause byte-compile--tail-position)))
     (byte-compile-out-tag donetag)))
 
 (defun byte-compile-and (form)
@@ -3793,7 +3861,7 @@
 	(byte-compile-goto-if nil byte-compile--for-effect failtag)
 	(byte-compile-maybe-guarded (car rest)
 	  (byte-compile-and-recursion (cdr rest) failtag)))
-    (byte-compile-form-do-effect (car rest))
+    (byte-compile-form-do-effect (car rest) byte-compile--tail-position)
     (byte-compile-out-tag failtag)))
 
 (defun byte-compile-or (form)
@@ -3812,7 +3880,7 @@
 	(byte-compile-goto-if t byte-compile--for-effect wintag)
 	(byte-compile-maybe-guarded (list 'not (car rest))
 	  (byte-compile-or-recursion (cdr rest) wintag)))
-    (byte-compile-form-do-effect (car rest))
+    (byte-compile-form-do-effect (car rest) byte-compile--tail-position)
     (byte-compile-out-tag wintag)))
 
 (defun byte-compile-while (form)
@@ -3827,8 +3895,10 @@
     (setq byte-compile--for-effect nil)))
 
 (defun byte-compile-funcall (form)
-  (mapc 'byte-compile-form (cdr form))
-  (byte-compile-out 'byte-call (length (cdr (cdr form)))))
+  (mapc #'byte-compile-form (cdr form))
+  (byte-compile-out 
+   (if byte-compile--tail-position 'byte-tail-call 'byte-call) 
+   (length (cdr (cdr form)))))
 
 \f
 ;; let binding
@@ -3932,7 +4002,7 @@
                  (pop init-lexenv)))))
       ;; Emit the body.
       (let ((init-stack-depth byte-compile-depth))
-        (byte-compile-body-do-effect (cdr (cdr form)))
+        (byte-compile-body-do-effect (cddr form) byte-compile--tail-position)
         ;; Unbind the variables.
         (if lexical-binding
             ;; Unbind both lexical and dynamic variables.
@@ -3998,7 +4068,9 @@
      (byte-compile-push-constant
       (byte-compile-top-level-body handlers t))))
   (byte-compile-out 'byte-unwind-protect 0)
-  (byte-compile-form-do-effect (car (cdr form)))
+  (byte-compile-form-do-effect 
+   (car (cdr form)) 
+   byte-compile--tail-position)
   (byte-compile-out 'byte-unbind 1))
 
 (defun byte-compile-track-mouse (form)
@@ -4065,17 +4137,17 @@
       (byte-compile-warn
        "Use `with-current-buffer' rather than save-excursion+set-buffer"))
   (byte-compile-out 'byte-save-excursion 0)
-  (byte-compile-body-do-effect (cdr form))
+  (byte-compile-body-do-effect (cdr form) byte-compile--tail-position)
   (byte-compile-out 'byte-unbind 1))
 
 (defun byte-compile-save-restriction (form)
   (byte-compile-out 'byte-save-restriction 0)
-  (byte-compile-body-do-effect (cdr form))
+  (byte-compile-body-do-effect (cdr form) byte-compile--tail-position)
   (byte-compile-out 'byte-unbind 1))
 
 (defun byte-compile-save-current-buffer (form)
   (byte-compile-out 'byte-save-current-buffer 0)
-  (byte-compile-body-do-effect (cdr form))
+  (byte-compile-body-do-effect (cdr form) byte-compile--tail-position)
   (byte-compile-out 'byte-unbind 1))
 \f
 ;;; top-level forms elsewhere
@@ -4278,7 +4350,8 @@
 (defun byte-compile-stack-adjustment (op operand)
   "Return the amount by which an operation adjusts the stack.
 OP and OPERAND are as passed to `byte-compile-out'."
-  (if (memq op '(byte-call byte-discardN byte-discardN-preserve-tos))
+  (if (memq op '(byte-call byte-self-tail-call byte-tail-call
+                           byte-discardN byte-discardN-preserve-tos))
       ;; For calls, OPERAND is the number of args, so we pop OPERAND + 1
       ;; elements, and the push the result, for a total of -OPERAND.
       ;; For discardN*, of course, we just pop OPERAND elements.
@@ -4599,36 +4672,7 @@
 (provide 'bytecomp)
 
 \f
-;;; report metering (see the hacks in bytecode.c)
 
-(defvar byte-code-meter)
-(defun byte-compile-report-ops ()
-  (or (boundp 'byte-metering-on)
-      (error "You must build Emacs with -DBYTE_CODE_METER to use this"))
-  (with-output-to-temp-buffer "*Meter*"
-    (set-buffer "*Meter*")
-    (let ((i 0) n op off)
-      (while (< i 256)
-	(setq n (aref (aref byte-code-meter 0) i)
-	      off nil)
-	(if t				;(not (zerop n))
-	    (progn
-	      (setq op i)
-	      (setq off nil)
-	      (cond ((< op byte-nth)
-		     (setq off (logand op 7))
-		     (setq op (logand op 248)))
-		    ((>= op byte-constant)
-		     (setq off (- op byte-constant)
-			   op byte-constant)))
-	      (setq op (aref byte-code-vector op))
-	      (insert (format "%-4d" i))
-	      (insert (symbol-name op))
-	      (if off (insert " [" (int-to-string off) "]"))
-	      (indent-to 40)
-	      (insert (int-to-string n) "\n")))
-	(setq i (1+ i))))))
-\f
 ;; To avoid "lisp nesting exceeds max-lisp-eval-depth" when bytecomp compiles
 ;; itself, compile some of its most used recursive functions (at load time).
 ;;

=== modified file 'src/bytecode.c'
--- src/bytecode.c	2012-09-16 21:43:55 +0000
+++ src/bytecode.c	2012-09-19 18:21:48 +0000
@@ -33,6 +33,7 @@
  */
 
 #include <config.h>
+#include <stdio.h>
 
 #include "lisp.h"
 #include "character.h"
@@ -58,35 +59,10 @@
    indirect threaded, using GCC's computed goto extension.  This code,
    as currently implemented, is incompatible with BYTE_CODE_SAFE and
    BYTE_CODE_METER.  */
-#if defined (__GNUC__) && !defined (BYTE_CODE_SAFE) && !defined (BYTE_CODE_METER)
+#if defined (__GNUC__)
 #define BYTE_CODE_THREADED
 #endif
 
-\f
-#ifdef BYTE_CODE_METER
-
-Lisp_Object Qbyte_code_meter;
-#define METER_2(code1, code2) AREF (AREF (Vbyte_code_meter, code1), code2)
-#define METER_1(code) METER_2 (0, code)
-
-#define METER_CODE(last_code, this_code)				\
-{									\
-  if (byte_metering_on)							\
-    {									\
-      if (XFASTINT (METER_1 (this_code)) < MOST_POSITIVE_FIXNUM)	\
-        XSETFASTINT (METER_1 (this_code),				\
-		     XFASTINT (METER_1 (this_code)) + 1);		\
-      if (last_code							\
-	  && (XFASTINT (METER_2 (last_code, this_code))			\
-	      < MOST_POSITIVE_FIXNUM))					\
-        XSETFASTINT (METER_2 (last_code, this_code),			\
-		     XFASTINT (METER_2 (last_code, this_code)) + 1);	\
-    }									\
-}
-
-#endif /* BYTE_CODE_METER */
-\f
-
 Lisp_Object Qbytecode;
 
 /*  Byte codes: */
@@ -275,7 +251,8 @@
 DEFINE (Bstack_set,  0262)						\
 DEFINE (Bstack_set2, 0263)						\
 DEFINE (BdiscardN,   0266)						\
-									\
+DEFINE (Btailcall, 0267)                                                \
+DEFINE (Bselftailcall, 0270)                                            \
 DEFINE (Bconstant, 0300)
 
 enum byte_code_op
@@ -284,14 +261,8 @@
     BYTE_CODES
 #undef DEFINE
 
-#ifdef BYTE_CODE_SAFE
-    Bscan_buffer = 0153, /* No longer generated as of v18.  */
-    Bset_mark = 0163 /* this loser is no longer generated as of v18 */
-#endif
 };
 
-/* Whether to maintain a `top' and `bottom' field in the stack frame.  */
-#define BYTE_MAINTAIN_TOP (BYTE_CODE_SAFE || BYTE_MARK_STACK)
 \f
 /* Structure describing a value stack used during byte-code execution
    in Fbyte_code.  */
@@ -304,7 +275,7 @@
 
   /* Top and bottom of stack.  The bottom points to an area of memory
      allocated with alloca in Fbyte_code.  */
-#if BYTE_MAINTAIN_TOP
+#if BYTE_MARK_STACK
   Lisp_Object *top, *bottom;
 #endif
 
@@ -360,8 +331,7 @@
 }
 #endif
 
-/* Unmark objects in the stacks on byte_stack_list.  Relocate program
-   counters.  Called when GC has completed.  */
+/* Called when GC has completed.  Relocate program counters.  */
 
 void
 unmark_byte_stack (void)
@@ -382,19 +352,14 @@
 \f
 /* Fetch the next byte from the bytecode stream */
 
-#define FETCH *stack.pc++
+#define FETCH (*stack.pc++)
 
 /* Fetch two bytes from the bytecode stream and make a 16-bit number
    out of them */
 
 #define FETCH2 (op = FETCH, op + (FETCH << 8))
 
-/* Push x onto the execution stack.  This used to be #define PUSH(x)
-   (*++stackp = (x)) This oddity is necessary because Alliant can't be
-   bothered to compile the preincrement operator properly, as of 4/91.
-   -JimB */
-
-#define PUSH(x) (top++, *top = (x))
+#define PUSH(x) (*++top = (x))
 
 /* Pop a value off the execution stack.  */
 
@@ -412,12 +377,12 @@
 /* Actions that must be performed before and after calling a function
    that might GC.  */
 
-#if !BYTE_MAINTAIN_TOP
+#if BYTE_MARK_STACK
+#define BEFORE_POTENTIAL_GC()	stack.top = top
+#define AFTER_POTENTIAL_GC()	stack.top = NULL
+#else
 #define BEFORE_POTENTIAL_GC()	((void)0)
 #define AFTER_POTENTIAL_GC()	((void)0)
-#else
-#define BEFORE_POTENTIAL_GC()	stack.top = top
-#define AFTER_POTENTIAL_GC()	stack.top = NULL
 #endif
 
 /* Garbage collect if we have consed enough since the last time.
@@ -430,19 +395,6 @@
    AFTER_POTENTIAL_GC ();	\
  } while (0)
 
-/* Check for jumping out of range.  */
-
-#ifdef BYTE_CODE_SAFE
-
-#define CHECK_RANGE(ARG) \
-  if (ARG >= bytestr_length) emacs_abort ()
-
-#else /* not BYTE_CODE_SAFE */
-
-#define CHECK_RANGE(ARG)
-
-#endif /* not BYTE_CODE_SAFE */
-
 /* A version of the QUIT macro which makes sure that the stack top is
    set before signaling `quit'.  */
 
@@ -474,148 +426,66 @@
   return exec_byte_code (bytestr, vector, maxdepth, Qnil, 0, NULL);
 }
 
-/* Execute the byte-code in BYTESTR.  VECTOR is the constant vector, and
-   MAXDEPTH is the maximum stack depth used (if MAXDEPTH is incorrect,
-   emacs may crash!).  If ARGS_TEMPLATE is non-nil, it should be a lisp
-   argument list (including &rest, &optional, etc.), and ARGS, of size
-   NARGS, should be a vector of the actual arguments.  The arguments in
-   ARGS are pushed on the stack according to ARGS_TEMPLATE before
-   executing BYTESTR.  */
-
-Lisp_Object
-exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
-		Lisp_Object args_template, ptrdiff_t nargs, Lisp_Object *args)
+/* Interpret a numerical args template TEMPLATE, used for lexically-scoped 
+   byte-compiled functions.  Put the data in MIN_ARGS, MAX_ARGS and REST. 
+   Also validates that nargs is in the range.  */
+static void
+resolve_args_template (ptrdiff_t template, ptrdiff_t *min_args, 
+                       ptrdiff_t *max_args, bool *rest)
+{
+  *max_args = template >> 8;
+  *min_args = template & 0x7F;
+  *rest = (template & 0x80) != 0;
+}
+
+/* Make sure that nargs is the right range.  */
+static void
+validate_nargs (ptrdiff_t nargs, ptrdiff_t min_args, 
+                ptrdiff_t max_args, bool rest)
+{
+  if (nargs < min_args)
+    Fsignal (Qwrong_number_of_arguments,
+             Fcons (Fcons (make_number (min_args),
+                           rest ? Qand_rest : make_number (max_args)),
+                    Fcons (make_number (nargs), Qnil)));
+  
+  if (nargs > max_args && !rest)
+    Fsignal (Qwrong_number_of_arguments,
+             Fcons (Fcons (make_number (min_args),
+                           make_number (max_args)),
+                    Fcons (make_number (nargs), Qnil)));
+}
+
+
+/* Execute the bytecode in BYTESTR, using constants in VECTOR, 
+   and with stack-top at TOP, bottom at BOTTOM and depth in 
+   MAXDEPTH.  */
+static Lisp_Object
+run_byte_code (Lisp_Object bytestr, Lisp_Object vector, 
+               Lisp_Object *top, Lisp_Object *bottom,
+               ptrdiff_t maxdepth)
+
 {
   ptrdiff_t count = SPECPDL_INDEX ();
-#ifdef BYTE_CODE_METER
-  int this_op = 0;
-  int prev_op;
-#endif
   int op;
-  /* Lisp_Object v1, v2; */
   Lisp_Object *vectorp;
-#ifdef BYTE_CODE_SAFE
-  ptrdiff_t const_length;
-  Lisp_Object *stacke;
-  ptrdiff_t bytestr_length;
-#endif
   struct byte_stack stack;
-  Lisp_Object *top;
   Lisp_Object result;
 
-#if 0 /* CHECK_FRAME_FONT */
- {
-   struct frame *f = SELECTED_FRAME ();
-   if (FRAME_X_P (f)
-       && FRAME_FONT (f)->direction != 0
-       && FRAME_FONT (f)->direction != 1)
-     emacs_abort ();
- }
-#endif
-
-  CHECK_STRING (bytestr);
-  CHECK_VECTOR (vector);
-  CHECK_NATNUM (maxdepth);
-
-#ifdef BYTE_CODE_SAFE
-  const_length = ASIZE (vector);
-#endif
-
-  if (STRING_MULTIBYTE (bytestr))
-    /* BYTESTR must have been produced by Emacs 20.2 or the earlier
-       because they produced a raw 8-bit string for byte-code and now
-       such a byte-code string is loaded as multibyte while raw 8-bit
-       characters converted to multibyte form.  Thus, now we must
-       convert them back to the originally intended unibyte form.  */
-    bytestr = Fstring_as_unibyte (bytestr);
-
-#ifdef BYTE_CODE_SAFE
-  bytestr_length = SBYTES (bytestr);
-#endif
   vectorp = XVECTOR (vector)->contents;
 
   stack.byte_string = bytestr;
   stack.pc = stack.byte_string_start = SDATA (bytestr);
   stack.constants = vector;
-  if (MAX_ALLOCA / word_size <= XFASTINT (maxdepth))
-    memory_full (SIZE_MAX);
-  top = alloca ((XFASTINT (maxdepth) + 1) * sizeof *top);
 #if BYTE_MAINTAIN_TOP
-  stack.bottom = top + 1;
+  stack.bottom = bottom;
   stack.top = NULL;
 #endif
   stack.next = byte_stack_list;
   byte_stack_list = &stack;
 
-#ifdef BYTE_CODE_SAFE
-  stacke = stack.bottom - 1 + XFASTINT (maxdepth);
-#endif
-
-  if (INTEGERP (args_template))
-    {
-      ptrdiff_t at = XINT (args_template);
-      bool rest = (at & 128) != 0;
-      int mandatory = at & 127;
-      ptrdiff_t nonrest = at >> 8;
-      eassert (mandatory <= nonrest);
-      if (nargs <= nonrest)
-	{
-	  ptrdiff_t i;
-	  for (i = 0 ; i < nargs; i++, args++)
-	    PUSH (*args);
-	  if (nargs < mandatory)
-	    /* Too few arguments.  */
-	    Fsignal (Qwrong_number_of_arguments,
-		     Fcons (Fcons (make_number (mandatory),
-				   rest ? Qand_rest : make_number (nonrest)),
-			    Fcons (make_number (nargs), Qnil)));
-	  else
-	    {
-	      for (; i < nonrest; i++)
-		PUSH (Qnil);
-	      if (rest)
-		PUSH (Qnil);
-	    }
-	}
-      else if (rest)
-	{
-	  ptrdiff_t i;
-	  for (i = 0 ; i < nonrest; i++, args++)
-	    PUSH (*args);
-	  PUSH (Flist (nargs - nonrest, args));
-	}
-      else
-	/* Too many arguments.  */
-	Fsignal (Qwrong_number_of_arguments,
-		 Fcons (Fcons (make_number (mandatory),
-			       make_number (nonrest)),
-			Fcons (make_number (nargs), Qnil)));
-    }
-  else if (! NILP (args_template))
-    /* We should push some arguments on the stack.  */
-    {
-      error ("Unknown args template!");
-    }
-
   while (1)
     {
-#ifdef BYTE_CODE_SAFE
-      if (top > stacke)
-	emacs_abort ();
-      else if (top < stack.bottom - 1)
-	emacs_abort ();
-#endif
-
-#ifdef BYTE_CODE_METER
-      prev_op = this_op;
-      this_op = op = FETCH;
-      METER_CODE (prev_op, op);
-#else
-#ifndef BYTE_CODE_THREADED
-      op = FETCH;
-#endif
-#endif
-
       /* The interpreter can be compiled one of two ways: as an
 	 ordinary switch-based interpreter, or as a threaded
 	 interpreter.  The threaded interpreter relies on GCC's
@@ -642,16 +512,6 @@
 #define CASE_DEFAULT
       /* This introduces an instruction that is known to call abort.  */
 #define CASE_ABORT CASE (Bstack_ref): CASE (default)
-#else
-      /* See above for the meaning of the various defines.  */
-#define CASE(OP) case OP
-#define NEXT break
-#define FIRST switch (op)
-#define CASE_DEFAULT case 255: default:
-#define CASE_ABORT case 0
-#endif
-
-#ifdef BYTE_CODE_THREADED
 
       /* A convenience define that saves us a lot of typing and makes
 	 the table clearer.  */
@@ -677,8 +537,18 @@
 # pragma GCC diagnostic pop
 #endif
 
-#endif
-
+#else /* !BYTE_CODE_THREADED */
+
+      /* See above for the meaning of the various defines.  */
+#define CASE(OP) case OP
+#define NEXT break
+#define FIRST switch (op)
+#define CASE_DEFAULT case 255: default:
+#define CASE_ABORT case 0
+
+      op = FETCH;
+
+#endif /* !BYTE_CODE_THREADED */
 
       FIRST
 	{
@@ -734,7 +604,6 @@
 	    if (NILP (v1))
 	      {
 		BYTE_CODE_QUIT;
-		CHECK_RANGE (op);
 		stack.pc = stack.byte_string_start + op;
 	      }
 	    NEXT;
@@ -881,21 +750,6 @@
 	  {
 	    BEFORE_POTENTIAL_GC ();
 	    DISCARD (op);
-#ifdef BYTE_CODE_METER
-	    if (byte_metering_on && SYMBOLP (TOP))
-	      {
-		Lisp_Object v1, v2;
-
-		v1 = TOP;
-		v2 = Fget (v1, Qbyte_code_meter);
-		if (INTEGERP (v2)
-		    && XINT (v2) < MOST_POSITIVE_FIXNUM)
-		  {
-		    XSETINT (v2, XINT (v2) + 1);
-		    Fput (v1, Qbyte_code_meter, v2);
-		  }
-	      }
-#endif
 	    TOP = Ffuncall (op + 1, &TOP);
 	    AFTER_POTENTIAL_GC ();
 	    NEXT;
@@ -934,7 +788,6 @@
 	  MAYBE_GC ();
 	  BYTE_CODE_QUIT;
 	  op = FETCH2;    /* pc = FETCH2 loses since FETCH2 contains pc++ */
-	  CHECK_RANGE (op);
 	  stack.pc = stack.byte_string_start + op;
 	  NEXT;
 
@@ -947,7 +800,6 @@
 	    if (!NILP (v1))
 	      {
 		BYTE_CODE_QUIT;
-		CHECK_RANGE (op);
 		stack.pc = stack.byte_string_start + op;
 	      }
 	    NEXT;
@@ -959,7 +811,6 @@
 	  if (NILP (TOP))
 	    {
 	      BYTE_CODE_QUIT;
-	      CHECK_RANGE (op);
 	      stack.pc = stack.byte_string_start + op;
 	    }
 	  else DISCARD (1);
@@ -971,7 +822,6 @@
 	  if (!NILP (TOP))
 	    {
 	      BYTE_CODE_QUIT;
-	      CHECK_RANGE (op);
 	      stack.pc = stack.byte_string_start + op;
 	    }
 	  else DISCARD (1);
@@ -1855,23 +1705,6 @@
 	  TOP = INTEGERP (TOP) ? Qt : Qnil;
 	  NEXT;
 
-#ifdef BYTE_CODE_SAFE
-	  /* These are intentionally written using 'case' syntax,
-	     because they are incompatible with the threaded
-	     interpreter.  */
-
-	case Bset_mark:
-	  BEFORE_POTENTIAL_GC ();
-	  error ("set-mark is an obsolete bytecode");
-	  AFTER_POTENTIAL_GC ();
-	  break;
-	case Bscan_buffer:
-	  BEFORE_POTENTIAL_GC ();
-	  error ("scan-buffer is an obsolete bytecode");
-	  AFTER_POTENTIAL_GC ();
-	  break;
-#endif
-
 	CASE_ABORT:
 	  /* Actually this is Bstack_ref with offset 0, but we use Bdup
 	     for that instead.  */
@@ -1924,21 +1757,96 @@
 	  DISCARD (op);
 	  NEXT;
 
+        CASE (Btailcall):
+          {
+            Lisp_Object fun, templ;
+            op = FETCH;
+
+            fun = *(top - op);
+            if (SYMBOLP (fun) && ! EQ (fun, Qunbound)
+                && (fun = XSYMBOL (fun)->function, SYMBOLP (fun)))
+              fun = indirect_function (fun);
+            
+            /* We can only do tail-calls to byte-compiled functions,
+               that are lexically-scoped and uses equal or less byte-stack 
+               than us.  */
+            if (COMPILEDP (fun) && 
+                INTEGERP ((templ = AREF (fun, COMPILED_ARGLIST))) &&
+                maxdepth >= XINT (AREF (fun, COMPILED_STACK_DEPTH)))
+              {
+                Lisp_Object bytestring;
+                ptrdiff_t arg_templ = XINT (templ);
+                ptrdiff_t max_args, min_args;
+                bool rest;
+                
+                BYTE_CODE_QUIT;
+                MAYBE_GC ();
+
+                if (CONSP (AREF (fun, COMPILED_BYTECODE)))
+                  Ffetch_bytecode (fun);
+                
+                /* Validate arguments and set up stack the new
+                   function will expect.  */
+                resolve_args_template (arg_templ, &min_args, 
+                                       &max_args, &rest);
+                validate_nargs (op, min_args, max_args, rest);
+
+                for (;op < max_args;op++)
+                  PUSH (Qnil);
+                
+                if (rest) 
+                  {
+                    ptrdiff_t extra = op - max_args;
+                    if (extra > 0)
+                      {
+                        Lisp_Object *ptr = top - extra + 1;
+                        *ptr = Flist (extra, ptr);
+                        DISCARD (extra - 1);
+                      }
+                    else
+                      PUSH (Qnil);
+                    op = max_args + 1;
+                  }
+                
+                memmove (bottom, top - op + 1, op * sizeof (*top));
+                top = bottom + op - 1;
+                
+                /* The stack is good to go, now for the rest.  */
+                stack.byte_string = AREF (fun, COMPILED_BYTECODE);
+                /* Backward compatibility with emacs pre-20.2
+                   inclusively.  (See below in exec_byte_code).  */
+                if (STRING_MULTIBYTE (stack.byte_string))
+                  stack.byte_string = Fstring_as_unibyte (bytestr);
+                stack.byte_string_start = SDATA (stack.byte_string);
+                stack.pc = stack.byte_string_start;
+                stack.constants = AREF (fun, COMPILED_CONSTANTS);
+                vectorp = XVECTOR (stack.constants)->contents;
+
+                /* And yoohoo!  */
+                NEXT;
+              }
+            goto docall;
+          }
+        
+        CASE (Bselftailcall):
+          op = FETCH;
+          
+          BYTE_CODE_QUIT;
+          MAYBE_GC ();
+          
+          /* Byte compiler should have set all arguments up right. Its
+             just for us to remove the cruft on the stack.  */
+          memmove (bottom, top - op + 1, sizeof (*top) * op);
+          top = bottom + op - 1;
+          
+          stack.pc = stack.byte_string_start;
+          
+          /* And we are back!  */
+          NEXT;
+
 	CASE_DEFAULT
 	CASE (Bconstant):
-#ifdef BYTE_CODE_SAFE
-	  if (op < Bconstant)
-	    {
-	      emacs_abort ();
-	    }
-	  if ((op -= Bconstant) >= const_length)
-	    {
-	      emacs_abort ();
-	    }
-	  PUSH (vectorp[op]);
-#else
 	  PUSH (vectorp[op - Bconstant]);
-#endif
 	  NEXT;
 	}
     }
@@ -1947,15 +1855,65 @@
 
   byte_stack_list = byte_stack_list->next;
 
-  /* Binds and unbinds are supposed to be compiled balanced.  */
-  if (SPECPDL_INDEX () != count)
-#ifdef BYTE_CODE_SAFE
-    error ("binding stack not balanced (serious byte compiler bug)");
-#else
-    emacs_abort ();
-#endif
-
-  return result;
+  /* When tail calls are being made, not all dynamical bindings have
+     necessarily been unbound, so clean them up here.  */
+  return unbind_to (count, result);
+}
+
+/* Execute the byte-code in BYTESTR.  VECTOR is the constant vector,
+   and MAXDEPTH is the maximum stack depth used (if MAXDEPTH is
+   incorrect, emacs may crash!).  If ARGS_TEMPLATE is non-nil, it
+   should be a natural number encoding the argument list according to
+   resolve_args_template above and ARGS, of size NARGS,
+   should be a vector of the actual arguments.  The arguments in ARGS
+   are pushed onto the stack according to ARGS_TEMPLATE before executing
+   BYTESTR.  */
+Lisp_Object 
+exec_byte_code (Lisp_Object bytestr, Lisp_Object vector,
+                Lisp_Object maxdepth, Lisp_Object args_template,
+                ptrdiff_t nargs, Lisp_Object *args)
+{
+  Lisp_Object *top, *bottom;
+
+  CHECK_STRING (bytestr);
+  CHECK_VECTOR (vector);
+  CHECK_NUMBER (maxdepth);
+
+  if (STRING_MULTIBYTE (bytestr))
+    /* BYTESTR must have been produced by Emacs 20.2 or the earlier
+       because they produced a raw 8-bit string for byte-code and now
+       such a byte-code string is loaded as multibyte while raw 8-bit
+       characters converted to multibyte form.  Thus, now we must
+       convert them back to the originally intended unibyte form.  */
+    bytestr = Fstring_as_unibyte (bytestr);
+  
+  if (MAX_ALLOCA / word_size <= XFASTINT (maxdepth))
+    memory_full (SIZE_MAX);
+  top = alloca ((XFASTINT (maxdepth) + 1) * sizeof (*top));
+  bottom = top + 1;
+  
+  if (INTEGERP (args_template))
+    {
+      ptrdiff_t min_args, max_args, nonrest, i;
+      bool rest;
+
+      resolve_args_template (XINT (args_template), &min_args, 
+                             &max_args, &rest);
+      validate_nargs (nargs, min_args, max_args, rest);
+      
+      nonrest = min (nargs, max_args);
+      for (i = 0; i < nonrest; i++)
+        PUSH (args[i]);
+      for (;i < max_args; i++)
+        PUSH (Qnil);
+      if (rest)
+        PUSH (nargs > max_args ? 
+              Flist (nargs - max_args, args + max_args) : Qnil);
+    }
+  else if (!NILP (args_template))
+    error ("Unknown args template!");
+  
+  return run_byte_code (bytestr, vector, top, bottom, XFASTINT (maxdepth));
 }
 
 void
@@ -1964,31 +1922,4 @@
   DEFSYM (Qbytecode, "byte-code");
 
   defsubr (&Sbyte_code);
-
-#ifdef BYTE_CODE_METER
-
-  DEFVAR_LISP ("byte-code-meter", Vbyte_code_meter,
-	       doc: /* A vector of vectors which holds a histogram of byte-code usage.
-\(aref (aref byte-code-meter 0) CODE) indicates how many times the byte
-opcode CODE has been executed.
-\(aref (aref byte-code-meter CODE1) CODE2), where CODE1 is not 0,
-indicates how many times the byte opcodes CODE1 and CODE2 have been
-executed in succession.  */);
-
-  DEFVAR_BOOL ("byte-metering-on", byte_metering_on,
-	       doc: /* If non-nil, keep profiling information on byte code usage.
-The variable byte-code-meter indicates how often each byte opcode is used.
-If a symbol has a property named `byte-code-meter' whose value is an
-integer, it is incremented each time that symbol's function is called.  */);
-
-  byte_metering_on = 0;
-  Vbyte_code_meter = Fmake_vector (make_number (256), make_number (0));
-  DEFSYM (Qbyte_code_meter, "byte-code-meter");
-  {
-    int i = 256;
-    while (i--)
-      ASET (Vbyte_code_meter, i,
-           Fmake_vector (make_number (256), make_number (0)));
-  }
-#endif
 }


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] Tail-call elimination in byte-compiled code.
  2012-09-20  8:15 [PATCH] Tail-call elimination in byte-compiled code Troels Nielsen
@ 2012-09-20 16:31 ` Stefan Monnier
  0 siblings, 0 replies; 2+ messages in thread
From: Stefan Monnier @ 2012-09-20 16:31 UTC (permalink / raw)
  To: Troels Nielsen; +Cc: benahssm, emacs-devel

> Hi all, and thanks for the great work being put into Emacs!

You're welcome.

> Now that wonderful lexical scope has been added, it is not very
> difficult adding tail-call elimination to byte-compiled code.  So I've
> tried to do, just that.

Great!

> ; -*- lexical-binding: t
> (require 'benchmark)
> (defun f (x accum)
>        (if (> x 0) (f (1- x) (+ x accum)) accum))
> (defun g (x accum)
>        (while  (> x 0) (setq accum (+ x accum)
>                             x (1- x)))
>        accum)
> (mapc #'byte-compile (list #'f #'g))
> (benchmark-run-compiled 10 (f 1000000 0))
> (benchmark-run-compiled 10 (g 1000000 0))
> will on my setup even make f some 8% faster than g!

I'd be interested to see the comparison of all 4 numbers:
f-old, g-old, f-new, g-new.

Yes, I know that f-old will fail for lack of stack, so we'd need
a different benchmark, but I think it's important to have such
a measure.  Also comparing g-new to g-old is so as to make sure the
patch doesn't end up introducing an overall slowdown.

> Self tail recursive functions has the following little problem:
> (f 1000000 0)
> (let ((y (symbol-function 'f)))
>      (fset 'f (lambda (_a _b) -1))
>      (funcall y 1000000 1))
> Where the interpreted behaviour give -1, but the byte-compiled wil be
> 500000500001.
> I don't think that is ultimately very serious though.

Its only significant impact is for defadvice.  It's hard to tell whether
that would/will be a real problem.

Of course, there are further consequences:
trace-function/debug-on-entry/elp won't see the recursive calls
any more.  In some cases this will actually be beneficial.

Another issue is the backtrace printed by `debug', which won't show those
recursive self-tail-calls any more.  Again, it's likely to be a virtue
in many cases.

OTOH, while non-self tail-calls won't suffer from the change w.r.t
defadvice (and elp/trace/d-o-e), they will also fail to show up in
debuggers's backtraces and I think this is much more problematic.

Maybe we can fix that be changing byte-tail-call to adjust
backtrace_list (with alloca).  That would make those "tail-calls" eat up
some C stack space, which is bad in the even/odd mutual-recursion case,
but I such cases should be a lot less frequent than the "plain
tail-call" case where recursion may not even be present.
We could leave the backtrace_list untouched in the case where the called
function has no name (is not a symbol by a bytecode object), so that
cases such as letrec/cl-labels mutual recursion won't eat up stack space.

> Another problem is that there is a little bit more difference
> in characteristic between interpreted and byte-compiled code,
> as interpreted code will quickly reach max-eval-depth.  I don't
> see any easy way out of that now tho, and tail-recursion is a very
> desirable thing for me and likely many others.

Yes, that's a concern.  It will prevent use of self-recursive
definitions in some places where they'd be used before they are
compiled.  Some of those problems can be solved by adding more
dependencies in lisp/Makefile, but others might be harder (for the files
we need before we byte-compile the byte-compiler).

OTOH we're slowly moving towards a mode where all the code is always
byte-compiled.  Example of recent changes in that direction are the
eager macro-expansion, and the byte-compilation of defsubsts before
inlining them (in those cases where we wouldn't know how to inline
their source code).

Further thoughts:
- I'd like self-tail calls to use something closer to `goto'.
  I some cases, they can really just use `goto 0' (in those cases where
  the memcpy is a nop).  Your self-tail-call is really a "discard-under
  + goto-0", so maybe all we need is to generalize it to go to other
  labels than 0.
  Of course, that would slow down self-tail calls and the only benefit
  would be when inlining such functions (and currently such recursive
  functions are never inlined).  So maybe it's not worth the trouble.
- What happens with the "unbind" bytecodes that would be run after the
  `call' when that call is turned into a `(self-)tail-call', e.g. when
  the self-tail-call is inside an unwind-protect or a let-bind of
  a dynamically-scoped var?
  I'm not sure the current handling is correct, yet I'm not sure it's
  incorrect either.
- Should we be able to do tail-calls without a new `tail-call'
  instruction by just checking whether the next instruction is `return'?
  
> The patch as is now, will also remove BYTE_CODE_SAFE and
> BYTE_CODE_METER. They made it more difficult for me to understand what
> actually went on in bytecode.c. If someone needs them I will gladly
> add them back.

I think your code and theirs doesn't interact, so you can put it back.

> I did try to put up a heap-allocated byte-code stack so to optimize
> non-tail-recursive inter-byte-code calls avoiding copying the
> arguments on the byte-stack. This unfortunately gave a small but
> significant performance reduction, maybe due to the C-stack
> consistently being in the processor's cache.

That's *very* interesting.
Have you tried allocating it with alloca instead?

> Also I'm not very fond of byte-compile--tail-position variable, and
> would rather add it as an argument to the 'byte-compile handlers.
> I have a patch that does just that along with disbanding
> byte-compile--for-effect and adding that as an argument along.

byte-compile--for-effect is pretty nasty, yes.  I'm actually surprised
we don't seem to have too many bugs because of it.

> The only problem is that some backward-compatibility may be broken,
> but I don't know how much external code is really adding their own
> 'byte-compile handlers.  Would there be any problems with such a patch?

I think breaking this compatibility would indeed be problematic.
As much as I dislike it, several external packages hook into the
byte-compiler (often for bad reasons, but that's another discussion).

> @@ -2503,15 +2515,16 @@ (defun byte-compile (form)
>          (when (symbolp form)
>            (unless (memq (car-safe fun) '(closure lambda))
>              (error "Don't know how to compile %S" fun))
> -          (setq fun (byte-compile--reify-function fun))
> -          (setq lexical-binding (eq (car fun) 'closure)))
> +          (setq lexical-binding (eq (car fun) 'closure))
> +          (setq fun (byte-compile--reify-function fun)))
>          (unless (eq (car-safe fun) 'lambda)
>            (error "Don't know how to compile %S" fun))
>          ;; Expand macros.
>          (setq fun (byte-compile-preprocess fun))

Yes, good catch, thank you.
Some more comments and nitpicks below.


        Stefan



>  (defvar byte-compile--for-effect)
> +(defvar byte-compile--tail-position nil)
> +

Please avoid adding whitespace ;-)

> -	       (mapc 'byte-compile-output-file-form (cdr form)))
> +	       (mapc #'byte-compile-output-file-form (cdr form)))

While I'm perfectly OK with such a change (as evidenced by some of my
recent commits), try and avoid including them in a patch that's already
large and meant only for review.

> -                      (symbol-value this-kind))))
> -          )
> +                      (symbol-value this-kind)))))
[...]
> -      (let* ((code (byte-compile-lambda (cons arglist body) t)))
> +      (let ((code (byte-compile-lambda (cons arglist body) t nil name)))

Same here.

> @@ -2462,6 +2471,9 @@
>    (pcase-let* (((or (and `(lambda ,args . ,body) (let env nil))
>                      `(closure ,env ,args . ,body)) fun)
>                 (renv ()))
> +    ;; Remove docstring if it exists
> +    (when (and (cdr body) (stringp body)) 
> +      (setq body (cdr body)))
>      ;; Turn the function's closed vars (if any) into local let bindings.
>      (dolist (binding env)
>        (cond

The comment lacks punctuation.  The change is going in the right
direction, but the docstring shouldn't be just removed.
This said, this change is also irrelevant for tail-calls, AFAICT.

> @@ -2653,7 +2668,7 @@
>                                     ;; closed by now).
>                                     (and lexical-binding
>                                          (byte-compile-make-lambda-lexenv fun))
> -                                   reserved-csts)))
> +                                   reserved-csts t)))

You could use "'tail" instead of "t" to make its meaning self-evident.

> +  (cond
> +   ((and (eq (car form) byte-compile-current-lambda-name)
> +         lexical-binding byte-compile--tail-position
> +         (let ((sig (byte-compile-arglist-signature 
> +                     byte-compile-current-lambda-arglist))
> +               (nargs (length (cdr form))))
> +           (and (>= (car sig) nargs)
> +                (or (not (cdr sig)) 
> +                    (<= nargs (cdr sig))))))
> +    (setq form (cdr form))
> +    (let* ((rest (memq '&rest byte-compile-current-lambda-arglist))
> +           (nnormal-args (- (length byte-compile-current-lambda-arglist)
> +                            (if rest 2 0)
> +                            (if (memq '&optional 
> +                                      byte-compile-current-lambda-arglist)
> +                                1 0)))
> +           (nargs 0))

Isn't `rest' the same as (null (cdr sig))?

This nnormal-args computation is a bit ugly here.  We should at least move
it into its own function (right next to byte-compile-arglist-signature),
or maybe extend byte-compile-arglist-signature to provide the needed value.

> +      ;; FIXME: 
> +      ;; Pad to make stack match expectations.
> +      (byte-compile-constant nil)))

We should probably teach the stack-depth logic that byte-self-tail-call
is a bit like a return.

> +       ;; If fun is builtin now, it probably will be later too, so don't 
> +       ;; tail-eliminate.
> +       (if (and (not (subrp fun)) byte-compile--tail-position)
> +           'byte-tail-call 'byte-call)
> +       (length (cdr form)))))))

Is byte-tail-call slower than byte-call when invoked on a subroutine?
 
> +       ((eq (car op) 'byte-self-tail-call)
> +        (error "Can not inline self-recursive function"))

If we replace byte-self-tail-call with a byte-goto, this problem
will disappear.

>  (defun byte-compile-stack-adjustment (op operand)
>    "Return the amount by which an operation adjusts the stack.
>  OP and OPERAND are as passed to `byte-compile-out'."
> -  (if (memq op '(byte-call byte-discardN byte-discardN-preserve-tos))
> +  (if (memq op '(byte-call byte-self-tail-call byte-tail-call
> +                           byte-discardN byte-discardN-preserve-tos))

Here we have a problem since these new byte-codes do not adjust the
stack depth in the same way as byte-call.

> -/* Push x onto the execution stack.  This used to be #define PUSH(x)
> -   (*++stackp = (x)) This oddity is necessary because Alliant can't be
> -   bothered to compile the preincrement operator properly, as of 4/91.
> -   -JimB */
> -
> -#define PUSH(x) (top++, *top = (x))
> +#define PUSH(x) (*++top = (x))

Are you sure 21 years is old enough?

> +/* Interpret a numerical args template TEMPLATE, used for lexically-scoped 
> +   byte-compiled functions.  Put the data in MIN_ARGS, MAX_ARGS and REST. 
> +   Also validates that nargs is in the range.  */
> +static void
> +resolve_args_template (ptrdiff_t template, ptrdiff_t *min_args, 
> +                       ptrdiff_t *max_args, bool *rest)
> +{
> +  *max_args = template >> 8;
> +  *min_args = template & 0x7F;
> +  *rest = (template & 0x80) != 0;
> +}

The last sentence in the comment seems out-of-date/place (it applies to the
next function rather than to this one).

> +        CASE (Btailcall):

This case is large enough to merit moving to its own function.

> +                /* The stack is good to go, now for the rest.  */
> +                stack.byte_string = AREF (fun, COMPILED_BYTECODE);
> +                /* Backward compatibility with emacs pre-20.2
> +                   inclusively.  (See below in exec_byte_code).  */
> +                if (STRING_MULTIBYTE (stack.byte_string))
> +                  stack.byte_string = Fstring_as_unibyte (bytestr);

This is a call from a tail-recursive function (i.e. Emacs>24.1) to
a lexically-scoped function (i.e. Emacs≥24.1), so Emacs<21 issues can
be safely ignored.

> +          /* Byte compiler should have set all arguments up right. Its
> +             just for us to remove the cruft on the stack.  */

Please use 2 spaces after a full stop.



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-09-20 16:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-20  8:15 [PATCH] Tail-call elimination in byte-compiled code Troels Nielsen
2012-09-20 16:31 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).