From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Tail-call elimination in byte-compiled code. Date: Thu, 20 Sep 2012 12:31:22 -0400 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1348158701 5175 80.91.229.3 (20 Sep 2012 16:31:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 20 Sep 2012 16:31:41 +0000 (UTC) Cc: benahssm@iro.umontreal.ca, emacs-devel@gnu.org To: Troels Nielsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Sep 20 18:31:43 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TEjfA-0004P3-Eb for ged-emacs-devel@m.gmane.org; Thu, 20 Sep 2012 18:31:40 +0200 Original-Received: from localhost ([::1]:59503 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEjf5-0005ZH-MI for ged-emacs-devel@m.gmane.org; Thu, 20 Sep 2012 12:31:35 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:54498) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEjex-0005XP-Lz for emacs-devel@gnu.org; Thu, 20 Sep 2012 12:31:34 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TEjev-0006hi-1H for emacs-devel@gnu.org; Thu, 20 Sep 2012 12:31:27 -0400 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.182]:3021) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEjeu-0006ev-OF for emacs-devel@gnu.org; Thu, 20 Sep 2012 12:31:24 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ai8FAG6Zu09MCocG/2dsb2JhbABEhS2rG4NJgQiCFQEBBAEjMyMFCwsODAIYDgICFBgNJIgcBacOknuBJo4KgRQDkUCRc4FYgwWBPBo X-IronPort-AV: E=Sophos;i="4.75,637,1330923600"; d="scan'208";a="199552640" Original-Received: from 76-10-135-6.dsl.teksavvy.com (HELO pastel.home) ([76.10.135.6]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 20 Sep 2012 12:31:23 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id A7184591B8; Thu, 20 Sep 2012 12:31:22 -0400 (EDT) In-Reply-To: (Troels Nielsen's message of "Thu, 20 Sep 2012 10:15:52 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.248.154.182 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:153403 Archived-At: > Hi all, and thanks for the great work being put into Emacs! You're welcome. > Now that wonderful lexical scope has been added, it is not very > difficult adding tail-call elimination to byte-compiled code. So I've > tried to do, just that. Great! > ; -*- lexical-binding: t > (require 'benchmark) > (defun f (x accum) > (if (> x 0) (f (1- x) (+ x accum)) accum)) > (defun g (x accum) > (while (> x 0) (setq accum (+ x accum) > x (1- x))) > accum) > (mapc #'byte-compile (list #'f #'g)) > (benchmark-run-compiled 10 (f 1000000 0)) > (benchmark-run-compiled 10 (g 1000000 0)) > will on my setup even make f some 8% faster than g! I'd be interested to see the comparison of all 4 numbers: f-old, g-old, f-new, g-new. Yes, I know that f-old will fail for lack of stack, so we'd need a different benchmark, but I think it's important to have such a measure. Also comparing g-new to g-old is so as to make sure the patch doesn't end up introducing an overall slowdown. > Self tail recursive functions has the following little problem: > (f 1000000 0) > (let ((y (symbol-function 'f))) > (fset 'f (lambda (_a _b) -1)) > (funcall y 1000000 1)) > Where the interpreted behaviour give -1, but the byte-compiled wil be > 500000500001. > I don't think that is ultimately very serious though. Its only significant impact is for defadvice. It's hard to tell whether that would/will be a real problem. Of course, there are further consequences: trace-function/debug-on-entry/elp won't see the recursive calls any more. In some cases this will actually be beneficial. Another issue is the backtrace printed by `debug', which won't show those recursive self-tail-calls any more. Again, it's likely to be a virtue in many cases. OTOH, while non-self tail-calls won't suffer from the change w.r.t defadvice (and elp/trace/d-o-e), they will also fail to show up in debuggers's backtraces and I think this is much more problematic. Maybe we can fix that be changing byte-tail-call to adjust backtrace_list (with alloca). That would make those "tail-calls" eat up some C stack space, which is bad in the even/odd mutual-recursion case, but I such cases should be a lot less frequent than the "plain tail-call" case where recursion may not even be present. We could leave the backtrace_list untouched in the case where the called function has no name (is not a symbol by a bytecode object), so that cases such as letrec/cl-labels mutual recursion won't eat up stack space. > Another problem is that there is a little bit more difference > in characteristic between interpreted and byte-compiled code, > as interpreted code will quickly reach max-eval-depth. I don't > see any easy way out of that now tho, and tail-recursion is a very > desirable thing for me and likely many others. Yes, that's a concern. It will prevent use of self-recursive definitions in some places where they'd be used before they are compiled. Some of those problems can be solved by adding more dependencies in lisp/Makefile, but others might be harder (for the files we need before we byte-compile the byte-compiler). OTOH we're slowly moving towards a mode where all the code is always byte-compiled. Example of recent changes in that direction are the eager macro-expansion, and the byte-compilation of defsubsts before inlining them (in those cases where we wouldn't know how to inline their source code). Further thoughts: - I'd like self-tail calls to use something closer to `goto'. I some cases, they can really just use `goto 0' (in those cases where the memcpy is a nop). Your self-tail-call is really a "discard-under + goto-0", so maybe all we need is to generalize it to go to other labels than 0. Of course, that would slow down self-tail calls and the only benefit would be when inlining such functions (and currently such recursive functions are never inlined). So maybe it's not worth the trouble. - What happens with the "unbind" bytecodes that would be run after the `call' when that call is turned into a `(self-)tail-call', e.g. when the self-tail-call is inside an unwind-protect or a let-bind of a dynamically-scoped var? I'm not sure the current handling is correct, yet I'm not sure it's incorrect either. - Should we be able to do tail-calls without a new `tail-call' instruction by just checking whether the next instruction is `return'? =20=20 > The patch as is now, will also remove BYTE_CODE_SAFE and > BYTE_CODE_METER. They made it more difficult for me to understand what > actually went on in bytecode.c. If someone needs them I will gladly > add them back. I think your code and theirs doesn't interact, so you can put it back. > I did try to put up a heap-allocated byte-code stack so to optimize > non-tail-recursive inter-byte-code calls avoiding copying the > arguments on the byte-stack. This unfortunately gave a small but > significant performance reduction, maybe due to the C-stack > consistently being in the processor's cache. That's *very* interesting. Have you tried allocating it with alloca instead? > Also I'm not very fond of byte-compile--tail-position variable, and > would rather add it as an argument to the 'byte-compile handlers. > I have a patch that does just that along with disbanding > byte-compile--for-effect and adding that as an argument along. byte-compile--for-effect is pretty nasty, yes. I'm actually surprised we don't seem to have too many bugs because of it. > The only problem is that some backward-compatibility may be broken, > but I don't know how much external code is really adding their own > 'byte-compile handlers. Would there be any problems with such a patch? I think breaking this compatibility would indeed be problematic. As much as I dislike it, several external packages hook into the byte-compiler (often for bad reasons, but that's another discussion). > @@ -2503,15 +2515,16 @@ (defun byte-compile (form) > (when (symbolp form) > (unless (memq (car-safe fun) '(closure lambda)) > (error "Don't know how to compile %S" fun)) > - (setq fun (byte-compile--reify-function fun)) > - (setq lexical-binding (eq (car fun) 'closure))) > + (setq lexical-binding (eq (car fun) 'closure)) > + (setq fun (byte-compile--reify-function fun))) > (unless (eq (car-safe fun) 'lambda) > (error "Don't know how to compile %S" fun)) > ;; Expand macros. > (setq fun (byte-compile-preprocess fun)) Yes, good catch, thank you. Some more comments and nitpicks below. Stefan > (defvar byte-compile--for-effect) > +(defvar byte-compile--tail-position nil) > + Please avoid adding whitespace ;-) > - (mapc 'byte-compile-output-file-form (cdr form))) > + (mapc #'byte-compile-output-file-form (cdr form))) While I'm perfectly OK with such a change (as evidenced by some of my recent commits), try and avoid including them in a patch that's already large and meant only for review. > - (symbol-value this-kind)))) > - ) > + (symbol-value this-kind))))) [...] > - (let* ((code (byte-compile-lambda (cons arglist body) t))) > + (let ((code (byte-compile-lambda (cons arglist body) t nil name))) Same here. > @@ -2462,6 +2471,9 @@ > (pcase-let* (((or (and `(lambda ,args . ,body) (let env nil)) > `(closure ,env ,args . ,body)) fun) > (renv ())) > + ;; Remove docstring if it exists > + (when (and (cdr body) (stringp body))=20 > + (setq body (cdr body))) > ;; Turn the function's closed vars (if any) into local let bindings. > (dolist (binding env) > (cond The comment lacks punctuation. The change is going in the right direction, but the docstring shouldn't be just removed. This said, this change is also irrelevant for tail-calls, AFAICT. > @@ -2653,7 +2668,7 @@ > ;; closed by now). > (and lexical-binding > (byte-compile-make-lambda-lexenv= fun)) > - reserved-csts))) > + reserved-csts t))) You could use "'tail" instead of "t" to make its meaning self-evident. > + (cond > + ((and (eq (car form) byte-compile-current-lambda-name) > + lexical-binding byte-compile--tail-position > + (let ((sig (byte-compile-arglist-signature=20 > + byte-compile-current-lambda-arglist)) > + (nargs (length (cdr form)))) > + (and (>=3D (car sig) nargs) > + (or (not (cdr sig))=20 > + (<=3D nargs (cdr sig)))))) > + (setq form (cdr form)) > + (let* ((rest (memq '&rest byte-compile-current-lambda-arglist)) > + (nnormal-args (- (length byte-compile-current-lambda-arglist) > + (if rest 2 0) > + (if (memq '&optional=20 > + byte-compile-current-lambda-arglis= t) > + 1 0))) > + (nargs 0)) Isn't `rest' the same as (null (cdr sig))? This nnormal-args computation is a bit ugly here. We should at least move it into its own function (right next to byte-compile-arglist-signature), or maybe extend byte-compile-arglist-signature to provide the needed value. > + ;; FIXME:=20 > + ;; Pad to make stack match expectations. > + (byte-compile-constant nil))) We should probably teach the stack-depth logic that byte-self-tail-call is a bit like a return. > + ;; If fun is builtin now, it probably will be later too, so don't= =20 > + ;; tail-eliminate. > + (if (and (not (subrp fun)) byte-compile--tail-position) > + 'byte-tail-call 'byte-call) > + (length (cdr form))))))) Is byte-tail-call slower than byte-call when invoked on a subroutine? =20 > + ((eq (car op) 'byte-self-tail-call) > + (error "Can not inline self-recursive function")) If we replace byte-self-tail-call with a byte-goto, this problem will disappear. > (defun byte-compile-stack-adjustment (op operand) > "Return the amount by which an operation adjusts the stack. > OP and OPERAND are as passed to `byte-compile-out'." > - (if (memq op '(byte-call byte-discardN byte-discardN-preserve-tos)) > + (if (memq op '(byte-call byte-self-tail-call byte-tail-call > + byte-discardN byte-discardN-preserve-tos)) Here we have a problem since these new byte-codes do not adjust the stack depth in the same way as byte-call. > -/* Push x onto the execution stack. This used to be #define PUSH(x) > - (*++stackp =3D (x)) This oddity is necessary because Alliant can't be > - bothered to compile the preincrement operator properly, as of 4/91. > - -JimB */ > - > -#define PUSH(x) (top++, *top =3D (x)) > +#define PUSH(x) (*++top =3D (x)) Are you sure 21 years is old enough? > +/* Interpret a numerical args template TEMPLATE, used for lexically-scop= ed=20 > + byte-compiled functions. Put the data in MIN_ARGS, MAX_ARGS and REST= .=20 > + Also validates that nargs is in the range. */ > +static void > +resolve_args_template (ptrdiff_t template, ptrdiff_t *min_args,=20 > + ptrdiff_t *max_args, bool *rest) > +{ > + *max_args =3D template >> 8; > + *min_args =3D template & 0x7F; > + *rest =3D (template & 0x80) !=3D 0; > +} The last sentence in the comment seems out-of-date/place (it applies to the next function rather than to this one). > + CASE (Btailcall): This case is large enough to merit moving to its own function. > + /* The stack is good to go, now for the rest. */ > + stack.byte_string =3D AREF (fun, COMPILED_BYTECODE); > + /* Backward compatibility with emacs pre-20.2 > + inclusively. (See below in exec_byte_code). */ > + if (STRING_MULTIBYTE (stack.byte_string)) > + stack.byte_string =3D Fstring_as_unibyte (bytestr); This is a call from a tail-recursive function (i.e. Emacs>24.1) to a lexically-scoped function (i.e. Emacs=E2=89=A524.1), so Emacs<21 issues c= an be safely ignored. > + /* Byte compiler should have set all arguments up right. Its > + just for us to remove the cruft on the stack. */ Please use 2 spaces after a full stop.