From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: thoughts on targetting the web Date: Thu, 7 Oct 2021 09:37:29 +0800 Message-ID: References: <87h7htfvy3.fsf@pobox.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000000dab3a05cdb94f03" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20008"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-devel To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Thu Oct 07 03:38:27 2021 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mYIMd-00053f-8x for guile-devel@m.gmane-mx.org; Thu, 07 Oct 2021 03:38:27 +0200 Original-Received: from localhost ([::1]:42552 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mYIMb-0003xe-RY for guile-devel@m.gmane-mx.org; Wed, 06 Oct 2021 21:38:25 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33590) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mYIM0-0003xV-0n for guile-devel@gnu.org; Wed, 06 Oct 2021 21:37:48 -0400 Original-Received: from mail-lf1-x136.google.com ([2a00:1450:4864:20::136]:40625) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mYILx-0004YU-7U for guile-devel@gnu.org; Wed, 06 Oct 2021 21:37:47 -0400 Original-Received: by mail-lf1-x136.google.com with SMTP id y15so17973029lfk.7 for ; Wed, 06 Oct 2021 18:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/x+mfbqiBcrLbZ1WtVJy/YAzEW21t5JHsioFonE46TI=; b=mMpY3wWpeU/lJD6EVGj7CV82d7jsdG4pCYgF09FIFgteN0bXGjwZNuhUbXMkkLYk5w 65/+INENfP2453pP46njOKkn55IexvDqYydOj0BDMs+DUDQfUVwDYO9KiN582HJ5QKAN F4jL48j1NNrEx2lTlICs5FgFUQmKY4nOl0NSxvXN4fcQlRj8Wm+yXZfw8zsxNFICVPPH VwSf4S7zzNlVIUFBL+l+iNNZeXCfvGTwEm8ykD8IqMrQASkRvUpHYtcjsNWeFaxWuzuh BNxvyXGx3t4KCQDVA+FvPssPXdr20KLu/896S9cGTg+0rMc5yCzJovCF6cAgwAezUiG8 bpTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/x+mfbqiBcrLbZ1WtVJy/YAzEW21t5JHsioFonE46TI=; b=CbMcsnYk7WJVwbR2ZKHBdnz6BB4t6d28iFfNqmw5SHjmyXRMKHHrUdz39SswPa4szL cRV9HDwz8sLtpH9tFZpV0o/nqr4ZpzIh8yznS8od8/51cFVMlFZ9zaxX9A2kZLAi7BXH lWiaVFqhiTFxDUR0iVtf274+dUVU+u3fI4fq54eV4ffk1D0qFTh/5FMRaJbVSbRdHXvK VtcyZE5ESEIgi2iSAlNFLGxVLeg96wLDBWg0Ao6ad2JpVE1g29eKnH7xiIFiTFPSCUKH C4v7lSOLzQCbElNF6JdD7ZJcxP5OTCRNr7T5lXZ1IOH4nmYK5Z9EjCB23qv7hxUjsUse M8zw== X-Gm-Message-State: AOAM530lVsbL1doFSg/wQMsnh6rBG1fpQplpNWkHyjViFfL7I6sK3w2P OxT/Wf46nGhzReYF/9YGqGQHDr4hyD6GXKFqsk4= X-Google-Smtp-Source: ABdhPJyv8wSjZ/vl49AqVuEgzTgIA8yosnWgVV2JCXgaoidR3q8YTO3TcAkWLk/i9zQarSy/HvK5PqTSLKV6jmJ+xos= X-Received: by 2002:a05:6512:3181:: with SMTP id i1mr1411299lfe.29.1633570662491; Wed, 06 Oct 2021 18:37:42 -0700 (PDT) In-Reply-To: <87h7htfvy3.fsf@pobox.com> Received-SPF: pass client-ip=2a00:1450:4864:20::136; envelope-from=nalaginrut@gmail.com; helo=mail-lf1-x136.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:20899 Archived-At: --0000000000000dab3a05cdb94f03 Content-Type: text/plain; charset="UTF-8" I'm looking forward to it, and I'm pretty sure this can bring new possibilities to Artanis! Best regards. On Sun, Jun 20, 2021, 04:21 Andy Wingo wrote: > Hi :) > > A brain-dump tonight. I was thinking about what it would mean for Guile > to target the web. The goal would be for Guile to be a useful language > for programming client-side interaction on web pages. Now, there are a > number of takes for Schemes of various stripes to target JS. I haven't > made a full survey but my starting point is that in a Guile context, we > basically want the Guile Scheme language, which is to say: > > 1. Tail calls. The solution must include proper tail calls. > > 2. Multiple values. The solution should have the same semantics that > Guile has for multiple-value returns, including implicit > truncation. > > 3. Varargs. The solution should allow for rest arguments, keyword > arguments, and the like. > > 4. Delimited continuations. The solution should allow non-local exit > from any function, and should allow for capturing stack slices to > partial continuations, and should allow those continuations to be > instantiated multiple times if desired. We should be able to target > fibers to the web. Otherwise, why bother? > > 5. Garbage collection. *We should re-use the host GC*. Although it > would be possible to manage a heap in linear memory, that has > retention problems due to cycles between the Guile heap and the JS > heap. > > 6. Numbers. We should support Scheme's unbounded integers. Requires > bignums in the limit case, but thankfully JS has bignums now. We > will still be able to unbox in many cases though. > > 7. Target existing web. We shouldn't depend on some future WebAssembly > or JS proposal -- the solution should work in the here and now and > only get better as features are added to the web platform. > > From a UX perspective, I would expect we would generally want > whole-program compilation with aggressive DCE / tree-shaking, producing > a single JS or WebAssembly artifact at build-time. But that's a later > issue. > > I have thought about this off and on over the years but in the end was > flummoxed about how to meet all requirements. However recently I think > I have come up with a solution for most of these: > > (1) In JS, tail calls are part of ES2015, but not implemented in all > browsers. In WebAssembly, they are a future plan, but hard for > various reasons. So in practice the solution for today's web is to > use a universal trampoline and make all calls tail calls -- > i.e. call all functions via: > > while true: > call(pop()) > > Whenever we can target proper tail calls, this will only get > faster. > > (2) Neither JS nor WebAssembly have the truncation semantics we want. > Therefore we will return values via an explicit array, and then the > continuation will be responsible for consuming those values and > binding any needed variables. > > (3) Although JS does have varargs support, WebAssembly does not. But > we can actually use the same solution here as we use for (1) and > (2) -- pass arguments on an explicit array + nvalues, and relying > on the function to parse them appropriately. In this way we can > get Guile keyword arguments. This also makes the type of Scheme > functions uniform, which is important in WebAssembly contexts. > > (4) This is the interesting bit! As hinted in (1), we will transform > the program such that all calls are tail calls. This is a form of > minimal CPS conversion -- minimal in the sense that functions are > broken up into the minimum number of pieces to ensure the > all-tail-calls property. Non-tail calls transform to tail calls by > saving their continuation's state on a stack, which is the same as > stack allocation for these continuations. The continuation of the > call pops the saved state from the stack. Because the stack is > explicit, we can manipulate it as data: slice it to capture a > delimited continuation, drop values to make a non-local exit, splat > a saved slice back on to compose a captured delimited continuation > with the current continuation, and so on. Therefore we have the > necessary primitives to implement delimited continuations as a > library. > > (5) Scheme needs a representation that can store any value. In JS this > is easy because JS is untyped. For WebAssembly, I think I would > lean on externref for this purpose, which effectively boxes all > values. There are no fixnums in the current WebAssembly spec, so > this is suboptimal, and we have to call out to JS to evaluate type > predicates and access fields. But, support for structured > GC-managed types is coming to WebAssembly, which will speed up > object access. > > (6) The easy solution here is to make JS numbers, which are doubles at > heart, represent flonums, and use JS bignums for Scheme integers. > Fine. > > (7) This principle has been taken into account in (1)-(6). > > Now, a note on the transformation described in (4), which I call > "tailification". > > The first step of tailification computes the set of "tails" in a > function. The function entry starts a tail, as does each return point > from non-tail calls. Join points between different tails also start > tails. > > In the residual program, there are four ways that a continuation exits: > > - Tail calls in the source program are tail calls in the residual > program; no change. > > - For non-tail calls in the source program, the caller saves the state > of the continuation (the live variables flowing into the > continuation) on an explicit stack, and saves the label of the > continuation. The return continuation will be converted into a > arity-checking function entry, to handle multi-value returns; when > it is invoked, it will pop its incoming live variables from the > continuation stack. > > - Terms that continue to a join continuation are converted to label > calls in tail position, passing the state of the continuation as > arguments. > > - Returning values from a continuation pops the return label from the > stack and does an indirect tail label call on that label, with the > given return values. > > I expect that a tailified program will probably be slower than a > non-tailified program. However a tailified program has a few > interesting properties: the stack is packed and only contains live data; > the stack can be traversed in a portable way, allowing for > implementation of prompts on systems that don't support them natively; > and as all calls are tail calls, the whole system can be implemented > naturally with a driver trampoline on targets that don't support tail > calls (e.g. JavaScript and WebAssembly). > > This algorithm is implemented in the wip-tailify branch in Guile's git. > > I have this little driver program: > > (use-modules (system base compile) > (system base language) > (language cps tailify) > (language cps verify) > (language cps renumber) > (language cps dce) > (language cps simplify) > (language cps dump)) > > (define (tailify* form) > (let ((from 'scheme) > (make-lower (language-lowerer (lookup-language 'cps))) > (optimization-level 2) > (warning-level 2) > (opts '())) > (define cps > ((make-lower optimization-level opts) > (compile form #:from 'scheme #:to 'cps #:opts opts > #:optimization-level optimization-level > #:warning-level warning-level > #:env (current-module)) > (current-module))) > (format #t "Original:\n") > (dump cps) > (format #t "\n\nTailified:\n") > (let* ((cps (tailify cps)) > (cps (verify cps)) > (cps (eliminate-dead-code cps)) > (cps (simplify cps)) > (cps (renumber cps))) > (dump cps)))) > > If I run this on the following lambda: > > (lambda (x) > (pk x > (if x > (pk 12) > (pk 42)))) > > Then first it prints the dump of the optimized CPS: > > Original: > L0: > v0 := self > L1(...) > L1: > receive() > v1 := current-module() ; module > cache-set![0](v1) > v2 := const-fun L7 ; _ > return v2 > > L7: > v3 := self > L8(...) > L8: > v4 := receive(x) ; x > v5 := cache-ref[(0 . pk)]() ; cached > heap-object?(v5) ? L16() : L11() > L16(): > L17(v5) > L11(): > v6 := cache-ref[0]() ; mod > v7 := const pk ; name > v8 := lookup-bound(v6, v7) ; var > cache-set![(0 . pk)](v8) > L17(v8) > L17(v9): ; box > v10 := scm-ref/immediate[(box . 1)](v9) ; arg > false?(v4) ? L22() : L19() > L22(): > v12 := const 42 ; arg > call v10(v12) > L25(receive(arg, rest...)) > L19(): > v11 := const 12 ; arg > call v10(v11) > L25(receive(arg, rest...)) > L25(v13, v14): ; tmp, tmp > tail call v10(v4, v13) > > Then it prints the tailified version: > > L0: > v0 := self > L1(...) > L1: > receive() > v1 := current-module() ; module > cache-set![0](v1) > v2 := const-fun L8 ; _ > v3 := restore1[ptr]() ; ret > tail calli v3(v2) > > L8: > v4 := self > L9(...) > L9: > v5 := receive(x) ; x > v6 := cache-ref[(0 . pk)]() ; cached > heap-object?(v6) ? L17() : L12() > L17(): > L18(v6) > L12(): > v7 := cache-ref[0]() ; mod > v8 := const pk ; name > v9 := lookup-bound(v7, v8) ; var > cache-set![(0 . pk)](v9) > L18(v9) > L18(v10): ; box > v11 := scm-ref/immediate[(box . 1)](v10) ; arg > false?(v5) ? L24() : L20() > L24(): > v14 := const 42 ; arg > v15 := code L37 ; cont > save[(scm scm ptr)](v5, v11, v15) > tail call v11(v14) > L20(): > v12 := const 12 ; arg > v13 := code L29 ; cont > save[(scm scm ptr)](v5, v11, v13) > tail call v11(v12) > > L29: > L30(...) > L30: > v16, v17 := receive(arg, rest...) ; tmp, tmp > v18, v19 := restore[(scm scm)]() ; restored, restored > tail callk L34(_, v18, v19, v16) > > L34: > meta: arg-representations: (scm scm scm) > L35(...) > L35(v20, v21, v22): ; _, _, _ > tail call v21(v20, v22) > > L37: > L38(...) > L38: > v23, v24 := receive(arg, rest...) ; tmp, tmp > v25, v26 := restore[(scm scm)]() ; restored, restored > tail callk L34(_, v25, v26, v23) > > I think I would like to implement the "save" and "restore" primcalls on > stock Guile, to be able to adequately test tailification. But then I'd > look at targetting WebAssembly (just to make it interesting; run-time > performance would be faster no doubt if I targetted JS, due to GC > concerns). > > Anyway, thoughts welcome. Happy hacking :) > > Andy > > --0000000000000dab3a05cdb94f03 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm looking forward to it, and I'm pretty sure th= is can bring new possibilities to Artanis!

Best regards.

On Sun, Jun 20, 2021, 04:21 Andy Wingo <= wingo@pobox.com> wrote:
=
Hi :)

A brain-dump tonight.=C2=A0 I was thinking about what it would mean for Gui= le
to target the web.=C2=A0 The goal would be for Guile to be a useful languag= e
for programming client-side interaction on web pages.=C2=A0 Now, there are = a
number of takes for Schemes of various stripes to target JS.=C2=A0 I haven&= #39;t
made a full survey but my starting point is that in a Guile context, we
basically want the Guile Scheme language, which is to say:

=C2=A01. Tail calls.=C2=A0 The solution must include proper tail calls.

=C2=A02. Multiple values.=C2=A0 The solution should have the same semantics= that
=C2=A0 =C2=A0 Guile has for multiple-value returns, including implicit
=C2=A0 =C2=A0 truncation.

=C2=A03. Varargs.=C2=A0 The solution should allow for rest arguments, keywo= rd
=C2=A0 =C2=A0 arguments, and the like.

=C2=A04. Delimited continuations.=C2=A0 The solution should allow non-local= exit
=C2=A0 =C2=A0 from any function, and should allow for capturing stack slice= s to
=C2=A0 =C2=A0 partial continuations, and should allow those continuations t= o be
=C2=A0 =C2=A0 instantiated multiple times if desired.=C2=A0 We should be ab= le to target
=C2=A0 =C2=A0 fibers to the web.=C2=A0 Otherwise, why bother?

=C2=A05. Garbage collection.=C2=A0 *We should re-use the host GC*.=C2=A0 Al= though it
=C2=A0 =C2=A0 would be possible to manage a heap in linear memory, that has=
=C2=A0 =C2=A0 retention problems due to cycles between the Guile heap and t= he JS
=C2=A0 =C2=A0 heap.

=C2=A06. Numbers.=C2=A0 We should support Scheme's unbounded integers.= =C2=A0 Requires
=C2=A0 =C2=A0 bignums in the limit case, but thankfully JS has bignums now.= =C2=A0 We
=C2=A0 =C2=A0 will still be able to unbox in many cases though.

=C2=A07. Target existing web.=C2=A0 We shouldn't depend on some future = WebAssembly
=C2=A0 =C2=A0 or JS proposal -- the solution should work in the here and no= w and
=C2=A0 =C2=A0 only get better as features are added to the web platform.
>From a UX perspective, I would expect we would generally want
whole-program compilation with aggressive DCE / tree-shaking, producing
a single JS or WebAssembly artifact at build-time.=C2=A0 But that's a l= ater
issue.

I have thought about this off and on over the years but in the end was
flummoxed about how to meet all requirements.=C2=A0 However recently I thin= k
I have come up with a solution for most of these:

=C2=A0(1) In JS, tail calls are part of ES2015, but not implemented in all<= br> =C2=A0 =C2=A0 =C2=A0browsers.=C2=A0 In WebAssembly, they are a future plan,= but hard for
=C2=A0 =C2=A0 =C2=A0various reasons.=C2=A0 So in practice the solution for = today's web is to
=C2=A0 =C2=A0 =C2=A0use a universal trampoline and make all calls tail call= s --
=C2=A0 =C2=A0 =C2=A0i.e. call all functions via:

=C2=A0 =C2=A0 =C2=A0 =C2=A0while true:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0call(pop())

=C2=A0 =C2=A0 =C2=A0Whenever we can target proper tail calls, this will onl= y get
=C2=A0 =C2=A0 =C2=A0faster.

=C2=A0(2) Neither JS nor WebAssembly have the truncation semantics we want.=
=C2=A0 =C2=A0 =C2=A0Therefore we will return values via an explicit array, = and then the
=C2=A0 =C2=A0 =C2=A0continuation will be responsible for consuming those va= lues and
=C2=A0 =C2=A0 =C2=A0binding any needed variables.

=C2=A0(3) Although JS does have varargs support, WebAssembly does not.=C2= =A0 But
=C2=A0 =C2=A0 =C2=A0we can actually use the same solution here as we use fo= r (1) and
=C2=A0 =C2=A0 =C2=A0(2) -- pass arguments on an explicit array + nvalues, a= nd relying
=C2=A0 =C2=A0 =C2=A0on the function to parse them appropriately.=C2=A0 In t= his way we can
=C2=A0 =C2=A0 =C2=A0get Guile keyword arguments.=C2=A0 This also makes the = type of Scheme
=C2=A0 =C2=A0 =C2=A0functions uniform, which is important in WebAssembly co= ntexts.

=C2=A0(4) This is the interesting bit!=C2=A0 As hinted in (1), we will tran= sform
=C2=A0 =C2=A0 =C2=A0the program such that all calls are tail calls.=C2=A0 T= his is a form of
=C2=A0 =C2=A0 =C2=A0minimal CPS conversion -- minimal in the sense that fun= ctions are
=C2=A0 =C2=A0 =C2=A0broken up into the minimum number of pieces to ensure t= he
=C2=A0 =C2=A0 =C2=A0all-tail-calls property.=C2=A0 Non-tail calls transform= to tail calls by
=C2=A0 =C2=A0 =C2=A0saving their continuation's state on a stack, which= is the same as
=C2=A0 =C2=A0 =C2=A0stack allocation for these continuations.=C2=A0 The con= tinuation of the
=C2=A0 =C2=A0 =C2=A0call pops the saved state from the stack.=C2=A0 Because= the stack is
=C2=A0 =C2=A0 =C2=A0explicit, we can manipulate it as data: slice it to cap= ture a
=C2=A0 =C2=A0 =C2=A0delimited continuation, drop values to make a non-local= exit, splat
=C2=A0 =C2=A0 =C2=A0a saved slice back on to compose a captured delimited c= ontinuation
=C2=A0 =C2=A0 =C2=A0with the current continuation, and so on.=C2=A0 Therefo= re we have the
=C2=A0 =C2=A0 =C2=A0necessary primitives to implement delimited continuatio= ns as a
=C2=A0 =C2=A0 =C2=A0library.

=C2=A0(5) Scheme needs a representation that can store any value.=C2=A0 In = JS this
=C2=A0 =C2=A0 =C2=A0is easy because JS is untyped.=C2=A0 For WebAssembly, I= think I would
=C2=A0 =C2=A0 =C2=A0lean on externref for this purpose, which effectively b= oxes all
=C2=A0 =C2=A0 =C2=A0values.=C2=A0 There are no fixnums in the current WebAs= sembly spec, so
=C2=A0 =C2=A0 =C2=A0this is suboptimal, and we have to call out to JS to ev= aluate type
=C2=A0 =C2=A0 =C2=A0predicates and access fields.=C2=A0 But, support for st= ructured
=C2=A0 =C2=A0 =C2=A0GC-managed types is coming to WebAssembly, which will s= peed up
=C2=A0 =C2=A0 =C2=A0object access.

=C2=A0(6) The easy solution here is to make JS numbers, which are doubles a= t
=C2=A0 =C2=A0 =C2=A0heart, represent flonums, and use JS bignums for Scheme= integers.
=C2=A0 =C2=A0 =C2=A0Fine.

=C2=A0(7) This principle has been taken into account in (1)-(6).

Now, a note on the transformation described in (4), which I call
"tailification".

The first step of tailification computes the set of "tails" in a<= br> function.=C2=A0 The function entry starts a tail, as does each return point=
from non-tail calls.=C2=A0 Join points between different tails also start tails.

In the residual program, there are four ways that a continuation exits:

=C2=A0 - Tail calls in the source program are tail calls in the residual =C2=A0 =C2=A0 program; no change.

=C2=A0 - For non-tail calls in the source program, the caller saves the sta= te
=C2=A0 =C2=A0 of the continuation (the live variables flowing into the
=C2=A0 =C2=A0 continuation) on an explicit stack, and saves the label of th= e
=C2=A0 =C2=A0 continuation.=C2=A0 The return continuation will be converted= into a
=C2=A0 =C2=A0 arity-checking function entry, to handle multi-value returns;= when
=C2=A0 =C2=A0 it is invoked, it will pop its incoming live variables from t= he
=C2=A0 =C2=A0 continuation stack.

=C2=A0 - Terms that continue to a join continuation are converted to label<= br> =C2=A0 =C2=A0 calls in tail position, passing the state of the continuation= as
=C2=A0 =C2=A0 arguments.

=C2=A0 - Returning values from a continuation pops the return label from th= e
=C2=A0 =C2=A0 stack and does an indirect tail label call on that label, wit= h the
=C2=A0 =C2=A0 given return values.

I expect that a tailified program will probably be slower than a
non-tailified program.=C2=A0 However a tailified program has a few
interesting properties: the stack is packed and only contains live data; the stack can be traversed in a portable way, allowing for
implementation of prompts on systems that don't support them natively;<= br> and as all calls are tail calls, the whole system can be implemented
naturally with a driver trampoline on targets that don't support tail calls (e.g. JavaScript and WebAssembly).

This algorithm is implemented in the wip-tailify branch in Guile's git.=

I have this little driver program:

=C2=A0 =C2=A0 (use-modules (system base compile)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(system base = language)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(language cps= tailify)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(language cps= verify)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(language cps= renumber)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(language cps= dce)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(language cps= simplify)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(language cps= dump))

=C2=A0 =C2=A0 (define (tailify* form)
=C2=A0 =C2=A0 =C2=A0 (let ((from 'scheme)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (make-lower (language-lowerer (lo= okup-language 'cps)))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (optimization-level 2)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (warning-level 2)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (opts '()))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (define cps
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ((make-lower optimization-level opts) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(compile form #:from 'scheme #= :to 'cps #:opts opts
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 #:opt= imization-level optimization-level
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 #:war= ning-level warning-level
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 #:env= (current-module))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(current-module)))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (format #t "Original:\n")
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (dump cps)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (format #t "\n\nTailified:\n")
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (let* ((cps (tailify cps))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(cps (verify cps)) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(cps (eliminate-dead= -code cps))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(cps (simplify cps))=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(cps (renumber cps))= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (dump cps))))

If I run this on the following lambda:

=C2=A0 (lambda (x)
=C2=A0 =C2=A0 (pk x
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (if x
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (pk 12)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (pk 42))))

Then first it prints the dump of the optimized CPS:

=C2=A0 Original:
=C2=A0 L0:
=C2=A0 =C2=A0 v0 :=3D self
=C2=A0 =C2=A0 L1(...)
=C2=A0 L1:
=C2=A0 =C2=A0 receive()
=C2=A0 =C2=A0 v1 :=3D current-module()=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; module
=C2=A0 =C2=A0 cache-set![0](v1)
=C2=A0 =C2=A0 v2 :=3D const-fun L7=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; _
=C2=A0 =C2=A0 return v2

=C2=A0 L7:
=C2=A0 =C2=A0 v3 :=3D self
=C2=A0 =C2=A0 L8(...)
=C2=A0 L8:
=C2=A0 =C2=A0 v4 :=3D receive(x)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; x
=C2=A0 =C2=A0 v5 :=3D cache-ref[(0 . pk)]()=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; cached
=C2=A0 =C2=A0 heap-object?(v5) ? L16() : L11()
=C2=A0 L16():
=C2=A0 =C2=A0 L17(v5)
=C2=A0 L11():
=C2=A0 =C2=A0 v6 :=3D cache-ref[0]()=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; mod
=C2=A0 =C2=A0 v7 :=3D const pk=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; name
=C2=A0 =C2=A0 v8 :=3D lookup-bound(v6, v7)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; var
=C2=A0 =C2=A0 cache-set![(0 . pk)](v8)
=C2=A0 =C2=A0 L17(v8)
=C2=A0 L17(v9):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;= box
=C2=A0 =C2=A0 v10 :=3D scm-ref/immediate[(box . 1)](v9)=C2=A0 =C2=A0 =C2=A0= ; arg
=C2=A0 =C2=A0 false?(v4) ? L22() : L19()
=C2=A0 L22():
=C2=A0 =C2=A0 v12 :=3D const 42=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; arg
=C2=A0 =C2=A0 call v10(v12)
=C2=A0 =C2=A0 L25(receive(arg, rest...))
=C2=A0 L19():
=C2=A0 =C2=A0 v11 :=3D const 12=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; arg
=C2=A0 =C2=A0 call v10(v11)
=C2=A0 =C2=A0 L25(receive(arg, rest...))
=C2=A0 L25(v13, v14):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; tmp, tmp
=C2=A0 =C2=A0 tail call v10(v4, v13)

Then it prints the tailified version:

=C2=A0 L0:
=C2=A0 =C2=A0 v0 :=3D self
=C2=A0 =C2=A0 L1(...)
=C2=A0 L1:
=C2=A0 =C2=A0 receive()
=C2=A0 =C2=A0 v1 :=3D current-module()=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; module
=C2=A0 =C2=A0 cache-set![0](v1)
=C2=A0 =C2=A0 v2 :=3D const-fun L8=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; _
=C2=A0 =C2=A0 v3 :=3D restore1[ptr]()=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; ret
=C2=A0 =C2=A0 tail calli v3(v2)

=C2=A0 L8:
=C2=A0 =C2=A0 v4 :=3D self
=C2=A0 =C2=A0 L9(...)
=C2=A0 L9:
=C2=A0 =C2=A0 v5 :=3D receive(x)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; x
=C2=A0 =C2=A0 v6 :=3D cache-ref[(0 . pk)]()=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; cached
=C2=A0 =C2=A0 heap-object?(v6) ? L17() : L12()
=C2=A0 L17():
=C2=A0 =C2=A0 L18(v6)
=C2=A0 L12():
=C2=A0 =C2=A0 v7 :=3D cache-ref[0]()=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; mod
=C2=A0 =C2=A0 v8 :=3D const pk=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; name
=C2=A0 =C2=A0 v9 :=3D lookup-bound(v7, v8)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; var
=C2=A0 =C2=A0 cache-set![(0 . pk)](v9)
=C2=A0 =C2=A0 L18(v9)
=C2=A0 L18(v10):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; = box
=C2=A0 =C2=A0 v11 :=3D scm-ref/immediate[(box . 1)](v10)=C2=A0 =C2=A0 ; arg=
=C2=A0 =C2=A0 false?(v5) ? L24() : L20()
=C2=A0 L24():
=C2=A0 =C2=A0 v14 :=3D const 42=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; arg
=C2=A0 =C2=A0 v15 :=3D code L37=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; cont
=C2=A0 =C2=A0 save[(scm scm ptr)](v5, v11, v15)
=C2=A0 =C2=A0 tail call v11(v14)
=C2=A0 L20():
=C2=A0 =C2=A0 v12 :=3D const 12=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; arg
=C2=A0 =C2=A0 v13 :=3D code L29=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; cont
=C2=A0 =C2=A0 save[(scm scm ptr)](v5, v11, v13)
=C2=A0 =C2=A0 tail call v11(v12)

=C2=A0 L29:
=C2=A0 =C2=A0 L30(...)
=C2=A0 L30:
=C2=A0 =C2=A0 v16, v17 :=3D receive(arg, rest...)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0; tmp, tmp
=C2=A0 =C2=A0 v18, v19 :=3D restore[(scm scm)]()=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 ; restored, restored
=C2=A0 =C2=A0 tail callk L34(_, v18, v19, v16)

=C2=A0 L34:
=C2=A0 =C2=A0 meta: arg-representations: (scm scm scm)
=C2=A0 =C2=A0 L35(...)
=C2=A0 L35(v20, v21, v22):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0; _, _, _
=C2=A0 =C2=A0 tail call v21(v20, v22)

=C2=A0 L37:
=C2=A0 =C2=A0 L38(...)
=C2=A0 L38:
=C2=A0 =C2=A0 v23, v24 :=3D receive(arg, rest...)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0; tmp, tmp
=C2=A0 =C2=A0 v25, v26 :=3D restore[(scm scm)]()=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 ; restored, restored
=C2=A0 =C2=A0 tail callk L34(_, v25, v26, v23)

I think I would like to implement the "save" and "restore&qu= ot; primcalls on
stock Guile, to be able to adequately test tailification.=C2=A0 But then I&= #39;d
look at targetting WebAssembly (just to make it interesting; run-time
performance would be faster no doubt if I targetted JS, due to GC
concerns).

Anyway, thoughts welcome.=C2=A0 Happy hacking :)

Andy

--0000000000000dab3a05cdb94f03--