I'm looking forward to it, and I'm pretty sure this can bring new possibilities to Artanis! Best regards. On Sun, Jun 20, 2021, 04:21 Andy Wingo wrote: > Hi :) > > A brain-dump tonight. I was thinking about what it would mean for Guile > to target the web. The goal would be for Guile to be a useful language > for programming client-side interaction on web pages. Now, there are a > number of takes for Schemes of various stripes to target JS. I haven't > made a full survey but my starting point is that in a Guile context, we > basically want the Guile Scheme language, which is to say: > > 1. Tail calls. The solution must include proper tail calls. > > 2. Multiple values. The solution should have the same semantics that > Guile has for multiple-value returns, including implicit > truncation. > > 3. Varargs. The solution should allow for rest arguments, keyword > arguments, and the like. > > 4. Delimited continuations. The solution should allow non-local exit > from any function, and should allow for capturing stack slices to > partial continuations, and should allow those continuations to be > instantiated multiple times if desired. We should be able to target > fibers to the web. Otherwise, why bother? > > 5. Garbage collection. *We should re-use the host GC*. Although it > would be possible to manage a heap in linear memory, that has > retention problems due to cycles between the Guile heap and the JS > heap. > > 6. Numbers. We should support Scheme's unbounded integers. Requires > bignums in the limit case, but thankfully JS has bignums now. We > will still be able to unbox in many cases though. > > 7. Target existing web. We shouldn't depend on some future WebAssembly > or JS proposal -- the solution should work in the here and now and > only get better as features are added to the web platform. > > From a UX perspective, I would expect we would generally want > whole-program compilation with aggressive DCE / tree-shaking, producing > a single JS or WebAssembly artifact at build-time. But that's a later > issue. > > I have thought about this off and on over the years but in the end was > flummoxed about how to meet all requirements. However recently I think > I have come up with a solution for most of these: > > (1) In JS, tail calls are part of ES2015, but not implemented in all > browsers. In WebAssembly, they are a future plan, but hard for > various reasons. So in practice the solution for today's web is to > use a universal trampoline and make all calls tail calls -- > i.e. call all functions via: > > while true: > call(pop()) > > Whenever we can target proper tail calls, this will only get > faster. > > (2) Neither JS nor WebAssembly have the truncation semantics we want. > Therefore we will return values via an explicit array, and then the > continuation will be responsible for consuming those values and > binding any needed variables. > > (3) Although JS does have varargs support, WebAssembly does not. But > we can actually use the same solution here as we use for (1) and > (2) -- pass arguments on an explicit array + nvalues, and relying > on the function to parse them appropriately. In this way we can > get Guile keyword arguments. This also makes the type of Scheme > functions uniform, which is important in WebAssembly contexts. > > (4) This is the interesting bit! As hinted in (1), we will transform > the program such that all calls are tail calls. This is a form of > minimal CPS conversion -- minimal in the sense that functions are > broken up into the minimum number of pieces to ensure the > all-tail-calls property. Non-tail calls transform to tail calls by > saving their continuation's state on a stack, which is the same as > stack allocation for these continuations. The continuation of the > call pops the saved state from the stack. Because the stack is > explicit, we can manipulate it as data: slice it to capture a > delimited continuation, drop values to make a non-local exit, splat > a saved slice back on to compose a captured delimited continuation > with the current continuation, and so on. Therefore we have the > necessary primitives to implement delimited continuations as a > library. > > (5) Scheme needs a representation that can store any value. In JS this > is easy because JS is untyped. For WebAssembly, I think I would > lean on externref for this purpose, which effectively boxes all > values. There are no fixnums in the current WebAssembly spec, so > this is suboptimal, and we have to call out to JS to evaluate type > predicates and access fields. But, support for structured > GC-managed types is coming to WebAssembly, which will speed up > object access. > > (6) The easy solution here is to make JS numbers, which are doubles at > heart, represent flonums, and use JS bignums for Scheme integers. > Fine. > > (7) This principle has been taken into account in (1)-(6). > > Now, a note on the transformation described in (4), which I call > "tailification". > > The first step of tailification computes the set of "tails" in a > function. The function entry starts a tail, as does each return point > from non-tail calls. Join points between different tails also start > tails. > > In the residual program, there are four ways that a continuation exits: > > - Tail calls in the source program are tail calls in the residual > program; no change. > > - For non-tail calls in the source program, the caller saves the state > of the continuation (the live variables flowing into the > continuation) on an explicit stack, and saves the label of the > continuation. The return continuation will be converted into a > arity-checking function entry, to handle multi-value returns; when > it is invoked, it will pop its incoming live variables from the > continuation stack. > > - Terms that continue to a join continuation are converted to label > calls in tail position, passing the state of the continuation as > arguments. > > - Returning values from a continuation pops the return label from the > stack and does an indirect tail label call on that label, with the > given return values. > > I expect that a tailified program will probably be slower than a > non-tailified program. However a tailified program has a few > interesting properties: the stack is packed and only contains live data; > the stack can be traversed in a portable way, allowing for > implementation of prompts on systems that don't support them natively; > and as all calls are tail calls, the whole system can be implemented > naturally with a driver trampoline on targets that don't support tail > calls (e.g. JavaScript and WebAssembly). > > This algorithm is implemented in the wip-tailify branch in Guile's git. > > I have this little driver program: > > (use-modules (system base compile) > (system base language) > (language cps tailify) > (language cps verify) > (language cps renumber) > (language cps dce) > (language cps simplify) > (language cps dump)) > > (define (tailify* form) > (let ((from 'scheme) > (make-lower (language-lowerer (lookup-language 'cps))) > (optimization-level 2) > (warning-level 2) > (opts '())) > (define cps > ((make-lower optimization-level opts) > (compile form #:from 'scheme #:to 'cps #:opts opts > #:optimization-level optimization-level > #:warning-level warning-level > #:env (current-module)) > (current-module))) > (format #t "Original:\n") > (dump cps) > (format #t "\n\nTailified:\n") > (let* ((cps (tailify cps)) > (cps (verify cps)) > (cps (eliminate-dead-code cps)) > (cps (simplify cps)) > (cps (renumber cps))) > (dump cps)))) > > If I run this on the following lambda: > > (lambda (x) > (pk x > (if x > (pk 12) > (pk 42)))) > > Then first it prints the dump of the optimized CPS: > > Original: > L0: > v0 := self > L1(...) > L1: > receive() > v1 := current-module() ; module > cache-set![0](v1) > v2 := const-fun L7 ; _ > return v2 > > L7: > v3 := self > L8(...) > L8: > v4 := receive(x) ; x > v5 := cache-ref[(0 . pk)]() ; cached > heap-object?(v5) ? L16() : L11() > L16(): > L17(v5) > L11(): > v6 := cache-ref[0]() ; mod > v7 := const pk ; name > v8 := lookup-bound(v6, v7) ; var > cache-set![(0 . pk)](v8) > L17(v8) > L17(v9): ; box > v10 := scm-ref/immediate[(box . 1)](v9) ; arg > false?(v4) ? L22() : L19() > L22(): > v12 := const 42 ; arg > call v10(v12) > L25(receive(arg, rest...)) > L19(): > v11 := const 12 ; arg > call v10(v11) > L25(receive(arg, rest...)) > L25(v13, v14): ; tmp, tmp > tail call v10(v4, v13) > > Then it prints the tailified version: > > L0: > v0 := self > L1(...) > L1: > receive() > v1 := current-module() ; module > cache-set![0](v1) > v2 := const-fun L8 ; _ > v3 := restore1[ptr]() ; ret > tail calli v3(v2) > > L8: > v4 := self > L9(...) > L9: > v5 := receive(x) ; x > v6 := cache-ref[(0 . pk)]() ; cached > heap-object?(v6) ? L17() : L12() > L17(): > L18(v6) > L12(): > v7 := cache-ref[0]() ; mod > v8 := const pk ; name > v9 := lookup-bound(v7, v8) ; var > cache-set![(0 . pk)](v9) > L18(v9) > L18(v10): ; box > v11 := scm-ref/immediate[(box . 1)](v10) ; arg > false?(v5) ? L24() : L20() > L24(): > v14 := const 42 ; arg > v15 := code L37 ; cont > save[(scm scm ptr)](v5, v11, v15) > tail call v11(v14) > L20(): > v12 := const 12 ; arg > v13 := code L29 ; cont > save[(scm scm ptr)](v5, v11, v13) > tail call v11(v12) > > L29: > L30(...) > L30: > v16, v17 := receive(arg, rest...) ; tmp, tmp > v18, v19 := restore[(scm scm)]() ; restored, restored > tail callk L34(_, v18, v19, v16) > > L34: > meta: arg-representations: (scm scm scm) > L35(...) > L35(v20, v21, v22): ; _, _, _ > tail call v21(v20, v22) > > L37: > L38(...) > L38: > v23, v24 := receive(arg, rest...) ; tmp, tmp > v25, v26 := restore[(scm scm)]() ; restored, restored > tail callk L34(_, v25, v26, v23) > > I think I would like to implement the "save" and "restore" primcalls on > stock Guile, to be able to adequately test tailification. But then I'd > look at targetting WebAssembly (just to make it interesting; run-time > performance would be faster no doubt if I targetted JS, due to GC > concerns). > > Anyway, thoughts welcome. Happy hacking :) > > Andy > >