I'm looking forward to it, and I'm pretty sure this can bring new
possibilities to Artanis!

Best regards.

On Sun, Jun 20, 2021, 04:21 Andy Wingo <wingo@pobox.com> wrote:

> Hi :)
>
> A brain-dump tonight.  I was thinking about what it would mean for Guile
> to target the web.  The goal would be for Guile to be a useful language
> for programming client-side interaction on web pages.  Now, there are a
> number of takes for Schemes of various stripes to target JS.  I haven't
> made a full survey but my starting point is that in a Guile context, we
> basically want the Guile Scheme language, which is to say:
>
>  1. Tail calls.  The solution must include proper tail calls.
>
>  2. Multiple values.  The solution should have the same semantics that
>     Guile has for multiple-value returns, including implicit
>     truncation.
>
>  3. Varargs.  The solution should allow for rest arguments, keyword
>     arguments, and the like.
>
>  4. Delimited continuations.  The solution should allow non-local exit
>     from any function, and should allow for capturing stack slices to
>     partial continuations, and should allow those continuations to be
>     instantiated multiple times if desired.  We should be able to target
>     fibers to the web.  Otherwise, why bother?
>
>  5. Garbage collection.  *We should re-use the host GC*.  Although it
>     would be possible to manage a heap in linear memory, that has
>     retention problems due to cycles between the Guile heap and the JS
>     heap.
>
>  6. Numbers.  We should support Scheme's unbounded integers.  Requires
>     bignums in the limit case, but thankfully JS has bignums now.  We
>     will still be able to unbox in many cases though.
>
>  7. Target existing web.  We shouldn't depend on some future WebAssembly
>     or JS proposal -- the solution should work in the here and now and
>     only get better as features are added to the web platform.
>
> From a UX perspective, I would expect we would generally want
> whole-program compilation with aggressive DCE / tree-shaking, producing
> a single JS or WebAssembly artifact at build-time.  But that's a later
> issue.
>
> I have thought about this off and on over the years but in the end was
> flummoxed about how to meet all requirements.  However recently I think
> I have come up with a solution for most of these:
>
>  (1) In JS, tail calls are part of ES2015, but not implemented in all
>      browsers.  In WebAssembly, they are a future plan, but hard for
>      various reasons.  So in practice the solution for today's web is to
>      use a universal trampoline and make all calls tail calls --
>      i.e. call all functions via:
>
>        while true:
>          call(pop())
>
>      Whenever we can target proper tail calls, this will only get
>      faster.
>
>  (2) Neither JS nor WebAssembly have the truncation semantics we want.
>      Therefore we will return values via an explicit array, and then the
>      continuation will be responsible for consuming those values and
>      binding any needed variables.
>
>  (3) Although JS does have varargs support, WebAssembly does not.  But
>      we can actually use the same solution here as we use for (1) and
>      (2) -- pass arguments on an explicit array + nvalues, and relying
>      on the function to parse them appropriately.  In this way we can
>      get Guile keyword arguments.  This also makes the type of Scheme
>      functions uniform, which is important in WebAssembly contexts.
>
>  (4) This is the interesting bit!  As hinted in (1), we will transform
>      the program such that all calls are tail calls.  This is a form of
>      minimal CPS conversion -- minimal in the sense that functions are
>      broken up into the minimum number of pieces to ensure the
>      all-tail-calls property.  Non-tail calls transform to tail calls by
>      saving their continuation's state on a stack, which is the same as
>      stack allocation for these continuations.  The continuation of the
>      call pops the saved state from the stack.  Because the stack is
>      explicit, we can manipulate it as data: slice it to capture a
>      delimited continuation, drop values to make a non-local exit, splat
>      a saved slice back on to compose a captured delimited continuation
>      with the current continuation, and so on.  Therefore we have the
>      necessary primitives to implement delimited continuations as a
>      library.
>
>  (5) Scheme needs a representation that can store any value.  In JS this
>      is easy because JS is untyped.  For WebAssembly, I think I would
>      lean on externref for this purpose, which effectively boxes all
>      values.  There are no fixnums in the current WebAssembly spec, so
>      this is suboptimal, and we have to call out to JS to evaluate type
>      predicates and access fields.  But, support for structured
>      GC-managed types is coming to WebAssembly, which will speed up
>      object access.
>
>  (6) The easy solution here is to make JS numbers, which are doubles at
>      heart, represent flonums, and use JS bignums for Scheme integers.
>      Fine.
>
>  (7) This principle has been taken into account in (1)-(6).
>
> Now, a note on the transformation described in (4), which I call
> "tailification".
>
> The first step of tailification computes the set of "tails" in a
> function.  The function entry starts a tail, as does each return point
> from non-tail calls.  Join points between different tails also start
> tails.
>
> In the residual program, there are four ways that a continuation exits:
>
>   - Tail calls in the source program are tail calls in the residual
>     program; no change.
>
>   - For non-tail calls in the source program, the caller saves the state
>     of the continuation (the live variables flowing into the
>     continuation) on an explicit stack, and saves the label of the
>     continuation.  The return continuation will be converted into a
>     arity-checking function entry, to handle multi-value returns; when
>     it is invoked, it will pop its incoming live variables from the
>     continuation stack.
>
>   - Terms that continue to a join continuation are converted to label
>     calls in tail position, passing the state of the continuation as
>     arguments.
>
>   - Returning values from a continuation pops the return label from the
>     stack and does an indirect tail label call on that label, with the
>     given return values.
>
> I expect that a tailified program will probably be slower than a
> non-tailified program.  However a tailified program has a few
> interesting properties: the stack is packed and only contains live data;
> the stack can be traversed in a portable way, allowing for
> implementation of prompts on systems that don't support them natively;
> and as all calls are tail calls, the whole system can be implemented
> naturally with a driver trampoline on targets that don't support tail
> calls (e.g. JavaScript and WebAssembly).
>
> This algorithm is implemented in the wip-tailify branch in Guile's git.
>
> I have this little driver program:
>
>     (use-modules (system base compile)
>                  (system base language)
>                  (language cps tailify)
>                  (language cps verify)
>                  (language cps renumber)
>                  (language cps dce)
>                  (language cps simplify)
>                  (language cps dump))
>
>     (define (tailify* form)
>       (let ((from 'scheme)
>             (make-lower (language-lowerer (lookup-language 'cps)))
>             (optimization-level 2)
>             (warning-level 2)
>             (opts '()))
>         (define cps
>           ((make-lower optimization-level opts)
>            (compile form #:from 'scheme #:to 'cps #:opts opts
>                     #:optimization-level optimization-level
>                     #:warning-level warning-level
>                     #:env (current-module))
>            (current-module)))
>         (format #t "Original:\n")
>         (dump cps)
>         (format #t "\n\nTailified:\n")
>         (let* ((cps (tailify cps))
>                (cps (verify cps))
>                (cps (eliminate-dead-code cps))
>                (cps (simplify cps))
>                (cps (renumber cps)))
>           (dump cps))))
>
> If I run this on the following lambda:
>
>   (lambda (x)
>     (pk x
>         (if x
>             (pk 12)
>             (pk 42))))
>
> Then first it prints the dump of the optimized CPS:
>
>   Original:
>   L0:
>     v0 := self
>     L1(...)
>   L1:
>     receive()
>     v1 := current-module()                      ; module
>     cache-set![0](v1)
>     v2 := const-fun L7                          ; _
>     return v2
>
>   L7:
>     v3 := self
>     L8(...)
>   L8:
>     v4 := receive(x)                            ; x
>     v5 := cache-ref[(0 . pk)]()                 ; cached
>     heap-object?(v5) ? L16() : L11()
>   L16():
>     L17(v5)
>   L11():
>     v6 := cache-ref[0]()                        ; mod
>     v7 := const pk                              ; name
>     v8 := lookup-bound(v6, v7)                  ; var
>     cache-set![(0 . pk)](v8)
>     L17(v8)
>   L17(v9):                                      ; box
>     v10 := scm-ref/immediate[(box . 1)](v9)     ; arg
>     false?(v4) ? L22() : L19()
>   L22():
>     v12 := const 42                             ; arg
>     call v10(v12)
>     L25(receive(arg, rest...))
>   L19():
>     v11 := const 12                             ; arg
>     call v10(v11)
>     L25(receive(arg, rest...))
>   L25(v13, v14):                                ; tmp, tmp
>     tail call v10(v4, v13)
>
> Then it prints the tailified version:
>
>   L0:
>     v0 := self
>     L1(...)
>   L1:
>     receive()
>     v1 := current-module()                      ; module
>     cache-set![0](v1)
>     v2 := const-fun L8                          ; _
>     v3 := restore1[ptr]()                       ; ret
>     tail calli v3(v2)
>
>   L8:
>     v4 := self
>     L9(...)
>   L9:
>     v5 := receive(x)                            ; x
>     v6 := cache-ref[(0 . pk)]()                 ; cached
>     heap-object?(v6) ? L17() : L12()
>   L17():
>     L18(v6)
>   L12():
>     v7 := cache-ref[0]()                        ; mod
>     v8 := const pk                              ; name
>     v9 := lookup-bound(v7, v8)                  ; var
>     cache-set![(0 . pk)](v9)
>     L18(v9)
>   L18(v10):                                     ; box
>     v11 := scm-ref/immediate[(box . 1)](v10)    ; arg
>     false?(v5) ? L24() : L20()
>   L24():
>     v14 := const 42                             ; arg
>     v15 := code L37                             ; cont
>     save[(scm scm ptr)](v5, v11, v15)
>     tail call v11(v14)
>   L20():
>     v12 := const 12                             ; arg
>     v13 := code L29                             ; cont
>     save[(scm scm ptr)](v5, v11, v13)
>     tail call v11(v12)
>
>   L29:
>     L30(...)
>   L30:
>     v16, v17 := receive(arg, rest...)           ; tmp, tmp
>     v18, v19 := restore[(scm scm)]()            ; restored, restored
>     tail callk L34(_, v18, v19, v16)
>
>   L34:
>     meta: arg-representations: (scm scm scm)
>     L35(...)
>   L35(v20, v21, v22):                           ; _, _, _
>     tail call v21(v20, v22)
>
>   L37:
>     L38(...)
>   L38:
>     v23, v24 := receive(arg, rest...)           ; tmp, tmp
>     v25, v26 := restore[(scm scm)]()            ; restored, restored
>     tail callk L34(_, v25, v26, v23)
>
> I think I would like to implement the "save" and "restore" primcalls on
> stock Guile, to be able to adequately test tailification.  But then I'd
> look at targetting WebAssembly (just to make it interesting; run-time
> performance would be faster no doubt if I targetted JS, due to GC
> concerns).
>
> Anyway, thoughts welcome.  Happy hacking :)
>
> Andy
>
>