On Mon, Apr 17, 2023, 11:48 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> I wrote that incorrectly - I meant that primitive operations would add
> a continuation to the future and return a future for their result.
> Basically, a computation would never block, it would just build
> continuation trees (in the form of futures) and return to the
> top-level.  Although that assumes the system would be able to allocate
> those futures without blocking for GC work.

I think this would end up being extremely inefficient, since for every
tiny operation you'd end up creating a future, linked to the
previous computation (basically a sort of fine-grained dataflow graph).
I'm sure in theory it can be made tolerably efficient, but it'd need
a very different implementation strategy than what we have now.
Furthermore I expect it would lead to surprising semantics when used
with side-effecting operations.

Figuring those surprises out might be useful in determining what should be saved.

IOW you can probably create a usable system that uses this approach, but
with a different language and a different implementation :-)

I'm not advocating for it.  It would be like RABBIT with inverted continuation construction, but less sophisticated - dynamically constructing the closures from Plotkin's CPS transform without the optimizations of RABBIT or subsequent compilers.  And that proviso about the GC not blocking seems like a significant challenge.

Instead, I view it as an exercise in trying to understand what a syntax for working with futures as a semantics is supposed to represent - how to condense the shape of the computation into the shape of the expression.
 
`futur.el` instead forces the programmer to be somewhat explicit about
the concurrency points, so they have *some* control over efficiency and
interaction with side-effects.

> At some point in this thread you stated you weren't sure what the
> right semantics are in terms of the information to save, etc.  I posed
> this implicit semantics as a way to think about what "the right thing"
> would be.  Would all operations preserve the same (lisp) machine
> state, or would it differ depending on the nature of the operator?  [
> is the kind of question it might be useful to work out in this thought
> experiment ]

I can't imagine it working sanely if the kind of state that's saved
depends on the operation: the saved-state is basically private to the
continuation, so it might make sense to do it differently depending on
the continuation (tho even that would introduce a lot of complexity),
but not depending on the operation.

Am I right in thinking that the only real question is around the buffer state and buffer-local variables?  Global variables are something users can easily define locks for, and dynamically bound variables are already thread-local, but buffer state is particularly subject to I/O events and buffer local variables don't really have a strong thread semantics - in one context a variable might be global and in another buffer-local, and the programmer would have to figure out whether to use a buffer-local lock or global lock on the fly.  Or, maybe it would be fair to say that we would expect most interesting asynchronous code to involve work on buffers so that use-case is worth special consideration?


The coders will need to know what is saved and what isn't, so the
more complex this rule is, the harder it is to learn to use this
tool correctly.

I feel like I have read you refer to using purely functional data structures for concurrency in emacs (or elsewhere), but I don't have any concrete reference.  So, I don't think my suggestion that buffers might be extended or replaced with a functional data structure + merging of asynchronous changes per https://lists.gnu.org/archive/html/emacs-devel/2023-04/msg00587.html is novel to you.  For all I know, it reflects something you wrote in the past as munged through my memory.

In any case, synchronizing though immutable data structures and merging is probably a lot easier and more performant path to concurrent buffers that going through the existent code trying to impose fine-grained synchronization.  I don't know if some kind of buffer-rope based on the existing buffer code is a feasible path, or if there would need to be a wholesale reimplementation, but that kind of buffer would not only be good for asynchronous/concurrent editing with non-preemptive threading, but also between multiple in-process lisp machines with shared memory or even multi-process "collaborative" editing.

If an emacs developer just wanted to get something going to see how it might work, though, maybe there's a kluge involving overlays with embedded temporary buffers, where the main buffer could be made read-only when it was being accessed asynchronously, and "copy-on-write" used when an error is thrown when one of the asynchronous functions attempts to write to the buffer.  Then the async function would merge its changes on yield or return or something - this is one place you could provide explicit control over synchronization.
 
> The way you've defined future-let, the variable being bound is a
> future because you are constructing it as one, but it is still a
> normal variable.
>
> What if, instead, we define a "futur-abstraction" (lambda/futur (v)
> body ...) in which v is treated as a future by default, and a
> future-conditional form (if-available v ready-expr not-ready-expr)
> with the obvious meaning.  If v appears as the argument to a
> lambda/future function object it will be passed as is.  Otherwise, the
> reference to v would be rewritten as (futur-wait v).  Some syntactic
> sugar (futur-escape v) => (if-available v v) could be used to pass the
> future to arbitrary functions.

Seems complex, and I'm not sure it would buy you anything in practice.

It might not be much - I'm thinking it is one way to find a middle-ground between full-blown CPS-conversion and "stackless" coroutines.  Plus, a future is essentially an operator on continuations, so it's not too wacky to define a class of operators on futures.

Given the correspondence of futures to variables bound by continuations, maybe Felleisen's representation of "holes" in evaluation contexts would be visually helpful.  Square brackets aren't convenient, but angular or curly brackets could be used to connote when a variable is being referenced as a future rather than the value returned by the future.  I would consider that to be a concise form of explicitness.

So, maybe "future-bind" could be replaced by "{}<-", and the "<-" in "future-let*" by {}<-.  Somehow I suspect ({x} <- ...) being used to bind "x" would be over the lispy line, but that could be interesting.
Either way, in the body, "{x}" would refer to the future and "x" to the value yielded by the future.  Or maybe it should be the other way around.  Either way, the point of the syntax is to visually represent the flow of the future to multiple (syntactic) contexts.  I'm not sure how else that inversion of control can be concisely represented when it *doesn't* happen in a linear fashion.
 
> It would be easier if elisp threads were orthogonal to system threads,
> so that any elisp thread could be run on any available system thread.

Currently, only one thread can run ELisp at a time.  Whether that's
implemented using several system threads or not is largely an
internal detail.

> Multiprocessing could be done by creating multiple lisp VMs in a
> process (i.e. lisp VM orthogonal to a physical core),

Yes, "could".

I would go so far as to say it's the safest approach, especially if preserving the semantics of existing programs is a goal.  I don't think attempting to make the lisp multiprocessing semantics slavishly replicate the host multiprocessing semantics is a good idea at all.  At least with portable dumps/loads, there is a path to creating independent VMs (where the symbol table is local to the VM, and probably "headless" to start) in the same process space.

It's not too hard to come up with a design that makes sense, indeed, the
problem is to actually do the work of bringing the current code to
that design.

True enough.   There are shorter paths than (whether naive or sophisticated) attempts to impose fine-grained locking on the emacs run-time to replicate the native facilities for parallelism.  I am under the impression that the latter is the prevailing view of what it would mean to bring efficient parallelism to emacs, which is unfortunate if the impression is correct.

Lynn