The case with A simple loop of 20M operations are now down to 0.3 s that's
almost 20X improvements over
the best delimited continuation example (6s). Cpython takes 0.5s!

On Fri, Feb 11, 2022 at 1:10 PM Stefan Israelsson Tampe <
stefan.itampe@gmail.com> wrote:

> Hmm, I can improve the delimited continuation speed slightly by doing the
> below code
>
>
> (define prompt (list 1))
> (define (f2)
>   (let lp ((i 0))
>     (when (< i 20000000)
>       (begin
>         (abort-to-prompt prompt)
>         (lp (+ i 1)))))
>   #f)
>
>  ; 5.906402s real time, 12.297234s run time.  8.894807s spent in GC.
>
> So we are actually around 12X faster.
>
> (define (test2)
>   (let lp ((k f2))
>     (let ((k (call-with-prompt prompt k (lambda (k) k))))
>       (if k (lp k) #f))))
>
>
> On Fri, Feb 11, 2022 at 1:06 PM Stefan Israelsson Tampe <
> stefan.itampe@gmail.com> wrote:
>
>> I managed to make jitted code work for an example, speeds up the code up
>> 2x. So in 1s ther is 40M ops per s
>> overhead in the generator construct, that's essentially 4x slower the
>> fastest it can do in a very simple loop. And matches
>> pythons generators and are 15x faster than the example code I have above.
>>
>> On Thu, Feb 10, 2022 at 4:19 PM Stefan Israelsson Tampe <
>> stefan.itampe@gmail.com> wrote:
>>
>>> I did some benchmark, consider this code below. Let's turn off the jit.
>>> Then
>>> a 20M loop using normal delimited continuations yields,
>>>
>>> ;; 7.866898s real time, 14.809225s run time.  9.652291s spent in GC
>>>
>>> With a pausing continuation or generator we end up with,
>>> ;; 0.965947s real time, 0.965588s run time.  0.000000s spent in GC.
>>>
>>> python 3's similar generator example is executing at 0.5s for the same
>>> looop.
>>> so using delimited continuations to model pythons generators we have an
>>> overhead of around 15X.
>>>
>>> With jit,
>>> ;; 6.678504s real time, 13.589789s run time.  9.560317s spent in GC.
>>>
>>> So we can't really get any speedup help from guile's jit here. The
>>> paused jit version is not available as I have not figured out how to do
>>> this jet.
>>>
>>> (define prompt (list 1))
>>> (define (f)
>>>   (let lp ((i 0))
>>>     (when (< i 20000000)
>>>       (begin
>>>         (abort-to-prompt prompt)
>>>         (lp (+ i 1))))))
>>>
>>> (define (test2)
>>>   (let lp ((k f))
>>>     (call-with-prompt prompt k lp)))
>>>
>>>
>>>
>>> On Thu, Feb 10, 2022 at 2:07 PM Stefan Israelsson Tampe <
>>> stefan.itampe@gmail.com> wrote:
>>>
>>>> Consider a memory barrier idiom constructed from
>>>> 0, (mk-stack)
>>>> 1. (enter x)
>>>> 2. (pause x)
>>>> 3. (leave x)
>>>>
>>>> The idea is that we create a separate stack object and when entering
>>>> it, we will swap the current stack with the one in the argument saving the
>>>> current stack in x  and be in the 'child' state and move to a paused
>>>> position in case of a pause, when pausing stack x, we will return to where
>>>> after where entered saving the current position in stack and ip, and be in
>>>> state 'pause' and when we leave we will be in the state 'leave and move
>>>> to the old stack, using the current
>>>> ip. At first encounter the function stack frame is copied over hence
>>>> there will be a fork limited to the function only.
>>>>
>>>> This means that we essentially can define a generator as
>>>> (define (g x)
>>>>   (let lp ((n 0))
>>>>     (if (< n 10)
>>>>         (begin
>>>>            (pause x)
>>>>            (lp (+ n 1))))))
>>>>
>>>> And use it as
>>>> (define (test)
>>>>     (let ((x (mk-stack)))
>>>>         (let lp ()
>>>>            (case (enter x)
>>>>                ((pause)
>>>>                    (pk 'pause)
>>>>                    (lp))
>>>>                 ((child)
>>>>                  (g x)
>>>>                  (leave x))))))))
>>>>
>>>> A paused or leaved stack cannot be paused, an entered stack cannot be
>>>> entered and one cannot leave a paused stack, but enter a leaved stack.
>>>>
>>>> Anyhow this idea is modeled like a fork command instead of functional
>>>> and have the benefit over delimited continuations that one does not need to
>>>> copy the whole stack and potentially speed up generator like constructs.
>>>> But not only this, writing efficient prolog code is possible as well. We
>>>> could simplify a lot of the generation of prolog code, speed it up and also
>>>> improve compiler speed of prolog code significantly.
>>>>
>>>> How would we approach the  prolog code. The simplest system is to use
>>>> return the
>>>> alternate pause stack when succeeding things becomes very simple,
>>>>
>>>> x   = stack to pause to in case of failure
>>>> cc = the continuation
>>>>
>>>> (<and> (x cc)  goal1 goal2)
>>>>      :: (cc (goal1 (goal2 x))
>>>>
>>>> (<or >   (x cc)  goal1 goal2)
>>>>     ::  (let ((xx (mkstack)))
>>>>              (case (enter xx)
>>>>                  ((child)
>>>>                   (cc (goal2 xx)))
>>>>
>>>>                 ((pause)
>>>>                  (cc (goal2 x)))))
>>>>
>>>> Very elegant, and we also can use some heuristics to store already made
>>>> stacks when
>>>> leaving a stack and reuse at the next enter which is a common theme in
>>>> prolog,
>>>>
>>>> Anyhow we have an issue, consider the case where everythings
>>>> succeds forever. Then we will blow the stack . There is no concept of tail
>>>> calls here. So what you can do is the following for an <and>,
>>>>
>>>> (let ((xx (mk-stack)))
>>>>     (case (enter xx)
>>>>       ((child)
>>>>        (goal1 x (lambda (xxx) (pause xx xxx)))
>>>>
>>>>       ((pause xxx)
>>>>          (goal2 xxx cc))))
>>>>
>>>> This enable cuts so that a cutted and (and!) in kanren lingo will use
>>>> (goal2 x cc)
>>>>
>>>> And we have tail calls!
>>>>
>>>>
>>>> I have a non jitted version guile working as a proof of concept.
>>>>
>>>> The drawback with this is if a function uses a lot of stack, it will be
>>>> a memory hog.
>>>>
>>>> WDYT?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> .
>>>>
>>>>
>>>>
>>>