From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: Re: Pausable continuations Date: Sun, 13 Feb 2022 10:34:43 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000002fc41805d7e3038c" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25872"; mail-complaints-to="usenet@ciao.gmane.io" To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Feb 13 10:35:35 2022 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nJBI6-0006YW-GI for guile-devel@m.gmane-mx.org; Sun, 13 Feb 2022 10:35:34 +0100 Original-Received: from localhost ([::1]:39860 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nJBI4-0006Vc-Le for guile-devel@m.gmane-mx.org; Sun, 13 Feb 2022 04:35:32 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:46250) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJBHX-0006Uy-3J for guile-devel@gnu.org; Sun, 13 Feb 2022 04:34:59 -0500 Original-Received: from [2607:f8b0:4864:20::432] (port=43940 helo=mail-pf1-x432.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nJBHU-0008Nx-Ic for guile-devel@gnu.org; Sun, 13 Feb 2022 04:34:58 -0500 Original-Received: by mail-pf1-x432.google.com with SMTP id d187so24072965pfa.10 for ; Sun, 13 Feb 2022 01:34:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=AdLb4Na5366oe5NhbK8QPPK4Vrhrazl7nlGmAoJBDH4=; b=HQ+g+2HnFXU61VX0XGmgHVq6MePigw+/sp3CNfaXwaqj0R1Huh1zHqv8RDIO7BLm1a /4I97GEubw+SHL+8X+8yChxPieG68sh9nGQeiHJczxnJmDBzr9TdFBE4A7qWPCnAcJLz Z0+Ts46ylAYKxYnUNO17KLWEHDC9rGLl0BuWFKvBW1hna9/zhJ8EICw2p/2+Pj86lW3T MBWXaYkmhBdx646txt/zbWlbmbB6aVSDXJ17aQYCcp1Aq94vSkgmvXRPOjHxRBOivyF4 zYYFZaF73YRLqoTytGjr+xzuwg1IapDKlTfm/8sV3gfzar9Az1+BqicEiDQuciTKJdJG fW5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=AdLb4Na5366oe5NhbK8QPPK4Vrhrazl7nlGmAoJBDH4=; b=pLHRVU2Aq3zaxaZoLWl+GYXdYTxdyQOg8nGjncsFG1m3GzIrUilzWIZ5KNLdA0PyNP Lga9mQ2MX9JHQuUau5CcUv8XUmX1NKl41FUqF5rQTn2oNk0kRJP9Oiz57dvwkzaQ8+D5 cgdDwzV0JQUcIADw/rmrl9fu1h4trhaUZGiiQN5a24Y85sHFmd0HsFw1atdmmEo92xkb 6ceQWMglFsfvr0eJGNr7U8ynL64qatC1U4MJQg5fsVq2z1nRFxGveoJ0RFGK2X5feKyj aY6hdqeQOA1rD4W5UASgpRkrB4w1HZ561SNgVDgKAEGW5sk5fmrkTxsk+2aXqc2dC4Yk hL0w== X-Gm-Message-State: AOAM5308Kdi1xel43nNwhEpF8Fw42DwAijgioQg2g78JLinSvOk3DdOm pGtdpBsZCbqeCPhx/SMH/ilYBruRU2oUPIQIspgt3BVl X-Google-Smtp-Source: ABdhPJyITpe2Mmq3eu3PgMMeBsnn+eIW/qvvnGjgPaq7YGwn4Xqmj+xqTEY90s69y89WTp9IcfyH4J7EXnI7v0lWaqY= X-Received: by 2002:a63:4c41:: with SMTP id m1mr7474901pgl.52.1644744894562; Sun, 13 Feb 2022 01:34:54 -0800 (PST) In-Reply-To: X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::432 (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::432; envelope-from=stefan.itampe@gmail.com; helo=mail-pf1-x432.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:21108 Archived-At: --0000000000002fc41805d7e3038c Content-Type: text/plain; charset="UTF-8" The case with A simple loop of 20M operations are now down to 0.3 s that's almost 20X improvements over the best delimited continuation example (6s). Cpython takes 0.5s! On Fri, Feb 11, 2022 at 1:10 PM Stefan Israelsson Tampe < stefan.itampe@gmail.com> wrote: > Hmm, I can improve the delimited continuation speed slightly by doing the > below code > > > (define prompt (list 1)) > (define (f2) > (let lp ((i 0)) > (when (< i 20000000) > (begin > (abort-to-prompt prompt) > (lp (+ i 1))))) > #f) > > ; 5.906402s real time, 12.297234s run time. 8.894807s spent in GC. > > So we are actually around 12X faster. > > (define (test2) > (let lp ((k f2)) > (let ((k (call-with-prompt prompt k (lambda (k) k)))) > (if k (lp k) #f)))) > > > On Fri, Feb 11, 2022 at 1:06 PM Stefan Israelsson Tampe < > stefan.itampe@gmail.com> wrote: > >> I managed to make jitted code work for an example, speeds up the code up >> 2x. So in 1s ther is 40M ops per s >> overhead in the generator construct, that's essentially 4x slower the >> fastest it can do in a very simple loop. And matches >> pythons generators and are 15x faster than the example code I have above. >> >> On Thu, Feb 10, 2022 at 4:19 PM Stefan Israelsson Tampe < >> stefan.itampe@gmail.com> wrote: >> >>> I did some benchmark, consider this code below. Let's turn off the jit. >>> Then >>> a 20M loop using normal delimited continuations yields, >>> >>> ;; 7.866898s real time, 14.809225s run time. 9.652291s spent in GC >>> >>> With a pausing continuation or generator we end up with, >>> ;; 0.965947s real time, 0.965588s run time. 0.000000s spent in GC. >>> >>> python 3's similar generator example is executing at 0.5s for the same >>> looop. >>> so using delimited continuations to model pythons generators we have an >>> overhead of around 15X. >>> >>> With jit, >>> ;; 6.678504s real time, 13.589789s run time. 9.560317s spent in GC. >>> >>> So we can't really get any speedup help from guile's jit here. The >>> paused jit version is not available as I have not figured out how to do >>> this jet. >>> >>> (define prompt (list 1)) >>> (define (f) >>> (let lp ((i 0)) >>> (when (< i 20000000) >>> (begin >>> (abort-to-prompt prompt) >>> (lp (+ i 1)))))) >>> >>> (define (test2) >>> (let lp ((k f)) >>> (call-with-prompt prompt k lp))) >>> >>> >>> >>> On Thu, Feb 10, 2022 at 2:07 PM Stefan Israelsson Tampe < >>> stefan.itampe@gmail.com> wrote: >>> >>>> Consider a memory barrier idiom constructed from >>>> 0, (mk-stack) >>>> 1. (enter x) >>>> 2. (pause x) >>>> 3. (leave x) >>>> >>>> The idea is that we create a separate stack object and when entering >>>> it, we will swap the current stack with the one in the argument saving the >>>> current stack in x and be in the 'child' state and move to a paused >>>> position in case of a pause, when pausing stack x, we will return to where >>>> after where entered saving the current position in stack and ip, and be in >>>> state 'pause' and when we leave we will be in the state 'leave and move >>>> to the old stack, using the current >>>> ip. At first encounter the function stack frame is copied over hence >>>> there will be a fork limited to the function only. >>>> >>>> This means that we essentially can define a generator as >>>> (define (g x) >>>> (let lp ((n 0)) >>>> (if (< n 10) >>>> (begin >>>> (pause x) >>>> (lp (+ n 1)))))) >>>> >>>> And use it as >>>> (define (test) >>>> (let ((x (mk-stack))) >>>> (let lp () >>>> (case (enter x) >>>> ((pause) >>>> (pk 'pause) >>>> (lp)) >>>> ((child) >>>> (g x) >>>> (leave x)))))))) >>>> >>>> A paused or leaved stack cannot be paused, an entered stack cannot be >>>> entered and one cannot leave a paused stack, but enter a leaved stack. >>>> >>>> Anyhow this idea is modeled like a fork command instead of functional >>>> and have the benefit over delimited continuations that one does not need to >>>> copy the whole stack and potentially speed up generator like constructs. >>>> But not only this, writing efficient prolog code is possible as well. We >>>> could simplify a lot of the generation of prolog code, speed it up and also >>>> improve compiler speed of prolog code significantly. >>>> >>>> How would we approach the prolog code. The simplest system is to use >>>> return the >>>> alternate pause stack when succeeding things becomes very simple, >>>> >>>> x = stack to pause to in case of failure >>>> cc = the continuation >>>> >>>> ( (x cc) goal1 goal2) >>>> :: (cc (goal1 (goal2 x)) >>>> >>>> ( (x cc) goal1 goal2) >>>> :: (let ((xx (mkstack))) >>>> (case (enter xx) >>>> ((child) >>>> (cc (goal2 xx))) >>>> >>>> ((pause) >>>> (cc (goal2 x))))) >>>> >>>> Very elegant, and we also can use some heuristics to store already made >>>> stacks when >>>> leaving a stack and reuse at the next enter which is a common theme in >>>> prolog, >>>> >>>> Anyhow we have an issue, consider the case where everythings >>>> succeds forever. Then we will blow the stack . There is no concept of tail >>>> calls here. So what you can do is the following for an , >>>> >>>> (let ((xx (mk-stack))) >>>> (case (enter xx) >>>> ((child) >>>> (goal1 x (lambda (xxx) (pause xx xxx))) >>>> >>>> ((pause xxx) >>>> (goal2 xxx cc)))) >>>> >>>> This enable cuts so that a cutted and (and!) in kanren lingo will use >>>> (goal2 x cc) >>>> >>>> And we have tail calls! >>>> >>>> >>>> I have a non jitted version guile working as a proof of concept. >>>> >>>> The drawback with this is if a function uses a lot of stack, it will be >>>> a memory hog. >>>> >>>> WDYT? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> . >>>> >>>> >>>> >>> --0000000000002fc41805d7e3038c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The case with A simple loop of 20M operations are now down= to 0.3 s that's almost 20X improvements=C2=A0over
the best delimit= ed continuation example (6s). Cpython takes 0.5s!

On Fri, Feb 11, 2022= at 1:10 PM Stefan Israelsson Tampe <stefan.itampe@gmail.com> wrote:
Hmm, I can improve the de= limited continuation speed slightly by doing the below code

<= div>
(define prompt (list 1))
(define (f2)
=C2=A0 (let lp ((i= 0))
=C2=A0 =C2=A0 (when (< i 20000000)
=C2=A0 =C2=A0 =C2=A0 (begi= n
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (abort-to-prompt prompt)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 (lp (+ i 1)))))
=C2=A0 #f)

=C2=A0; 5.906402s= real time, 12.297234s run time. =C2=A08.894807s spent in GC.=
So we are actually around 12X faster.

<= /span>(define (test2)
=C2=A0 (let lp ((k f2))
=C2=A0 =C2=A0 (let ((k = (call-with-prompt prompt k (lambda (k) k))))
=C2=A0 =C2=A0 =C2=A0 (if k = (lp k) #f))))


On Fri, Feb 11, 2022 at 1:06 PM= Stefan Israelsson Tampe <stefan.itampe@gmail.com> wrote:
I managed to make ji= tted code work for an example, speeds up the code up 2x. So in 1s ther is 4= 0M ops per s
overhead in the generator construct, that's essentiall= y 4x slower the fastest it can do in a very simple loop. And matches=C2=A0<= /div>
pythons generators and are 15x faster than the example code I hav= e above.

On Thu, Feb 10, 2022 at 4:19 PM Stefan Israelsson Tampe <= ;stefan.itampe= @gmail.com> wrote:
I did some benchmark, consider this code below.= Let's turn off the jit. Then
a 20M loop using normal delimited=C2= =A0continuations yields,

;; 7.866898s real t= ime, 14.809225s run time. =C2=A09.652291s spent in GC

With a pausing continuation or generator we end= up with,
;; 0.965947s real time, 0.965588s run time. = =C2=A00.000000s spent in GC.

python 3's similar generator example is executing at 0.5s for th= e same looop.
so using delimited continuations to model pythons g= enerators we have an overhead of around 15X.

With = jit,
;; 6.678504s real time, 13.589789s run time. =C2=A09.560317s spe= nt in GC.

So we can't= really get any speedup help from guile's jit here. The paused jit vers= ion is not available as I have not figured out how to do this jet.

(define prompt (list 1))
(define (f)
=C2=A0 = (let lp ((i 0))
=C2=A0 =C2=A0 (when (< i 20000000)
=C2=A0 =C2=A0 = =C2=A0 (begin
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (abort-to-prompt prompt)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 (lp (+ i 1))))))
=C2=A0
(define (test2)<= br>=C2=A0 (let lp ((k f))
=C2=A0 =C2=A0 (call-with-prompt prompt k lp)))=



On Thu, Feb 10, 2022 at 2:07 = PM Stefan Israelsson Tampe <stefan.itampe@gmail.com> wrote:
Consider a memory = barrier=C2=A0idiom constructed from=C2=A0
0, (mk-stack)
1. (ent= er x)
2. (pause x)
3. (leave x)

The idea is that we create a separate stack object and when enterin= g it, we will swap the current stack with the one in the argument saving th= e current stack in x=C2=A0 and be in the 'child' state and move to = a paused position in case of a pause, when pausing stack x, we=C2=A0will re= turn to where after where entered saving the current position in stack and = ip, and be in state 'pause' and when we leave we will be in the sta= te=C2=A0'leave and move to=C2=A0the old stack, using the current
<= div>ip. At first encounter the function=C2=A0stack frame is copied over hen= ce there will be a fork limited to the function=C2=A0only.

This means that we essentially can define a generator as
(define (g x)
=C2=A0 (let lp ((n 0))
=C2=A0 =C2=A0 (if= (< n 10)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (begin
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(pause x)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0(lp (+ n 1))))))

And use it as=
(define (test)
=C2=A0 =C2=A0 (let ((x (mk-stack)))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 (let lp ()
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(case (enter x)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0((pause)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(pk 'pause)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(lp))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ((child)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(g x)<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(le= ave x))))))))

A paused or leaved stack cannot be p= aused, an entered=C2=A0stack cannot be entered and one cannot leave a pause= d stack, but enter a leaved stack.

Anyhow this ide= a is modeled like a fork command instead=C2=A0of functional and have the be= nefit over delimited continuations that one does not need to copy the whole= stack and potentially speed up generator=C2=A0like constructs. But not onl= y this,=C2=A0writing efficient prolog code is possible as well. We could si= mplify a lot of the generation of prolog code, speed it up and also improve= compiler speed of prolog code significantly.

How = would we approach the=C2=A0 prolog code.=C2=A0The simplest system is to use= return the=C2=A0
alternate pause stack when succeeding things be= comes very simple,

x=C2=A0 =C2=A0=3D stack to paus= e to in case of failure
cc =3D the continuation

(<and> (x cc)=C2=A0 goal1 goal2)=C2=A0=C2=A0
=C2=A0= =C2=A0 =C2=A0:: (cc (goal1 (goal2 x))

(<or >= ;=C2=A0 =C2=A0(x cc)=C2=A0 goal1 goal2)=C2=A0=C2=A0
=C2=A0 =C2=A0= ::=C2=A0 (let ((xx (mkstack)))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0(case (enter xx)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0((child)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (cc (goal2 xx)))

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ((pause)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(cc (go= al2 x)))))

Very elegant, and we also can use some = heuristics to store already made stacks when=C2=A0
leaving a stac= k and reuse at the next enter which is a common theme in prolog,
=
Anyhow we have an issue, consider the case where everythings= succeds=C2=A0forever. Then we will blow the stack . There is no concept=C2= =A0of tail calls here. So what you can do is the following for an <and&g= t;,

(let ((xx (mk-stack)))
=C2=A0 =C2=A0= (case (enter xx)
=C2=A0 =C2=A0 =C2=A0 ((child)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0(goal1 x (lambda (xxx) (pause xx xxx)))
=C2= =A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 ((pause xxx)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(goal2 xxx cc))))

This enable cuts so that a cutted and (and!) in kanren lingo will use=
(goal2 x cc)

And we have tail calls!


I have a non jitted version guile wor= king as a proof of concept.=C2=A0

The drawback wit= h this is if a function uses a lot of stack, it will be a memory hog.
=

WDYT?











.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
--0000000000002fc41805d7e3038c--