From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Mikael Djurfeldt <mikael@djurfeldt.com>
Newsgroups: gmane.lisp.guile.devel
Subject: Re: guile 3 update: instruction explosion for pairs, vectors
Date: Tue, 9 Jan 2018 15:54:33 +0100
Message-ID: <CAA2XvwJA5NUOBDB_9qQ9Xn55NqYSomm72bRpFPVZP-f9fu3U=Q@mail.gmail.com>
References: <87h8rws8c8.fsf@pobox.com>
	<CAA2Xvw+imstb56HqmMSoxEJxBpLFCzRQci8o0gviKEcz4fkdrA@mail.gmail.com>
Reply-To: mikael@djurfeldt.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="f4f5e803beb0c769e70562591792"
X-Trace: blaine.gmane.org 1515509635 28535 195.159.176.226 (9 Jan 2018 14:53:55 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 9 Jan 2018 14:53:55 +0000 (UTC)
Cc: guile-devel <guile-devel@gnu.org>
To: Andy Wingo <wingo@pobox.com>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Jan 09 15:53:51 2018
Return-path: <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Envelope-to: guile-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1eYvHY-0005yI-CC
	for guile-devel@m.gmane.org; Tue, 09 Jan 2018 15:53:40 +0100
Original-Received: from localhost ([::1]:56648 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1eYvJQ-0000L0-Ad
	for guile-devel@m.gmane.org; Tue, 09 Jan 2018 09:55:36 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54005)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mdjurfeldt@gmail.com>) id 1eYvIU-0008CH-BF
	for guile-devel@gnu.org; Tue, 09 Jan 2018 09:54:41 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mdjurfeldt@gmail.com>) id 1eYvIR-0003Vg-95
	for guile-devel@gnu.org; Tue, 09 Jan 2018 09:54:38 -0500
Original-Received: from mail-io0-x234.google.com ([2607:f8b0:4001:c06::234]:41770)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <mdjurfeldt@gmail.com>)
	id 1eYvIR-0003VJ-2G
	for guile-devel@gnu.org; Tue, 09 Jan 2018 09:54:35 -0500
Original-Received: by mail-io0-x234.google.com with SMTP id f6so18817850ioh.8
	for <guile-devel@gnu.org>; Tue, 09 Jan 2018 06:54:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:reply-to:sender:in-reply-to:references:from:date
	:message-id:subject:to:cc;
	bh=iyIaID7eGk66GKBNAiYRJ54e0b0dHS7Edl4QQ8TjJrw=;
	b=fOveDL4WQq6HX4saPXy+ZdOYSNojJ9E5RY/qKf6w18yUsvGra15O9rEG8LQyeAQ/SC
	nwYnR3+ru2vIwhSuwTcV/xC4aTZ79Fk6XyyUBcE2dSdzI5wMCsZsOu57Pj7YLd1fTMQf
	i0hlJyUZotgFpX5po0/WlQL5/mlVYO4kkt7Bp0B0XNsjVN0Y9kzvEb7BFKFV7XPKpEaA
	MyziJZuSY96ARdNTEVP+zLyiVUwO56PxrGRmGFNaD7LEBoCkw2/jSnMFvG6TX7vgxA4h
	kDcyWS5c9H62iq/Ig8TeWI7oxzcWtsQcbF72SXBa8HYbxQNz2GW8IyX473uH1EqVm+WH
	xl8Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to
	:references:from:date:message-id:subject:to:cc;
	bh=iyIaID7eGk66GKBNAiYRJ54e0b0dHS7Edl4QQ8TjJrw=;
	b=OLFbxyDaT9c0FhoV1vBwXIfpVHnysoqApURYc8ZGucHWleqQ7EGY4jiQ0roluXdn77
	d+qubkWPpsSjIQHmGw0vd7bIQHnBvpEW5oxjsiCoAsBHO36ZO9kCQFJP3ZCk7NkNRlJZ
	NM5CrtBr/gVi4g+m+NhbziJZKtSTVlMen10QdCoKCQJhpM96yrARjET/OzuhM+0AiPw0
	mkVlm9HE+zUJnBJ+u3sGcIi7zs+J+AW4/kjkfIsL4a+bIdwd382nS6KrQbmjY+ZT1tJX
	KoDNQakNXWzmfEY2Rb0EzNgti1LIyunLu4ukqXy6PRGCzRHEnjE8jO6eqjunpI9NM7SM
	zlZg==
X-Gm-Message-State: AKwxytfVeMoDPYAAFhrMWCMqOkm5DpA8yTki0ngxFmuQHkqISEFv7wq8
	rOm+Ky9IN1osiQu9NitI4XyBjR3msH165qHEq8s=
X-Google-Smtp-Source: ACJfBotsYo0TpHdnTkiUBU+bOj11DWKbu9qaotpATSzDGHgNNrKeXR95Ec6ywMhxxGIfoj5BTf4oy4byG91tGRbWU6A=
X-Received: by 10.107.79.2 with SMTP id d2mr14301523iob.170.1515509674107;
	Tue, 09 Jan 2018 06:54:34 -0800 (PST)
Original-Received: by 10.79.200.67 with HTTP; Tue, 9 Jan 2018 06:54:33 -0800 (PST)
Original-Received: by 10.79.200.67 with HTTP; Tue, 9 Jan 2018 06:54:33 -0800 (PST)
In-Reply-To: <CAA2Xvw+imstb56HqmMSoxEJxBpLFCzRQci8o0gviKEcz4fkdrA@mail.gmail.com>
X-Google-Sender-Auth: wgfoLbZZ100kJh_-69cWlMp-HAg
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:4001:c06::234
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Developers list for Guile,
	the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guile-devel/>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Original-Sender: "guile-devel" <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.lisp.guile.devel:19458
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/19458>

--f4f5e803beb0c769e70562591792
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I'm aware that something similar can be achieved by different approaches.

But, note how these IMs are a nice compromise between two extremes: 1.
doing no inlining at all, leaving all function calls intact or 2. inlining
absolutely everything.

Of course such an IM based scheme could be supplemented by run-time
invocation statistics to make better choices of what to instantiate and
compile---as is being done in several interpreters/compilers.

Den 9 jan 2018 15:30 skrev "Mikael Djurfeldt" <mikael@djurfeldt.com>:

> Hi,
>
> Hi think this is a marvelous development and, for what it's worth, in the
> right direction. Many, many thanks!
>
> Maybe this is all completely obvious to you, but I want to remind, again,
> about the plans and ideas I had for GOOPS before I had to leave it at its
> rather prototypical and unfinished state:
>
> As you recall, generic functions (GFs) then carried a cache (ideas taken
> from PCL) with "instantiated methods" (IM; there is probably a better ter=
m
> and I might have called them "compiled methods" or "cmethods"
> before)---method code where the types of the arguments are known since ea=
ch
> instance come from an actual invocation of the GF. Given a new invocation=
,
> the dispatch mechanism would just use argument types to look up the corre=
ct
> IM.
>
> I then had the idea that since we know the types of the arguments of the
> IM, a lot of the type dispatch could be eliminated within the IM based on
> flow information---very much in line with what you are doing here. If we
> now add one more piece, things get really exciting: Wherever there is a
> call to other GFs within one IM and the types of the arguments can be
> deduced at the point of the call, then the polymorphic type dispatch of t=
he
> GF can be eliminated and we can replace the GF call with a direct call to
> the corresponding IM.
>
> Given now that most of the standard scheme functions can be regarded as
> polymorphic GFs, I then imagined that most of the type dispatch in a
> program could be eliminated. Actual execution would mostly be direct call=
s
> of IMs to IMs, something which the optimizer could continue to work on,
> especially if it all was represented as CPS.
>
> Given your work here, I think that something like this could now rather
> easily be implemented. That is, we re-introduce IMs, make them directly
> callable, and substitute IM calls for GF calls when compiling them. I
> gather that the code of IMs do not necessarily have to be hung onto GFs b=
ut
> could be handled by some separate manager/data structures.
>
> Happy new year!
> Mikael
>
>
> On Mon, Jan 8, 2018 at 4:01 PM, Andy Wingo <wingo@pobox.com> wrote:
>
>> Hey all!
>>
>> This is an update along the road to Guile 3.  Check
>> https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00016.html for
>> the previous entry.
>>
>> Since 25 November there have been around 100 commits or so.  Firstly I
>> merged in patches from stable-2.0, including patches corresponding to
>> the improvements in the Guile 2.2.3 stable series release.
>>
>> Then, I started to look at "instruction explosion" for vector-ref et
>> al.  Basically the idea is to transform the various subcomponents of
>> e.g. vector-ref into their constituent parts.  In the concrete case of
>> vector-ref, we have to check that the vector is a heap object, that its
>> heap object tag is "vector", we have to extract the length from the heap
>> object, then we have to check that the index is an integer between 0 and
>> length-1, and finally we dereference the field in the vector.
>> Instruction explosion turns all of these into different primcalls and
>> branches.
>>
>> One thing that became apparent was that with instruction explosion, we'd
>> have a lot more control flow.  Information that the optimizer would
>> learn in a specific way (e.g. via specialzied type inference / effects
>> analysis handlers for vector-ref) would instead be learned by generic
>> control flow.
>>
>> Concretely --
>>
>>     scheme@(guile-user)> ,x (lambda (v i) (vector-ref v i))
>>     Disassembly of #<procedure 29f5e50 at <unknown port>:1:3 (v i)> at
>> #x29f5b4c:
>>
>>        0    (assert-nargs-ee/locals 3 1)    ;; 4 slots (2 args)   at
>> (unknown file):1:3
>>        1    (immediate-tag=3D? 2 7 0)         ;; heap-object?       at
>> (unknown file):1:17
>>        3    (jne 23)                        ;; -> L3
>>        4    (heap-tag=3D? 2 127 13)           ;; vector?
>>        6    (jne 20)                        ;; -> L3
>>        7    (word-ref/immediate 3 2 0)
>>        8    (ursh/immediate 3 3 8)
>>        9    (immediate-tag=3D? 1 3 2)         ;; fixnum?
>>       11    (jne 13)                        ;; -> L2
>>       12    (untag-fixnum 0 1)
>>       13    (s64-imm<? 0 0)
>>       14    (jl 8)                          ;; -> L1
>>       15    (s64<? 0 3)
>>       16    (jnl 6)                         ;; -> L1
>>       17    (mov 3 0)
>>       18    (uadd/immediate 3 3 1)
>>       19    (scm-ref 2 2 3)
>>       20    (handle-interrupts)
>>       21    (return-values 2)               ;; 1 value
>>     L1:
>>       22    (throw/value+data 1 201)        ;; #(out-of-range
>> "vector-ref" "Argument 2 out of range: ~S")
>>     L2:
>>       24    (throw/value+data 1 225)        ;; #(wrong-type-arg
>> "vector-ref" "Wrong type argument in position 2 (expecting small integer=
):
>> ~S")
>>     L3:
>>       26    (throw/value+data 2 239)        ;; #(wrong-type-arg
>> "vector-ref" "Wrong type argument in position 1 (expecting vector): ~S")
>>
>> So this is a bit horrible and I need to make the disassembler do a
>> better job, but anyway.  Instructions 1 through 6 check that V is a
>> vector; instructions 7 and 8 extract the length; 9 and 11 check that the
>> index is a fixnum, 12 extracts the fixnum value as an untagged 64-bit
>> integer, and 13 through 16 check that the index is in range.
>>
>> L1, L2, and L3 are bailouts.  The idea here is that if this vector-ref
>> is followed by some other operation on the vector, we'll at least get to
>> elide the vector? checks, and maybe we can reuse the length extraction
>> too.  Et cetera.
>>
>> The optimizer makes decisions like when to elide redundant checks based
>> on flow information.  However for historical reasons unfortunately the
>> "throw" terms actually did "continue" from the optimizer's perspective;
>> whereas the information flowing to e.g. L3 shouldn't flow at all to
>> instruction 7, the IR didn't have a way to denote terms that didn't
>> continue at all.
>>
>> To fix this I had to make some changes to the IR.  On the plus side,
>> $throw is its own term kind now that doesn't have a continuation.  Also,
>> $branch is a term instead of an expression shoehorned into $continue;
>> $prompt is a term too.  Maybe one day we'll get a $select term for
>> proper case compilation.
>>
>> Finally, the instruction explosion.  I refactored the Tree-IL->CPS
>> compiler to allow individual primcalls to have custom lowering routines,
>> and tightened up $primcall so that it now always continues to $kargs.
>> Then I added custom lowering routines for vector-ref et al, and cons and
>> other things.  The CPS IR refactors allowed me to remove some useless
>> passes (elide-values, prune-bailouts).  There were some optimizer bugs
>> but generally things were already in a good state; e.g. here's a vector
>> sum routine:
>>
>>   scheme@(guile-user)> (define (v-sum v)
>>                          (let lp ((n 0) (sum 0))
>>                            (if (< n (vector-length v))
>>                                (lp (+ n 1) (+ sum (vector-ref v n)))
>>                                sum)))
>>   scheme@(guile-user)> ,x v-sum
>>   Disassembly of #<procedure v-sum (v)> at #x1fa2750:
>>
>>      0    (assert-nargs-ee/locals 2 3)    ;; 5 slots (1 arg)    at
>> (unknown file):1:0
>>      1    (make-short-immediate 4 2)      ;; 0
>>      2    (immediate-tag=3D? 3 7 0)         ;; heap-object?       at
>> (unknown file):1:51
>>      4    (jne 39)                        ;; -> L5
>>      5    (heap-tag=3D? 3 127 13)           ;; vector?
>>      7    (jne 36)                        ;; -> L5
>>      8    (word-ref/immediate 2 3 0)
>>      9    (ursh/immediate 2 2 8)
>>     10    (imm-s64<? 2 0)                                       at
>> (unknown file):1:46
>>     11    (jnl 29)                        ;; -> L4
>>     12    (load-s64 1 0 0)                                      at
>> (unknown file):1:89
>>     15    (s64<? 1 2)
>>     16    (jnl 22)                        ;; -> L3
>>     17    (mov 4 1)
>>     18    (uadd/immediate 4 4 1)
>>     19    (scm-ref 4 3 4)
>>     20    (add/immediate 4 4 0)                                 at
>> (unknown file):1:82
>>     21    (load-s64 1 0 1)
>>     24    (s64<? 1 2)                                           at
>> (unknown file):1:46
>>     25    (jnl 11)                        ;; -> L2
>>   L1:
>>     26    (handle-interrupts)
>>     27    (mov 0 1)                                             at
>> (unknown file):1:74
>>     28    (uadd/immediate 0 0 1)
>>     29    (uadd/immediate 1 1 1)                                at
>> (unknown file):1:89
>>     30    (scm-ref 1 3 1)
>>     31    (add 4 4 1)                                           at
>> (unknown file):1:82
>>     32    (s64<? 0 2)                                           at
>> (unknown file):1:46
>>     33    (jnl 7)                         ;; -> L4
>>     34    (mov 1 0)                                             at
>> (unknown file):1:70
>>     35    (j -9)                          ;; -> L1
>>   L2:
>>     36    (mov 0 1)                                             at
>> (unknown file):1:46
>>     37    (j 3)                           ;; -> L4
>>   L3:
>>     38    (throw/value+data 4 204)        ;; #(out-of-range "vector-ref"
>> "Argument 2 out of range: ~S") at (unknown file):1:89
>>   L4:
>>     40    (handle-interrupts)
>>     41    (mov 3 4)
>>     42    (return-values 2)               ;; 1 value
>>   L5:
>>     43    (throw/value+data 3 233)        ;; #(wrong-type-arg
>> "vector-length" "Wrong type argument in position 1 (expect=E2=80=A6") at=
 (unknown
>> file):1:51
>>
>> The bit between L1 and L2 is the loop.  Note that the loop does no
>> dynamic checks besides checking that the unboxed index is within bounds
>> -- notably the vector? check, the length computation, and the index
>> untagging were hoisted out.  The "scm-ref" instruction is the raw
>> unchecked memory accessor instruction that will correspond to a load
>> when compiled to machine code.
>>
>> This code runs a little slower than it did a month ago because
>> instruction explosion is generally more instructions than what we had
>> before.  But this is expected.  I expect the corresponding amount of
>> machine code to be lower, when we emit machine code, sometimes
>> significantly so.  Doing things this way takes away the temptation for
>> an eventual JIT to do optimization-like tasks -- we keep it all in
>> generic CPS, and the JIT will be easy to write and generate good code.
>> Until then, hacking on Guile is a bit slow though, in terms of compile
>> time.
>>
>> Next up is instruction explosion for other data structures (boxes,
>> structs, strings, etc); then I have to figure out what to do for
>> arithmetic that I can't specialize (like that "add" at IP 31).  I guess
>> polymorphic inline caches are the real solution there; we'll see.
>>
>> OK that's the update.  Happy new year and happy hacking with Guile :)
>>
>> Cheers,
>>
>> Andy
>>
>>
>

--f4f5e803beb0c769e70562591792
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">I&#39;m aware that something similar can be achieved by d=
ifferent approaches.<div dir=3D"auto"><br></div><div dir=3D"auto">But, note=
 how these IMs are a nice compromise between two extremes: 1. doing no inli=
ning at all, leaving all function calls intact or 2. inlining absolutely ev=
erything.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Of course such=
 an IM based scheme could be supplemented by run-time invocation statistics=
 to make better choices of what to instantiate and compile---as is being do=
ne in several interpreters/compilers.</div></div><div class=3D"gmail_extra"=
><br><div class=3D"gmail_quote">Den 9 jan 2018 15:30 skrev &quot;Mikael Dju=
rfeldt&quot; &lt;<a href=3D"mailto:mikael@djurfeldt.com">mikael@djurfeldt.c=
om</a>&gt;:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><div><div><div><div><div><div><div><div>Hi,<br><br></div>Hi think =
this is a marvelous development and, for what it&#39;s worth, in the right =
direction. Many, many thanks!<br><br></div>Maybe this is all completely obv=
ious to you, but I want to remind, again, about the plans and ideas I had f=
or GOOPS before I had to leave it at its rather prototypical and unfinished=
 state:<br><br></div>As you recall, generic functions (GFs) then carried a =
cache (ideas taken from PCL) with &quot;instantiated methods&quot; (IM; the=
re is probably a better term and I might have called them &quot;compiled me=
thods&quot; or &quot;cmethods&quot; before)---method code where the types o=
f the arguments are known since each instance come from an actual invocatio=
n of the GF. Given a new invocation, the dispatch mechanism would just use =
argument types to look up the correct IM.<br><br></div>I then had the idea =
that since we know the types of the arguments of the IM, a lot of the type =
dispatch could be eliminated within the IM based on flow information---very=
 much in line with what you are doing here. If we now add one more piece, t=
hings get really exciting: Wherever there is a call to other GFs within one=
 IM and the types of the arguments can be deduced at the point of the call,=
 then the polymorphic type dispatch of the GF can be eliminated and we can =
replace the GF call with a direct call to the corresponding IM.<br><br></di=
v>Given now that most of the standard scheme functions can be regarded as p=
olymorphic GFs, I then imagined that most of the type dispatch in a program=
 could be eliminated. Actual execution would mostly be direct calls of IMs =
to IMs, something which the optimizer could continue to work on, especially=
 if it all was represented as CPS.<br><br></div>Given your work here, I thi=
nk that something like this could now rather easily be implemented. That is=
, we re-introduce IMs, make them directly callable, and substitute IM calls=
 for GF calls when compiling them. I gather that the code of IMs do not nec=
essarily have to be hung onto GFs but could be handled by some separate man=
ager/data structures.<br><br></div>Happy new year!<br></div>Mikael<br><div>=
<div><div><div><div><div><br></div></div></div></div></div></div></div><div=
 class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Jan 8, 2018 a=
t 4:01 PM, Andy Wingo <span dir=3D"ltr">&lt;<a href=3D"mailto:wingo@pobox.c=
om" target=3D"_blank">wingo@pobox.com</a>&gt;</span> wrote:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex">Hey all!<br>
<br>
This is an update along the road to Guile 3.=C2=A0 Check<br>
<a href=3D"https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00016.=
html" rel=3D"noreferrer" target=3D"_blank">https://lists.gnu.org/archive/<w=
br>html/guile-devel/2017-11/msg00<wbr>016.html</a> for<br>
the previous entry.<br>
<br>
Since 25 November there have been around 100 commits or so.=C2=A0 Firstly I=
<br>
merged in patches from stable-2.0, including patches corresponding to<br>
the improvements in the Guile 2.2.3 stable series release.<br>
<br>
Then, I started to look at &quot;instruction explosion&quot; for vector-ref=
 et<br>
al.=C2=A0 Basically the idea is to transform the various subcomponents of<b=
r>
e.g. vector-ref into their constituent parts.=C2=A0 In the concrete case of=
<br>
vector-ref, we have to check that the vector is a heap object, that its<br>
heap object tag is &quot;vector&quot;, we have to extract the length from t=
he heap<br>
object, then we have to check that the index is an integer between 0 and<br=
>
length-1, and finally we dereference the field in the vector.<br>
Instruction explosion turns all of these into different primcalls and<br>
branches.<br>
<br>
One thing that became apparent was that with instruction explosion, we&#39;=
d<br>
have a lot more control flow.=C2=A0 Information that the optimizer would<br=
>
learn in a specific way (e.g. via specialzied type inference / effects<br>
analysis handlers for vector-ref) would instead be learned by generic<br>
control flow.<br>
<br>
Concretely --<br>
<br>
=C2=A0 =C2=A0 scheme@(guile-user)&gt; ,x (lambda (v i) (vector-ref v i))<br=
>
=C2=A0 =C2=A0 Disassembly of #&lt;procedure 29f5e50 at &lt;unknown port&gt;=
:1:3 (v i)&gt; at #x29f5b4c:<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 (assert-nargs-ee/locals 3 1)=C2=
=A0 =C2=A0 ;; 4 slots (2 args)=C2=A0 =C2=A0at (unknown file):1:3<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A01=C2=A0 =C2=A0 (immediate-tag=3D? 2 7 0)=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0;; heap-object?=C2=A0 =C2=A0 =C2=A0 =C2=A0at (un=
known file):1:17<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A03=C2=A0 =C2=A0 (jne 23)=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L3<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A04=C2=A0 =C2=A0 (heap-tag=3D? 2 127 13)=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; vector?<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A06=C2=A0 =C2=A0 (jne 20)=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L3<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A07=C2=A0 =C2=A0 (word-ref/immediate 3 2 0)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A08=C2=A0 =C2=A0 (ursh/immediate 3 3 8)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A09=C2=A0 =C2=A0 (immediate-tag=3D? 1 3 2)=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0;; fixnum?<br>
=C2=A0 =C2=A0 =C2=A0 11=C2=A0 =C2=A0 (jne 13)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L2<br>
=C2=A0 =C2=A0 =C2=A0 12=C2=A0 =C2=A0 (untag-fixnum 0 1)<br>
=C2=A0 =C2=A0 =C2=A0 13=C2=A0 =C2=A0 (s64-imm&lt;? 0 0)<br>
=C2=A0 =C2=A0 =C2=A0 14=C2=A0 =C2=A0 (jl 8)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L1<br>
=C2=A0 =C2=A0 =C2=A0 15=C2=A0 =C2=A0 (s64&lt;? 0 3)<br>
=C2=A0 =C2=A0 =C2=A0 16=C2=A0 =C2=A0 (jnl 6)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; -&gt; L1<br>
=C2=A0 =C2=A0 =C2=A0 17=C2=A0 =C2=A0 (mov 3 0)<br>
=C2=A0 =C2=A0 =C2=A0 18=C2=A0 =C2=A0 (uadd/immediate 3 3 1)<br>
=C2=A0 =C2=A0 =C2=A0 19=C2=A0 =C2=A0 (scm-ref 2 2 3)<br>
=C2=A0 =C2=A0 =C2=A0 20=C2=A0 =C2=A0 (handle-interrupts)<br>
=C2=A0 =C2=A0 =C2=A0 21=C2=A0 =C2=A0 (return-values 2)=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; 1 value<br>
=C2=A0 =C2=A0 L1:<br>
=C2=A0 =C2=A0 =C2=A0 22=C2=A0 =C2=A0 (throw/value+data 1 201)=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 ;; #(out-of-range &quot;vector-ref&quot; &quot;Argument 2 out=
 of range: ~S&quot;)<br>
=C2=A0 =C2=A0 L2:<br>
=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0 (throw/value+data 1 225)=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 ;; #(wrong-type-arg &quot;vector-ref&quot; &quot;Wrong type a=
rgument in position 2 (expecting small integer): ~S&quot;)<br>
=C2=A0 =C2=A0 L3:<br>
=C2=A0 =C2=A0 =C2=A0 26=C2=A0 =C2=A0 (throw/value+data 2 239)=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 ;; #(wrong-type-arg &quot;vector-ref&quot; &quot;Wrong type a=
rgument in position 1 (expecting vector): ~S&quot;)<br>
<br>
So this is a bit horrible and I need to make the disassembler do a<br>
better job, but anyway.=C2=A0 Instructions 1 through 6 check that V is a<br=
>
vector; instructions 7 and 8 extract the length; 9 and 11 check that the<br=
>
index is a fixnum, 12 extracts the fixnum value as an untagged 64-bit<br>
integer, and 13 through 16 check that the index is in range.<br>
<br>
L1, L2, and L3 are bailouts.=C2=A0 The idea here is that if this vector-ref=
<br>
is followed by some other operation on the vector, we&#39;ll at least get t=
o<br>
elide the vector? checks, and maybe we can reuse the length extraction<br>
too.=C2=A0 Et cetera.<br>
<br>
The optimizer makes decisions like when to elide redundant checks based<br>
on flow information.=C2=A0 However for historical reasons unfortunately the=
<br>
&quot;throw&quot; terms actually did &quot;continue&quot; from the optimize=
r&#39;s perspective;<br>
whereas the information flowing to e.g. L3 shouldn&#39;t flow at all to<br>
instruction 7, the IR didn&#39;t have a way to denote terms that didn&#39;t=
<br>
continue at all.<br>
<br>
To fix this I had to make some changes to the IR.=C2=A0 On the plus side,<b=
r>
$throw is its own term kind now that doesn&#39;t have a continuation.=C2=A0=
 Also,<br>
$branch is a term instead of an expression shoehorned into $continue;<br>
$prompt is a term too.=C2=A0 Maybe one day we&#39;ll get a $select term for=
<br>
proper case compilation.<br>
<br>
Finally, the instruction explosion.=C2=A0 I refactored the Tree-IL-&gt;CPS<=
br>
compiler to allow individual primcalls to have custom lowering routines,<br=
>
and tightened up $primcall so that it now always continues to $kargs.<br>
Then I added custom lowering routines for vector-ref et al, and cons and<br=
>
other things.=C2=A0 The CPS IR refactors allowed me to remove some useless<=
br>
passes (elide-values, prune-bailouts).=C2=A0 There were some optimizer bugs=
<br>
but generally things were already in a good state; e.g. here&#39;s a vector=
<br>
sum routine:<br>
<br>
=C2=A0 scheme@(guile-user)&gt; (define (v-sum v)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0(let lp ((n 0) (sum 0))<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0(if (&lt; n (vector-length v))<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(lp (+ n 1) (+ sum (vector-ref v n)))=
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sum)))<br>
=C2=A0 scheme@(guile-user)&gt; ,x v-sum<br>
=C2=A0 Disassembly of #&lt;procedure v-sum (v)&gt; at #x1fa2750:<br>
<br>
=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 (assert-nargs-ee/locals 2 3)=C2=A0 =C2=
=A0 ;; 5 slots (1 arg)=C2=A0 =C2=A0 at (unknown file):1:0<br>
=C2=A0 =C2=A0 =C2=A01=C2=A0 =C2=A0 (make-short-immediate 4 2)=C2=A0 =C2=A0 =
=C2=A0 ;; 0<br>
=C2=A0 =C2=A0 =C2=A02=C2=A0 =C2=A0 (immediate-tag=3D? 3 7 0)=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0;; heap-object?=C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown f=
ile):1:51<br>
=C2=A0 =C2=A0 =C2=A04=C2=A0 =C2=A0 (jne 39)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L5<br>
=C2=A0 =C2=A0 =C2=A05=C2=A0 =C2=A0 (heap-tag=3D? 3 127 13)=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0;; vector?<br>
=C2=A0 =C2=A0 =C2=A07=C2=A0 =C2=A0 (jne 36)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L5<br>
=C2=A0 =C2=A0 =C2=A08=C2=A0 =C2=A0 (word-ref/immediate 2 3 0)<br>
=C2=A0 =C2=A0 =C2=A09=C2=A0 =C2=A0 (ursh/immediate 2 2 8)<br>
=C2=A0 =C2=A0 10=C2=A0 =C2=A0 (imm-s64&lt;? 2 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46<br>
=C2=A0 =C2=A0 11=C2=A0 =C2=A0 (jnl 29)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L4<br>
=C2=A0 =C2=A0 12=C2=A0 =C2=A0 (load-s64 1 0 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at (unknown file):1:89<br>
=C2=A0 =C2=A0 15=C2=A0 =C2=A0 (s64&lt;? 1 2)<br>
=C2=A0 =C2=A0 16=C2=A0 =C2=A0 (jnl 22)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L3<br>
=C2=A0 =C2=A0 17=C2=A0 =C2=A0 (mov 4 1)<br>
=C2=A0 =C2=A0 18=C2=A0 =C2=A0 (uadd/immediate 4 4 1)<br>
=C2=A0 =C2=A0 19=C2=A0 =C2=A0 (scm-ref 4 3 4)<br>
=C2=A0 =C2=A0 20=C2=A0 =C2=A0 (add/immediate 4 4 0)=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0at (unknown file):1:82<br>
=C2=A0 =C2=A0 21=C2=A0 =C2=A0 (load-s64 1 0 1)<br>
=C2=A0 =C2=A0 24=C2=A0 =C2=A0 (s64&lt;? 1 2)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46<br>
=C2=A0 =C2=A0 25=C2=A0 =C2=A0 (jnl 11)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L2<br>
=C2=A0 L1:<br>
=C2=A0 =C2=A0 26=C2=A0 =C2=A0 (handle-interrupts)<br>
=C2=A0 =C2=A0 27=C2=A0 =C2=A0 (mov 0 1)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:74<=
br>
=C2=A0 =C2=A0 28=C2=A0 =C2=A0 (uadd/immediate 0 0 1)<br>
=C2=A0 =C2=A0 29=C2=A0 =C2=A0 (uadd/immediate 1 1 1)=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 at (unknown file):1:89<br>
=C2=A0 =C2=A0 30=C2=A0 =C2=A0 (scm-ref 1 3 1)<br>
=C2=A0 =C2=A0 31=C2=A0 =C2=A0 (add 4 4 1)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:82<br>
=C2=A0 =C2=A0 32=C2=A0 =C2=A0 (s64&lt;? 0 2)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46<br>
=C2=A0 =C2=A0 33=C2=A0 =C2=A0 (jnl 7)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; -&gt; L4<br>
=C2=A0 =C2=A0 34=C2=A0 =C2=A0 (mov 1 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:70<=
br>
=C2=A0 =C2=A0 35=C2=A0 =C2=A0 (j -9)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -&gt; L1<br>
=C2=A0 L2:<br>
=C2=A0 =C2=A0 36=C2=A0 =C2=A0 (mov 0 1)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46<=
br>
=C2=A0 =C2=A0 37=C2=A0 =C2=A0 (j 3)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; -&gt; L4<br>
=C2=A0 L3:<br>
=C2=A0 =C2=A0 38=C2=A0 =C2=A0 (throw/value+data 4 204)=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 ;; #(out-of-range &quot;vector-ref&quot; &quot;Argument 2 out of ran=
ge: ~S&quot;) at (unknown file):1:89<br>
=C2=A0 L4:<br>
=C2=A0 =C2=A0 40=C2=A0 =C2=A0 (handle-interrupts)<br>
=C2=A0 =C2=A0 41=C2=A0 =C2=A0 (mov 3 4)<br>
=C2=A0 =C2=A0 42=C2=A0 =C2=A0 (return-values 2)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0;; 1 value<br>
=C2=A0 L5:<br>
=C2=A0 =C2=A0 43=C2=A0 =C2=A0 (throw/value+data 3 233)=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 ;; #(wrong-type-arg &quot;vector-length&quot; &quot;Wrong type argum=
ent in position 1 (expect=E2=80=A6&quot;) at (unknown file):1:51<br>
<br>
The bit between L1 and L2 is the loop.=C2=A0 Note that the loop does no<br>
dynamic checks besides checking that the unboxed index is within bounds<br>
-- notably the vector? check, the length computation, and the index<br>
untagging were hoisted out.=C2=A0 The &quot;scm-ref&quot; instruction is th=
e raw<br>
unchecked memory accessor instruction that will correspond to a load<br>
when compiled to machine code.<br>
<br>
This code runs a little slower than it did a month ago because<br>
instruction explosion is generally more instructions than what we had<br>
before.=C2=A0 But this is expected.=C2=A0 I expect the corresponding amount=
 of<br>
machine code to be lower, when we emit machine code, sometimes<br>
significantly so.=C2=A0 Doing things this way takes away the temptation for=
<br>
an eventual JIT to do optimization-like tasks -- we keep it all in<br>
generic CPS, and the JIT will be easy to write and generate good code.<br>
Until then, hacking on Guile is a bit slow though, in terms of compile<br>
time.<br>
<br>
Next up is instruction explosion for other data structures (boxes,<br>
structs, strings, etc); then I have to figure out what to do for<br>
arithmetic that I can&#39;t specialize (like that &quot;add&quot; at IP 31)=
.=C2=A0 I guess<br>
polymorphic inline caches are the real solution there; we&#39;ll see.<br>
<br>
OK that&#39;s the update.=C2=A0 Happy new year and happy hacking with Guile=
 :)<br>
<br>
Cheers,<br>
<br>
Andy<br>
<br>
</blockquote></div><br></div>
</blockquote></div></div>

--f4f5e803beb0c769e70562591792--