From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Mikael Djurfeldt Newsgroups: gmane.lisp.guile.devel Subject: Re: guile 3 update: instruction explosion for pairs, vectors Date: Tue, 9 Jan 2018 15:54:33 +0100 Message-ID: References: <87h8rws8c8.fsf@pobox.com> Reply-To: mikael@djurfeldt.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="f4f5e803beb0c769e70562591792" X-Trace: blaine.gmane.org 1515509635 28535 195.159.176.226 (9 Jan 2018 14:53:55 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 9 Jan 2018 14:53:55 +0000 (UTC) Cc: guile-devel To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Jan 09 15:53:51 2018 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eYvHY-0005yI-CC for guile-devel@m.gmane.org; Tue, 09 Jan 2018 15:53:40 +0100 Original-Received: from localhost ([::1]:56648 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eYvJQ-0000L0-Ad for guile-devel@m.gmane.org; Tue, 09 Jan 2018 09:55:36 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54005) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eYvIU-0008CH-BF for guile-devel@gnu.org; Tue, 09 Jan 2018 09:54:41 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eYvIR-0003Vg-95 for guile-devel@gnu.org; Tue, 09 Jan 2018 09:54:38 -0500 Original-Received: from mail-io0-x234.google.com ([2607:f8b0:4001:c06::234]:41770) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eYvIR-0003VJ-2G for guile-devel@gnu.org; Tue, 09 Jan 2018 09:54:35 -0500 Original-Received: by mail-io0-x234.google.com with SMTP id f6so18817850ioh.8 for ; Tue, 09 Jan 2018 06:54:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:sender:in-reply-to:references:from:date :message-id:subject:to:cc; bh=iyIaID7eGk66GKBNAiYRJ54e0b0dHS7Edl4QQ8TjJrw=; b=fOveDL4WQq6HX4saPXy+ZdOYSNojJ9E5RY/qKf6w18yUsvGra15O9rEG8LQyeAQ/SC nwYnR3+ru2vIwhSuwTcV/xC4aTZ79Fk6XyyUBcE2dSdzI5wMCsZsOu57Pj7YLd1fTMQf i0hlJyUZotgFpX5po0/WlQL5/mlVYO4kkt7Bp0B0XNsjVN0Y9kzvEb7BFKFV7XPKpEaA MyziJZuSY96ARdNTEVP+zLyiVUwO56PxrGRmGFNaD7LEBoCkw2/jSnMFvG6TX7vgxA4h kDcyWS5c9H62iq/Ig8TeWI7oxzcWtsQcbF72SXBa8HYbxQNz2GW8IyX473uH1EqVm+WH xl8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to :references:from:date:message-id:subject:to:cc; bh=iyIaID7eGk66GKBNAiYRJ54e0b0dHS7Edl4QQ8TjJrw=; b=OLFbxyDaT9c0FhoV1vBwXIfpVHnysoqApURYc8ZGucHWleqQ7EGY4jiQ0roluXdn77 d+qubkWPpsSjIQHmGw0vd7bIQHnBvpEW5oxjsiCoAsBHO36ZO9kCQFJP3ZCk7NkNRlJZ NM5CrtBr/gVi4g+m+NhbziJZKtSTVlMen10QdCoKCQJhpM96yrARjET/OzuhM+0AiPw0 mkVlm9HE+zUJnBJ+u3sGcIi7zs+J+AW4/kjkfIsL4a+bIdwd382nS6KrQbmjY+ZT1tJX KoDNQakNXWzmfEY2Rb0EzNgti1LIyunLu4ukqXy6PRGCzRHEnjE8jO6eqjunpI9NM7SM zlZg== X-Gm-Message-State: AKwxytfVeMoDPYAAFhrMWCMqOkm5DpA8yTki0ngxFmuQHkqISEFv7wq8 rOm+Ky9IN1osiQu9NitI4XyBjR3msH165qHEq8s= X-Google-Smtp-Source: ACJfBotsYo0TpHdnTkiUBU+bOj11DWKbu9qaotpATSzDGHgNNrKeXR95Ec6ywMhxxGIfoj5BTf4oy4byG91tGRbWU6A= X-Received: by 10.107.79.2 with SMTP id d2mr14301523iob.170.1515509674107; Tue, 09 Jan 2018 06:54:34 -0800 (PST) Original-Received: by 10.79.200.67 with HTTP; Tue, 9 Jan 2018 06:54:33 -0800 (PST) Original-Received: by 10.79.200.67 with HTTP; Tue, 9 Jan 2018 06:54:33 -0800 (PST) In-Reply-To: X-Google-Sender-Auth: wgfoLbZZ100kJh_-69cWlMp-HAg X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4001:c06::234 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:19458 Archived-At: --f4f5e803beb0c769e70562591792 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm aware that something similar can be achieved by different approaches. But, note how these IMs are a nice compromise between two extremes: 1. doing no inlining at all, leaving all function calls intact or 2. inlining absolutely everything. Of course such an IM based scheme could be supplemented by run-time invocation statistics to make better choices of what to instantiate and compile---as is being done in several interpreters/compilers. Den 9 jan 2018 15:30 skrev "Mikael Djurfeldt" : > Hi, > > Hi think this is a marvelous development and, for what it's worth, in the > right direction. Many, many thanks! > > Maybe this is all completely obvious to you, but I want to remind, again, > about the plans and ideas I had for GOOPS before I had to leave it at its > rather prototypical and unfinished state: > > As you recall, generic functions (GFs) then carried a cache (ideas taken > from PCL) with "instantiated methods" (IM; there is probably a better ter= m > and I might have called them "compiled methods" or "cmethods" > before)---method code where the types of the arguments are known since ea= ch > instance come from an actual invocation of the GF. Given a new invocation= , > the dispatch mechanism would just use argument types to look up the corre= ct > IM. > > I then had the idea that since we know the types of the arguments of the > IM, a lot of the type dispatch could be eliminated within the IM based on > flow information---very much in line with what you are doing here. If we > now add one more piece, things get really exciting: Wherever there is a > call to other GFs within one IM and the types of the arguments can be > deduced at the point of the call, then the polymorphic type dispatch of t= he > GF can be eliminated and we can replace the GF call with a direct call to > the corresponding IM. > > Given now that most of the standard scheme functions can be regarded as > polymorphic GFs, I then imagined that most of the type dispatch in a > program could be eliminated. Actual execution would mostly be direct call= s > of IMs to IMs, something which the optimizer could continue to work on, > especially if it all was represented as CPS. > > Given your work here, I think that something like this could now rather > easily be implemented. That is, we re-introduce IMs, make them directly > callable, and substitute IM calls for GF calls when compiling them. I > gather that the code of IMs do not necessarily have to be hung onto GFs b= ut > could be handled by some separate manager/data structures. > > Happy new year! > Mikael > > > On Mon, Jan 8, 2018 at 4:01 PM, Andy Wingo wrote: > >> Hey all! >> >> This is an update along the road to Guile 3. Check >> https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00016.html for >> the previous entry. >> >> Since 25 November there have been around 100 commits or so. Firstly I >> merged in patches from stable-2.0, including patches corresponding to >> the improvements in the Guile 2.2.3 stable series release. >> >> Then, I started to look at "instruction explosion" for vector-ref et >> al. Basically the idea is to transform the various subcomponents of >> e.g. vector-ref into their constituent parts. In the concrete case of >> vector-ref, we have to check that the vector is a heap object, that its >> heap object tag is "vector", we have to extract the length from the heap >> object, then we have to check that the index is an integer between 0 and >> length-1, and finally we dereference the field in the vector. >> Instruction explosion turns all of these into different primcalls and >> branches. >> >> One thing that became apparent was that with instruction explosion, we'd >> have a lot more control flow. Information that the optimizer would >> learn in a specific way (e.g. via specialzied type inference / effects >> analysis handlers for vector-ref) would instead be learned by generic >> control flow. >> >> Concretely -- >> >> scheme@(guile-user)> ,x (lambda (v i) (vector-ref v i)) >> Disassembly of #:1:3 (v i)> at >> #x29f5b4c: >> >> 0 (assert-nargs-ee/locals 3 1) ;; 4 slots (2 args) at >> (unknown file):1:3 >> 1 (immediate-tag=3D? 2 7 0) ;; heap-object? at >> (unknown file):1:17 >> 3 (jne 23) ;; -> L3 >> 4 (heap-tag=3D? 2 127 13) ;; vector? >> 6 (jne 20) ;; -> L3 >> 7 (word-ref/immediate 3 2 0) >> 8 (ursh/immediate 3 3 8) >> 9 (immediate-tag=3D? 1 3 2) ;; fixnum? >> 11 (jne 13) ;; -> L2 >> 12 (untag-fixnum 0 1) >> 13 (s64-imm> 14 (jl 8) ;; -> L1 >> 15 (s64> 16 (jnl 6) ;; -> L1 >> 17 (mov 3 0) >> 18 (uadd/immediate 3 3 1) >> 19 (scm-ref 2 2 3) >> 20 (handle-interrupts) >> 21 (return-values 2) ;; 1 value >> L1: >> 22 (throw/value+data 1 201) ;; #(out-of-range >> "vector-ref" "Argument 2 out of range: ~S") >> L2: >> 24 (throw/value+data 1 225) ;; #(wrong-type-arg >> "vector-ref" "Wrong type argument in position 2 (expecting small integer= ): >> ~S") >> L3: >> 26 (throw/value+data 2 239) ;; #(wrong-type-arg >> "vector-ref" "Wrong type argument in position 1 (expecting vector): ~S") >> >> So this is a bit horrible and I need to make the disassembler do a >> better job, but anyway. Instructions 1 through 6 check that V is a >> vector; instructions 7 and 8 extract the length; 9 and 11 check that the >> index is a fixnum, 12 extracts the fixnum value as an untagged 64-bit >> integer, and 13 through 16 check that the index is in range. >> >> L1, L2, and L3 are bailouts. The idea here is that if this vector-ref >> is followed by some other operation on the vector, we'll at least get to >> elide the vector? checks, and maybe we can reuse the length extraction >> too. Et cetera. >> >> The optimizer makes decisions like when to elide redundant checks based >> on flow information. However for historical reasons unfortunately the >> "throw" terms actually did "continue" from the optimizer's perspective; >> whereas the information flowing to e.g. L3 shouldn't flow at all to >> instruction 7, the IR didn't have a way to denote terms that didn't >> continue at all. >> >> To fix this I had to make some changes to the IR. On the plus side, >> $throw is its own term kind now that doesn't have a continuation. Also, >> $branch is a term instead of an expression shoehorned into $continue; >> $prompt is a term too. Maybe one day we'll get a $select term for >> proper case compilation. >> >> Finally, the instruction explosion. I refactored the Tree-IL->CPS >> compiler to allow individual primcalls to have custom lowering routines, >> and tightened up $primcall so that it now always continues to $kargs. >> Then I added custom lowering routines for vector-ref et al, and cons and >> other things. The CPS IR refactors allowed me to remove some useless >> passes (elide-values, prune-bailouts). There were some optimizer bugs >> but generally things were already in a good state; e.g. here's a vector >> sum routine: >> >> scheme@(guile-user)> (define (v-sum v) >> (let lp ((n 0) (sum 0)) >> (if (< n (vector-length v)) >> (lp (+ n 1) (+ sum (vector-ref v n))) >> sum))) >> scheme@(guile-user)> ,x v-sum >> Disassembly of # at #x1fa2750: >> >> 0 (assert-nargs-ee/locals 2 3) ;; 5 slots (1 arg) at >> (unknown file):1:0 >> 1 (make-short-immediate 4 2) ;; 0 >> 2 (immediate-tag=3D? 3 7 0) ;; heap-object? at >> (unknown file):1:51 >> 4 (jne 39) ;; -> L5 >> 5 (heap-tag=3D? 3 127 13) ;; vector? >> 7 (jne 36) ;; -> L5 >> 8 (word-ref/immediate 2 3 0) >> 9 (ursh/immediate 2 2 8) >> 10 (imm-s64> (unknown file):1:46 >> 11 (jnl 29) ;; -> L4 >> 12 (load-s64 1 0 0) at >> (unknown file):1:89 >> 15 (s64> 16 (jnl 22) ;; -> L3 >> 17 (mov 4 1) >> 18 (uadd/immediate 4 4 1) >> 19 (scm-ref 4 3 4) >> 20 (add/immediate 4 4 0) at >> (unknown file):1:82 >> 21 (load-s64 1 0 1) >> 24 (s64> (unknown file):1:46 >> 25 (jnl 11) ;; -> L2 >> L1: >> 26 (handle-interrupts) >> 27 (mov 0 1) at >> (unknown file):1:74 >> 28 (uadd/immediate 0 0 1) >> 29 (uadd/immediate 1 1 1) at >> (unknown file):1:89 >> 30 (scm-ref 1 3 1) >> 31 (add 4 4 1) at >> (unknown file):1:82 >> 32 (s64> (unknown file):1:46 >> 33 (jnl 7) ;; -> L4 >> 34 (mov 1 0) at >> (unknown file):1:70 >> 35 (j -9) ;; -> L1 >> L2: >> 36 (mov 0 1) at >> (unknown file):1:46 >> 37 (j 3) ;; -> L4 >> L3: >> 38 (throw/value+data 4 204) ;; #(out-of-range "vector-ref" >> "Argument 2 out of range: ~S") at (unknown file):1:89 >> L4: >> 40 (handle-interrupts) >> 41 (mov 3 4) >> 42 (return-values 2) ;; 1 value >> L5: >> 43 (throw/value+data 3 233) ;; #(wrong-type-arg >> "vector-length" "Wrong type argument in position 1 (expect=E2=80=A6") at= (unknown >> file):1:51 >> >> The bit between L1 and L2 is the loop. Note that the loop does no >> dynamic checks besides checking that the unboxed index is within bounds >> -- notably the vector? check, the length computation, and the index >> untagging were hoisted out. The "scm-ref" instruction is the raw >> unchecked memory accessor instruction that will correspond to a load >> when compiled to machine code. >> >> This code runs a little slower than it did a month ago because >> instruction explosion is generally more instructions than what we had >> before. But this is expected. I expect the corresponding amount of >> machine code to be lower, when we emit machine code, sometimes >> significantly so. Doing things this way takes away the temptation for >> an eventual JIT to do optimization-like tasks -- we keep it all in >> generic CPS, and the JIT will be easy to write and generate good code. >> Until then, hacking on Guile is a bit slow though, in terms of compile >> time. >> >> Next up is instruction explosion for other data structures (boxes, >> structs, strings, etc); then I have to figure out what to do for >> arithmetic that I can't specialize (like that "add" at IP 31). I guess >> polymorphic inline caches are the real solution there; we'll see. >> >> OK that's the update. Happy new year and happy hacking with Guile :) >> >> Cheers, >> >> Andy >> >> > --f4f5e803beb0c769e70562591792 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm aware that something similar can be achieved by d= ifferent approaches.

But, note= how these IMs are a nice compromise between two extremes: 1. doing no inli= ning at all, leaving all function calls intact or 2. inlining absolutely ev= erything.

Of course such= an IM based scheme could be supplemented by run-time invocation statistics= to make better choices of what to instantiate and compile---as is being do= ne in several interpreters/compilers.

Den 9 jan 2018 15:30 skrev "Mikael Dju= rfeldt" <mikael@djurfeldt.c= om>:
Hi,

Hi think = this is a marvelous development and, for what it's worth, in the right = direction. Many, many thanks!

Maybe this is all completely obv= ious to you, but I want to remind, again, about the plans and ideas I had f= or GOOPS before I had to leave it at its rather prototypical and unfinished= state:

As you recall, generic functions (GFs) then carried a = cache (ideas taken from PCL) with "instantiated methods" (IM; the= re is probably a better term and I might have called them "compiled me= thods" or "cmethods" before)---method code where the types o= f the arguments are known since each instance come from an actual invocatio= n of the GF. Given a new invocation, the dispatch mechanism would just use = argument types to look up the correct IM.

I then had the idea = that since we know the types of the arguments of the IM, a lot of the type = dispatch could be eliminated within the IM based on flow information---very= much in line with what you are doing here. If we now add one more piece, t= hings get really exciting: Wherever there is a call to other GFs within one= IM and the types of the arguments can be deduced at the point of the call,= then the polymorphic type dispatch of the GF can be eliminated and we can = replace the GF call with a direct call to the corresponding IM.

Given now that most of the standard scheme functions can be regarded as p= olymorphic GFs, I then imagined that most of the type dispatch in a program= could be eliminated. Actual execution would mostly be direct calls of IMs = to IMs, something which the optimizer could continue to work on, especially= if it all was represented as CPS.

Given your work here, I thi= nk that something like this could now rather easily be implemented. That is= , we re-introduce IMs, make them directly callable, and substitute IM calls= for GF calls when compiling them. I gather that the code of IMs do not nec= essarily have to be hung onto GFs but could be handled by some separate man= ager/data structures.

Happy new year!
Mikael
=


On Mon, Jan 8, 2018 a= t 4:01 PM, Andy Wingo <wingo@pobox.com> wrote:
Hey all!

This is an update along the road to Guile 3.=C2=A0 Check
https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00016.html for
the previous entry.

Since 25 November there have been around 100 commits or so.=C2=A0 Firstly I=
merged in patches from stable-2.0, including patches corresponding to
the improvements in the Guile 2.2.3 stable series release.

Then, I started to look at "instruction explosion" for vector-ref= et
al.=C2=A0 Basically the idea is to transform the various subcomponents of e.g. vector-ref into their constituent parts.=C2=A0 In the concrete case of=
vector-ref, we have to check that the vector is a heap object, that its
heap object tag is "vector", we have to extract the length from t= he heap
object, then we have to check that the index is an integer between 0 and length-1, and finally we dereference the field in the vector.
Instruction explosion turns all of these into different primcalls and
branches.

One thing that became apparent was that with instruction explosion, we'= d
have a lot more control flow.=C2=A0 Information that the optimizer would learn in a specific way (e.g. via specialzied type inference / effects
analysis handlers for vector-ref) would instead be learned by generic
control flow.

Concretely --

=C2=A0 =C2=A0 scheme@(guile-user)> ,x (lambda (v i) (vector-ref v i)) =C2=A0 =C2=A0 Disassembly of #<procedure 29f5e50 at <unknown port>= :1:3 (v i)> at #x29f5b4c:

=C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 (assert-nargs-ee/locals 3 1)=C2= =A0 =C2=A0 ;; 4 slots (2 args)=C2=A0 =C2=A0at (unknown file):1:3
=C2=A0 =C2=A0 =C2=A0 =C2=A01=C2=A0 =C2=A0 (immediate-tag=3D? 2 7 0)=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0;; heap-object?=C2=A0 =C2=A0 =C2=A0 =C2=A0at (un= known file):1:17
=C2=A0 =C2=A0 =C2=A0 =C2=A03=C2=A0 =C2=A0 (jne 23)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L3
=C2=A0 =C2=A0 =C2=A0 =C2=A04=C2=A0 =C2=A0 (heap-tag=3D? 2 127 13)=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; vector?
=C2=A0 =C2=A0 =C2=A0 =C2=A06=C2=A0 =C2=A0 (jne 20)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L3
=C2=A0 =C2=A0 =C2=A0 =C2=A07=C2=A0 =C2=A0 (word-ref/immediate 3 2 0)
=C2=A0 =C2=A0 =C2=A0 =C2=A08=C2=A0 =C2=A0 (ursh/immediate 3 3 8)
=C2=A0 =C2=A0 =C2=A0 =C2=A09=C2=A0 =C2=A0 (immediate-tag=3D? 1 3 2)=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0;; fixnum?
=C2=A0 =C2=A0 =C2=A0 11=C2=A0 =C2=A0 (jne 13)=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L2
=C2=A0 =C2=A0 =C2=A0 12=C2=A0 =C2=A0 (untag-fixnum 0 1)
=C2=A0 =C2=A0 =C2=A0 13=C2=A0 =C2=A0 (s64-imm<? 0 0)
=C2=A0 =C2=A0 =C2=A0 14=C2=A0 =C2=A0 (jl 8)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L1
=C2=A0 =C2=A0 =C2=A0 15=C2=A0 =C2=A0 (s64<? 0 3)
=C2=A0 =C2=A0 =C2=A0 16=C2=A0 =C2=A0 (jnl 6)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; -> L1
=C2=A0 =C2=A0 =C2=A0 17=C2=A0 =C2=A0 (mov 3 0)
=C2=A0 =C2=A0 =C2=A0 18=C2=A0 =C2=A0 (uadd/immediate 3 3 1)
=C2=A0 =C2=A0 =C2=A0 19=C2=A0 =C2=A0 (scm-ref 2 2 3)
=C2=A0 =C2=A0 =C2=A0 20=C2=A0 =C2=A0 (handle-interrupts)
=C2=A0 =C2=A0 =C2=A0 21=C2=A0 =C2=A0 (return-values 2)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; 1 value
=C2=A0 =C2=A0 L1:
=C2=A0 =C2=A0 =C2=A0 22=C2=A0 =C2=A0 (throw/value+data 1 201)=C2=A0 =C2=A0 = =C2=A0 =C2=A0 ;; #(out-of-range "vector-ref" "Argument 2 out= of range: ~S")
=C2=A0 =C2=A0 L2:
=C2=A0 =C2=A0 =C2=A0 24=C2=A0 =C2=A0 (throw/value+data 1 225)=C2=A0 =C2=A0 = =C2=A0 =C2=A0 ;; #(wrong-type-arg "vector-ref" "Wrong type a= rgument in position 2 (expecting small integer): ~S")
=C2=A0 =C2=A0 L3:
=C2=A0 =C2=A0 =C2=A0 26=C2=A0 =C2=A0 (throw/value+data 2 239)=C2=A0 =C2=A0 = =C2=A0 =C2=A0 ;; #(wrong-type-arg "vector-ref" "Wrong type a= rgument in position 1 (expecting vector): ~S")

So this is a bit horrible and I need to make the disassembler do a
better job, but anyway.=C2=A0 Instructions 1 through 6 check that V is a vector; instructions 7 and 8 extract the length; 9 and 11 check that the index is a fixnum, 12 extracts the fixnum value as an untagged 64-bit
integer, and 13 through 16 check that the index is in range.

L1, L2, and L3 are bailouts.=C2=A0 The idea here is that if this vector-ref=
is followed by some other operation on the vector, we'll at least get t= o
elide the vector? checks, and maybe we can reuse the length extraction
too.=C2=A0 Et cetera.

The optimizer makes decisions like when to elide redundant checks based
on flow information.=C2=A0 However for historical reasons unfortunately the=
"throw" terms actually did "continue" from the optimize= r's perspective;
whereas the information flowing to e.g. L3 shouldn't flow at all to
instruction 7, the IR didn't have a way to denote terms that didn't=
continue at all.

To fix this I had to make some changes to the IR.=C2=A0 On the plus side, $throw is its own term kind now that doesn't have a continuation.=C2=A0= Also,
$branch is a term instead of an expression shoehorned into $continue;
$prompt is a term too.=C2=A0 Maybe one day we'll get a $select term for=
proper case compilation.

Finally, the instruction explosion.=C2=A0 I refactored the Tree-IL->CPS<= br> compiler to allow individual primcalls to have custom lowering routines, and tightened up $primcall so that it now always continues to $kargs.
Then I added custom lowering routines for vector-ref et al, and cons and other things.=C2=A0 The CPS IR refactors allowed me to remove some useless<= br> passes (elide-values, prune-bailouts).=C2=A0 There were some optimizer bugs=
but generally things were already in a good state; e.g. here's a vector=
sum routine:

=C2=A0 scheme@(guile-user)> (define (v-sum v)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0(let lp ((n 0) (sum 0))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(if (< n (vector-length v))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(lp (+ n 1) (+ sum (vector-ref v n)))=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sum)))
=C2=A0 scheme@(guile-user)> ,x v-sum
=C2=A0 Disassembly of #<procedure v-sum (v)> at #x1fa2750:

=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 (assert-nargs-ee/locals 2 3)=C2=A0 =C2= =A0 ;; 5 slots (1 arg)=C2=A0 =C2=A0 at (unknown file):1:0
=C2=A0 =C2=A0 =C2=A01=C2=A0 =C2=A0 (make-short-immediate 4 2)=C2=A0 =C2=A0 = =C2=A0 ;; 0
=C2=A0 =C2=A0 =C2=A02=C2=A0 =C2=A0 (immediate-tag=3D? 3 7 0)=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0;; heap-object?=C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown f= ile):1:51
=C2=A0 =C2=A0 =C2=A04=C2=A0 =C2=A0 (jne 39)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L5
=C2=A0 =C2=A0 =C2=A05=C2=A0 =C2=A0 (heap-tag=3D? 3 127 13)=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0;; vector?
=C2=A0 =C2=A0 =C2=A07=C2=A0 =C2=A0 (jne 36)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L5
=C2=A0 =C2=A0 =C2=A08=C2=A0 =C2=A0 (word-ref/immediate 2 3 0)
=C2=A0 =C2=A0 =C2=A09=C2=A0 =C2=A0 (ursh/immediate 2 2 8)
=C2=A0 =C2=A0 10=C2=A0 =C2=A0 (imm-s64<? 2 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46
=C2=A0 =C2=A0 11=C2=A0 =C2=A0 (jnl 29)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L4
=C2=A0 =C2=A0 12=C2=A0 =C2=A0 (load-s64 1 0 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at (unknown file):1:89
=C2=A0 =C2=A0 15=C2=A0 =C2=A0 (s64<? 1 2)
=C2=A0 =C2=A0 16=C2=A0 =C2=A0 (jnl 22)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L3
=C2=A0 =C2=A0 17=C2=A0 =C2=A0 (mov 4 1)
=C2=A0 =C2=A0 18=C2=A0 =C2=A0 (uadd/immediate 4 4 1)
=C2=A0 =C2=A0 19=C2=A0 =C2=A0 (scm-ref 4 3 4)
=C2=A0 =C2=A0 20=C2=A0 =C2=A0 (add/immediate 4 4 0)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0at (unknown file):1:82
=C2=A0 =C2=A0 21=C2=A0 =C2=A0 (load-s64 1 0 1)
=C2=A0 =C2=A0 24=C2=A0 =C2=A0 (s64<? 1 2)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46
=C2=A0 =C2=A0 25=C2=A0 =C2=A0 (jnl 11)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L2
=C2=A0 L1:
=C2=A0 =C2=A0 26=C2=A0 =C2=A0 (handle-interrupts)
=C2=A0 =C2=A0 27=C2=A0 =C2=A0 (mov 0 1)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:74<= br> =C2=A0 =C2=A0 28=C2=A0 =C2=A0 (uadd/immediate 0 0 1)
=C2=A0 =C2=A0 29=C2=A0 =C2=A0 (uadd/immediate 1 1 1)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 at (unknown file):1:89
=C2=A0 =C2=A0 30=C2=A0 =C2=A0 (scm-ref 1 3 1)
=C2=A0 =C2=A0 31=C2=A0 =C2=A0 (add 4 4 1)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:82
=C2=A0 =C2=A0 32=C2=A0 =C2=A0 (s64<? 0 2)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46
=C2=A0 =C2=A0 33=C2=A0 =C2=A0 (jnl 7)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; -> L4
=C2=A0 =C2=A0 34=C2=A0 =C2=A0 (mov 1 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:70<= br> =C2=A0 =C2=A0 35=C2=A0 =C2=A0 (j -9)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ;; -> L1
=C2=A0 L2:
=C2=A0 =C2=A0 36=C2=A0 =C2=A0 (mov 0 1)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at (unknown file):1:46<= br> =C2=A0 =C2=A0 37=C2=A0 =C2=A0 (j 3)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;; -> L4
=C2=A0 L3:
=C2=A0 =C2=A0 38=C2=A0 =C2=A0 (throw/value+data 4 204)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 ;; #(out-of-range "vector-ref" "Argument 2 out of ran= ge: ~S") at (unknown file):1:89
=C2=A0 L4:
=C2=A0 =C2=A0 40=C2=A0 =C2=A0 (handle-interrupts)
=C2=A0 =C2=A0 41=C2=A0 =C2=A0 (mov 3 4)
=C2=A0 =C2=A0 42=C2=A0 =C2=A0 (return-values 2)=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0;; 1 value
=C2=A0 L5:
=C2=A0 =C2=A0 43=C2=A0 =C2=A0 (throw/value+data 3 233)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 ;; #(wrong-type-arg "vector-length" "Wrong type argum= ent in position 1 (expect=E2=80=A6") at (unknown file):1:51

The bit between L1 and L2 is the loop.=C2=A0 Note that the loop does no
dynamic checks besides checking that the unboxed index is within bounds
-- notably the vector? check, the length computation, and the index
untagging were hoisted out.=C2=A0 The "scm-ref" instruction is th= e raw
unchecked memory accessor instruction that will correspond to a load
when compiled to machine code.

This code runs a little slower than it did a month ago because
instruction explosion is generally more instructions than what we had
before.=C2=A0 But this is expected.=C2=A0 I expect the corresponding amount= of
machine code to be lower, when we emit machine code, sometimes
significantly so.=C2=A0 Doing things this way takes away the temptation for=
an eventual JIT to do optimization-like tasks -- we keep it all in
generic CPS, and the JIT will be easy to write and generate good code.
Until then, hacking on Guile is a bit slow though, in terms of compile
time.

Next up is instruction explosion for other data structures (boxes,
structs, strings, etc); then I have to figure out what to do for
arithmetic that I can't specialize (like that "add" at IP 31)= .=C2=A0 I guess
polymorphic inline caches are the real solution there; we'll see.

OK that's the update.=C2=A0 Happy new year and happy hacking with Guile= :)

Cheers,

Andy


--f4f5e803beb0c769e70562591792--