From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: guile 3 update: more number unboxing improvements Date: Wed, 29 Nov 2017 09:14:49 +0100 Message-ID: <874lpdwlfq.fsf@igalia.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1511943315 5752 195.159.176.226 (29 Nov 2017 08:15:15 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 29 Nov 2017 08:15:15 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Nov 29 09:15:09 2017 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eJxWN-0000p4-LD for guile-devel@m.gmane.org; Wed, 29 Nov 2017 09:15:08 +0100 Original-Received: from localhost ([::1]:41738 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eJxWS-0007Aj-Ax for guile-devel@m.gmane.org; Wed, 29 Nov 2017 03:15:12 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43465) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eJxWJ-00077x-Ch for guile-devel@gnu.org; Wed, 29 Nov 2017 03:15:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eJxWF-0000DD-BI for guile-devel@gnu.org; Wed, 29 Nov 2017 03:15:03 -0500 Original-Received: from pb-sasl2.pobox.com ([64.147.108.67]:59099 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eJxWF-0000Cr-6W for guile-devel@gnu.org; Wed, 29 Nov 2017 03:14:59 -0500 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 5D42AA6AEB for ; Wed, 29 Nov 2017 03:14:57 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to :subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=sasl; bh=sHiCZBxGGfO4eo8DjSzqAGHRC Kg=; b=hmejCLkQ5e2TPEFnfkmkmrNpCJ7u2xANecqaaXgLNqvFYtHvyID1RWkAS TH+TGyk6vprYEs4RzvbNfS6RyGbl5TSC1U5a3baoUhShctsRHYaD4Cayoczqh5um wfqI843jEd2CfsxnRG/RM+4QKMwfx4G1OLsNGJtL+6SXe+L+EQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:subject :date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=sasl; b=gmgUFM2R1yWN4vlbKX0 SGDkkKjQC+bjo27ADweBzelk1SPRUX86BWguB0+kBuRMoeNS2aybygf/O6fEbN/I 7RN4goQ3NUFBG1Qnu2J4AB6FonfmI6DhHgIKc/2mzQfZLKdHjh9uraHF4LeXJufN 42Rf2hfoWIW4i4xwlCRleaTI= Original-Received: from pb-sasl2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl2.pobox.com (Postfix) with ESMTP id 4573CA6AE9 for ; Wed, 29 Nov 2017 03:14:57 -0500 (EST) Original-Received: from rusty (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl2.pobox.com (Postfix) with ESMTPSA id 6FAB7A6AE8 for ; Wed, 29 Nov 2017 03:14:56 -0500 (EST) X-Pobox-Relay-ID: 62A3467C-D4DD-11E7-8662-EA54894C8D7C-02397024!pb-sasl2.pobox.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 64.147.108.67 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:19390 Archived-At: Hi, In the last update (https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00010.html) I talked a lot about fixnums. This last couple weeks' work is along the same lines. Firstly I added support for specialized instructions that compare (in the < or =3D sense) s64 and u64 values against 8-bit immediates. Then I started to do some "exploding" of the comparison operators: for example, if a number is a 64-bit integer, maybe first see if it's a fixnum and take a fast path there, otherwise call out to a scm->u64 primitive. I was thinking maybe we would be able to inline even scm->u64 entirely, but this turned out to be a bad idea, and indeed even the speculative inlining of the fixnum case of e.g. < wasn't that great. The problem is that if you replace a simple primcall (define x FOO) with a diamond control structure like (define x (if FIXNUM? FOO-1 FOO-2)) early in the compiler, that makes it harder to do code motion like CSE or LICM on X, as X now has two definitions. It could make sense to reify a diamond control flow more towards the compiler back-end, but not in the beginning. Instruction explosion only makes sense if some of the exploded instructions can be eliminated or are themselves subject to CSE. If it's just exploding instructions that can never be eliminated, probably it's better to call out to the runtime (as in the case of scm->u64, unless it can be reduced to untag-fixnum). By this time I had backed myself into a little corner with number specialization, so I had to back out and fix my mess; took a week or so, but then I was back on track, with a new idea.=20=20 The problem I was looking to solve was to hoist a fixnum? check above code like (=3D x (logand x #xff)). This predicate will only succeed if X is a fixnum. I then realized it would be tricky, as it would involve hoisting the assertion made inside logand that X is an exact integer "above" (but to where?), and it could be that the assertion doesn't commute with other effects so it can't be hoisted. So I modified the problem :) Instead I wanted to optimize this kind of code: (define (t x) (unless (exact-integer? x) (error "expected an integer" x)) (unless (=3D x (logand x #xff) (error "out of range" x)) (* x 2)) or (define (t x) (unless (exact-integer? x) (error "expected an integer" x)) (unless (<=3D -10 x 100) (error "out of range" x)) (* x 2)) Here exact-integer? is effectively (or (fixnum? x) (bignum? x)), so we have a more precise type of X flowing into the predicate. However it's not magical; the "=3D" in the first example still sees that X could be a bignum, and likewise for the <=3D operations. What you want to do here is to "devirtualize" part of this function. This is a compiler pass that effectively inlines the concrete implementations of a virtual operation like =3D, logand, or <=3D. But you don't want to devirtualize just one operation; you want to devirtualize a trace of operations. Like this: (define (out-of-range x) (error "out of range" x)) (define (not-int x) (error "expected an integer" x)) (cond ((fixnum? x) (if (<=3D -10 x 100) (* x 2) (out-of-range x))) ((bignum? x) (if (<=3D -10 x 100) (* x 2) (out-of-range x))) (else (not-int x))) This causes code growth initially, but probably causes shrinkage later; concretely this code optimizes to: (define (out-of-range x) (error "out of range" x)) (define (not-int x) (error "expected an integer" x)) (cond ((fixnum? x) (let ((x* (untag-fixnum x))) (if (s64<=3D -10 x*) (if (s64<=3D x* 100) (tag-fixnum (s64+ x* x*)) (out-of-range x)) (out-of-range x)))) ((bignum? x) (out-of-range x) (else (error "expected an integer" x))) which is indeed what we get in code: Disassembly of # at #xb437a0: 0 (assert-nargs-ee/locals 2 0) ;; 2 slots (1 arg) at (unknown= file):10:0 1 (immediate-tag=3D? 0 3 2) ;; fixnum? at (unkno= wn file):11:10 3 (jne 11) ;; -> L1 4 (untag-fixnum 1 0) at (unknown= file):12:10 5 (s64-imm L2 7 (imm-s64 L2 9 (mov 0 1) at (unknown= file):13:2 10 (uadd 1 0 1)=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20 11 (tag-fixnum 0 1)=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 12 (handle-interrupts)=20=20=20=20=20=20=20=20=20=20=20=20=20 13 (return-values 2) ;; 1 value L1: 14 (immediate-tag=3D? 0 7 0) ;; heap-object? at (unkno= wn file):11:10 16 (jne 8) ;; -> L3 17 (heap-tag=3D? 0 4095 279) ;; bignum? 19 (jne 5) ;; -> L3 L2: 20 (throw/value 0 138) ;; #(misc-error #f "out of range = ~S") at (unknown file):12:25 22 (handle-interrupts)=20=20=20=20=20=20=20=20=20=20=20=20=20 23 (return-values 1) ;; 0 values L3: 24 (throw/value 0 150) ;; #(misc-error #f "expected an i= nteger ~S") at (unknown file):11:29 26 (handle-interrupts)=20=20=20=20=20=20=20=20=20=20=20=20=20 27 (return-values 1) ;; 0 values Pretty good code I think, though I should probably improve the disassembler to make more readable disassemblies. I would be interested if anyone knows if other compilers do this. It would appear that GCC and LLVM do not: https://www.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAKxAEZS= BnVAV2OUxAHIBSAJgGY8AO2QAbZlgDU3fgGEGBfEIIA6BDOzcADAEFtOhcWbICkgEZ5gQ5gFsZA= IX37mQvKiGThBTMEzFpAOyOupKhYcxeAA4ExAD6pgBmeAAe1nb8wTph2ZKGxqYWVraSAFSFaQ5O= AQAilbrOru6eyj5%2B5pZpsQCG6Oi0khAubh5ercQAlHU6Q02jvv5RMfHxqLHlxRCLcQSTGU66C= l0EeMjNosKYkhHKABzxngyxSakbMyMt856T9UH6OZLETAEVgeTYqZ5paS8ABsklo42k/FqSLhU2= 4NX2BgIRxOZwuV0a728ny2yyITxSkM2ymi20kyW%2Bel%2BIRyb2axLagNMMmqgXsknBlOKPIGU= GuBFp8QRDMRshksjhCPRCv66ORmX%2BoS5IIBQLRNVCmMOx1OwnOQkubLmbVJBBWsWt/mpEqWpg= ZVQ1OTwCVFYtt0qV/E0Qck0N4Sphir%2Bmq1QJ1tvt6xsA3dexZMdCmFEDEw0YzOW1xCJkrt5Ih= G1TmTVRuxJrxFvZJck5Zs9ttAytHxtjPRnuyhdBzpLCLwgpeNkDwewqLTTNq9QOtdxZvxnY5/hb= 3V6/UGhPZYy%2BHrzYQA9CfJAB1S4AayEqAA7pIekpgHDJPf3GBOKZUAA3PwJKID6SAQCCXLaKj= Hv2cZFo2rptl2TqbuS7abBG/Lwvq856IuOKmkI5qWnujpPtuHbEYhh4/H2YQDg8FLjgMeCBgAYs= 2Qqts%2BO7MZIIDtEUnFkWhWGYmuB7PgALORwz7iSPbMlkBYwR4XEDKpEDqepzHjDpWGcOMpCiF= wACsnCkEIXBaGZqBcHKvCOPZuQsGwlx8PwtBmQQln6QZ14gBJACcKgBSFoVhWFhlcBJZkWZwVmk= DZnBmQwIBaKQXlxfppBwLASBoDYkR4KIfjkJQ%2BWFcVxAoKIXRCMAxlaGlSSiN4xApRAZjeaQF= hCF0xAAJ5cB5pD5TYmDKAA8gRg2ZaQWA2LVwDFV1%2BCAiYeD/ils2YMkmDIMw3hDWZoxGbNMR4= DY3kGecZgpZABmoNEwxbQAtBN/DJc57B0NdJkxV1iXJDc0IvdCUliItkjGSoWgwwMuCECQULuaQ= kiyKgBVFW0bnwp5V2%2Bf5QXhcToWRZw0XmQDXDJal6X42TvD/bNiV45lOmkP%2BbXDP5QA I.e. each individual fixnum op is inlined, but GCC and LLVM don't peel out a trace for the fixnum case. I call this pass "integer devirtualization" but also use the term "peeling a trace". Words! So that's my status. Next up, I need to fix <=3D and >=3D on NaN values, as I didn't get to that from last time; then I will move on to instruction explosion of vector-ref and related instructions. But first, I promised Ludovic I would fix the compiler speed issue in slot allocation, both for 2.2 and 3.0, so I will do that :) Cheers, Andy