* SHA256 performance with Guile 2.2 vs. Guile 3.0 @ 2020-01-03 23:55 Ludovic Courtès 2020-01-04 0:40 ` Ludovic Courtès 0 siblings, 1 reply; 13+ messages in thread From: Ludovic Courtès @ 2020-01-03 23:55 UTC (permalink / raw) To: Guile Devel; +Cc: Andy Wingo [-- Attachment #1: Type: text/plain, Size: 1640 bytes --] Hello Guilers! Göran Weinholt wrote a pure-Scheme (R6RS) implementation of common cryptographic hash functions: https://github.com/weinholt/hashing I thought this would make a useful benchmark (and probably favorable to JIT because it’s one hot loop), so here goes (computing the SHA256 hash of ‘guile-2.2.6.tar.xz’ with the code below, compiled with -O2, Guile 2.2.6 vs. 2.9.8): --8<---------------cut here---------------start------------->8--- ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile guile-hashing -- guile ~/tmp/sha256.scm ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") clock utime stime cutime cstime gctime 135.07 164.13 0.52 0.00 0.00 42.14 ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") clock utime stime cutime cstime gctime 65.17 89.75 0.45 0.00 0.00 35.63 --8<---------------cut here---------------end--------------->8--- Guile 3 is twice as fast! No difference if we force ‘-O3’. Still far from the libgcrypt implementation in C + asm, but hey! --8<---------------cut here---------------start------------->8--- ludo@ribbon ~/src/guix$ time guix hash -f hex "/gnu/store/zfp7d4wr5hbl5lrnzs8c15bsc3ygq74g-guile-2.2.6.tar.xz" b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988 real 0m0.097s user 0m0.093s sys 0m0.024s --8<---------------cut here---------------end--------------->8--- To be continued… Ludo’. [-- Attachment #2: The code --] [-- Type: text/plain, Size: 756 bytes --] (use-modules (hashing sha-2) (rnrs bytevectors) (ice-9 binary-ports) (ice-9 match) (ice-9 time)) (define (port-sha256 port) ;; Return the SHA256 of the data read from PORT. (define bv (make-bytevector 65536)) (define hash (make-sha-256)) (let loop () (match (get-bytevector-n! port bv 0 (bytevector-length bv)) ((? eof-object?) (sha-256-finish! hash) hash) (n (sha-256-update! hash bv 0 n) (loop))))) (define (file-sha256 file) ;; Return the SHA256 of FILE. (call-with-input-file file port-sha256)) (time (pk 'hash (sha-256->string (file-sha256 "/gnu/store/zfp7d4wr5hbl5lrnzs8c15bsc3ygq74g-guile-2.2.6.tar.xz")))) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-03 23:55 SHA256 performance with Guile 2.2 vs. Guile 3.0 Ludovic Courtès @ 2020-01-04 0:40 ` Ludovic Courtès 2020-01-04 10:28 ` Göran Weinholt 2020-01-05 9:48 ` Andy Wingo 0 siblings, 2 replies; 13+ messages in thread From: Ludovic Courtès @ 2020-01-04 0:40 UTC (permalink / raw) To: Guile Devel; +Cc: Andy Wingo [-- Attachment #1: Type: text/plain, Size: 1063 bytes --] Ludovic Courtès <ludo@gnu.org> skribis: > ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm > > ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") > clock utime stime cutime cstime gctime > 65.17 89.75 0.45 0.00 0.00 35.63 The patch below gives us: --8<---------------cut here---------------start------------->8--- ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") clock utime stime cutime cstime gctime 59.31 80.65 0.39 0.00 0.00 30.73 --8<---------------cut here---------------end--------------->8--- It’s a disappointingly small improvement. The reason is that (hashing fixnums) adds another layer of opacity, where it ends up doing essentially: (define fx32xor fxxor) … Thus, no inlining, and no easy trick to solve that. :-/ Anyhow, I think the patch is probably a good idea. WDYT? Thanks, Ludo’. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 8653 bytes --] diff --git a/module/rnrs/arithmetic/fixnums.scm b/module/rnrs/arithmetic/fixnums.scm index 4ec1cae0c..c30807eb5 100644 --- a/module/rnrs/arithmetic/fixnums.scm +++ b/module/rnrs/arithmetic/fixnums.scm @@ -1,6 +1,6 @@ ;;; fixnums.scm --- The R6RS fixnums arithmetic library -;; Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc. +;; Copyright (C) 2010, 2011, 2013, 2020 Free Software Foundation, Inc. ;; ;; This library is free software; you can redistribute it and/or ;; modify it under the terms of the GNU Lesser General Public @@ -75,25 +75,26 @@ fxrotate-bit-field fxreverse-bit-field) (import (only (guile) ash - cons* - define-inlinable - inexact->exact - logand - logbit? - logcount - logior - lognot - logxor - most-positive-fixnum - most-negative-fixnum - object-address) + cons* + define-inlinable + inexact->exact + logand + logbit? + logcount + logior + lognot + logxor + most-positive-fixnum + most-negative-fixnum + object-address) (ice-9 optargs) (rnrs base (6)) (rnrs control (6)) (rnrs arithmetic bitwise (6)) (rnrs conditions (6)) (rnrs exceptions (6)) - (rnrs lists (6))) + (rnrs lists (6)) + (rnrs syntax-case (6))) (define fixnum-width (let ((w (do ((i 0 (+ 1 i)) @@ -121,70 +122,105 @@ (or (for-all inline-fixnum? args) (raise (make-assertion-violation)))) (define-syntax define-fxop* + (lambda (s) + (syntax-case s () + ((_ name op) + (with-syntax ((proc (datum->syntax + #'name + (string->symbol + (string-append "%" + (symbol->string + (syntax->datum #'name)) + "-proc"))))) + #'(begin + ;; Define a procedure for when the inline case doesn't + ;; apply. + (define proc + (case-lambda + ((x y) + (assert-fixnum x y) + (op x y)) + (args + (assert-fixnums args) + (apply op args)))) + + (define-syntax name + (lambda (s) + (syntax-case s () + ((_ args (... ...)) + #'(begin + (assert-fixnum args (... ...)) + (op args (... ...)))) + (x + (identifier? #'x) + #'proc)))))))))) + + (define-syntax define-alias (syntax-rules () - ((_ name op) - (define name - (case-lambda - ((x y) - (assert-fixnum x y) - (op x y)) - (args - (assert-fixnums args) - (apply op args))))))) + ((_ new old) + (define-syntax new (identifier-syntax old))))) ;; All these predicates don't check their arguments for fixnum-ness, ;; as this doesn't seem to be strictly required by R6RS. - (define fx=? =) - (define fx>? >) - (define fx<? <) - (define fx>=? >=) - (define fx<=? <=) + (define-alias fx=? =) + (define-alias fx>? >) + (define-alias fx<? <) + (define-alias fx>=? >=) + (define-alias fx<=? <=) - (define fxzero? zero?) - (define fxpositive? positive?) - (define fxnegative? negative?) - (define fxodd? odd?) - (define fxeven? even?) + (define-alias fxzero? zero?) + (define-alias fxpositive? positive?) + (define-alias fxnegative? negative?) + (define-alias fxodd? odd?) + (define-alias fxeven? even?) (define-fxop* fxmax max) (define-fxop* fxmin min) - (define (fx+ fx1 fx2) + (define-inlinable (fx+ fx1 fx2) (assert-fixnum fx1 fx2) (let ((r (+ fx1 fx2))) (or (inline-fixnum? r) (raise (make-implementation-restriction-violation))) r)) - (define (fx* fx1 fx2) + (define-inlinable (fx* fx1 fx2) (assert-fixnum fx1 fx2) (let ((r (* fx1 fx2))) (or (inline-fixnum? r) (raise (make-implementation-restriction-violation))) r)) - (define* (fx- fx1 #:optional fx2) - (assert-fixnum fx1) - (if fx2 - (begin - (assert-fixnum fx2) - (let ((r (- fx1 fx2))) - (or (inline-fixnum? r) (raise (make-assertion-violation))) - r)) - (let ((r (- fx1))) - (or (inline-fixnum? r) (raise (make-assertion-violation))) - r))) - - (define (fxdiv fx1 fx2) + (define-syntax fx- + (lambda (s) + (syntax-case s () + ((_ fx) + #'(begin + (assert-fixnum fx) + (let ((r (- fx))) + (unless (inline-fixnum? r) (raise (make-assertion-violation))) + (- fx)))) + ((_ fx1 fx2) + #'(begin + (assert-fixnum fx1) + (assert-fixnum fx2) + (let ((r (- fx1 fx2))) + (unless (inline-fixnum? r) (raise (make-assertion-violation))) + r))) + (x + (identifier? #'x) + #'-)))) + + (define-inlinable (fxdiv fx1 fx2) (assert-fixnum fx1 fx2) (div fx1 fx2)) - (define (fxmod fx1 fx2) + (define-inlinable (fxmod fx1 fx2) (assert-fixnum fx1 fx2) (mod fx1 fx2)) - (define (fxdiv-and-mod fx1 fx2) + (define-inlinable (fxdiv-and-mod fx1 fx2) (assert-fixnum fx1 fx2) (div-and-mod fx1 fx2)) @@ -221,71 +257,71 @@ (s1 (div0 s (expt 2 (fixnum-width))))) (values s0 s1))) - (define (fxnot fx) (assert-fixnum fx) (lognot fx)) + (define-inlinable (fxnot fx) (assert-fixnum fx) (lognot fx)) (define-fxop* fxand logand) (define-fxop* fxior logior) (define-fxop* fxxor logxor) - (define (fxif fx1 fx2 fx3) + (define-inlinable (fxif fx1 fx2 fx3) (assert-fixnum fx1 fx2 fx3) (bitwise-if fx1 fx2 fx3)) - (define (fxbit-count fx) + (define-inlinable (fxbit-count fx) (assert-fixnum fx) (if (negative? fx) (bitwise-not (logcount fx)) (logcount fx))) - (define (fxlength fx) (assert-fixnum fx) (bitwise-length fx)) - (define (fxfirst-bit-set fx) (assert-fixnum fx) (bitwise-first-bit-set fx)) - (define (fxbit-set? fx1 fx2) (assert-fixnum fx1 fx2) (logbit? fx2 fx1)) + (define-inlinable (fxlength fx) (assert-fixnum fx) (bitwise-length fx)) + (define-inlinable (fxfirst-bit-set fx) (assert-fixnum fx) (bitwise-first-bit-set fx)) + (define-inlinable (fxbit-set? fx1 fx2) (assert-fixnum fx1 fx2) (logbit? fx2 fx1)) - (define (fxcopy-bit fx1 fx2 fx3) + (define-inlinable (fxcopy-bit fx1 fx2 fx3) (assert-fixnum fx1 fx2 fx3) (unless (and (<= 0 fx2) (< fx2 (fixnum-width))) (raise (make-assertion-violation))) (bitwise-copy-bit fx1 fx2 fx3)) - (define (fxbit-field fx1 fx2 fx3) + (define-inlinable (fxbit-field fx1 fx2 fx3) (assert-fixnum fx1 fx2 fx3) (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width))) (raise (make-assertion-violation))) (bitwise-bit-field fx1 fx2 fx3)) - (define (fxcopy-bit-field fx1 fx2 fx3 fx4) + (define-inlinable (fxcopy-bit-field fx1 fx2 fx3 fx4) (assert-fixnum fx1 fx2 fx3 fx4) (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width))) (raise (make-assertion-violation))) (bitwise-copy-bit-field fx1 fx2 fx3 fx4)) - (define (fxarithmetic-shift fx1 fx2) + (define-inlinable (fxarithmetic-shift fx1 fx2) (assert-fixnum fx1 fx2) (unless (< (abs fx2) (fixnum-width)) (raise (make-assertion-violation))) (ash fx1 fx2)) - (define (fxarithmetic-shift-left fx1 fx2) + (define-inlinable (fxarithmetic-shift-left fx1 fx2) (assert-fixnum fx1 fx2) (unless (and (<= 0 fx2) (< fx2 (fixnum-width))) (raise (make-assertion-violation))) (ash fx1 fx2)) - (define (fxarithmetic-shift-right fx1 fx2) + (define-inlinable (fxarithmetic-shift-right fx1 fx2) (assert-fixnum fx1 fx2) (unless (and (<= 0 fx2) (< fx2 (fixnum-width))) (raise (make-assertion-violation))) (ash fx1 (- fx2))) - (define (fxrotate-bit-field fx1 fx2 fx3 fx4) + (define-inlinable (fxrotate-bit-field fx1 fx2 fx3 fx4) (assert-fixnum fx1 fx2 fx3 fx4) (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width)) (< fx4 (- fx3 fx2))) (raise (make-assertion-violation))) (bitwise-rotate-bit-field fx1 fx2 fx3 fx4)) - (define (fxreverse-bit-field fx1 fx2 fx3) + (define-inlinable (fxreverse-bit-field fx1 fx2 fx3) (assert-fixnum fx1 fx2 fx3) (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width))) (raise (make-assertion-violation))) (bitwise-reverse-bit-field fx1 fx2 fx3)) -) + ) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-04 0:40 ` Ludovic Courtès @ 2020-01-04 10:28 ` Göran Weinholt 2020-01-04 15:51 ` Nala Ginrut ` (2 more replies) 2020-01-05 9:48 ` Andy Wingo 1 sibling, 3 replies; 13+ messages in thread From: Göran Weinholt @ 2020-01-04 10:28 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Andy Wingo, Guile Devel Ludovic Courtès <ludo@gnu.org> writes: > Ludovic Courtès <ludo@gnu.org> skribis: > >> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm >> >> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") >> clock utime stime cutime cstime gctime >> 65.17 89.75 0.45 0.00 0.00 35.63 > > The patch below gives us: > > ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm > > ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") > clock utime stime cutime cstime gctime > 59.31 80.65 0.39 0.00 0.00 30.73 > > It’s a disappointingly small improvement. The reason is that (hashing > fixnums) adds another layer of opacity, where it ends up doing > essentially: > > (define fx32xor fxxor) > … > > Thus, no inlining, and no easy trick to solve that. :-/ I've pushed a Guile-specific version of (hashing fixnums) that inlines the generic arithmetic procedures. This and some other small changes improved the runtime: clock utime stime cutime cstime gctime before: 2.2.6 31.06 32.61 0.03 0.00 0.00 1.38 2.9.8 15.55 16.23 0.01 0.00 0.00 1.19 after: 2.2.6 2.95 3.01 0.00 0.00 0.00 0.10 2.9.8 1.98 1.99 0.00 0.00 0.00 0.08 That's about 100 times slower than sha256sum from coreutils. You might get some more performance out of it by unrolling the loop in sha-256-transform!. Regards, -- Göran Weinholt | https://weinholt.se/ Debian developer | 73 de SA6CJK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-04 10:28 ` Göran Weinholt @ 2020-01-04 15:51 ` Nala Ginrut 2020-01-04 21:39 ` Arne Babenhauserheide 2020-01-06 9:37 ` Ludovic Courtès 2 siblings, 0 replies; 13+ messages in thread From: Nala Ginrut @ 2020-01-04 15:51 UTC (permalink / raw) To: Göran Weinholt; +Cc: Andy Wingo, Ludovic Courtès, Guile Devel [-- Attachment #1: Type: text/plain, Size: 2035 bytes --] Congrats! I just replaced Weinholt's hmac implementation with NSS binding for product consideration, but nice to know this great result! And thanks to Weinholt's work, Artanis had been relying on it for many years. Best regards. On Sat, Jan 4, 2020, 18:36 Göran Weinholt <goran@weinholt.se> wrote: > Ludovic Courtès <ludo@gnu.org> writes: > > > Ludovic Courtès <ludo@gnu.org> skribis: > > > >> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure > --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm > >> > >> ;;; (hash > "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") > >> clock utime stime cutime cstime gctime > >> 65.17 89.75 0.45 0.00 0.00 35.63 > > > > The patch below gives us: > > > > ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm > > > > ;;; (hash > "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") > > clock utime stime cutime cstime gctime > > 59.31 80.65 0.39 0.00 0.00 30.73 > > > > It’s a disappointingly small improvement. The reason is that (hashing > > fixnums) adds another layer of opacity, where it ends up doing > > essentially: > > > > (define fx32xor fxxor) > > … > > > > Thus, no inlining, and no easy trick to solve that. :-/ > > I've pushed a Guile-specific version of (hashing fixnums) that inlines > the generic arithmetic procedures. This and some other small changes > improved the runtime: > > clock utime stime cutime cstime gctime > before: > 2.2.6 31.06 32.61 0.03 0.00 0.00 1.38 > 2.9.8 15.55 16.23 0.01 0.00 0.00 1.19 > after: > 2.2.6 2.95 3.01 0.00 0.00 0.00 0.10 > 2.9.8 1.98 1.99 0.00 0.00 0.00 0.08 > > That's about 100 times slower than sha256sum from coreutils. You might > get some more performance out of it by unrolling the loop in > sha-256-transform!. > > Regards, > > -- > Göran Weinholt | https://weinholt.se/ > Debian developer | 73 de SA6CJK > > [-- Attachment #2: Type: text/html, Size: 2890 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-04 10:28 ` Göran Weinholt 2020-01-04 15:51 ` Nala Ginrut @ 2020-01-04 21:39 ` Arne Babenhauserheide 2020-01-05 14:28 ` Göran Weinholt 2020-01-06 9:37 ` Ludovic Courtès 2 siblings, 1 reply; 13+ messages in thread From: Arne Babenhauserheide @ 2020-01-04 21:39 UTC (permalink / raw) To: guile-devel; +Cc: Andy Wingo, Ludovic Courtès Göran Weinholt <goran@weinholt.se> writes: > I've pushed a Guile-specific version of (hashing fixnums) that inlines > the generic arithmetic procedures. This and some other small changes > improved the runtime: > > clock utime stime cutime cstime gctime > before: > 2.2.6 31.06 32.61 0.03 0.00 0.00 1.38 > 2.9.8 15.55 16.23 0.01 0.00 0.00 1.19 > after: > 2.2.6 2.95 3.01 0.00 0.00 0.00 0.10 > 2.9.8 1.98 1.99 0.00 0.00 0.00 0.08 > > That's about 100 times slower than sha256sum from coreutils. You might > get some more performance out of it by unrolling the loop in > sha-256-transform!. I’d like to highlight the understatement: The new version is about factor 10 faster on 2.2.6 and factor 8 faster on 2.9.8 (and the new version on Guile 2.9.8 is factor 15 faster than the old version with the old Guile). That’s an awesome speedup! (can you show the change/patch that did this? Sidenote: It would be great to have a list of performance hints for Guile) Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-04 21:39 ` Arne Babenhauserheide @ 2020-01-05 14:28 ` Göran Weinholt 0 siblings, 0 replies; 13+ messages in thread From: Göran Weinholt @ 2020-01-05 14:28 UTC (permalink / raw) To: Arne Babenhauserheide; +Cc: Andy Wingo, Ludovic Courtès, guile-devel Arne Babenhauserheide <arne_bab@web.de> writes: > Göran Weinholt <goran@weinholt.se> writes: >> I've pushed a Guile-specific version of (hashing fixnums) that inlines >> the generic arithmetic procedures. This and some other small changes >> improved the runtime: >> >> clock utime stime cutime cstime gctime >> before: >> 2.2.6 31.06 32.61 0.03 0.00 0.00 1.38 >> 2.9.8 15.55 16.23 0.01 0.00 0.00 1.19 >> after: >> 2.2.6 2.95 3.01 0.00 0.00 0.00 0.10 >> 2.9.8 1.98 1.99 0.00 0.00 0.00 0.08 >> >> That's about 100 times slower than sha256sum from coreutils. You might >> get some more performance out of it by unrolling the loop in >> sha-256-transform!. > > I’d like to highlight the understatement: The new version is about > factor 10 faster on 2.2.6 and factor 8 faster on 2.9.8 (and the new > version on Guile 2.9.8 is factor 15 faster than the old version with the > old Guile). > > That’s an awesome speedup! > > (can you show the change/patch that did this? Sidenote: It would be > great to have a list of performance hints for Guile) The commit is here: https://github.com/weinholt/hashing/commit/eb28080180b5fdfb6ffc74f8cdf2c1a7823ef0cb The relevant changes are: * The define-fx macro previously returned: #'(define name (if (> (fixnum-width) bit-width) fxname bitwise-name)) This was bad for three reasons: Guile does not do constant folding on (fixnum-width); in Guile the fixnum procedures are slower than the generic procedures; and Guile 2.2 did not inline those defines. It now returns: #'(define-syntax name (identifier-syntax bitwise-name)) So the generic procedures like +, bitwise-and, etc are used instead of fx+ and fxand. The use of define-syntax and identifier-syntax is basically manual inlining for the benefit of Guile 2.2. * I replaced some instances of fixnum operations with the ones from (hashing fixnums), which benefit from the above. * I changed some comparisons with constant fixnums to use eqv? instead of fx=?, e.g. (eqv? i 8) instead of (fx=? i 8). This is commonly faster, not only in Guile, because in implementations with immediate fixnums it can be done with only a simple compare + branch. I don't know if Guile takes advantage of this fact, but it's still faster than fx=?. Regards, -- Göran Weinholt | https://weinholt.se/ Debian developer | 73 de SA6CJK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-04 10:28 ` Göran Weinholt 2020-01-04 15:51 ` Nala Ginrut 2020-01-04 21:39 ` Arne Babenhauserheide @ 2020-01-06 9:37 ` Ludovic Courtès 2 siblings, 0 replies; 13+ messages in thread From: Ludovic Courtès @ 2020-01-06 9:37 UTC (permalink / raw) To: Göran Weinholt; +Cc: Andy Wingo, Guile Devel Hi Göran, Göran Weinholt <goran@weinholt.se> skribis: > I've pushed a Guile-specific version of (hashing fixnums) that inlines > the generic arithmetic procedures. This and some other small changes > improved the runtime: > > clock utime stime cutime cstime gctime > before: > 2.2.6 31.06 32.61 0.03 0.00 0.00 1.38 > 2.9.8 15.55 16.23 0.01 0.00 0.00 1.19 > after: > 2.2.6 2.95 3.01 0.00 0.00 0.00 0.10 > 2.9.8 1.98 1.99 0.00 0.00 0.00 0.08 Excellent! > That's about 100 times slower than sha256sum from coreutils. You might > get some more performance out of it by unrolling the loop in > sha-256-transform!. I suppose that’s a challenge for Andy. ;-) Thanks, Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-04 0:40 ` Ludovic Courtès 2020-01-04 10:28 ` Göran Weinholt @ 2020-01-05 9:48 ` Andy Wingo 2020-01-06 9:47 ` Ludovic Courtès 1 sibling, 1 reply; 13+ messages in thread From: Andy Wingo @ 2020-01-05 9:48 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guile Devel On Sat 04 Jan 2020 01:40, Ludovic Courtès <ludo@gnu.org> writes: > Ludovic Courtès <ludo@gnu.org> skribis: > >> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm >> >> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") >> clock utime stime cutime cstime gctime >> 65.17 89.75 0.45 0.00 0.00 35.63 > > (define fx32xor fxxor) > … From a speed perspective I think there is one major issue and one minor issue. The major issue is that we don't do cross-module inlining. But now that we have declarative modules, this is a possibility: https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00026.html https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00027.html With cross-module inlining of "small" definitions, I think we would solve a lot of this kind of problem. I think we could add this during 3.0 and for this reason I would hesitate to apply this patch for 3.0 because it changes "fx+" exports to be macros rather than "normal" values in the ABI. WDYT? The minor issue, at least relatively speaking, is that IMO the (rnrs arithmetic fixnums) API is not appropriate for bitwise operations. When you do bitwise operations and you want to ensure that you're within some fixed domain, it's best to do e.g. "(logand x #xffffffff)" on operands and results. Guile will optimize this well. The good optimization isn't fixnum vs other kinds of numbers, it's unboxing to raw unsigned integers; and you usually want to exclude negative numbers. fx+ doesn't help with that. Cheers, Andy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-05 9:48 ` Andy Wingo @ 2020-01-06 9:47 ` Ludovic Courtès 2020-01-06 19:52 ` Andy Wingo 0 siblings, 1 reply; 13+ messages in thread From: Ludovic Courtès @ 2020-01-06 9:47 UTC (permalink / raw) To: Andy Wingo; +Cc: Guile Devel Hello, Andy Wingo <wingo@pobox.com> skribis: > On Sat 04 Jan 2020 01:40, Ludovic Courtès <ludo@gnu.org> writes: > >> Ludovic Courtès <ludo@gnu.org> skribis: >> >>> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm >>> >>> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988") >>> clock utime stime cutime cstime gctime >>> 65.17 89.75 0.45 0.00 0.00 35.63 >> >> (define fx32xor fxxor) >> … > > From a speed perspective I think there is one major issue and one minor > issue. > > The major issue is that we don't do cross-module inlining. But now that > we have declarative modules, this is a possibility: > > https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00026.html > https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00027.html Neat. Storing Tree-IL in object files reminds me of LTO in GCC & co. > With cross-module inlining of "small" definitions, I think we would > solve a lot of this kind of problem. I think we could add this during > 3.0 and for this reason I would hesitate to apply this patch for 3.0 > because it changes "fx+" exports to be macros rather than "normal" > values in the ABI. WDYT? I agree that cross-module inlining is the better fix whereas this patch is the immediate workaround. Are you confident that cross-module inlining can happen be added without introducing incompatibilities over in the 3.0 series? (At first sight it seems tricky to me, notably because we’d have to store Tree-IL in object files, which introduces compatibility and thus external representation versioning considerations.) If you do, then it’s fine to drop this patch. If conversely cross-module inlining might take longer, then we can have this patch in and drop it in 3.2. Your call! (I guess I’m not being that helpful here. :-)) > The minor issue, at least relatively speaking, is that IMO the (rnrs > arithmetic fixnums) API is not appropriate for bitwise operations. When > you do bitwise operations and you want to ensure that you're within some > fixed domain, it's best to do e.g. "(logand x #xffffffff)" on operands > and results. Guile will optimize this well. The good optimization > isn't fixnum vs other kinds of numbers, it's unboxing to raw unsigned > integers; and you usually want to exclude negative numbers. fx+ doesn't > help with that. I agree, of course. The tragedy is that people would use this (clumsy) API in the hope of getting better performance, and the result ends up being worse performance, at least on today’s Guile (perhaps also on other implementations?). Thanks, Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-06 9:47 ` Ludovic Courtès @ 2020-01-06 19:52 ` Andy Wingo 2020-01-07 11:08 ` Ludovic Courtès 0 siblings, 1 reply; 13+ messages in thread From: Andy Wingo @ 2020-01-06 19:52 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Andy Wingo, Guile Devel On Mon 06 Jan 2020 10:47, Ludovic Courtès <ludo@gnu.org> writes: > Andy Wingo <wingo@pobox.com> skribis: > >> With cross-module inlining of "small" definitions, I think we would >> solve a lot of this kind of problem. I think we could add this during >> 3.0 and for this reason I would hesitate to apply this patch for 3.0 >> because it changes "fx+" exports to be macros rather than "normal" >> values in the ABI. WDYT? > > I agree that cross-module inlining is the better fix whereas this patch > is the immediate workaround. > > Are you confident that cross-module inlining can happen be added without > introducing incompatibilities over in the 3.0 series? (At first sight > it seems tricky to me, notably because we’d have to store Tree-IL in > object files, which introduces compatibility and thus external > representation versioning considerations.) Concretely I would add a little part of the compiler to the Tree-IL phase to serialize a bytecode for the "small" definitions in the module, for declarative modules, both public and private (because public definitions may alias private definitions). This would be stored as a bytevector in an additional field of the module, and the program being compiled would be transformed to initialize the "lto" field (placeholder name) of the module, so that once the compiled module is loaded, we have the inlinable bindings. I think this can be done compatibly. > If you do, then it’s fine to drop this patch. If conversely > cross-module inlining might take longer, then we can have this patch in > and drop it in 3.2. Your call! (I guess I’m not being that helpful > here. :-)) :) I hesitate to land this patch because we haven't shown that it significantly helps things, it would need to be undone, and it makes the ABI more fragile. So if that's OK let's do nothing :) Cheers, Andy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-06 19:52 ` Andy Wingo @ 2020-01-07 11:08 ` Ludovic Courtès 2020-01-07 21:45 ` Andy Wingo 0 siblings, 1 reply; 13+ messages in thread From: Ludovic Courtès @ 2020-01-07 11:08 UTC (permalink / raw) To: Andy Wingo; +Cc: Guile Devel Hi! Andy Wingo <wingo@pobox.com> skribis: > On Mon 06 Jan 2020 10:47, Ludovic Courtès <ludo@gnu.org> writes: > >> Andy Wingo <wingo@pobox.com> skribis: >> >>> With cross-module inlining of "small" definitions, I think we would >>> solve a lot of this kind of problem. I think we could add this during >>> 3.0 and for this reason I would hesitate to apply this patch for 3.0 >>> because it changes "fx+" exports to be macros rather than "normal" >>> values in the ABI. WDYT? >> >> I agree that cross-module inlining is the better fix whereas this patch >> is the immediate workaround. >> >> Are you confident that cross-module inlining can happen be added without >> introducing incompatibilities over in the 3.0 series? (At first sight >> it seems tricky to me, notably because we’d have to store Tree-IL in >> object files, which introduces compatibility and thus external >> representation versioning considerations.) > > Concretely I would add a little part of the compiler to the Tree-IL > phase to serialize a bytecode for the "small" definitions in the module, > for declarative modules, both public and private (because public > definitions may alias private definitions). This would be stored as a > bytevector in an additional field of the module, and the program being > compiled would be transformed to initialize the "lto" field (placeholder > name) of the module, so that once the compiled module is loaded, we have > the inlinable bindings. I think this can be done compatibly. OK, sounds great. What are your thoughts about versioning that wire Tree-IL representation? >> If you do, then it’s fine to drop this patch. If conversely >> cross-module inlining might take longer, then we can have this patch in >> and drop it in 3.2. Your call! (I guess I’m not being that helpful >> here. :-)) > > :) > > I hesitate to land this patch because we haven't shown that it > significantly helps things, it would need to be undone, and it makes the > ABI more fragile. So if that's OK let's do nothing :) Alright, fine with me! Thanks, Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-07 11:08 ` Ludovic Courtès @ 2020-01-07 21:45 ` Andy Wingo 2020-01-07 22:59 ` Ludovic Courtès 0 siblings, 1 reply; 13+ messages in thread From: Andy Wingo @ 2020-01-07 21:45 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Andy Wingo, Guile Devel On Tue 07 Jan 2020 12:08, Ludovic Courtès <ludo@gnu.org> writes: > Andy Wingo <wingo@pobox.com> skribis: > >> Concretely I would add a little part of the compiler to the Tree-IL >> phase to serialize a bytecode for the "small" definitions in the module, >> for declarative modules, both public and private (because public >> definitions may alias private definitions). This would be stored as a >> bytevector in an additional field of the module, and the program being >> compiled would be transformed to initialize the "lto" field (placeholder >> name) of the module, so that once the compiled module is loaded, we have >> the inlinable bindings. I think this can be done compatibly. > > OK, sounds great. What are your thoughts about versioning that wire > Tree-IL representation? It would be a little bytecode language, with its own versioning considerations. It would need to have a translation to and from Tree-IL, though not necessarily lossless. It would change only in ABI-compatible ways, using the bytecode version of the Guile doing the compilation as a proxy for what is OK to support. A ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0 2020-01-07 21:45 ` Andy Wingo @ 2020-01-07 22:59 ` Ludovic Courtès 0 siblings, 0 replies; 13+ messages in thread From: Ludovic Courtès @ 2020-01-07 22:59 UTC (permalink / raw) To: Andy Wingo; +Cc: Guile Devel Andy Wingo <wingo@pobox.com> skribis: > On Tue 07 Jan 2020 12:08, Ludovic Courtès <ludo@gnu.org> writes: > >> Andy Wingo <wingo@pobox.com> skribis: >> >>> Concretely I would add a little part of the compiler to the Tree-IL >>> phase to serialize a bytecode for the "small" definitions in the module, >>> for declarative modules, both public and private (because public >>> definitions may alias private definitions). This would be stored as a >>> bytevector in an additional field of the module, and the program being >>> compiled would be transformed to initialize the "lto" field (placeholder >>> name) of the module, so that once the compiled module is loaded, we have >>> the inlinable bindings. I think this can be done compatibly. >> >> OK, sounds great. What are your thoughts about versioning that wire >> Tree-IL representation? > > It would be a little bytecode language, with its own versioning > considerations. It would need to have a translation to and from > Tree-IL, though not necessarily lossless. It would change only in > ABI-compatible ways, using the bytecode version of the Guile doing the > compilation as a proxy for what is OK to support. I see, that makes a lot of sense to me. Thanks for explaining! Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-01-07 22:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-03 23:55 SHA256 performance with Guile 2.2 vs. Guile 3.0 Ludovic Courtès 2020-01-04 0:40 ` Ludovic Courtès 2020-01-04 10:28 ` Göran Weinholt 2020-01-04 15:51 ` Nala Ginrut 2020-01-04 21:39 ` Arne Babenhauserheide 2020-01-05 14:28 ` Göran Weinholt 2020-01-06 9:37 ` Ludovic Courtès 2020-01-05 9:48 ` Andy Wingo 2020-01-06 9:47 ` Ludovic Courtès 2020-01-06 19:52 ` Andy Wingo 2020-01-07 11:08 ` Ludovic Courtès 2020-01-07 21:45 ` Andy Wingo 2020-01-07 22:59 ` Ludovic Courtès
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).