unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* SHA256 performance with Guile 2.2 vs. Guile 3.0
@ 2020-01-03 23:55 Ludovic Courtès
  2020-01-04  0:40 ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-01-03 23:55 UTC (permalink / raw)
  To: Guile Devel; +Cc: Andy Wingo

[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]

Hello Guilers!

Göran Weinholt wrote a pure-Scheme (R6RS) implementation of common
cryptographic hash functions:

  https://github.com/weinholt/hashing

I thought this would make a useful benchmark (and probably favorable to
JIT because it’s one hot loop), so here goes (computing the SHA256 hash
of ‘guile-2.2.6.tar.xz’ with the code below, compiled with -O2, Guile
2.2.6 vs. 2.9.8):

--8<---------------cut here---------------start------------->8---
ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile guile-hashing -- guile ~/tmp/sha256.scm

;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
clock utime stime cutime cstime gctime
135.07 164.13  0.52   0.00   0.00  42.14
ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm

;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
clock utime stime cutime cstime gctime
65.17 89.75  0.45   0.00   0.00  35.63
--8<---------------cut here---------------end--------------->8---

Guile 3 is twice as fast!  No difference if we force ‘-O3’.

Still far from the libgcrypt implementation in C + asm, but hey!

--8<---------------cut here---------------start------------->8---
ludo@ribbon ~/src/guix$ time guix hash -f hex "/gnu/store/zfp7d4wr5hbl5lrnzs8c15bsc3ygq74g-guile-2.2.6.tar.xz"
b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988

real	0m0.097s
user	0m0.093s
sys	0m0.024s
--8<---------------cut here---------------end--------------->8---

To be continued…

Ludo’.


[-- Attachment #2: The code --]
[-- Type: text/plain, Size: 756 bytes --]

(use-modules (hashing sha-2)
             (rnrs bytevectors)
             (ice-9 binary-ports)
             (ice-9 match)
             (ice-9 time))

(define (port-sha256 port)
  ;; Return the SHA256 of the data read from PORT.
  (define bv (make-bytevector 65536))
  (define hash (make-sha-256))

  (let loop ()
    (match (get-bytevector-n! port bv 0
                              (bytevector-length bv))
      ((? eof-object?)
       (sha-256-finish! hash)
       hash)
      (n
       (sha-256-update! hash bv 0 n)
       (loop)))))

(define (file-sha256 file)
  ;; Return the SHA256 of FILE.
  (call-with-input-file file port-sha256))

(time (pk 'hash (sha-256->string (file-sha256 "/gnu/store/zfp7d4wr5hbl5lrnzs8c15bsc3ygq74g-guile-2.2.6.tar.xz"))))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-03 23:55 SHA256 performance with Guile 2.2 vs. Guile 3.0 Ludovic Courtès
@ 2020-01-04  0:40 ` Ludovic Courtès
  2020-01-04 10:28   ` Göran Weinholt
  2020-01-05  9:48   ` Andy Wingo
  0 siblings, 2 replies; 13+ messages in thread
From: Ludovic Courtès @ 2020-01-04  0:40 UTC (permalink / raw)
  To: Guile Devel; +Cc: Andy Wingo

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>
> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> clock utime stime cutime cstime gctime
> 65.17 89.75  0.45   0.00   0.00  35.63

The patch below gives us:

--8<---------------cut here---------------start------------->8---
ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm

;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
clock utime stime cutime cstime gctime
59.31 80.65  0.39   0.00   0.00  30.73
--8<---------------cut here---------------end--------------->8---

It’s a disappointingly small improvement.  The reason is that (hashing
fixnums) adds another layer of opacity, where it ends up doing
essentially:

  (define fx32xor fxxor)
  …

Thus, no inlining, and no easy trick to solve that.  :-/

Anyhow, I think the patch is probably a good idea.  WDYT?

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 8653 bytes --]

diff --git a/module/rnrs/arithmetic/fixnums.scm b/module/rnrs/arithmetic/fixnums.scm
index 4ec1cae0c..c30807eb5 100644
--- a/module/rnrs/arithmetic/fixnums.scm
+++ b/module/rnrs/arithmetic/fixnums.scm
@@ -1,6 +1,6 @@
 ;;; fixnums.scm --- The R6RS fixnums arithmetic library
 
-;;      Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc.
+;;      Copyright (C) 2010, 2011, 2013, 2020 Free Software Foundation, Inc.
 ;;
 ;; This library is free software; you can redistribute it and/or
 ;; modify it under the terms of the GNU Lesser General Public
@@ -75,25 +75,26 @@
 	  fxrotate-bit-field
 	  fxreverse-bit-field)
   (import (only (guile) ash
-		        cons*
-			define-inlinable
-			inexact->exact
-			logand
-			logbit?
-			logcount
-			logior
-			lognot
-			logxor
-			most-positive-fixnum 
-			most-negative-fixnum
-			object-address)
+		cons*
+		define-inlinable
+		inexact->exact
+		logand
+		logbit?
+		logcount
+		logior
+		lognot
+		logxor
+		most-positive-fixnum 
+		most-negative-fixnum
+		object-address)
 	  (ice-9 optargs)
 	  (rnrs base (6))
 	  (rnrs control (6))
 	  (rnrs arithmetic bitwise (6))
 	  (rnrs conditions (6))
 	  (rnrs exceptions (6))
-	  (rnrs lists (6)))
+          (rnrs lists (6))
+          (rnrs syntax-case (6)))
 
   (define fixnum-width
     (let ((w (do ((i 0 (+ 1 i))
@@ -121,70 +122,105 @@
     (or (for-all inline-fixnum? args) (raise (make-assertion-violation))))
 
   (define-syntax define-fxop*
+    (lambda (s)
+      (syntax-case s ()
+        ((_ name op)
+         (with-syntax ((proc (datum->syntax
+                              #'name
+                              (string->symbol
+                               (string-append "%"
+                                              (symbol->string
+                                               (syntax->datum #'name))
+                                              "-proc")))))
+           #'(begin
+               ;; Define a procedure for when the inline case doesn't
+               ;; apply.
+               (define proc
+                 (case-lambda
+                   ((x y)
+                    (assert-fixnum x y)
+                    (op x y))
+                   (args
+                    (assert-fixnums args)
+                    (apply op args))))
+
+               (define-syntax name
+                 (lambda (s)
+                   (syntax-case s ()
+                     ((_ args (... ...))
+                      #'(begin
+                          (assert-fixnum args (... ...))
+                          (op args (... ...))))
+                     (x
+                      (identifier? #'x)
+                      #'proc))))))))))
+
+  (define-syntax define-alias
     (syntax-rules ()
-      ((_ name op)
-       (define name
-	(case-lambda
-	  ((x y)
-	   (assert-fixnum x y)
-	   (op x y))
-	  (args
-	   (assert-fixnums args)
-           (apply op args)))))))
+      ((_ new old)
+       (define-syntax new (identifier-syntax old)))))
 
   ;; All these predicates don't check their arguments for fixnum-ness,
   ;; as this doesn't seem to be strictly required by R6RS.
 
-  (define fx=? =)
-  (define fx>? >)
-  (define fx<? <)
-  (define fx>=? >=)
-  (define fx<=? <=)
+  (define-alias fx=? =)
+  (define-alias fx>? >)
+  (define-alias fx<? <)
+  (define-alias fx>=? >=)
+  (define-alias fx<=? <=)
 
-  (define fxzero? zero?)
-  (define fxpositive? positive?)
-  (define fxnegative? negative?)
-  (define fxodd? odd?)
-  (define fxeven? even?)
+  (define-alias fxzero? zero?)
+  (define-alias fxpositive? positive?)
+  (define-alias fxnegative? negative?)
+  (define-alias fxodd? odd?)
+  (define-alias fxeven? even?)
 
   (define-fxop* fxmax max)
   (define-fxop* fxmin min)
 
-  (define (fx+ fx1 fx2)
+  (define-inlinable (fx+ fx1 fx2)
     (assert-fixnum fx1 fx2) 
     (let ((r (+ fx1 fx2))) 
       (or (inline-fixnum? r)
           (raise (make-implementation-restriction-violation)))
       r))
 
-  (define (fx* fx1 fx2)
+  (define-inlinable (fx* fx1 fx2)
     (assert-fixnum fx1 fx2) 
     (let ((r (* fx1 fx2))) 
       (or (inline-fixnum? r)
           (raise (make-implementation-restriction-violation)))
       r))
 
-  (define* (fx- fx1 #:optional fx2)
-    (assert-fixnum fx1)
-    (if fx2 
-	(begin 
-	  (assert-fixnum fx2) 
-	  (let ((r (- fx1 fx2))) 
-	    (or (inline-fixnum? r) (raise (make-assertion-violation)))
-	    r))
-	(let ((r (- fx1))) 
-	  (or (inline-fixnum? r) (raise (make-assertion-violation)))
-	  r)))
-
-  (define (fxdiv fx1 fx2)
+  (define-syntax fx-
+    (lambda (s)
+      (syntax-case s ()
+        ((_ fx)
+         #'(begin
+             (assert-fixnum fx)
+             (let ((r (- fx)))
+               (unless (inline-fixnum? r) (raise (make-assertion-violation)))
+               (- fx))))
+        ((_ fx1 fx2)
+         #'(begin
+             (assert-fixnum fx1)
+             (assert-fixnum fx2)
+             (let ((r (- fx1 fx2)))
+               (unless (inline-fixnum? r) (raise (make-assertion-violation)))
+               r)))
+        (x
+         (identifier? #'x)
+         #'-))))
+
+  (define-inlinable (fxdiv fx1 fx2)
     (assert-fixnum fx1 fx2)
     (div fx1 fx2))
 
-  (define (fxmod fx1 fx2)
+  (define-inlinable (fxmod fx1 fx2)
     (assert-fixnum fx1 fx2)
     (mod fx1 fx2))
 
-  (define (fxdiv-and-mod fx1 fx2)
+  (define-inlinable (fxdiv-and-mod fx1 fx2)
     (assert-fixnum fx1 fx2)
     (div-and-mod fx1 fx2))
 
@@ -221,71 +257,71 @@
 	   (s1 (div0 s (expt 2 (fixnum-width)))))
       (values s0 s1)))
 
-  (define (fxnot fx) (assert-fixnum fx) (lognot fx))
+  (define-inlinable (fxnot fx) (assert-fixnum fx) (lognot fx))
   (define-fxop* fxand logand)
   (define-fxop* fxior logior)
   (define-fxop* fxxor logxor)
 
-  (define (fxif fx1 fx2 fx3) 
+  (define-inlinable (fxif fx1 fx2 fx3) 
     (assert-fixnum fx1 fx2 fx3) 
     (bitwise-if fx1 fx2 fx3))
 
-  (define (fxbit-count fx)
+  (define-inlinable (fxbit-count fx)
     (assert-fixnum fx)
     (if (negative? fx)
         (bitwise-not (logcount fx))
         (logcount fx)))
 
-  (define (fxlength fx) (assert-fixnum fx) (bitwise-length fx))
-  (define (fxfirst-bit-set fx) (assert-fixnum fx) (bitwise-first-bit-set fx))
-  (define (fxbit-set? fx1 fx2) (assert-fixnum fx1 fx2) (logbit? fx2 fx1))
+  (define-inlinable (fxlength fx) (assert-fixnum fx) (bitwise-length fx))
+  (define-inlinable (fxfirst-bit-set fx) (assert-fixnum fx) (bitwise-first-bit-set fx))
+  (define-inlinable (fxbit-set? fx1 fx2) (assert-fixnum fx1 fx2) (logbit? fx2 fx1))
 
-  (define (fxcopy-bit fx1 fx2 fx3) 
+  (define-inlinable (fxcopy-bit fx1 fx2 fx3) 
     (assert-fixnum fx1 fx2 fx3) 
     (unless (and (<= 0 fx2) (< fx2 (fixnum-width)))
       (raise (make-assertion-violation)))
     (bitwise-copy-bit fx1 fx2 fx3))
 
-  (define (fxbit-field fx1 fx2 fx3)
+  (define-inlinable (fxbit-field fx1 fx2 fx3)
     (assert-fixnum fx1 fx2 fx3)
     (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width)))
       (raise (make-assertion-violation)))
     (bitwise-bit-field fx1 fx2 fx3))
 
-  (define (fxcopy-bit-field fx1 fx2 fx3 fx4)
+  (define-inlinable (fxcopy-bit-field fx1 fx2 fx3 fx4)
     (assert-fixnum fx1 fx2 fx3 fx4)
     (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width)))
       (raise (make-assertion-violation)))
     (bitwise-copy-bit-field fx1 fx2 fx3 fx4))
 
-  (define (fxarithmetic-shift fx1 fx2)
+  (define-inlinable (fxarithmetic-shift fx1 fx2)
     (assert-fixnum fx1 fx2)
     (unless (< (abs fx2) (fixnum-width))
       (raise (make-assertion-violation)))
     (ash fx1 fx2))
 
-  (define (fxarithmetic-shift-left fx1 fx2)
+  (define-inlinable (fxarithmetic-shift-left fx1 fx2)
     (assert-fixnum fx1 fx2)
     (unless (and (<= 0 fx2) (< fx2 (fixnum-width)))
       (raise (make-assertion-violation)))
     (ash fx1 fx2))
 
-  (define (fxarithmetic-shift-right fx1 fx2)
+  (define-inlinable (fxarithmetic-shift-right fx1 fx2)
     (assert-fixnum fx1 fx2)
     (unless (and (<= 0 fx2) (< fx2 (fixnum-width)))
       (raise (make-assertion-violation)))
     (ash fx1 (- fx2)))
 
-  (define (fxrotate-bit-field fx1 fx2 fx3 fx4)
+  (define-inlinable (fxrotate-bit-field fx1 fx2 fx3 fx4)
     (assert-fixnum fx1 fx2 fx3 fx4)
     (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width)) (< fx4 (- fx3 fx2)))
       (raise (make-assertion-violation)))
     (bitwise-rotate-bit-field fx1 fx2 fx3 fx4))
   
-  (define (fxreverse-bit-field fx1 fx2 fx3)
+  (define-inlinable (fxreverse-bit-field fx1 fx2 fx3)
     (assert-fixnum fx1 fx2 fx3)
     (unless (and (<= 0 fx2 fx3) (< fx3 (fixnum-width)))
       (raise (make-assertion-violation)))
     (bitwise-reverse-bit-field fx1 fx2 fx3))
 
-)
+  )

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-04  0:40 ` Ludovic Courtès
@ 2020-01-04 10:28   ` Göran Weinholt
  2020-01-04 15:51     ` Nala Ginrut
                       ` (2 more replies)
  2020-01-05  9:48   ` Andy Wingo
  1 sibling, 3 replies; 13+ messages in thread
From: Göran Weinholt @ 2020-01-04 10:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, Guile Devel

Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>>
>> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
>> clock utime stime cutime cstime gctime
>> 65.17 89.75  0.45   0.00   0.00  35.63
>
> The patch below gives us:
>
> ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm
>
> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> clock utime stime cutime cstime gctime
> 59.31 80.65  0.39   0.00   0.00  30.73
>
> It’s a disappointingly small improvement.  The reason is that (hashing
> fixnums) adds another layer of opacity, where it ends up doing
> essentially:
>
>   (define fx32xor fxxor)
>   …
>
> Thus, no inlining, and no easy trick to solve that.  :-/

I've pushed a Guile-specific version of (hashing fixnums) that inlines
the generic arithmetic procedures. This and some other small changes
improved the runtime:

           clock utime stime cutime cstime gctime
 before:
    2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
    2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
 after:
    2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
    2.9.8   1.98  1.99  0.00   0.00   0.00   0.08

That's about 100 times slower than sha256sum from coreutils. You might
get some more performance out of it by unrolling the loop in
sha-256-transform!.

Regards,

-- 
Göran Weinholt   | https://weinholt.se/
Debian developer | 73 de SA6CJK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-04 10:28   ` Göran Weinholt
@ 2020-01-04 15:51     ` Nala Ginrut
  2020-01-04 21:39     ` Arne Babenhauserheide
  2020-01-06  9:37     ` Ludovic Courtès
  2 siblings, 0 replies; 13+ messages in thread
From: Nala Ginrut @ 2020-01-04 15:51 UTC (permalink / raw)
  To: Göran Weinholt; +Cc: Andy Wingo, Ludovic Courtès, Guile Devel

[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]

Congrats!
I just replaced Weinholt's hmac implementation with NSS binding for product
consideration, but nice to know this great result! And thanks to Weinholt's
work, Artanis had been relying on it for many years.

Best regards.


On Sat, Jan 4, 2020, 18:36 Göran Weinholt <goran@weinholt.se> wrote:

> Ludovic Courtès <ludo@gnu.org> writes:
>
> > Ludovic Courtès <ludo@gnu.org> skribis:
> >
> >> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure
> --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
> >>
> >> ;;; (hash
> "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> >> clock utime stime cutime cstime gctime
> >> 65.17 89.75  0.45   0.00   0.00  35.63
> >
> > The patch below gives us:
> >
> > ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm
> >
> > ;;; (hash
> "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> > clock utime stime cutime cstime gctime
> > 59.31 80.65  0.39   0.00   0.00  30.73
> >
> > It’s a disappointingly small improvement.  The reason is that (hashing
> > fixnums) adds another layer of opacity, where it ends up doing
> > essentially:
> >
> >   (define fx32xor fxxor)
> >   …
> >
> > Thus, no inlining, and no easy trick to solve that.  :-/
>
> I've pushed a Guile-specific version of (hashing fixnums) that inlines
> the generic arithmetic procedures. This and some other small changes
> improved the runtime:
>
>            clock utime stime cutime cstime gctime
>  before:
>     2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
>     2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>  after:
>     2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
>     2.9.8   1.98  1.99  0.00   0.00   0.00   0.08
>
> That's about 100 times slower than sha256sum from coreutils. You might
> get some more performance out of it by unrolling the loop in
> sha-256-transform!.
>
> Regards,
>
> --
> Göran Weinholt   | https://weinholt.se/
> Debian developer | 73 de SA6CJK
>
>

[-- Attachment #2: Type: text/html, Size: 2890 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-04 10:28   ` Göran Weinholt
  2020-01-04 15:51     ` Nala Ginrut
@ 2020-01-04 21:39     ` Arne Babenhauserheide
  2020-01-05 14:28       ` Göran Weinholt
  2020-01-06  9:37     ` Ludovic Courtès
  2 siblings, 1 reply; 13+ messages in thread
From: Arne Babenhauserheide @ 2020-01-04 21:39 UTC (permalink / raw)
  To: guile-devel; +Cc: Andy Wingo, Ludovic Courtès

Göran Weinholt <goran@weinholt.se> writes:
> I've pushed a Guile-specific version of (hashing fixnums) that inlines
> the generic arithmetic procedures. This and some other small changes
> improved the runtime:
>
>            clock utime stime cutime cstime gctime
>  before:
>     2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
>     2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>  after:
>     2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
>     2.9.8   1.98  1.99  0.00   0.00   0.00   0.08
>
> That's about 100 times slower than sha256sum from coreutils. You might
> get some more performance out of it by unrolling the loop in
> sha-256-transform!.

I’d like to highlight the understatement: The new version is about
factor 10 faster on 2.2.6 and factor 8 faster on 2.9.8 (and the new
version on Guile 2.9.8 is factor 15 faster than the old version with the
old Guile).

That’s an awesome speedup!

(can you show the change/patch that did this? Sidenote: It would be
great to have a list of performance hints for Guile)

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-04  0:40 ` Ludovic Courtès
  2020-01-04 10:28   ` Göran Weinholt
@ 2020-01-05  9:48   ` Andy Wingo
  2020-01-06  9:47     ` Ludovic Courtès
  1 sibling, 1 reply; 13+ messages in thread
From: Andy Wingo @ 2020-01-05  9:48 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guile Devel

On Sat 04 Jan 2020 01:40, Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>>
>> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
>> clock utime stime cutime cstime gctime
>> 65.17 89.75  0.45   0.00   0.00  35.63
>
>   (define fx32xor fxxor)
>   …

From a speed perspective I think there is one major issue and one minor
issue.

The major issue is that we don't do cross-module inlining.  But now that
we have declarative modules, this is a possibility:

  https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00026.html
  https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00027.html

With cross-module inlining of "small" definitions, I think we would
solve a lot of this kind of problem.  I think we could add this during
3.0 and for this reason I would hesitate to apply this patch for 3.0
because it changes "fx+" exports to be macros rather than "normal"
values in the ABI.  WDYT?

The minor issue, at least relatively speaking, is that IMO the (rnrs
arithmetic fixnums) API is not appropriate for bitwise operations.  When
you do bitwise operations and you want to ensure that you're within some
fixed domain, it's best to do e.g. "(logand x #xffffffff)" on operands
and results.  Guile will optimize this well.  The good optimization
isn't fixnum vs other kinds of numbers, it's unboxing to raw unsigned
integers; and you usually want to exclude negative numbers.  fx+ doesn't
help with that.

Cheers,

Andy



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-04 21:39     ` Arne Babenhauserheide
@ 2020-01-05 14:28       ` Göran Weinholt
  0 siblings, 0 replies; 13+ messages in thread
From: Göran Weinholt @ 2020-01-05 14:28 UTC (permalink / raw)
  To: Arne Babenhauserheide; +Cc: Andy Wingo, Ludovic Courtès, guile-devel

Arne Babenhauserheide <arne_bab@web.de> writes:

> Göran Weinholt <goran@weinholt.se> writes:
>> I've pushed a Guile-specific version of (hashing fixnums) that inlines
>> the generic arithmetic procedures. This and some other small changes
>> improved the runtime:
>>
>>            clock utime stime cutime cstime gctime
>>  before:
>>     2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
>>     2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>>  after:
>>     2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
>>     2.9.8   1.98  1.99  0.00   0.00   0.00   0.08
>>
>> That's about 100 times slower than sha256sum from coreutils. You might
>> get some more performance out of it by unrolling the loop in
>> sha-256-transform!.
>
> I’d like to highlight the understatement: The new version is about
> factor 10 faster on 2.2.6 and factor 8 faster on 2.9.8 (and the new
> version on Guile 2.9.8 is factor 15 faster than the old version with the
> old Guile).
>
> That’s an awesome speedup!
>
> (can you show the change/patch that did this? Sidenote: It would be
> great to have a list of performance hints for Guile)

The commit is here:

https://github.com/weinholt/hashing/commit/eb28080180b5fdfb6ffc74f8cdf2c1a7823ef0cb

The relevant changes are:

* The define-fx macro previously returned:

  #'(define name
      (if (> (fixnum-width) bit-width)
          fxname bitwise-name))

  This was bad for three reasons: Guile does not do constant folding on
  (fixnum-width); in Guile the fixnum procedures are slower than the
  generic procedures; and Guile 2.2 did not inline those defines. It now
  returns:

  #'(define-syntax name
      (identifier-syntax bitwise-name))

  So the generic procedures like +, bitwise-and, etc are used instead of
  fx+ and fxand. The use of define-syntax and identifier-syntax is
  basically manual inlining for the benefit of Guile 2.2.

* I replaced some instances of fixnum operations with the ones from
  (hashing fixnums), which benefit from the above.

* I changed some comparisons with constant fixnums to use eqv? instead
  of fx=?, e.g. (eqv? i 8) instead of (fx=? i 8). This is commonly
  faster, not only in Guile, because in implementations with immediate
  fixnums it can be done with only a simple compare + branch. I don't
  know if Guile takes advantage of this fact, but it's still faster than
  fx=?.

Regards,

-- 
Göran Weinholt   | https://weinholt.se/
Debian developer | 73 de SA6CJK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-04 10:28   ` Göran Weinholt
  2020-01-04 15:51     ` Nala Ginrut
  2020-01-04 21:39     ` Arne Babenhauserheide
@ 2020-01-06  9:37     ` Ludovic Courtès
  2 siblings, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2020-01-06  9:37 UTC (permalink / raw)
  To: Göran Weinholt; +Cc: Andy Wingo, Guile Devel

Hi Göran,

Göran Weinholt <goran@weinholt.se> skribis:

> I've pushed a Guile-specific version of (hashing fixnums) that inlines
> the generic arithmetic procedures. This and some other small changes
> improved the runtime:
>
>            clock utime stime cutime cstime gctime
>  before:
>     2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
>     2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>  after:
>     2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
>     2.9.8   1.98  1.99  0.00   0.00   0.00   0.08

Excellent!

> That's about 100 times slower than sha256sum from coreutils. You might
> get some more performance out of it by unrolling the loop in
> sha-256-transform!.

I suppose that’s a challenge for Andy.  ;-)

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-05  9:48   ` Andy Wingo
@ 2020-01-06  9:47     ` Ludovic Courtès
  2020-01-06 19:52       ` Andy Wingo
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-01-06  9:47 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Guile Devel

Hello,

Andy Wingo <wingo@pobox.com> skribis:

> On Sat 04 Jan 2020 01:40, Ludovic Courtès <ludo@gnu.org> writes:
>
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>>>
>>> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
>>> clock utime stime cutime cstime gctime
>>> 65.17 89.75  0.45   0.00   0.00  35.63
>>
>>   (define fx32xor fxxor)
>>   …
>
> From a speed perspective I think there is one major issue and one minor
> issue.
>
> The major issue is that we don't do cross-module inlining.  But now that
> we have declarative modules, this is a possibility:
>
>   https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00026.html
>   https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00027.html

Neat.  Storing Tree-IL in object files reminds me of LTO in GCC & co.

> With cross-module inlining of "small" definitions, I think we would
> solve a lot of this kind of problem.  I think we could add this during
> 3.0 and for this reason I would hesitate to apply this patch for 3.0
> because it changes "fx+" exports to be macros rather than "normal"
> values in the ABI.  WDYT?

I agree that cross-module inlining is the better fix whereas this patch
is the immediate workaround.

Are you confident that cross-module inlining can happen be added without
introducing incompatibilities over in the 3.0 series?  (At first sight
it seems tricky to me, notably because we’d have to store Tree-IL in
object files, which introduces compatibility and thus external
representation versioning considerations.)

If you do, then it’s fine to drop this patch.  If conversely
cross-module inlining might take longer, then we can have this patch in
and drop it in 3.2.  Your call!  (I guess I’m not being that helpful
here.  :-))

> The minor issue, at least relatively speaking, is that IMO the (rnrs
> arithmetic fixnums) API is not appropriate for bitwise operations.  When
> you do bitwise operations and you want to ensure that you're within some
> fixed domain, it's best to do e.g. "(logand x #xffffffff)" on operands
> and results.  Guile will optimize this well.  The good optimization
> isn't fixnum vs other kinds of numbers, it's unboxing to raw unsigned
> integers; and you usually want to exclude negative numbers.  fx+ doesn't
> help with that.

I agree, of course.  The tragedy is that people would use this (clumsy)
API in the hope of getting better performance, and the result ends up
being worse performance, at least on today’s Guile (perhaps also on
other implementations?).

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-06  9:47     ` Ludovic Courtès
@ 2020-01-06 19:52       ` Andy Wingo
  2020-01-07 11:08         ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Wingo @ 2020-01-06 19:52 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, Guile Devel

On Mon 06 Jan 2020 10:47, Ludovic Courtès <ludo@gnu.org> writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> With cross-module inlining of "small" definitions, I think we would
>> solve a lot of this kind of problem.  I think we could add this during
>> 3.0 and for this reason I would hesitate to apply this patch for 3.0
>> because it changes "fx+" exports to be macros rather than "normal"
>> values in the ABI.  WDYT?
>
> I agree that cross-module inlining is the better fix whereas this patch
> is the immediate workaround.
>
> Are you confident that cross-module inlining can happen be added without
> introducing incompatibilities over in the 3.0 series?  (At first sight
> it seems tricky to me, notably because we’d have to store Tree-IL in
> object files, which introduces compatibility and thus external
> representation versioning considerations.)

Concretely I would add a little part of the compiler to the Tree-IL
phase to serialize a bytecode for the "small" definitions in the module,
for declarative modules, both public and private (because public
definitions may alias private definitions).  This would be stored as a
bytevector in an additional field of the module, and the program being
compiled would be transformed to initialize the "lto" field (placeholder
name) of the module, so that once the compiled module is loaded, we have
the inlinable bindings.  I think this can be done compatibly.

> If you do, then it’s fine to drop this patch.  If conversely
> cross-module inlining might take longer, then we can have this patch in
> and drop it in 3.2.  Your call!  (I guess I’m not being that helpful
> here.  :-))

:)

I hesitate to land this patch because we haven't shown that it
significantly helps things, it would need to be undone, and it makes the
ABI more fragile.  So if that's OK let's do nothing :)

Cheers,

Andy



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-06 19:52       ` Andy Wingo
@ 2020-01-07 11:08         ` Ludovic Courtès
  2020-01-07 21:45           ` Andy Wingo
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-01-07 11:08 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Guile Devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> On Mon 06 Jan 2020 10:47, Ludovic Courtès <ludo@gnu.org> writes:
>
>> Andy Wingo <wingo@pobox.com> skribis:
>>
>>> With cross-module inlining of "small" definitions, I think we would
>>> solve a lot of this kind of problem.  I think we could add this during
>>> 3.0 and for this reason I would hesitate to apply this patch for 3.0
>>> because it changes "fx+" exports to be macros rather than "normal"
>>> values in the ABI.  WDYT?
>>
>> I agree that cross-module inlining is the better fix whereas this patch
>> is the immediate workaround.
>>
>> Are you confident that cross-module inlining can happen be added without
>> introducing incompatibilities over in the 3.0 series?  (At first sight
>> it seems tricky to me, notably because we’d have to store Tree-IL in
>> object files, which introduces compatibility and thus external
>> representation versioning considerations.)
>
> Concretely I would add a little part of the compiler to the Tree-IL
> phase to serialize a bytecode for the "small" definitions in the module,
> for declarative modules, both public and private (because public
> definitions may alias private definitions).  This would be stored as a
> bytevector in an additional field of the module, and the program being
> compiled would be transformed to initialize the "lto" field (placeholder
> name) of the module, so that once the compiled module is loaded, we have
> the inlinable bindings.  I think this can be done compatibly.

OK, sounds great.  What are your thoughts about versioning that wire
Tree-IL representation?

>> If you do, then it’s fine to drop this patch.  If conversely
>> cross-module inlining might take longer, then we can have this patch in
>> and drop it in 3.2.  Your call!  (I guess I’m not being that helpful
>> here.  :-))
>
> :)
>
> I hesitate to land this patch because we haven't shown that it
> significantly helps things, it would need to be undone, and it makes the
> ABI more fragile.  So if that's OK let's do nothing :)

Alright, fine with me!

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-07 11:08         ` Ludovic Courtès
@ 2020-01-07 21:45           ` Andy Wingo
  2020-01-07 22:59             ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Wingo @ 2020-01-07 21:45 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, Guile Devel

On Tue 07 Jan 2020 12:08, Ludovic Courtès <ludo@gnu.org> writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> Concretely I would add a little part of the compiler to the Tree-IL
>> phase to serialize a bytecode for the "small" definitions in the module,
>> for declarative modules, both public and private (because public
>> definitions may alias private definitions).  This would be stored as a
>> bytevector in an additional field of the module, and the program being
>> compiled would be transformed to initialize the "lto" field (placeholder
>> name) of the module, so that once the compiled module is loaded, we have
>> the inlinable bindings.  I think this can be done compatibly.
>
> OK, sounds great.  What are your thoughts about versioning that wire
> Tree-IL representation?

It would be a little bytecode language, with its own versioning
considerations.  It would need to have a translation to and from
Tree-IL, though not necessarily lossless.  It would change only in
ABI-compatible ways, using the bytecode version of the Guile doing the
compilation as a proxy for what is OK to support.

A



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SHA256 performance with Guile 2.2 vs. Guile 3.0
  2020-01-07 21:45           ` Andy Wingo
@ 2020-01-07 22:59             ` Ludovic Courtès
  0 siblings, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2020-01-07 22:59 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Guile Devel

Andy Wingo <wingo@pobox.com> skribis:

> On Tue 07 Jan 2020 12:08, Ludovic Courtès <ludo@gnu.org> writes:
>
>> Andy Wingo <wingo@pobox.com> skribis:
>>
>>> Concretely I would add a little part of the compiler to the Tree-IL
>>> phase to serialize a bytecode for the "small" definitions in the module,
>>> for declarative modules, both public and private (because public
>>> definitions may alias private definitions).  This would be stored as a
>>> bytevector in an additional field of the module, and the program being
>>> compiled would be transformed to initialize the "lto" field (placeholder
>>> name) of the module, so that once the compiled module is loaded, we have
>>> the inlinable bindings.  I think this can be done compatibly.
>>
>> OK, sounds great.  What are your thoughts about versioning that wire
>> Tree-IL representation?
>
> It would be a little bytecode language, with its own versioning
> considerations.  It would need to have a translation to and from
> Tree-IL, though not necessarily lossless.  It would change only in
> ABI-compatible ways, using the bytecode version of the Guile doing the
> compilation as a proxy for what is OK to support.

I see, that makes a lot of sense to me.  Thanks for explaining!

Ludo’.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-01-07 22:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-03 23:55 SHA256 performance with Guile 2.2 vs. Guile 3.0 Ludovic Courtès
2020-01-04  0:40 ` Ludovic Courtès
2020-01-04 10:28   ` Göran Weinholt
2020-01-04 15:51     ` Nala Ginrut
2020-01-04 21:39     ` Arne Babenhauserheide
2020-01-05 14:28       ` Göran Weinholt
2020-01-06  9:37     ` Ludovic Courtès
2020-01-05  9:48   ` Andy Wingo
2020-01-06  9:47     ` Ludovic Courtès
2020-01-06 19:52       ` Andy Wingo
2020-01-07 11:08         ` Ludovic Courtès
2020-01-07 21:45           ` Andy Wingo
2020-01-07 22:59             ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).