string-for-each vs. for-each+string->list performance

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* string-for-each vs. for-each+string->list performance
@ 2020-06-07  6:27 Aleix Conchillo Flaqué
  2020-06-07  6:33 ` Aleix Conchillo Flaqué
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Aleix Conchillo Flaqué @ 2020-06-07  6:27 UTC (permalink / raw)
  To: guile-user

Hi,

in the latest guile-json, 4.1.0. I changed some code to use
for-each+string->list. The performance seemed nice and I released it.

Christopher Lam pointed out that I could have used string-for-each instead.
I made the change but the performance degraded a lot:

string-for-each:

scheme@(json parser)> ,t (->bool (scm->json-string json))
$19 = #t
;; 17.909537s real time, 18.063382s run time.  0.207281s spent in GC.

vs

for-each + string->list:

scheme@(json parser)> ,t (->bool (scm->json-string json))
$20 = #t
;; 2.998381s real time, 3.319349s run time.  0.471969s spent in GC.

string-for-each is implemented in scheme here, if Im not wrong:

http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89

string->list and for-each would use C.

Is that huge gap expected?

This is with Guile 3.0.2.

Best,

Aleix


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: string-for-each vs. for-each+string->list performance
  2020-06-07  6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
@ 2020-06-07  6:33 ` Aleix Conchillo Flaqué
  2020-06-07 11:20 ` Linus Björnstam
  2020-06-07 13:50 ` Linus Björnstam
  2 siblings, 0 replies; 6+ messages in thread
From: Aleix Conchillo Flaqué @ 2020-06-07  6:33 UTC (permalink / raw)
  To: guile-user

On Sat, Jun 6, 2020 at 11:27 PM Aleix Conchillo Flaqué <aconchillo@gmail.com>
wrote:

> Hi,
>
> in the latest guile-json, 4.1.0. I changed some code to use
> for-each+string->list. The performance seemed nice and I released it.
>
> Christopher Lam pointed out that I could have used string-for-each
> instead. I made the change but the performance degraded a lot:
>
> string-for-each:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $19 = #t
> ;; 17.909537s real time, 18.063382s run time.  0.207281s spent in GC.
>
> vs
>
> for-each + string->list:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $20 = #t
> ;; 2.998381s real time, 3.319349s run time.  0.471969s spent in GC.
>
> string-for-each is implemented in scheme here, if Im not wrong:
>
> http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89
>
> string->list and for-each would use C.
>
> Is that huge gap expected?
>
> This is with Guile 3.0.2.
>
>
Forgot to mention that Chritopher also pointed out that string->list has an
optimization for narrow strings:

http://git.savannah.gnu.org/cgit/guile.git/tree/libguile/srfi-13.c#n294

which means it doesn't use string_ref where the scheme code has to do it.
So, it might be due to this?

Aleix


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: string-for-each vs. for-each+string->list performance
  2020-06-07  6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
  2020-06-07  6:33 ` Aleix Conchillo Flaqué
@ 2020-06-07 11:20 ` Linus Björnstam
  2020-06-07 13:50 ` Linus Björnstam
  2 siblings, 0 replies; 6+ messages in thread
From: Linus Björnstam @ 2020-06-07 11:20 UTC (permalink / raw)
  To: Aleix Conchillo Flaqué, guile-user

Try disassembling it. I suspect the for-each is inlined, and that whatever is going on involves instruction explosion which means you get a lot more optimization opportunities.  

That difference seems large though.  Try it in guile 2.2 (which shouldnt have the same opportunities for optimization). 

-- 
  Linus Björnstam

On Sun, 7 Jun 2020, at 08:27, Aleix Conchillo Flaqué wrote:
> Hi,
> 
> in the latest guile-json, 4.1.0. I changed some code to use
> for-each+string->list. The performance seemed nice and I released it.
> 
> Christopher Lam pointed out that I could have used string-for-each instead.
> I made the change but the performance degraded a lot:
> 
> string-for-each:
> 
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $19 = #t
> ;; 17.909537s real time, 18.063382s run time.  0.207281s spent in GC.
> 
> vs
> 
> for-each + string->list:
> 
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $20 = #t
> ;; 2.998381s real time, 3.319349s run time.  0.471969s spent in GC.
> 
> string-for-each is implemented in scheme here, if Im not wrong:
> 
> http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89
> 
> string->list and for-each would use C.
> 
> Is that huge gap expected?
> 
> This is with Guile 3.0.2.
> 
> Best,
> 
> Aleix
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: string-for-each vs. for-each+string->list performance
  2020-06-07  6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
  2020-06-07  6:33 ` Aleix Conchillo Flaqué
  2020-06-07 11:20 ` Linus Björnstam
@ 2020-06-07 13:50 ` Linus Björnstam
  2020-06-12 20:13   ` Ludovic Courtès
  2 siblings, 1 reply; 6+ messages in thread
From: Linus Björnstam @ 2020-06-07 13:50 UTC (permalink / raw)
  To: Aleix Conchillo Flaqué, guile-user

You can cut another 15-ish % from that loop by making an inline loop, btw

(let loop ((pos 0))
  (when (< pos (string-length str))
    ...
    (loop (1+ pos)))

I have been looking at the disassembly, even for simpler cases, but I haven't been able to understand enough of it. 

BTW: string-for-each is in the default environment, and is probably the same as the srfi-13 C implementation.

-- 
  Linus Björnstam

On Sun, 7 Jun 2020, at 08:27, Aleix Conchillo Flaqué wrote:
> Hi,
> 
> in the latest guile-json, 4.1.0. I changed some code to use
> for-each+string->list. The performance seemed nice and I released it.
> 
> Christopher Lam pointed out that I could have used string-for-each instead.
> I made the change but the performance degraded a lot:
> 
> string-for-each:
> 
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $19 = #t
> ;; 17.909537s real time, 18.063382s run time.  0.207281s spent in GC.
> 
> vs
> 
> for-each + string->list:
> 
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $20 = #t
> ;; 2.998381s real time, 3.319349s run time.  0.471969s spent in GC.
> 
> string-for-each is implemented in scheme here, if Im not wrong:
> 
> http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89
> 
> string->list and for-each would use C.
> 
> Is that huge gap expected?
> 
> This is with Guile 3.0.2.
> 
> Best,
> 
> Aleix
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: string-for-each vs. for-each+string->list performance
  2020-06-07 13:50 ` Linus Björnstam
@ 2020-06-12 20:13   ` Ludovic Courtès
  2020-06-13  6:41     ` Linus Björnstam
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2020-06-12 20:13 UTC (permalink / raw)
  To: guile-user

Hi,

Linus Björnstam <linus.internet@fastmail.se> skribis:

> You can cut another 15-ish % from that loop by making an inline loop, btw
>
> (let loop ((pos 0))
>   (when (< pos (string-length str))
>     ...
>     (loop (1+ pos)))
>
> I have been looking at the disassembly, even for simpler cases, but I haven't been able to understand enough of it. 
>
> BTW: string-for-each is in the default environment, and is probably the same as the srfi-13 C implementation.

‘string-for-each’ in C (the default) is slower than its Scheme counterpart:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (define (sfe proc str)
		       (define len (string-length str))
		       (let loop ((i 0))
			 (unless (= i len)
			   (proc (string-ref str i))
			   (loop (+ 1 i)))))
scheme@(guile-user)> (define str (make-string 15000000))
scheme@(guile-user)> ,t (sfe identity str)
;; 0.263725s real time, 0.263722s run time.  0.000000s spent in GC.
scheme@(guile-user)> ,t (sfe identity str)
;; 0.259538s real time, 0.259529s run time.  0.000000s spent in GC.
scheme@(guile-user)> ,t (string-for-each identity str)
;; 0.841632s real time, 0.841624s run time.  0.000000s spent in GC.
scheme@(guile-user)> (version)
$2 = "3.0.2"
--8<---------------cut here---------------end--------------->8---

In general we seem to pay a high price for leaving (calling a subr) and
re-entering (via ‘scm_call_n’) the VM.  This is especially acute here
because there’s almost nothing happening in C, so we keep bouncing
between Scheme and C.

That’s another reason to start rewriting such primitives in Scheme and
have the C functions just call out to Scheme.

If we do:

  perf record guile -c '(string-for-each identity (make-string 15000000))'

we get this profile:

--8<---------------cut here---------------start------------->8---
Overhead  Command  Shared Object          Symbol
  31.10%  guile    libguile-3.0.so.1.1.1  [.] vm_regular_engine
  27.48%  guile    libguile-3.0.so.1.1.1  [.] scm_call_n
  14.34%  guile    libguile-3.0.so.1.1.1  [.] scm_jit_enter_mcode
   3.55%  guile    libguile-3.0.so.1.1.1  [.] scm_i_string_ref
   3.37%  guile    libguile-3.0.so.1.1.1  [.] get_callee_vcode
   2.34%  guile    libguile-3.0.so.1.1.1  [.] scm_call_1
   2.31%  guile    libguile-3.0.so.1.1.1  [.] scm_string_for_each
--8<---------------cut here---------------end--------------->8---

Indeed, we get better performance when turning off JIT:

--8<---------------cut here---------------start------------->8---
$ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity (make-string 15000000))'
0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata 26396maxresident)k
0inputs+0outputs (0major+1583minor)pagefaults 0swaps
$ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity (make-string 15000000))'
0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata 26948maxresident)k
0inputs+0outputs (0major+1748minor)pagefaults 0swaps
$ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity (make-string 15000000))'
0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata 27324maxresident)k
0inputs+0outputs (0major+2548minor)pagefaults 0swaps
--8<---------------cut here---------------end--------------->8---

So it seems that we just keep firing the JIT machinery on every
‘scm_call_n’ for no benefit.

That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, &
co. always show high in statprof:

  https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html  

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: string-for-each vs. for-each+string->list performance
  2020-06-12 20:13   ` Ludovic Courtès
@ 2020-06-13  6:41     ` Linus Björnstam
  0 siblings, 0 replies; 6+ messages in thread
From: Linus Björnstam @ 2020-06-13  6:41 UTC (permalink / raw)
  To: Ludovic Courtès, guile-user


Thanks for clearing that up.

I have an old implementation of large parts of srfi-13 if that would be of interest. I don't know how much you want to change. The licence situation of the reference implementation is weird iirc.

A beginning could be to replace all higher order functions since that would minimize the kind of performance problems discussed here. 
-- 
  Linus Björnstam

On Fri, 12 Jun 2020, at 22:13, Ludovic Courtès wrote:
> Hi,
> 
> Linus Björnstam <linus.internet@fastmail.se> skribis:
> 
> > You can cut another 15-ish % from that loop by making an inline loop, btw
> >
> > (let loop ((pos 0))
> >   (when (< pos (string-length str))
> >     ...
> >     (loop (1+ pos)))
> >
> > I have been looking at the disassembly, even for simpler cases, but I haven't been able to understand enough of it. 
> >
> > BTW: string-for-each is in the default environment, and is probably the same as the srfi-13 C implementation.
> 
> ‘string-for-each’ in C (the default) is slower than its Scheme counterpart:
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (define (sfe proc str)
> 		       (define len (string-length str))
> 		       (let loop ((i 0))
> 			 (unless (= i len)
> 			   (proc (string-ref str i))
> 			   (loop (+ 1 i)))))
> scheme@(guile-user)> (define str (make-string 15000000))
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.263725s real time, 0.263722s run time.  0.000000s spent in GC.
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.259538s real time, 0.259529s run time.  0.000000s spent in GC.
> scheme@(guile-user)> ,t (string-for-each identity str)
> ;; 0.841632s real time, 0.841624s run time.  0.000000s spent in GC.
> scheme@(guile-user)> (version)
> $2 = "3.0.2"
> --8<---------------cut here---------------end--------------->8---
> 
> In general we seem to pay a high price for leaving (calling a subr) and
> re-entering (via ‘scm_call_n’) the VM.  This is especially acute here
> because there’s almost nothing happening in C, so we keep bouncing
> between Scheme and C.
> 
> That’s another reason to start rewriting such primitives in Scheme and
> have the C functions just call out to Scheme.
> 
> If we do:
> 
>   perf record guile -c '(string-for-each identity (make-string 15000000))'
> 
> we get this profile:
> 
> --8<---------------cut here---------------start------------->8---
> Overhead  Command  Shared Object          Symbol
>   31.10%  guile    libguile-3.0.so.1.1.1  [.] vm_regular_engine
>   27.48%  guile    libguile-3.0.so.1.1.1  [.] scm_call_n
>   14.34%  guile    libguile-3.0.so.1.1.1  [.] scm_jit_enter_mcode
>    3.55%  guile    libguile-3.0.so.1.1.1  [.] scm_i_string_ref
>    3.37%  guile    libguile-3.0.so.1.1.1  [.] get_callee_vcode
>    2.34%  guile    libguile-3.0.so.1.1.1  [.] scm_call_1
>    2.31%  guile    libguile-3.0.so.1.1.1  [.] scm_string_for_each
> --8<---------------cut here---------------end--------------->8---
> 
> Indeed, we get better performance when turning off JIT:
> 
> --8<---------------cut here---------------start------------->8---
> $ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata 
> 26396maxresident)k
> 0inputs+0outputs (0major+1583minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata 
> 26948maxresident)k
> 0inputs+0outputs (0major+1748minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata 
> 27324maxresident)k
> 0inputs+0outputs (0major+2548minor)pagefaults 0swaps
> --8<---------------cut here---------------end--------------->8---
> 
> So it seems that we just keep firing the JIT machinery on every
> ‘scm_call_n’ for no benefit.
> 
> That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, &
> co. always show high in statprof:
> 
>   https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html  
> 
> Thanks,
> Ludo’.
> 
> 
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-13  6:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07  6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
2020-06-07  6:33 ` Aleix Conchillo Flaqué
2020-06-07 11:20 ` Linus Björnstam
2020-06-07 13:50 ` Linus Björnstam
2020-06-12 20:13   ` Ludovic Courtès
2020-06-13  6:41     ` Linus Björnstam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).