* string-for-each vs. for-each+string->list performance
@ 2020-06-07 6:27 Aleix Conchillo Flaqué
2020-06-07 6:33 ` Aleix Conchillo Flaqué
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Aleix Conchillo Flaqué @ 2020-06-07 6:27 UTC (permalink / raw)
To: guile-user
Hi,
in the latest guile-json, 4.1.0. I changed some code to use
for-each+string->list. The performance seemed nice and I released it.
Christopher Lam pointed out that I could have used string-for-each instead.
I made the change but the performance degraded a lot:
string-for-each:
scheme@(json parser)> ,t (->bool (scm->json-string json))
$19 = #t
;; 17.909537s real time, 18.063382s run time. 0.207281s spent in GC.
vs
for-each + string->list:
scheme@(json parser)> ,t (->bool (scm->json-string json))
$20 = #t
;; 2.998381s real time, 3.319349s run time. 0.471969s spent in GC.
string-for-each is implemented in scheme here, if Im not wrong:
http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89
string->list and for-each would use C.
Is that huge gap expected?
This is with Guile 3.0.2.
Best,
Aleix
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: string-for-each vs. for-each+string->list performance
2020-06-07 6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
@ 2020-06-07 6:33 ` Aleix Conchillo Flaqué
2020-06-07 11:20 ` Linus Björnstam
2020-06-07 13:50 ` Linus Björnstam
2 siblings, 0 replies; 6+ messages in thread
From: Aleix Conchillo Flaqué @ 2020-06-07 6:33 UTC (permalink / raw)
To: guile-user
On Sat, Jun 6, 2020 at 11:27 PM Aleix Conchillo Flaqué <aconchillo@gmail.com>
wrote:
> Hi,
>
> in the latest guile-json, 4.1.0. I changed some code to use
> for-each+string->list. The performance seemed nice and I released it.
>
> Christopher Lam pointed out that I could have used string-for-each
> instead. I made the change but the performance degraded a lot:
>
> string-for-each:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $19 = #t
> ;; 17.909537s real time, 18.063382s run time. 0.207281s spent in GC.
>
> vs
>
> for-each + string->list:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $20 = #t
> ;; 2.998381s real time, 3.319349s run time. 0.471969s spent in GC.
>
> string-for-each is implemented in scheme here, if Im not wrong:
>
> http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89
>
> string->list and for-each would use C.
>
> Is that huge gap expected?
>
> This is with Guile 3.0.2.
>
>
Forgot to mention that Chritopher also pointed out that string->list has an
optimization for narrow strings:
http://git.savannah.gnu.org/cgit/guile.git/tree/libguile/srfi-13.c#n294
which means it doesn't use string_ref where the scheme code has to do it.
So, it might be due to this?
Aleix
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: string-for-each vs. for-each+string->list performance
2020-06-07 6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
2020-06-07 6:33 ` Aleix Conchillo Flaqué
@ 2020-06-07 11:20 ` Linus Björnstam
2020-06-07 13:50 ` Linus Björnstam
2 siblings, 0 replies; 6+ messages in thread
From: Linus Björnstam @ 2020-06-07 11:20 UTC (permalink / raw)
To: Aleix Conchillo Flaqué, guile-user
Try disassembling it. I suspect the for-each is inlined, and that whatever is going on involves instruction explosion which means you get a lot more optimization opportunities.
That difference seems large though. Try it in guile 2.2 (which shouldnt have the same opportunities for optimization).
--
Linus Björnstam
On Sun, 7 Jun 2020, at 08:27, Aleix Conchillo Flaqué wrote:
> Hi,
>
> in the latest guile-json, 4.1.0. I changed some code to use
> for-each+string->list. The performance seemed nice and I released it.
>
> Christopher Lam pointed out that I could have used string-for-each instead.
> I made the change but the performance degraded a lot:
>
> string-for-each:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $19 = #t
> ;; 17.909537s real time, 18.063382s run time. 0.207281s spent in GC.
>
> vs
>
> for-each + string->list:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $20 = #t
> ;; 2.998381s real time, 3.319349s run time. 0.471969s spent in GC.
>
> string-for-each is implemented in scheme here, if Im not wrong:
>
> http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89
>
> string->list and for-each would use C.
>
> Is that huge gap expected?
>
> This is with Guile 3.0.2.
>
> Best,
>
> Aleix
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: string-for-each vs. for-each+string->list performance
2020-06-07 6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
2020-06-07 6:33 ` Aleix Conchillo Flaqué
2020-06-07 11:20 ` Linus Björnstam
@ 2020-06-07 13:50 ` Linus Björnstam
2020-06-12 20:13 ` Ludovic Courtès
2 siblings, 1 reply; 6+ messages in thread
From: Linus Björnstam @ 2020-06-07 13:50 UTC (permalink / raw)
To: Aleix Conchillo Flaqué, guile-user
You can cut another 15-ish % from that loop by making an inline loop, btw
(let loop ((pos 0))
(when (< pos (string-length str))
...
(loop (1+ pos)))
I have been looking at the disassembly, even for simpler cases, but I haven't been able to understand enough of it.
BTW: string-for-each is in the default environment, and is probably the same as the srfi-13 C implementation.
--
Linus Björnstam
On Sun, 7 Jun 2020, at 08:27, Aleix Conchillo Flaqué wrote:
> Hi,
>
> in the latest guile-json, 4.1.0. I changed some code to use
> for-each+string->list. The performance seemed nice and I released it.
>
> Christopher Lam pointed out that I could have used string-for-each instead.
> I made the change but the performance degraded a lot:
>
> string-for-each:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $19 = #t
> ;; 17.909537s real time, 18.063382s run time. 0.207281s spent in GC.
>
> vs
>
> for-each + string->list:
>
> scheme@(json parser)> ,t (->bool (scm->json-string json))
> $20 = #t
> ;; 2.998381s real time, 3.319349s run time. 0.471969s spent in GC.
>
> string-for-each is implemented in scheme here, if Im not wrong:
>
> http://git.savannah.gnu.org/cgit/guile.git/tree/module/rnrs/base.scm#n89
>
> string->list and for-each would use C.
>
> Is that huge gap expected?
>
> This is with Guile 3.0.2.
>
> Best,
>
> Aleix
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: string-for-each vs. for-each+string->list performance
2020-06-07 13:50 ` Linus Björnstam
@ 2020-06-12 20:13 ` Ludovic Courtès
2020-06-13 6:41 ` Linus Björnstam
0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2020-06-12 20:13 UTC (permalink / raw)
To: guile-user
Hi,
Linus Björnstam <linus.internet@fastmail.se> skribis:
> You can cut another 15-ish % from that loop by making an inline loop, btw
>
> (let loop ((pos 0))
> (when (< pos (string-length str))
> ...
> (loop (1+ pos)))
>
> I have been looking at the disassembly, even for simpler cases, but I haven't been able to understand enough of it.
>
> BTW: string-for-each is in the default environment, and is probably the same as the srfi-13 C implementation.
‘string-for-each’ in C (the default) is slower than its Scheme counterpart:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (define (sfe proc str)
(define len (string-length str))
(let loop ((i 0))
(unless (= i len)
(proc (string-ref str i))
(loop (+ 1 i)))))
scheme@(guile-user)> (define str (make-string 15000000))
scheme@(guile-user)> ,t (sfe identity str)
;; 0.263725s real time, 0.263722s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,t (sfe identity str)
;; 0.259538s real time, 0.259529s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,t (string-for-each identity str)
;; 0.841632s real time, 0.841624s run time. 0.000000s spent in GC.
scheme@(guile-user)> (version)
$2 = "3.0.2"
--8<---------------cut here---------------end--------------->8---
In general we seem to pay a high price for leaving (calling a subr) and
re-entering (via ‘scm_call_n’) the VM. This is especially acute here
because there’s almost nothing happening in C, so we keep bouncing
between Scheme and C.
That’s another reason to start rewriting such primitives in Scheme and
have the C functions just call out to Scheme.
If we do:
perf record guile -c '(string-for-each identity (make-string 15000000))'
we get this profile:
--8<---------------cut here---------------start------------->8---
Overhead Command Shared Object Symbol
31.10% guile libguile-3.0.so.1.1.1 [.] vm_regular_engine
27.48% guile libguile-3.0.so.1.1.1 [.] scm_call_n
14.34% guile libguile-3.0.so.1.1.1 [.] scm_jit_enter_mcode
3.55% guile libguile-3.0.so.1.1.1 [.] scm_i_string_ref
3.37% guile libguile-3.0.so.1.1.1 [.] get_callee_vcode
2.34% guile libguile-3.0.so.1.1.1 [.] scm_call_1
2.31% guile libguile-3.0.so.1.1.1 [.] scm_string_for_each
--8<---------------cut here---------------end--------------->8---
Indeed, we get better performance when turning off JIT:
--8<---------------cut here---------------start------------->8---
$ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity (make-string 15000000))'
0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata 26396maxresident)k
0inputs+0outputs (0major+1583minor)pagefaults 0swaps
$ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity (make-string 15000000))'
0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata 26948maxresident)k
0inputs+0outputs (0major+1748minor)pagefaults 0swaps
$ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity (make-string 15000000))'
0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata 27324maxresident)k
0inputs+0outputs (0major+2548minor)pagefaults 0swaps
--8<---------------cut here---------------end--------------->8---
So it seems that we just keep firing the JIT machinery on every
‘scm_call_n’ for no benefit.
That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, &
co. always show high in statprof:
https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: string-for-each vs. for-each+string->list performance
2020-06-12 20:13 ` Ludovic Courtès
@ 2020-06-13 6:41 ` Linus Björnstam
0 siblings, 0 replies; 6+ messages in thread
From: Linus Björnstam @ 2020-06-13 6:41 UTC (permalink / raw)
To: Ludovic Courtès, guile-user
Thanks for clearing that up.
I have an old implementation of large parts of srfi-13 if that would be of interest. I don't know how much you want to change. The licence situation of the reference implementation is weird iirc.
A beginning could be to replace all higher order functions since that would minimize the kind of performance problems discussed here.
--
Linus Björnstam
On Fri, 12 Jun 2020, at 22:13, Ludovic Courtès wrote:
> Hi,
>
> Linus Björnstam <linus.internet@fastmail.se> skribis:
>
> > You can cut another 15-ish % from that loop by making an inline loop, btw
> >
> > (let loop ((pos 0))
> > (when (< pos (string-length str))
> > ...
> > (loop (1+ pos)))
> >
> > I have been looking at the disassembly, even for simpler cases, but I haven't been able to understand enough of it.
> >
> > BTW: string-for-each is in the default environment, and is probably the same as the srfi-13 C implementation.
>
> ‘string-for-each’ in C (the default) is slower than its Scheme counterpart:
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (define (sfe proc str)
> (define len (string-length str))
> (let loop ((i 0))
> (unless (= i len)
> (proc (string-ref str i))
> (loop (+ 1 i)))))
> scheme@(guile-user)> (define str (make-string 15000000))
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.263725s real time, 0.263722s run time. 0.000000s spent in GC.
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.259538s real time, 0.259529s run time. 0.000000s spent in GC.
> scheme@(guile-user)> ,t (string-for-each identity str)
> ;; 0.841632s real time, 0.841624s run time. 0.000000s spent in GC.
> scheme@(guile-user)> (version)
> $2 = "3.0.2"
> --8<---------------cut here---------------end--------------->8---
>
> In general we seem to pay a high price for leaving (calling a subr) and
> re-entering (via ‘scm_call_n’) the VM. This is especially acute here
> because there’s almost nothing happening in C, so we keep bouncing
> between Scheme and C.
>
> That’s another reason to start rewriting such primitives in Scheme and
> have the C functions just call out to Scheme.
>
> If we do:
>
> perf record guile -c '(string-for-each identity (make-string 15000000))'
>
> we get this profile:
>
> --8<---------------cut here---------------start------------->8---
> Overhead Command Shared Object Symbol
> 31.10% guile libguile-3.0.so.1.1.1 [.] vm_regular_engine
> 27.48% guile libguile-3.0.so.1.1.1 [.] scm_call_n
> 14.34% guile libguile-3.0.so.1.1.1 [.] scm_jit_enter_mcode
> 3.55% guile libguile-3.0.so.1.1.1 [.] scm_i_string_ref
> 3.37% guile libguile-3.0.so.1.1.1 [.] get_callee_vcode
> 2.34% guile libguile-3.0.so.1.1.1 [.] scm_call_1
> 2.31% guile libguile-3.0.so.1.1.1 [.] scm_string_for_each
> --8<---------------cut here---------------end--------------->8---
>
> Indeed, we get better performance when turning off JIT:
>
> --8<---------------cut here---------------start------------->8---
> $ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity
> (make-string 15000000))'
> 0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata
> 26396maxresident)k
> 0inputs+0outputs (0major+1583minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity
> (make-string 15000000))'
> 0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata
> 26948maxresident)k
> 0inputs+0outputs (0major+1748minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity
> (make-string 15000000))'
> 0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata
> 27324maxresident)k
> 0inputs+0outputs (0major+2548minor)pagefaults 0swaps
> --8<---------------cut here---------------end--------------->8---
>
> So it seems that we just keep firing the JIT machinery on every
> ‘scm_call_n’ for no benefit.
>
> That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, &
> co. always show high in statprof:
>
> https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html
>
> Thanks,
> Ludo’.
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-06-13 6:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07 6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
2020-06-07 6:33 ` Aleix Conchillo Flaqué
2020-06-07 11:20 ` Linus Björnstam
2020-06-07 13:50 ` Linus Björnstam
2020-06-12 20:13 ` Ludovic Courtès
2020-06-13 6:41 ` Linus Björnstam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).