Re: string-for-each vs. for-each+string->list performance

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

From: "Linus Björnstam" <linus.internet@fastmail.se>
To: "Ludovic Courtès" <ludo@gnu.org>, guile-user@gnu.org
Subject: Re: string-for-each vs. for-each+string->list performance
Date: Sat, 13 Jun 2020 08:41:17 +0200	[thread overview]
Message-ID: <db2035dc-befd-4510-bb2f-23eb4afb1fca@www.fastmail.com> (raw)
In-Reply-To: <87k10c2gah.fsf@gnu.org>


Thanks for clearing that up.

I have an old implementation of large parts of srfi-13 if that would be of interest. I don't know how much you want to change. The licence situation of the reference implementation is weird iirc.

A beginning could be to replace all higher order functions since that would minimize the kind of performance problems discussed here. 
-- 
  Linus Björnstam

On Fri, 12 Jun 2020, at 22:13, Ludovic Courtès wrote:
> Hi,
> 
> Linus Björnstam <linus.internet@fastmail.se> skribis:
> 
> > You can cut another 15-ish % from that loop by making an inline loop, btw
> >
> > (let loop ((pos 0))
> >   (when (< pos (string-length str))
> >     ...
> >     (loop (1+ pos)))
> >
> > I have been looking at the disassembly, even for simpler cases, but I haven't been able to understand enough of it. 
> >
> > BTW: string-for-each is in the default environment, and is probably the same as the srfi-13 C implementation.
> 
> ‘string-for-each’ in C (the default) is slower than its Scheme counterpart:
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (define (sfe proc str)
> 		       (define len (string-length str))
> 		       (let loop ((i 0))
> 			 (unless (= i len)
> 			   (proc (string-ref str i))
> 			   (loop (+ 1 i)))))
> scheme@(guile-user)> (define str (make-string 15000000))
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.263725s real time, 0.263722s run time.  0.000000s spent in GC.
> scheme@(guile-user)> ,t (sfe identity str)
> ;; 0.259538s real time, 0.259529s run time.  0.000000s spent in GC.
> scheme@(guile-user)> ,t (string-for-each identity str)
> ;; 0.841632s real time, 0.841624s run time.  0.000000s spent in GC.
> scheme@(guile-user)> (version)
> $2 = "3.0.2"
> --8<---------------cut here---------------end--------------->8---
> 
> In general we seem to pay a high price for leaving (calling a subr) and
> re-entering (via ‘scm_call_n’) the VM.  This is especially acute here
> because there’s almost nothing happening in C, so we keep bouncing
> between Scheme and C.
> 
> That’s another reason to start rewriting such primitives in Scheme and
> have the C functions just call out to Scheme.
> 
> If we do:
> 
>   perf record guile -c '(string-for-each identity (make-string 15000000))'
> 
> we get this profile:
> 
> --8<---------------cut here---------------start------------->8---
> Overhead  Command  Shared Object          Symbol
>   31.10%  guile    libguile-3.0.so.1.1.1  [.] vm_regular_engine
>   27.48%  guile    libguile-3.0.so.1.1.1  [.] scm_call_n
>   14.34%  guile    libguile-3.0.so.1.1.1  [.] scm_jit_enter_mcode
>    3.55%  guile    libguile-3.0.so.1.1.1  [.] scm_i_string_ref
>    3.37%  guile    libguile-3.0.so.1.1.1  [.] get_callee_vcode
>    2.34%  guile    libguile-3.0.so.1.1.1  [.] scm_call_1
>    2.31%  guile    libguile-3.0.so.1.1.1  [.] scm_string_for_each
> --8<---------------cut here---------------end--------------->8---
> 
> Indeed, we get better performance when turning off JIT:
> 
> --8<---------------cut here---------------start------------->8---
> $ GUILE_JIT_THRESHOLD=-1 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.47user 0.00system 0:00.47elapsed 100%CPU (0avgtext+0avgdata 
> 26396maxresident)k
> 0inputs+0outputs (0major+1583minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=100 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.83user 0.00system 0:00.83elapsed 100%CPU (0avgtext+0avgdata 
> 26948maxresident)k
> 0inputs+0outputs (0major+1748minor)pagefaults 0swaps
> $ GUILE_JIT_THRESHOLD=0 time guile -c '(string-for-each identity 
> (make-string 15000000))'
> 0.84user 0.00system 0:00.85elapsed 100%CPU (0avgtext+0avgdata 
> 27324maxresident)k
> 0inputs+0outputs (0major+2548minor)pagefaults 0swaps
> --8<---------------cut here---------------end--------------->8---
> 
> So it seems that we just keep firing the JIT machinery on every
> ‘scm_call_n’ for no benefit.
> 
> That’s probably also the reason why ‘%after-gc-hunk’, ‘reap-pipes’, &
> co. always show high in statprof:
> 
>   https://lists.gnu.org/archive/html/guile-devel/2020-05/msg00019.html  
> 
> Thanks,
> Ludo’.
> 
> 
>

     prev parent reply	other threads:[~2020-06-13  6:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-07  6:27 string-for-each vs. for-each+string->list performance Aleix Conchillo Flaqué
2020-06-07  6:33 ` Aleix Conchillo Flaqué
2020-06-07 11:20 ` Linus Björnstam
2020-06-07 13:50 ` Linus Björnstam
2020-06-12 20:13   ` Ludovic Courtès
2020-06-13  6:41     ` Linus Björnstam [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db2035dc-befd-4510-bb2f-23eb4afb1fca@www.fastmail.com \
    --to=linus.internet@fastmail.se \
    --cc=guile-user@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).