Hi!

Andy Wingo <wingo@pobox.com> writes:

> At this point, to improve performance, we have two choices: (1) make
> string-set! cheaper, or (2) avoid string-set!. I do not know how to do
> (1) in the presence of threads[2]. (2) seems feasible, if we look at what
> functions are actually calling scm_c_string_set_x. The ones that show up
> in the profile are all in read.c:
>
>     ./read.c:628:	  scm_c_string_set_x (*tok_buf, j, SCM_MAKE_CHAR (c));
>     ./read.c:703:      scm_c_string_set_x (*tok_buf, j, SCM_MAKE_CHAR (c));
>     ./read.c:766:            scm_c_string_set_x (*tok_buf, j, SCM_MAKE_CHAR (c));
>
> All of these calls use the token buffer API, in which a SCM string is
> allocated and grown as necessary. The readers fill the buffer with
> string-set!.

I just committed to HEAD the attached patch.  It removes all uses of the
token buffer API and instead privileges the use of C on-stack buffers in
the common case; in cases where larger buffers are needed, then it uses
Scheme strings.  The rationale is that, in practice, tokens encountered
in source files (e.g., symbols, numbers) are quite short, so we can
avoid allocating many intermediary Scheme objects.  This idea (and
pieces of code) was implemented in Guile-Reader.

I tried hard to preserve the exact behavior of the previous reader,
including undocumented behavior that might be relied on (e.g.,
exceptions), so that we can eventually put it in the 1.8 branch (I'm
hoping that the next stable branch will not need it because it will have
a brand new Unicode-capable reader :-)).

The patch removes internal functions that were exported, namely:

  scm_grow_tok_buf, scm_flush_ws, scm_casei_streq, scm_lreadr,
  scm_lreadrecparen

I think these are safe to remove, even for the next 1.8 release.
Google's codesearch (http://www.google.com/codesearch) seems to agree
with this.  What do you think?


I'll let Andy provide more detailed performance analysis ;-), but here
is what I observe (after several runs of each).  With the new reader:

  $ time for i in `seq 1 100` ; do ./pre-inst-guile -c '0' ; done

  real    0m3.141s
  user    0m1.380s
  sys 0m1.748s

With the old one:

  $ time for i in `seq 1 100` ; do guile -c '0' ; done

  real    0m3.851s
  user    0m3.404s
  sys 0m0.448s

That would mean an 18% improvement on total startup time.

Guile-Reader has a reader-specific benchmark (in the `tests' directory)
that is used to compare Guile-Reader's generated readers with Guile's
built-in reader.  With the new reader:

  * Comparing without position recording

    Guile's built-in reader:        65
    Guile-Reader's default reader:  66
    improvement:                    .98 times faster

  * Comparing with position recording

    Guile's built-in reader:        97
    Guile-Reader's default reader:  129
    improvement:                    .75 times faster

I.e., Guile-Reader is slightly slower than the new built-in reader.

With the old reader:

  * Comparing without position recording

    Guile's built-in reader:        448
    Guile-Reader's default reader:  65
    improvement:                    6.89 times faster

  * Comparing with position recording

    Guile's built-in reader:        542
    Guile-Reader's default reader:  131
    improvement:                    4.14 times faster

I.e., Guile-Reader is 4 to 7 times faster than the previous built-in
reader.

Thanks,
Ludovic.