Hi! Andy Wingo writes: > At this point, to improve performance, we have two choices: (1) make > string-set! cheaper, or (2) avoid string-set!. I do not know how to do > (1) in the presence of threads[2]. (2) seems feasible, if we look at what > functions are actually calling scm_c_string_set_x. The ones that show up > in the profile are all in read.c: > > ./read.c:628: scm_c_string_set_x (*tok_buf, j, SCM_MAKE_CHAR (c)); > ./read.c:703: scm_c_string_set_x (*tok_buf, j, SCM_MAKE_CHAR (c)); > ./read.c:766: scm_c_string_set_x (*tok_buf, j, SCM_MAKE_CHAR (c)); > > All of these calls use the token buffer API, in which a SCM string is > allocated and grown as necessary. The readers fill the buffer with > string-set!. I just committed to HEAD the attached patch. It removes all uses of the token buffer API and instead privileges the use of C on-stack buffers in the common case; in cases where larger buffers are needed, then it uses Scheme strings. The rationale is that, in practice, tokens encountered in source files (e.g., symbols, numbers) are quite short, so we can avoid allocating many intermediary Scheme objects. This idea (and pieces of code) was implemented in Guile-Reader. I tried hard to preserve the exact behavior of the previous reader, including undocumented behavior that might be relied on (e.g., exceptions), so that we can eventually put it in the 1.8 branch (I'm hoping that the next stable branch will not need it because it will have a brand new Unicode-capable reader :-)). The patch removes internal functions that were exported, namely: scm_grow_tok_buf, scm_flush_ws, scm_casei_streq, scm_lreadr, scm_lreadrecparen I think these are safe to remove, even for the next 1.8 release. Google's codesearch (http://www.google.com/codesearch) seems to agree with this. What do you think? I'll let Andy provide more detailed performance analysis ;-), but here is what I observe (after several runs of each). With the new reader: $ time for i in `seq 1 100` ; do ./pre-inst-guile -c '0' ; done real 0m3.141s user 0m1.380s sys 0m1.748s With the old one: $ time for i in `seq 1 100` ; do guile -c '0' ; done real 0m3.851s user 0m3.404s sys 0m0.448s That would mean an 18% improvement on total startup time. Guile-Reader has a reader-specific benchmark (in the `tests' directory) that is used to compare Guile-Reader's generated readers with Guile's built-in reader. With the new reader: * Comparing without position recording Guile's built-in reader: 65 Guile-Reader's default reader: 66 improvement: .98 times faster * Comparing with position recording Guile's built-in reader: 97 Guile-Reader's default reader: 129 improvement: .75 times faster I.e., Guile-Reader is slightly slower than the new built-in reader. With the old reader: * Comparing without position recording Guile's built-in reader: 448 Guile-Reader's default reader: 65 improvement: 6.89 times faster * Comparing with position recording Guile's built-in reader: 542 Guile-Reader's default reader: 131 improvement: 4.14 times faster I.e., Guile-Reader is 4 to 7 times faster than the previous built-in reader. Thanks, Ludovic.