* string port slow output on big string @ 2005-02-14 0:22 Kevin Ryde 2005-02-28 2:48 ` Marius Vollmer 0 siblings, 1 reply; 5+ messages in thread From: Kevin Ryde @ 2005-02-14 0:22 UTC (permalink / raw) [-- Attachment #1: Type: text/plain, Size: 1159 bytes --] I tried writing a biggish string to a string port, and it was very slow. Eg. (use-modules (ice-9 time)) (let ((str (make-string 100000 #\x))) (call-with-output-string (lambda (port) (time (display str port)))) #f) gives on my poor 333mhz clock utime stime cutime cstime gctime 7.63 7.58 0.05 0.00 0.00 4.17 I struck this trying to use regexp-substitute/global on a file slurped into memory. It was 130k, which is a decent size, but it's well within the realm of reason. I think strports.c st_write ends up doing a realloc and copy every 80 bytes of the block it's writing. It knows the size, but it lets st_flush just grow by 80 bytes at a time. The change below speeds it up from 7 seconds to 10 ms for me. But I don't know if the read side bits of this change are right. Is it supposed to update read_pos, read_end and read_buf_size to be the end of the string, or something? (Of course what would be even nicer would be to avoid big reallocing altogether, like keep a list of chunks and only join them when a get-string call wants the entire block. But that can wait.) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: strports.c.write.diff --] [-- Type: text/x-patch, Size: 906 bytes --] --- strports.c.~1.105.~ 2005-01-28 08:25:34.000000000 +1100 +++ strports.c 2005-02-14 11:20:05.000000000 +1100 @@ -142,18 +142,14 @@ scm_t_port *pt = SCM_PTAB_ENTRY (port); const char *input = (char *) data; - while (size > 0) - { - int space = pt->write_end - pt->write_pos; - int write_len = (size > space) ? space : size; - - memcpy ((char *) pt->write_pos, input, write_len); - pt->write_pos += write_len; - size -= write_len; - input += write_len; - if (write_len == space) - st_flush (port); - } + /* if not enough room for "size" then make that amount and an additional + SCM_WRITE_BLOCK */ + if (size > pt->write_end - pt->write_pos) + st_resize_port (pt, pt->write_pos - pt->write_buf + + size + SCM_WRITE_BLOCK); + + memcpy ((char *) pt->write_pos, input, size); + pt->write_pos += size; } static void [-- Attachment #3: Type: text/plain, Size: 143 bytes --] _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: string port slow output on big string 2005-02-14 0:22 string port slow output on big string Kevin Ryde @ 2005-02-28 2:48 ` Marius Vollmer [not found] ` <87r7ddrzgl.fsf@zip.com.au> 0 siblings, 1 reply; 5+ messages in thread From: Marius Vollmer @ 2005-02-28 2:48 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > But I don't know if the read side bits of this change are right. Is > it supposed to update read_pos, read_end and read_buf_size to be the > end of the string, or something? I don't know. Could you try to figure this out yourself? -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <87r7ddrzgl.fsf@zip.com.au>]
* Re: string port slow output on big string [not found] ` <87r7ddrzgl.fsf@zip.com.au> @ 2005-08-02 12:37 ` Alan Grover 2005-08-03 22:43 ` Kevin Ryde 2005-08-10 22:32 ` Marius Vollmer 1 sibling, 1 reply; 5+ messages in thread From: Alan Grover @ 2005-08-02 12:37 UTC (permalink / raw) [-- Attachment #1.1: Type: text/plain, Size: 4258 bytes --] Exponential growth set off a warning bell for me, but you probably have other problems by the time it bites you (consider what you are doing to the page-cache when you copy to the new block). After about the 6th allocation, things converge such that 1/6 of that total allocation is unused on average, i.e. there's a reserve on average of 1/3 the string's size (1/3 of the string was just allocated). E.g. A 1mb string implies an estimated 300kb reserve (actual: ~148k, which is ~1/8). A 1mb string takes 22 allocations/moves (~15000 under the previous code), 1gb requires 39 allocations/moves (about ~15,000,000 under the previous code). Kevin Ryde wrote: > I made the change below, it leaves the code alone, just grows the > buffer more each time, by a factor 1.5x so copying time is no longer > quadratic in the output size. > > I think I'll do this in the 1.6 branch too. Backtraces there have > been slow to the point of unusable for me in some parsing stuff I've > been doing with say 50k or so strings in various parameters. (The > backtrace goes via an output string so that it can truncate big args > like that.) > > > > > ------------------------------------------------------------------------ > > Index: strports.c > =================================================================== > RCS file: /cvsroot/guile/guile/guile-core/libguile/strports.c,v > retrieving revision 1.108 > diff -u -u -r1.108 strports.c > --- strports.c 23 May 2005 19:57:21 -0000 1.108 > +++ strports.c 1 Aug 2005 23:47:06 -0000 > @@ -65,7 +65,30 @@ > has been written to, but this is only updated after a flush. > read_pos and write_pos in principle should be equal, but this is only true > when rw_active is SCM_PORT_NEITHER. > -*/ > + > + ENHANCE-ME - output blocks: > + > + The current code keeps an output string as a single block. That means > + when the size is increased the entire old contents must be copied. It'd > + be more efficient to begin a new block when the old one is full, so > + there's no re-copying of previous data. > + > + To make seeking efficient, keeping the pieces in a vector might be best, > + though appending is probably the most common operation. The size of each > + block could be progressively increased, so the bigger the string the > + bigger the blocks. > + > + When `get-output-string' is called the blocks have to be coalesced into a > + string, the result could be kept as a single big block. If blocks were > + strings then `get-output-string' could notice when there's just one and > + return that with a copy-on-write (though repeated calls to > + `get-output-string' are probably unlikely). > + > + Another possibility would be to extend the port mechanism to let SCM > + strings come through directly from `display' and friends. That way if a > + big string is written it can be kept as a copy-on-write, saving time > + copying and maybe saving some space. */ > + > > scm_t_bits scm_tc16_strport; > > @@ -117,7 +140,14 @@ > #define SCM_WRITE_BLOCK 80 > > /* ensure that write_pos < write_end by enlarging the buffer when > - necessary. update read_buf to account for written chars. */ > + necessary. update read_buf to account for written chars. > + > + The buffer is enlarged by 1.5 times, plus SCM_WRITE_BLOCK. Adding just a > + fixed amount is no good, because there's a block copy for each increment, > + and that copying would take quadratic time. In the past it was found to > + be very slow just adding 80 bytes each time (eg. about 10 seconds for > + writing a 100kbyte string). */ > + > static void > st_flush (SCM port) > { > @@ -125,7 +155,7 @@ > > if (pt->write_pos == pt->write_end) > { > - st_resize_port (pt, pt->write_buf_size + SCM_WRITE_BLOCK); > + st_resize_port (pt, pt->write_buf_size * 3 / 2 + SCM_WRITE_BLOCK); > } > pt->read_pos = pt->write_pos; > if (pt->read_pos > pt->read_end) > > > ------------------------------------------------------------------------ > > _______________________________________________ > Guile-devel mailing list > Guile-devel@gnu.org > http://lists.gnu.org/mailman/listinfo/guile-devel -- Alan Grover awgrover@mail.msen.com +1.734.476.0969 [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] [-- Attachment #2: Type: text/plain, Size: 143 bytes --] _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: string port slow output on big string 2005-08-02 12:37 ` Alan Grover @ 2005-08-03 22:43 ` Kevin Ryde 0 siblings, 0 replies; 5+ messages in thread From: Kevin Ryde @ 2005-08-03 22:43 UTC (permalink / raw) Cc: guile-devel Alan Grover <awgrover@mail.msen.com> writes: > > A 1mb string takes 22 allocations/moves (~15000 under the previous > code), Yep, 15000 being so slow that you think it's gone into an inf loop :-). _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: string port slow output on big string [not found] ` <87r7ddrzgl.fsf@zip.com.au> 2005-08-02 12:37 ` Alan Grover @ 2005-08-10 22:32 ` Marius Vollmer 1 sibling, 0 replies; 5+ messages in thread From: Marius Vollmer @ 2005-08-10 22:32 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > I think I'll do this in the 1.6 branch too. Yes, sounds good. -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-08-10 22:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-14 0:22 string port slow output on big string Kevin Ryde 2005-02-28 2:48 ` Marius Vollmer [not found] ` <87r7ddrzgl.fsf@zip.com.au> 2005-08-02 12:37 ` Alan Grover 2005-08-03 22:43 ` Kevin Ryde 2005-08-10 22:32 ` Marius Vollmer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).