From: Alan Grover <awgrover@mail.msen.com>
Subject: Re: string port slow output on big string
Date: Tue, 02 Aug 2005 08:37:41 -0400 [thread overview]
Message-ID: <42EF6915.4050409@mail.msen.com> (raw)
In-Reply-To: <87r7ddrzgl.fsf@zip.com.au>
[-- Attachment #1.1: Type: text/plain, Size: 4258 bytes --]
Exponential growth set off a warning bell for me, but you probably have
other problems by the time it bites you (consider what you are doing to
the page-cache when you copy to the new block).
After about the 6th allocation, things converge such that 1/6 of that
total allocation is unused on average, i.e. there's a reserve on average
of 1/3 the string's size (1/3 of the string was just allocated). E.g. A
1mb string implies an estimated 300kb reserve (actual: ~148k, which is
~1/8).
A 1mb string takes 22 allocations/moves (~15000 under the previous
code), 1gb requires 39 allocations/moves (about ~15,000,000 under the
previous code).
Kevin Ryde wrote:
> I made the change below, it leaves the code alone, just grows the
> buffer more each time, by a factor 1.5x so copying time is no longer
> quadratic in the output size.
>
> I think I'll do this in the 1.6 branch too. Backtraces there have
> been slow to the point of unusable for me in some parsing stuff I've
> been doing with say 50k or so strings in various parameters. (The
> backtrace goes via an output string so that it can truncate big args
> like that.)
>
>
>
>
> ------------------------------------------------------------------------
>
> Index: strports.c
> ===================================================================
> RCS file: /cvsroot/guile/guile/guile-core/libguile/strports.c,v
> retrieving revision 1.108
> diff -u -u -r1.108 strports.c
> --- strports.c 23 May 2005 19:57:21 -0000 1.108
> +++ strports.c 1 Aug 2005 23:47:06 -0000
> @@ -65,7 +65,30 @@
> has been written to, but this is only updated after a flush.
> read_pos and write_pos in principle should be equal, but this is only true
> when rw_active is SCM_PORT_NEITHER.
> -*/
> +
> + ENHANCE-ME - output blocks:
> +
> + The current code keeps an output string as a single block. That means
> + when the size is increased the entire old contents must be copied. It'd
> + be more efficient to begin a new block when the old one is full, so
> + there's no re-copying of previous data.
> +
> + To make seeking efficient, keeping the pieces in a vector might be best,
> + though appending is probably the most common operation. The size of each
> + block could be progressively increased, so the bigger the string the
> + bigger the blocks.
> +
> + When `get-output-string' is called the blocks have to be coalesced into a
> + string, the result could be kept as a single big block. If blocks were
> + strings then `get-output-string' could notice when there's just one and
> + return that with a copy-on-write (though repeated calls to
> + `get-output-string' are probably unlikely).
> +
> + Another possibility would be to extend the port mechanism to let SCM
> + strings come through directly from `display' and friends. That way if a
> + big string is written it can be kept as a copy-on-write, saving time
> + copying and maybe saving some space. */
> +
>
> scm_t_bits scm_tc16_strport;
>
> @@ -117,7 +140,14 @@
> #define SCM_WRITE_BLOCK 80
>
> /* ensure that write_pos < write_end by enlarging the buffer when
> - necessary. update read_buf to account for written chars. */
> + necessary. update read_buf to account for written chars.
> +
> + The buffer is enlarged by 1.5 times, plus SCM_WRITE_BLOCK. Adding just a
> + fixed amount is no good, because there's a block copy for each increment,
> + and that copying would take quadratic time. In the past it was found to
> + be very slow just adding 80 bytes each time (eg. about 10 seconds for
> + writing a 100kbyte string). */
> +
> static void
> st_flush (SCM port)
> {
> @@ -125,7 +155,7 @@
>
> if (pt->write_pos == pt->write_end)
> {
> - st_resize_port (pt, pt->write_buf_size + SCM_WRITE_BLOCK);
> + st_resize_port (pt, pt->write_buf_size * 3 / 2 + SCM_WRITE_BLOCK);
> }
> pt->read_pos = pt->write_pos;
> if (pt->read_pos > pt->read_end)
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Guile-devel mailing list
> Guile-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/guile-devel
--
Alan Grover
awgrover@mail.msen.com
+1.734.476.0969
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]
[-- Attachment #2: Type: text/plain, Size: 143 bytes --]
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
next prev parent reply other threads:[~2005-08-02 12:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-02-14 0:22 string port slow output on big string Kevin Ryde
2005-02-28 2:48 ` Marius Vollmer
[not found] ` <87r7ddrzgl.fsf@zip.com.au>
2005-08-02 12:37 ` Alan Grover [this message]
2005-08-03 22:43 ` Kevin Ryde
2005-08-10 22:32 ` Marius Vollmer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42EF6915.4050409@mail.msen.com \
--to=awgrover@mail.msen.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).