From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Grover Newsgroups: gmane.lisp.guile.devel Subject: Re: string port slow output on big string Date: Tue, 02 Aug 2005 08:37:41 -0400 Message-ID: <42EF6915.4050409@mail.msen.com> References: <874qggvvmb.fsf@zip.com.au> <87mztpe6we.fsf@zagadka.de> <87r7ddrzgl.fsf@zip.com.au> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2058715606==" X-Trace: sea.gmane.org 1122993671 4441 80.91.229.2 (2 Aug 2005 14:41:11 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 2 Aug 2005 14:41:11 +0000 (UTC) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Aug 02 14:55:42 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DzwIJ-0004sf-ES for guile-devel@m.gmane.org; Tue, 02 Aug 2005 14:54:56 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DzwKz-0003my-VO for guile-devel@m.gmane.org; Tue, 02 Aug 2005 08:57:42 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DzwEf-0000jj-NE for guile-devel@gnu.org; Tue, 02 Aug 2005 08:51:12 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DzwEY-0000et-Ru for guile-devel@gnu.org; Tue, 02 Aug 2005 08:51:06 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DzwEV-0000Wi-WF for guile-devel@gnu.org; Tue, 02 Aug 2005 08:51:00 -0400 Original-Received: from [148.59.80.48] (helo=ww8.msen.com) by monty-python.gnu.org with esmtp (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.34) id 1DzwEA-00027Z-H4 for guile-devel@gnu.org; Tue, 02 Aug 2005 08:50:39 -0400 X-Sent-To: Original-Received: from [192.168.1.220] (pool-151-196-115-140.balt.east.verizon.net [151.196.115.140]) (authenticated bits=0) by ww8.msen.com (8.13.4/8.13.4) with ESMTP id j72CbksG089101 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 2 Aug 2005 08:37:47 -0400 (EDT) (envelope-from awgrover@mail.msen.com) User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317) X-Accept-Language: en-us, en Original-To: guile-devel@gnu.org In-Reply-To: <87r7ddrzgl.fsf@zip.com.au> X-Enigmail-Version: 0.92.0.0 OpenPGP: id=5074AF60; url= X-Milter: Spamilter (Reciever: ww8.msen.com; Sender-ip: 151.196.115.140; Sender-helo: [192.168.1.220]; ) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:5181 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:5181 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============2058715606== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig7B8B12727BC7824617F7FDD1" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig7B8B12727BC7824617F7FDD1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Exponential growth set off a warning bell for me, but you probably have other problems by the time it bites you (consider what you are doing to the page-cache when you copy to the new block). After about the 6th allocation, things converge such that 1/6 of that total allocation is unused on average, i.e. there's a reserve on average of 1/3 the string's size (1/3 of the string was just allocated). E.g. A 1mb string implies an estimated 300kb reserve (actual: ~148k, which is ~1/8). A 1mb string takes 22 allocations/moves (~15000 under the previous code), 1gb requires 39 allocations/moves (about ~15,000,000 under the previous code). Kevin Ryde wrote: > I made the change below, it leaves the code alone, just grows the > buffer more each time, by a factor 1.5x so copying time is no longer > quadratic in the output size. > > I think I'll do this in the 1.6 branch too. Backtraces there have > been slow to the point of unusable for me in some parsing stuff I've > been doing with say 50k or so strings in various parameters. (The > backtrace goes via an output string so that it can truncate big args > like that.) > > > > > ------------------------------------------------------------------------ > > Index: strports.c > =================================================================== > RCS file: /cvsroot/guile/guile/guile-core/libguile/strports.c,v > retrieving revision 1.108 > diff -u -u -r1.108 strports.c > --- strports.c 23 May 2005 19:57:21 -0000 1.108 > +++ strports.c 1 Aug 2005 23:47:06 -0000 > @@ -65,7 +65,30 @@ > has been written to, but this is only updated after a flush. > read_pos and write_pos in principle should be equal, but this is only true > when rw_active is SCM_PORT_NEITHER. > -*/ > + > + ENHANCE-ME - output blocks: > + > + The current code keeps an output string as a single block. That means > + when the size is increased the entire old contents must be copied. It'd > + be more efficient to begin a new block when the old one is full, so > + there's no re-copying of previous data. > + > + To make seeking efficient, keeping the pieces in a vector might be best, > + though appending is probably the most common operation. The size of each > + block could be progressively increased, so the bigger the string the > + bigger the blocks. > + > + When `get-output-string' is called the blocks have to be coalesced into a > + string, the result could be kept as a single big block. If blocks were > + strings then `get-output-string' could notice when there's just one and > + return that with a copy-on-write (though repeated calls to > + `get-output-string' are probably unlikely). > + > + Another possibility would be to extend the port mechanism to let SCM > + strings come through directly from `display' and friends. That way if a > + big string is written it can be kept as a copy-on-write, saving time > + copying and maybe saving some space. */ > + > > scm_t_bits scm_tc16_strport; > > @@ -117,7 +140,14 @@ > #define SCM_WRITE_BLOCK 80 > > /* ensure that write_pos < write_end by enlarging the buffer when > - necessary. update read_buf to account for written chars. */ > + necessary. update read_buf to account for written chars. > + > + The buffer is enlarged by 1.5 times, plus SCM_WRITE_BLOCK. Adding just a > + fixed amount is no good, because there's a block copy for each increment, > + and that copying would take quadratic time. In the past it was found to > + be very slow just adding 80 bytes each time (eg. about 10 seconds for > + writing a 100kbyte string). */ > + > static void > st_flush (SCM port) > { > @@ -125,7 +155,7 @@ > > if (pt->write_pos == pt->write_end) > { > - st_resize_port (pt, pt->write_buf_size + SCM_WRITE_BLOCK); > + st_resize_port (pt, pt->write_buf_size * 3 / 2 + SCM_WRITE_BLOCK); > } > pt->read_pos = pt->write_pos; > if (pt->read_pos > pt->read_end) > > > ------------------------------------------------------------------------ > > _______________________________________________ > Guile-devel mailing list > Guile-devel@gnu.org > http://lists.gnu.org/mailman/listinfo/guile-devel -- Alan Grover awgrover@mail.msen.com +1.734.476.0969 --------------enig7B8B12727BC7824617F7FDD1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFC72kZbLLh2VB0r2ARAjC/AJ9nZqs5O9EnIqmtp6pCzDJh8cMrKgCeK1Sl KV7eWXTj/CjCH9NIP7LpHik= =w56Z -----END PGP SIGNATURE----- --------------enig7B8B12727BC7824617F7FDD1-- --===============2058715606== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel --===============2058715606==--