unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* string port slow output on big string
@ 2005-02-14  0:22 Kevin Ryde
  2005-02-28  2:48 ` Marius Vollmer
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Ryde @ 2005-02-14  0:22 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]

I tried writing a biggish string to a string port, and it was very
slow.  Eg.

    (use-modules (ice-9 time))
    (let ((str (make-string 100000 #\x)))
      (call-with-output-string (lambda (port)
                                 (time (display str port))))
      #f)

gives on my poor 333mhz

    clock utime stime cutime cstime gctime
     7.63  7.58  0.05   0.00   0.00   4.17

I struck this trying to use regexp-substitute/global on a file slurped
into memory.  It was 130k, which is a decent size, but it's well
within the realm of reason.


I think strports.c st_write ends up doing a realloc and copy every 80
bytes of the block it's writing.  It knows the size, but it lets
st_flush just grow by 80 bytes at a time.

The change below speeds it up from 7 seconds to 10 ms for me.

But I don't know if the read side bits of this change are right.  Is
it supposed to update read_pos, read_end and read_buf_size to be the
end of the string, or something?


(Of course what would be even nicer would be to avoid big reallocing
altogether, like keep a list of chunks and only join them when a
get-string call wants the entire block.  But that can wait.)



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: strports.c.write.diff --]
[-- Type: text/x-patch, Size: 906 bytes --]

--- strports.c.~1.105.~	2005-01-28 08:25:34.000000000 +1100
+++ strports.c	2005-02-14 11:20:05.000000000 +1100
@@ -142,18 +142,14 @@
   scm_t_port *pt = SCM_PTAB_ENTRY (port);
   const char *input = (char *) data;
 
-  while (size > 0)
-    {
-      int space = pt->write_end - pt->write_pos;
-      int write_len = (size > space) ? space : size;
-      
-      memcpy ((char *) pt->write_pos, input, write_len);
-      pt->write_pos += write_len;
-      size -= write_len;
-      input += write_len;
-      if (write_len == space)
-	st_flush (port);
-    }
+  /* if not enough room for "size" then make that amount and an additional
+     SCM_WRITE_BLOCK */
+  if (size > pt->write_end - pt->write_pos)
+    st_resize_port (pt, pt->write_pos - pt->write_buf
+                        + size + SCM_WRITE_BLOCK);
+
+  memcpy ((char *) pt->write_pos, input, size);
+  pt->write_pos += size;
 }
 
 static void

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: string port slow output on big string
  2005-02-14  0:22 string port slow output on big string Kevin Ryde
@ 2005-02-28  2:48 ` Marius Vollmer
       [not found]   ` <87r7ddrzgl.fsf@zip.com.au>
  0 siblings, 1 reply; 5+ messages in thread
From: Marius Vollmer @ 2005-02-28  2:48 UTC (permalink / raw)


Kevin Ryde <user42@zip.com.au> writes:

> But I don't know if the read side bits of this change are right.  Is
> it supposed to update read_pos, read_end and read_buf_size to be the
> end of the string, or something?

I don't know.  Could you try to figure this out yourself?

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: string port slow output on big string
       [not found]   ` <87r7ddrzgl.fsf@zip.com.au>
@ 2005-08-02 12:37     ` Alan Grover
  2005-08-03 22:43       ` Kevin Ryde
  2005-08-10 22:32     ` Marius Vollmer
  1 sibling, 1 reply; 5+ messages in thread
From: Alan Grover @ 2005-08-02 12:37 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 4258 bytes --]

Exponential growth set off a warning bell for me, but you probably have
other problems by the time it bites you (consider what you are doing to
the page-cache when you copy to the new block).

After about the 6th allocation, things converge such that 1/6 of that
total allocation is unused on average, i.e. there's a reserve on average
of 1/3 the string's size (1/3 of the string was just allocated). E.g. A
1mb string implies an estimated 300kb reserve (actual: ~148k, which is
~1/8).

A 1mb string takes 22 allocations/moves (~15000 under the previous
code), 1gb requires 39 allocations/moves (about ~15,000,000 under the
previous code).

Kevin Ryde wrote:
> I made the change below, it leaves the code alone, just grows the
> buffer more each time, by a factor 1.5x so copying time is no longer
> quadratic in the output size.
> 
> I think I'll do this in the 1.6 branch too.  Backtraces there have
> been slow to the point of unusable for me in some parsing stuff I've
> been doing with say 50k or so strings in various parameters.  (The
> backtrace goes via an output string so that it can truncate big args
> like that.)
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Index: strports.c
> ===================================================================
> RCS file: /cvsroot/guile/guile/guile-core/libguile/strports.c,v
> retrieving revision 1.108
> diff -u -u -r1.108 strports.c
> --- strports.c	23 May 2005 19:57:21 -0000	1.108
> +++ strports.c	1 Aug 2005 23:47:06 -0000
> @@ -65,7 +65,30 @@
>     has been written to, but this is only updated after a flush.
>     read_pos and write_pos in principle should be equal, but this is only true
>     when rw_active is SCM_PORT_NEITHER.
> -*/
> +
> +   ENHANCE-ME - output blocks:
> +
> +   The current code keeps an output string as a single block.  That means
> +   when the size is increased the entire old contents must be copied.  It'd
> +   be more efficient to begin a new block when the old one is full, so
> +   there's no re-copying of previous data.
> +
> +   To make seeking efficient, keeping the pieces in a vector might be best,
> +   though appending is probably the most common operation.  The size of each
> +   block could be progressively increased, so the bigger the string the
> +   bigger the blocks.
> +
> +   When `get-output-string' is called the blocks have to be coalesced into a
> +   string, the result could be kept as a single big block.  If blocks were
> +   strings then `get-output-string' could notice when there's just one and
> +   return that with a copy-on-write (though repeated calls to
> +   `get-output-string' are probably unlikely).
> +
> +   Another possibility would be to extend the port mechanism to let SCM
> +   strings come through directly from `display' and friends.  That way if a
> +   big string is written it can be kept as a copy-on-write, saving time
> +   copying and maybe saving some space.  */
> +
>  
>  scm_t_bits scm_tc16_strport;
>  
> @@ -117,7 +140,14 @@
>  #define SCM_WRITE_BLOCK 80
>  
>  /* ensure that write_pos < write_end by enlarging the buffer when
> -   necessary.  update read_buf to account for written chars.  */
> +   necessary.  update read_buf to account for written chars.
> +
> +   The buffer is enlarged by 1.5 times, plus SCM_WRITE_BLOCK.  Adding just a
> +   fixed amount is no good, because there's a block copy for each increment,
> +   and that copying would take quadratic time.  In the past it was found to
> +   be very slow just adding 80 bytes each time (eg. about 10 seconds for
> +   writing a 100kbyte string).  */
> +
>  static void
>  st_flush (SCM port)
>  {
> @@ -125,7 +155,7 @@
>  
>    if (pt->write_pos == pt->write_end)
>      {
> -      st_resize_port (pt, pt->write_buf_size + SCM_WRITE_BLOCK);
> +      st_resize_port (pt, pt->write_buf_size * 3 / 2 + SCM_WRITE_BLOCK);
>      }
>    pt->read_pos = pt->write_pos;
>    if (pt->read_pos > pt->read_end)
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Guile-devel mailing list
> Guile-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/guile-devel

-- 
Alan Grover
awgrover@mail.msen.com
+1.734.476.0969

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

[-- Attachment #2: Type: text/plain, Size: 143 bytes --]

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: string port slow output on big string
  2005-08-02 12:37     ` Alan Grover
@ 2005-08-03 22:43       ` Kevin Ryde
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin Ryde @ 2005-08-03 22:43 UTC (permalink / raw)
  Cc: guile-devel

Alan Grover <awgrover@mail.msen.com> writes:
>
> A 1mb string takes 22 allocations/moves (~15000 under the previous
> code),

Yep, 15000 being so slow that you think it's gone into an inf loop
:-).


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: string port slow output on big string
       [not found]   ` <87r7ddrzgl.fsf@zip.com.au>
  2005-08-02 12:37     ` Alan Grover
@ 2005-08-10 22:32     ` Marius Vollmer
  1 sibling, 0 replies; 5+ messages in thread
From: Marius Vollmer @ 2005-08-10 22:32 UTC (permalink / raw)


Kevin Ryde <user42@zip.com.au> writes:

> I think I'll do this in the 1.6 branch too.

Yes, sounds good.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-08-10 22:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-14  0:22 string port slow output on big string Kevin Ryde
2005-02-28  2:48 ` Marius Vollmer
     [not found]   ` <87r7ddrzgl.fsf@zip.com.au>
2005-08-02 12:37     ` Alan Grover
2005-08-03 22:43       ` Kevin Ryde
2005-08-10 22:32     ` Marius Vollmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).