unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: "Neil Jerram" <neiljerram@googlemail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-devel@gnu.org
Subject: Re: [PATCH] Add a `read' method for ports
Date: Tue, 15 Jul 2008 23:08:01 +0100	[thread overview]
Message-ID: <49dd78620807151507t2a0d90cna48f64f86a38fe2b@mail.gmail.com> (raw)
In-Reply-To: <87tzfi8smz.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]

2008/6/24 Ludovic Courtès <ludo@gnu.org>:
> Hi Neil,
>
> Sorry for the delay, I was off-line last week.

No problem!

> "Neil Jerram" <neiljerram@googlemail.com> writes:
>
>> Agreed.  Yesterday I was worried about possible confusion from
>> bypassing a port's read buffer at a time when the read buffer has some
>> data in it.  But in fact this is not a problem, because
>> scm_fill_input() should be (and in practice is) only called when the
>> port's read buffer is empty.
>
> Right.  Putting an assertion in there to make sure can't hurt.

OK, done.

> These arguments make a lot of sense in 1.8.  In the longer run, though,
> I'd prefer to see `fill_input' superseded by a `read' method akin to
> what I proposed.  What do you think?

To be honest, I don't yet see a clear need for it.  If there is no
drawback to what we have now, and there would be a compatibility cost
to introducing a new read method, then why do that?

>> Based on all this, I'd like to update my suggestion so that
>>
>> - scm_fill_input_buf always uses the caller's buffer
>>
>> - the remaining scm_fill_input_buf logic is inlined into scm_c_read
>> (since that is the only place that uses scm_fill_input_buf, and
>> inlining it permits some further simplifications).
>
> OK.  The issue is that if the caller-supplied buffer is smaller than the
> port's buffer, we end up reading less data than we'd otherwise do.  I
> think this scenario should be addressed (by comparing `read_buf_size'
> and `size'), as was the case in my proposal.

True in theory, but I feel very confident that this will never matter
in practice, and hence that we don't need to have code for this.

Confidence is based on experience from my day job, which is in network
protocols - where buffered reads are a very common operation.  On the
other hand, it is possible I've missed some other application where
different patterns/rules apply, so please let me know if you have
specific counterexamples in mind.

>> +  /* Now we will call scm_fill_input repeatedly until we have read the
>> +     requested number of bytes.  (Note that a single scm_fill_input
>> +     call does not guarantee to fill the whole of the port's read
>> +     buffer.)  For these calls, since we already have a buffer here to
>> +     read into, we bypass the port's own read buffer (if it has one),
>> +     by saving it off and modifying the port structure to point to our
>> +     own buffer. */
>> +  saved_buf = pt->read_buf;
>> +  saved_buf_size = pt->read_buf_size;
>> +  pt->read_pos = pt->read_buf = pt->read_end = buffer;
>> +  pt->read_buf_size = size;
>
> `fill_input' methods can raise an exception, as is the case with soft
> parts.  Thus, this code should be protected by a `dynwind' that
> restores the port's buffer and size.

Good point.  I've added this.

>> +      remaining -= scm_c_read (port_or_fd, base + off, remaining);
>> +      if (remaining % sz != 0)
>> +        SCM_MISC_ERROR ("unexpected EOF", SCM_EOL);
>> +      ans -= remaining / sz;
>
> Likewise, `scm_c_read ()' might raise an exception, so we may need a
> `dynwind' here (my original patch for `uniform-vector-read!' did that).

This is covered by the dynwind inside scm_c_read, isn't it?

Updated patch is attached.  As well as taking a look, could you check
that it really improves performance in the cases that motivated this?
I tried running the read.bm benchmarks, but in those cases it didn't
seem to make much difference; is that expected?

Regards,
        Neil

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: unbuffered-read.diff --]
[-- Type: text/x-patch; name=unbuffered-read.diff, Size: 5735 bytes --]

diff --git a/libguile/ChangeLog b/libguile/ChangeLog
index 5c30574..d1bf0d9 100644
--- a/libguile/ChangeLog
+++ b/libguile/ChangeLog
@@ -1,3 +1,15 @@
+2008-07-15  Neil Jerram  <neil@ossau.uklinux.net>
+
+	* srfi-4.c (scm_uniform_vector_read_x): Use scm_c_read, instead of
+	equivalent code here.
+
+	* ports.c (scm_fill_input): Add assertion that read buffer is
+	empty when called.
+	(port_and_swap_buffer, swap_buffer): New, for...
+	(scm_c_read): Use caller's buffer for reading, to avoid making N
+	1-byte low-level read calls, in the case where the port is
+	unbuffered (or has a very small buffer).
+
 2008-07-05  Ludovic Courtès  <ludo@gnu.org>
 
 	* strings.c (scm_c_symbol_length): New function.
diff --git a/libguile/ports.c b/libguile/ports.c
index b97c826..9563001 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -28,6 +28,7 @@
 #include <stdio.h>
 #include <errno.h>
 #include <fcntl.h>  /* for chsize on mingw */
+#include <assert.h>
 
 #include "libguile/_scm.h"
 #include "libguile/async.h"
@@ -974,6 +975,8 @@ scm_fill_input (SCM port)
 {
   scm_t_port *pt = SCM_PTAB_ENTRY (port);
 
+  assert (pt->read_pos == pt->read_end);
+
   if (pt->read_buf == pt->putback_buf)
     {
       /* finished reading put-back chars.  */
@@ -1036,12 +1039,35 @@ scm_lfwrite (const char *ptr, size_t size, SCM port)
  *
  * Warning: Doesn't update port line and column counts!  */
 
+struct port_and_swap_buffer {
+  scm_t_port *pt;
+  unsigned char *buffer;
+  size_t size;
+};
+
+static void
+swap_buffer (void *data)
+{
+  struct port_and_swap_buffer *psb = (struct port_and_swap_buffer *) data;
+  unsigned char *old_buf = psb->pt->read_buf;
+  size_t old_size = psb->pt->read_buf_size;
+
+  /* Make the port use (buffer, size) from the struct. */
+  psb->pt->read_pos = psb->pt->read_buf = psb->pt->read_end = psb->buffer;
+  psb->pt->read_buf_size = psb->size;
+
+  /* Save the port's old (buffer, size) in the struct. */
+  psb->buffer = old_buf;
+  psb->size = old_size;
+}
+
 size_t
 scm_c_read (SCM port, void *buffer, size_t size)
 #define FUNC_NAME "scm_c_read"
 {
   scm_t_port *pt;
   size_t n_read = 0, n_available;
+  struct port_and_swap_buffer psb;
 
   SCM_VALIDATE_OPINPORT (1, port);
 
@@ -1052,35 +1078,48 @@ scm_c_read (SCM port, void *buffer, size_t size)
   if (pt->rw_random)
     pt->rw_active = SCM_PORT_READ;
 
-  if (SCM_READ_BUFFER_EMPTY_P (pt))
-    {
-      if (scm_fill_input (port) == EOF)
-	return 0;
-    }
-  
-  n_available = pt->read_end - pt->read_pos;
-  
-  while (n_available < size)
+  /* Take bytes first from the port's read buffer. */
+  if (pt->read_pos < pt->read_end)
     {
+      n_available = min (size, pt->read_end - pt->read_pos);
       memcpy (buffer, pt->read_pos, n_available);
       buffer = (char *) buffer + n_available;
       pt->read_pos += n_available;
       n_read += n_available;
-      
-      if (SCM_READ_BUFFER_EMPTY_P (pt))
-	{
-	  if (scm_fill_input (port) == EOF)
-	    return n_read;
-	}
-
       size -= n_available;
-      n_available = pt->read_end - pt->read_pos;
     }
 
-  memcpy (buffer, pt->read_pos, size);
-  pt->read_pos += size;
+  /* Now we will call scm_fill_input repeatedly until we have read the
+     requested number of bytes.  (Note that a single scm_fill_input
+     call does not guarantee to fill the whole of the port's read
+     buffer.)  For these calls, since we already have a buffer here to
+     read into, we bypass the port's own read buffer (if it has one),
+     by saving it off and modifying the port structure to point to our
+     own buffer.
+
+     We need to make sure that the port's normal buffer is reinstated
+     in case one of the scm_fill_input () calls throws an exception;
+     we use the scm_dynwind_* API to achieve that. */
+  psb.pt = pt;
+  psb.buffer = buffer;
+  psb.size = size;
+  scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE);
+  scm_dynwind_rewind_handler (swap_buffer, &psb, SCM_F_WIND_EXPLICITLY);
+  scm_dynwind_unwind_handler (swap_buffer, &psb, SCM_F_WIND_EXPLICITLY);
+
+  /* Call scm_fill_input until we have all the bytes that we need, or
+     we hit EOF. */
+  while (pt->read_buf_size && (scm_fill_input (port) != EOF))
+    {
+      pt->read_buf_size -= (pt->read_end - pt->read_pos);
+      pt->read_pos = pt->read_buf = pt->read_end;
+    }
+  n_read += pt->read_buf - (unsigned char *) buffer;
+
+  /* Reinstate the port's normal buffer. */
+  scm_dynwind_end ();
 
-  return n_read + size;
+  return n_read;
 }
 #undef FUNC_NAME
 
diff --git a/libguile/srfi-4.c b/libguile/srfi-4.c
index 7d22f8b..a01f86e 100644
--- a/libguile/srfi-4.c
+++ b/libguile/srfi-4.c
@@ -886,38 +886,11 @@ SCM_DEFINE (scm_uniform_vector_read_x, "uniform-vector-read!", 1, 3, 0,
 
   if (SCM_NIMP (port_or_fd))
     {
-      scm_t_port *pt = SCM_PTAB_ENTRY (port_or_fd);
-
-      if (pt->rw_active == SCM_PORT_WRITE)
-	scm_flush (port_or_fd);
-
       ans = cend - cstart;
-      while (remaining > 0)
-	{
-	  if (pt->read_pos < pt->read_end)
-	    {
-	      size_t to_copy = min (pt->read_end - pt->read_pos,
-				    remaining);
-	      
-	      memcpy (base + off, pt->read_pos, to_copy);
-	      pt->read_pos += to_copy;
-	      remaining -= to_copy;
-	      off += to_copy;
-	    }
-	  else
-	    {
-	      if (scm_fill_input (port_or_fd) == EOF)
-		{
-		  if (remaining % sz != 0)
-		    SCM_MISC_ERROR ("unexpected EOF", SCM_EOL);
-		  ans -= remaining / sz;
-		  break;
-		}
-	    }
-	}
-      
-      if (pt->rw_random)
-	pt->rw_active = SCM_PORT_READ;
+      remaining -= scm_c_read (port_or_fd, base + off, remaining);
+      if (remaining % sz != 0)
+        SCM_MISC_ERROR ("unexpected EOF", SCM_EOL);
+      ans -= remaining / sz;
     }
   else /* file descriptor.  */
     {

  parent reply	other threads:[~2008-07-15 22:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-01 18:58 [PATCH] Add a `read' method for ports Ludovic Courtès
2008-06-08 21:01 ` Ludovic Courtès
2008-06-09 17:01   ` Neil Jerram
2008-06-09 20:59     ` Ludovic Courtès
2008-06-11 21:38       ` Neil Jerram
2008-06-12  0:19         ` Neil Jerram
2008-06-12  7:58         ` Ludovic Courtès
2008-06-13  0:18           ` Neil Jerram
2008-06-24 20:50             ` Ludovic Courtès
2008-07-15 19:24               ` Ludovic Courtès
2008-07-15 20:05                 ` Neil Jerram
2008-07-15 22:08               ` Neil Jerram [this message]
2008-07-16 21:41                 ` Ludovic Courtès
2008-09-12 20:09                   ` Ludovic Courtès
2008-09-14 23:08                   ` Neil Jerram
2008-09-15  1:01                     ` Neil Jerram
2008-09-15  7:47                       ` Ludovic Courtès
2008-09-15  7:44                     ` Ludovic Courtès
2008-09-15 19:18                       ` Neil Jerram
2008-09-15 19:20                         ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49dd78620807151507t2a0d90cna48f64f86a38fe2b@mail.gmail.com \
    --to=neiljerram@googlemail.com \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).