unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: Andy Wingo <wingo@igalia.com>
To: Mark H Weaver <mhw@netris.org>
Cc: "Ludovic Courtès" <ludo@gnu.org>, 30066@debbugs.gnu.org
Subject: bug#30066: 'get-bytevector-some' returns only 1 byte from unbuffered ports
Date: Fri, 12 Jan 2018 10:01:11 +0100	[thread overview]
Message-ID: <87373bpi20.fsf@igalia.com> (raw)
In-Reply-To: <87a7xkxdph.fsf@netris.org> (Mark H. Weaver's message of "Thu, 11 Jan 2018 16:55:38 -0500")

On Thu 11 Jan 2018 22:55, Mark H Weaver <mhw@netris.org> writes:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw@netris.org> skribis:
>>
>>> ludo@gnu.org (Ludovic Courtès) writes:
>>
>> [...]
>>
>>>> +  if (SCM_UNBUFFEREDP (port) && (avail < max_buffer_size))
>>>> +    {
>>>> +      /* PORT is unbuffered.  Read as much as possible from PORT.  */
>>>> +      size_t read;
>>>> +
>>>> +      bv = scm_c_make_bytevector (max_buffer_size);
>>>> +      scm_port_buffer_take (buf, (scm_t_uint8 *) SCM_BYTEVECTOR_CONTENTS (bv),
>>>> +                            avail, cur, avail);
>>>> +
>>>> +      read = scm_i_read_bytes (port, bv, avail,
>>>> +                               SCM_BYTEVECTOR_LENGTH (bv) - avail);
>>>
>>> Here's the R6RS specification for 'get-bytevector-some':
>>>
>>>   "Reads from BINARY-INPUT-PORT, blocking as necessary, until bytes are
>>>    available from BINARY-INPUT-PORT or until an end of file is reached.
>>>    If bytes become available, 'get-bytevector-some' returns a freshly
>>>    allocated bytevector containing the initial available bytes (at least
>>>    one), and it updates BINARY-INPUT-PORT to point just past these
>>>    bytes.  If no input bytes are seen before an end of file is reached,
>>>    the end-of-file object is returned."
>>>
>>> By my reading of this, we should block only if necessary to ensure that
>>> we return at least one byte (or EOF).  In other words, if we can return
>>> at least one byte (or EOF), then we must not block, which means that we
>>> must not initiate another 'read'.
>>
>> Indeed.  So perhaps the condition above should be changed to:
>>
>>   if (SCM_UNBUFFEREDP (port) && (avail == 0))
>>
>> ?
>
> That won't work, because the earlier call to 'scm_fill_input' will have
> already initiated a 'read' if the buffer was empty.  The read buffer
> size will determine the maximum number of bytes read, which will be 1 in
> the case of an unbuffered port.  So, at the point of this condition,
> 'avail == 0' will occur only if EOF was encountered, in which case you
> must return EOF without attempting another 'read'.
>
> In order to avoid unnecessary blocking, there must be only one 'read'
> call, and it must be initiated only if the buffer was already empty.
>
> So, in order to accomplish your goal here, I don't see how you can use
> 'scm_fill_input', unless you temporarily increase the size of the read
> buffer beforehand.
>
> Instead, I think you need to first check if the read buffer contains any
> bytes.  If so, empty the buffer and return them.  If the buffer is
> empty, the next thing to check is 'scm_port_buffer_has_eof_p'.  If it's
> set, then clear that flag and return EOF.
>
> Otherwise, if the buffer is empty and 'scm_port_buffer_has_eof_p' is
> false, then you must do what 'scm_fill_input' would have done, except
> using your larger buffer instead of the port's internal read buffer.  In
> particular, you must first switch the port to "reading" mode, flushing
> the write buffer if 'rw_random' is set.
>
> Also, I'd prefer to move this code to ports.c in order to avoid adding
> more internal declarations to ports.h and changing more functions from
> 'static' to global functions.

I agree with Mark here -- thanks for the close review.

>>> Out of curiosity, is there a reason why you're using an unbuffered port
>>> in your use case?
>>
>> It’s to implement redirect à la socat:
>>
>>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=17af5d51de7c40756a4a39d336f81681de2ba447
>
> Why is an unbuffered port being used here?  Can we change it to a
> buffered port?

This was also a question I had!  If you make it a buffered port at 4096
bytes (for example), then get-bytevector-some works exactly like you
want it to, no?

Andy





  reply	other threads:[~2018-01-12  9:01 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-10 15:02 bug#30066: 'get-bytevector-some' returns only 1 byte from unbuffered ports Ludovic Courtès
2018-01-10 15:59 ` Ludovic Courtès
2018-01-10 16:32   ` Andy Wingo
2018-01-10 16:58     ` Nala Ginrut
2018-01-10 17:26       ` Andy Wingo
2018-01-10 17:43         ` Nala Ginrut
2018-01-11 14:34     ` Ludovic Courtès
2018-01-11 19:55       ` Mark H Weaver
2018-01-11 21:02         ` Ludovic Courtès
2018-01-11 21:55           ` Mark H Weaver
2018-01-12  9:01             ` Andy Wingo [this message]
2018-01-12 10:15               ` Ludovic Courtès
2018-01-12 10:33                 ` Andy Wingo
2018-01-13 20:53                   ` Ludovic Courtès
2018-02-16 13:19                     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87373bpi20.fsf@igalia.com \
    --to=wingo@igalia.com \
    --cc=30066@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    --cc=mhw@netris.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).