* bug#10627: char-ready? is broken for multibyte encodings @ 2012-01-28 10:21 Mark H Weaver 2013-02-24 19:11 ` Andy Wingo 0 siblings, 1 reply; 10+ messages in thread From: Mark H Weaver @ 2012-01-28 10:21 UTC (permalink / raw) To: 10627 The R5RS specifies that if 'char-ready?' returns #t, then the next 'read-char' operation is guaranteed not to hang. This is not currently the case for ports using a multibyte encoding. 'char-ready?' currently returns #t whenever at least one _byte_ is available. This is not correct in general. It should return #t only if there is a complete _character_ available. Mark ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2012-01-28 10:21 bug#10627: char-ready? is broken for multibyte encodings Mark H Weaver @ 2013-02-24 19:11 ` Andy Wingo 2013-02-24 20:14 ` Mark H Weaver 0 siblings, 1 reply; 10+ messages in thread From: Andy Wingo @ 2013-02-24 19:11 UTC (permalink / raw) To: Mark H Weaver; +Cc: 10627 On Sat 28 Jan 2012 11:21, Mark H Weaver <mhw@netris.org> writes: > The R5RS specifies that if 'char-ready?' returns #t, then the next > 'read-char' operation is guaranteed not to hang. This is not currently > the case for ports using a multibyte encoding. > > 'char-ready?' currently returns #t whenever at least one _byte_ is > available. This is not correct in general. It should return #t only if > there is a complete _character_ available. This procedure is omitted in the R6RS because it is not a good interface. Besides its semantic difficulties, can you think of a sane implementation for multibyte characters? I suggest we document that this procedure only works correctly in encodings with 1-byte characters and recommend that people use u8-ready? instead. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-24 19:11 ` Andy Wingo @ 2013-02-24 20:14 ` Mark H Weaver 2013-02-24 22:15 ` Andy Wingo 0 siblings, 1 reply; 10+ messages in thread From: Mark H Weaver @ 2013-02-24 20:14 UTC (permalink / raw) To: Andy Wingo; +Cc: 10627 Hi Andy, Andy Wingo <wingo@pobox.com> writes: > On Sat 28 Jan 2012 11:21, Mark H Weaver <mhw@netris.org> writes: > >> The R5RS specifies that if 'char-ready?' returns #t, then the next >> 'read-char' operation is guaranteed not to hang. This is not currently >> the case for ports using a multibyte encoding. >> >> 'char-ready?' currently returns #t whenever at least one _byte_ is >> available. This is not correct in general. It should return #t only if >> there is a complete _character_ available. > > This procedure is omitted in the R6RS because it is not a good > interface. Besides its semantic difficulties, can you think of a sane > implementation for multibyte characters? Maybe I'm missing something, but I don't see any semantic problem here, and it seems straightforward to implement. 'char-ready?' should simply read bytes until either a complete character is available, or no more bytes are ready. In either case, all the bytes should then be 'unget' before returning. What's the problem? The only reason I haven't yet fixed this is because it will require some refactoring in ports.c. I guess the most straightforward approach is to generalize 'get_codepoint', 'get_utf8_codepoint', and 'get_iconv_codepoint' to support a non-blocking mode of operation. What do you think? Regards, Mark ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-24 20:14 ` Mark H Weaver @ 2013-02-24 22:15 ` Andy Wingo 2013-02-25 0:06 ` Mark H Weaver 0 siblings, 1 reply; 10+ messages in thread From: Andy Wingo @ 2013-02-24 22:15 UTC (permalink / raw) To: Mark H Weaver; +Cc: 10627 Hi :) On Sun 24 Feb 2013 21:14, Mark H Weaver <mhw@netris.org> writes: > Andy Wingo <wingo@pobox.com> writes: > >> On Sat 28 Jan 2012 11:21, Mark H Weaver <mhw@netris.org> writes: >> >>> The R5RS specifies that if 'char-ready?' returns #t, then the next >>> 'read-char' operation is guaranteed not to hang. This is not currently >>> the case for ports using a multibyte encoding. >>> >>> 'char-ready?' currently returns #t whenever at least one _byte_ is >>> available. This is not correct in general. It should return #t only if >>> there is a complete _character_ available. >> >> This procedure is omitted in the R6RS because it is not a good >> interface. Besides its semantic difficulties, can you think of a sane >> implementation for multibyte characters? > > Maybe I'm missing something, but I don't see any semantic problem here, > and it seems straightforward to implement. 'char-ready?' should simply > read bytes until either a complete character is available, or no more > bytes are ready. In either case, all the bytes should then be 'unget' > before returning. What's the problem? The problem is that char-ready? should not read anything. If you want to peek, use peek-char. Note that if the stream is at EOF, char-ready? should return #t. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-24 22:15 ` Andy Wingo @ 2013-02-25 0:06 ` Mark H Weaver 2013-02-25 1:23 ` Daniel Hartwig 2013-02-25 8:55 ` Andy Wingo 0 siblings, 2 replies; 10+ messages in thread From: Mark H Weaver @ 2013-02-25 0:06 UTC (permalink / raw) To: Andy Wingo; +Cc: 10627 Andy Wingo <wingo@pobox.com> writes: > On Sun 24 Feb 2013 21:14, Mark H Weaver <mhw@netris.org> writes: > >> Maybe I'm missing something, but I don't see any semantic problem here, >> and it seems straightforward to implement. 'char-ready?' should simply >> read bytes until either a complete character is available, or no more >> bytes are ready. In either case, all the bytes should then be 'unget' >> before returning. What's the problem? > > The problem is that char-ready? should not read anything. Okay, but if all bytes read are later *unread*, and the reads never block, then why does it matter? The reads in my proposed implementation are just an internal implementation detail, and it seems to me that the user cannot tell the difference, as long as he does not peek underneath the Scheme port abstraction. If you prefer, perhaps a nicer way to think about it is that 'char-ready?' looks ahead in the putback buffer and/or the read buffer (refilling it in a non-blocking mode if needed), and returns #t iff a complete character is present in the buffer(s), or EOF is reached. However, is seems to me that implementing this in terms of read-byte and unget-byte is simpler, because it avoids duplication of the logic regarding putback buffers and refilling of buffers. Maybe there's some reason why this is a bad idea, but I haven't heard one. I agree that 'char-ready?' is an antiquated interface, but it is nonetheless part of the R5RS (and Guile since approximately forever), and it is the only way to do a non-blocking read in portable R5RS. It seems to me that we ought to try to implement it as well as we can, no? > If you want to peek, use peek-char. Okay, but that's a totally different tool with a different use case. It cannot be used to do non-blocking reads. > Note that if the stream is at EOF, char-ready? should return #t. Agreed. More thoughts? Thanks, Mark ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-25 0:06 ` Mark H Weaver @ 2013-02-25 1:23 ` Daniel Hartwig 2013-02-25 8:55 ` Andy Wingo 1 sibling, 0 replies; 10+ messages in thread From: Daniel Hartwig @ 2013-02-25 1:23 UTC (permalink / raw) To: Mark H Weaver; +Cc: 10627 On 25 February 2013 08:06, Mark H Weaver <mhw@netris.org> wrote: > Andy Wingo <wingo@pobox.com> writes: > >> On Sun 24 Feb 2013 21:14, Mark H Weaver <mhw@netris.org> writes: >> >>> Maybe I'm missing something, but I don't see any semantic problem here, >>> and it seems straightforward to implement. 'char-ready?' should simply >>> read bytes until either a complete character is available, or no more >>> bytes are ready. In either case, all the bytes should then be 'unget' >>> before returning. What's the problem? >> >> The problem is that char-ready? should not read anything. > > Okay, but if all bytes read are later *unread*, and the reads never > block, then why does it matter? Taking care to still use sf_input_waiting for soft ports? Reading bytes from a soft port could have side effects (i.e. logging action or similar). ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-25 0:06 ` Mark H Weaver 2013-02-25 1:23 ` Daniel Hartwig @ 2013-02-25 8:55 ` Andy Wingo 2013-02-26 19:50 ` Mark H Weaver 1 sibling, 1 reply; 10+ messages in thread From: Andy Wingo @ 2013-02-25 8:55 UTC (permalink / raw) To: Mark H Weaver; +Cc: 10627 Hi Mark, Are you proposing that `char-ready?' do a nonblocking read if the buffer is empty? That could work. On Mon 25 Feb 2013 01:06, Mark H Weaver <mhw@netris.org> writes: > However, is seems to me that implementing this in terms of read-byte and > unget-byte is simpler, because it avoids duplication of the logic > regarding putback buffers and refilling of buffers. Could work, if the port is nonblocking to begin with. > I agree that 'char-ready?' is an antiquated interface, but it is > nonetheless part of the R5RS (and Guile since approximately forever), > and it is the only way to do a non-blocking read in portable R5RS. It > seems to me that we ought to try to implement it as well as we can, no? Do what you like to do :) But if it were my time, I would simply document that it checks for a byte and not a character and move on. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-25 8:55 ` Andy Wingo @ 2013-02-26 19:50 ` Mark H Weaver 2013-02-26 19:59 ` Andy Wingo 0 siblings, 1 reply; 10+ messages in thread From: Mark H Weaver @ 2013-02-26 19:50 UTC (permalink / raw) To: Andy Wingo; +Cc: 10627 Andy Wingo <wingo@pobox.com> writes: > Are you proposing that `char-ready?' do a nonblocking read if > the buffer is empty? That could work. Yes. I suspect that something along these lines is already implemented, because I don't see how 'u8-ready?' could work properly without it. > Do what you like to do :) But if it were my time, I would simply > document that it checks for a byte and not a character and move on. I'd like to fix it properly. Let's keep this bug open until it's done. Thanks, Mark ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-26 19:50 ` Mark H Weaver @ 2013-02-26 19:59 ` Andy Wingo 2016-06-20 19:23 ` Andy Wingo 0 siblings, 1 reply; 10+ messages in thread From: Andy Wingo @ 2013-02-26 19:59 UTC (permalink / raw) To: Mark H Weaver; +Cc: 10627 On Tue 26 Feb 2013 20:50, Mark H Weaver <mhw@netris.org> writes: > Andy Wingo <wingo@pobox.com> writes: >> Are you proposing that `char-ready?' do a nonblocking read if >> the buffer is empty? That could work. > > Yes. I suspect that something along these lines is already implemented, > because I don't see how 'u8-ready?' could work properly without it. It does a poll with a timeout of 0. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#10627: char-ready? is broken for multibyte encodings 2013-02-26 19:59 ` Andy Wingo @ 2016-06-20 19:23 ` Andy Wingo 0 siblings, 0 replies; 10+ messages in thread From: Andy Wingo @ 2016-06-20 19:23 UTC (permalink / raw) To: Mark H Weaver; +Cc: 10627-done On Tue 26 Feb 2013 20:59, Andy Wingo <wingo@pobox.com> writes: > On Tue 26 Feb 2013 20:50, Mark H Weaver <mhw@netris.org> writes: > >> Andy Wingo <wingo@pobox.com> writes: >>> Are you proposing that `char-ready?' do a nonblocking read if >>> the buffer is empty? That could work. >> >> Yes. I suspect that something along these lines is already implemented, >> because I don't see how 'u8-ready?' could work properly without it. > > It does a poll with a timeout of 0. In the end I added this to the manual: Note that @code{char-ready?} only works reliably for terminals and sockets with one-byte encodings. Under the hood it will return @code{#t} if the port has any input buffered, or if the file descriptor that backs the port polls as readable, indicating that Guile can fetch more bytes from the kernel. However being able to fetch one byte doesn't mean that a full character is available; @xref{Encoding}. Also, on many systems it's possible for a file descriptor to poll as readable, but then block when it comes time to read bytes. Note also that on Linux kernels, all file ports backed by files always poll as readable. For non-file ports, this procedure always returns @code{#t}, except for soft ports, which have a @code{char-ready?} handler. @xref{Soft Ports}. In short, this is a legacy procedure whose semantics are hard to provide. However it is a useful check to see if any input is buffered. @xref{Non-Blocking I/O}. We could try a non-blocking read but at that point we should just provide a non-blocking read-char, and allow users to unread-char. That would be a different bug :) Andy ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-06-20 19:23 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-01-28 10:21 bug#10627: char-ready? is broken for multibyte encodings Mark H Weaver 2013-02-24 19:11 ` Andy Wingo 2013-02-24 20:14 ` Mark H Weaver 2013-02-24 22:15 ` Andy Wingo 2013-02-25 0:06 ` Mark H Weaver 2013-02-25 1:23 ` Daniel Hartwig 2013-02-25 8:55 ` Andy Wingo 2013-02-26 19:50 ` Mark H Weaver 2013-02-26 19:59 ` Andy Wingo 2016-06-20 19:23 ` Andy Wingo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).