* bug#22901: drain-input doesn't decode
@ 2016-03-04 3:09 Zefram
2016-06-20 16:12 ` Andy Wingo
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Zefram @ 2016-03-04 3:09 UTC (permalink / raw)
To: 22901
The documentation for drain-input says that it returns a string of
characters, implying that the result is equivalent to what you'd get
from calling read-char some number of times. In fact it differs in a
significant respect: whereas read-char decodes input octets according to
the port's selected encoding, drain-input ignores the selected encoding
and always decodes according to ISO-8859-1 (thus preserving the octet
values in character form).
$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object? c) (reverse l) (r (cons c l))))))) (newline)'
"UCS-2BE"
(353 610 867)
$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char (current-input-port)) (write (map char->integer (string->list (drain-input (current-input-port))))) (newline)'
"UCS-2BE"
(1 97 2 98 3 99)
The practical upshot is that the input returned by drain-input can't
be used in the same way as regular input from read-char. It can still
be used if the code doing the reading is totally aware of the encoding,
so that it can perform the decoding manually, but this seems a failure
of abstraction. The value returned by drain-input ought to be coherent
with the abstraction level at which it is specified.
I can see that there is a reason for drain-input to avoid performing
decoding: the problem that occurs if the buffer ends in the middle
of a character. If drain-input is to return decoded characters then
presumably in this case it would have to read further octets beyond the
buffer contents, in an unbuffered manner, until it reaches a character
boundary. If this is too unpalatable, perhaps drain-input should be
permitted only on ports configured for single-octet character encodings.
If, on the other hand, it is decided to endorse the current non-decoding
behaviour, then the break of abstraction needs to be documented.
-zefram
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#22901: drain-input doesn't decode
2016-03-04 3:09 bug#22901: drain-input doesn't decode Zefram
@ 2016-06-20 16:12 ` Andy Wingo
2017-02-26 17:46 ` Matt Wette
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Andy Wingo @ 2016-06-20 16:12 UTC (permalink / raw)
To: Zefram; +Cc: 22901
On Fri 04 Mar 2016 04:09, Zefram <zefram@fysh.org> writes:
> The documentation for drain-input says that it returns a string of
> characters, implying that the result is equivalent to what you'd get
> from calling read-char some number of times. In fact it differs in a
> significant respect: whereas read-char decodes input octets according to
> the port's selected encoding, drain-input ignores the selected encoding
> and always decodes according to ISO-8859-1 (thus preserving the octet
> values in character form).
>
> $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding!
> (current-input-port) "UCS-2BE") (write (port-encoding
> (current-input-port))) (newline) (write (map char->integer (let r ((l
> '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object?
> c) (reverse l) (r (cons c l))))))) (newline)'
> "UCS-2BE"
> (353 610 867)
> $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding!
> (current-input-port) "UCS-2BE") (write (port-encoding
> (current-input-port))) (newline) (peek-char (current-input-port))
> (write (map char->integer (string->list (drain-input
> (current-input-port))))) (newline)'
> "UCS-2BE"
> (1 97 2 98 3 99)
Thanks for the test case! FWIW, this is fixed in Guile 2.1.3. I am not
sure what we should do about Guile 2.0. I guess we should make it do
the documented thing though!
Andy
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#22901: drain-input doesn't decode
2016-03-04 3:09 bug#22901: drain-input doesn't decode Zefram
2016-06-20 16:12 ` Andy Wingo
@ 2017-02-26 17:46 ` Matt Wette
2017-02-26 17:58 ` Matt Wette
2021-05-16 17:55 ` Taylan Kammer
2021-05-19 11:41 ` Taylan Kammer
3 siblings, 1 reply; 6+ messages in thread
From: Matt Wette @ 2017-02-26 17:46 UTC (permalink / raw)
To: 22901
[-- Attachment #1: Type: text/plain, Size: 543 bytes --]
I put together a test and tried on 2.1.7 - my test fails. See attached.
(pass-if "encoded input"
(let ((fn (test-file))
(nc "utf-8")
(st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.")
;;(st "hello, world\n")
)
(let ((p1 (open-output-file fn #:encoding nc)))
;;(display st p1)
(string-for-each (lambda (ch) (write-char ch p1)) st)
(close p1))
(let* ((p0 (open-input-file fn #:encoding nc))
(s0 (begin (unread-char (read-char p0) p0) (drain-input p0))))
(simple-format #t "~S\n" s0)
(equal? s0 st))))
[-- Attachment #2: port-di.test --]
[-- Type: application/octet-stream, Size: 712 bytes --]
;; port-di.text -*- scheme -*-
(add-to-load-path "guile-2.1.7-dev3/test-suite")
(use-modules (test-suite lib))
(define (test-file)
(string-append (getcwd) "/ports-test.tmp"))
(with-test-prefix "drain-input"
(pass-if "encoded input"
(let ((fn (test-file))
(nc "utf-8")
(st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.")
;;(st "hello, world\n")
)
(let ((p1 (open-output-file fn #:encoding nc)))
;;(display st p1)
(string-for-each (lambda (ch) (write-char ch p1)) st)
(close p1))
(let* ((p0 (open-input-file fn #:encoding nc))
(s0 (begin (unread-char (read-char p0) p0) (drain-input p0))))
(simple-format #t "~S\n" s0)
(equal? s0 st))))
)
;; --- last line ---
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#22901: drain-input doesn't decode
2017-02-26 17:46 ` Matt Wette
@ 2017-02-26 17:58 ` Matt Wette
0 siblings, 0 replies; 6+ messages in thread
From: Matt Wette @ 2017-02-26 17:58 UTC (permalink / raw)
To: 22901
[-- Attachment #1: Type: text/plain, Size: 859 bytes --]
> On Feb 26, 2017, at 9:46 AM, Matt Wette <matt.wette@gmail.com> wrote:
>
> I put together a test and tried on 2.1.7 - my test fails. See attached.
>
> (pass-if "encoded input"
> (let ((fn (test-file))
> (nc "utf-8")
> (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.")
> ;;(st "hello, world\n")
> )
> (let ((p1 (open-output-file fn #:encoding nc)))
> ;;(display st p1)
> (string-for-each (lambda (ch) (write-char ch p1)) st)
> (close p1))
> (let* ((p0 (open-input-file fn #:encoding nc))
> (s0 (begin (unread-char (read-char p0) p0) (drain-input p0))))
> (simple-format #t "~S\n" s0)
> (equal? s0 st))))
>
My bad. The failure was on guile-2.0.13. It seems to work on guile-2.1.7:
mwette$ guile-2.1.7-dev3/meta/guile port-di.test
"βαδ ασσ am I."
PASS: drain-input: encoded input
[-- Attachment #2: Type: text/html, Size: 2931 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#22901: drain-input doesn't decode
2016-03-04 3:09 bug#22901: drain-input doesn't decode Zefram
2016-06-20 16:12 ` Andy Wingo
2017-02-26 17:46 ` Matt Wette
@ 2021-05-16 17:55 ` Taylan Kammer
2021-05-19 11:41 ` Taylan Kammer
3 siblings, 0 replies; 6+ messages in thread
From: Taylan Kammer @ 2021-05-16 17:55 UTC (permalink / raw)
To: 22901, Zefram, Andy Wingo
Are we still maintaining 2.0, or can this issue be closed?
--
Taylan
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#22901: drain-input doesn't decode
2016-03-04 3:09 bug#22901: drain-input doesn't decode Zefram
` (2 preceding siblings ...)
2021-05-16 17:55 ` Taylan Kammer
@ 2021-05-19 11:41 ` Taylan Kammer
3 siblings, 0 replies; 6+ messages in thread
From: Taylan Kammer @ 2021-05-19 11:41 UTC (permalink / raw)
To: 22901-done
Closing this since it's 5 years old and fixed in Guile 2.1 and higher.
--
Taylan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-05-19 11:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 3:09 bug#22901: drain-input doesn't decode Zefram
2016-06-20 16:12 ` Andy Wingo
2017-02-26 17:46 ` Matt Wette
2017-02-26 17:58 ` Matt Wette
2021-05-16 17:55 ` Taylan Kammer
2021-05-19 11:41 ` Taylan Kammer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).