The documentation for drain-input says that it returns a string of characters, implying that the result is equivalent to what you'd get from calling read-char some number of times. In fact it differs in a significant respect: whereas read-char decodes input octets according to the port's selected encoding, drain-input ignores the selected encoding and always decodes according to ISO-8859-1 (thus preserving the octet values in character form). $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object? c) (reverse l) (r (cons c l))))))) (newline)' "UCS-2BE" (353 610 867) $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char (current-input-port)) (write (map char->integer (string->list (drain-input (current-input-port))))) (newline)' "UCS-2BE" (1 97 2 98 3 99) The practical upshot is that the input returned by drain-input can't be used in the same way as regular input from read-char. It can still be used if the code doing the reading is totally aware of the encoding, so that it can perform the decoding manually, but this seems a failure of abstraction. The value returned by drain-input ought to be coherent with the abstraction level at which it is specified. I can see that there is a reason for drain-input to avoid performing decoding: the problem that occurs if the buffer ends in the middle of a character. If drain-input is to return decoded characters then presumably in this case it would have to read further octets beyond the buffer contents, in an unbuffered manner, until it reaches a character boundary. If this is too unpalatable, perhaps drain-input should be permitted only on ports configured for single-octet character encodings. If, on the other hand, it is decided to endorse the current non-decoding behaviour, then the break of abstraction needs to be documented. -zefram
On Fri 04 Mar 2016 04:09, Zefram <zefram@fysh.org> writes:
> The documentation for drain-input says that it returns a string of
> characters, implying that the result is equivalent to what you'd get
> from calling read-char some number of times. In fact it differs in a
> significant respect: whereas read-char decodes input octets according to
> the port's selected encoding, drain-input ignores the selected encoding
> and always decodes according to ISO-8859-1 (thus preserving the octet
> values in character form).
>
> $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding!
> (current-input-port) "UCS-2BE") (write (port-encoding
> (current-input-port))) (newline) (write (map char->integer (let r ((l
> '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object?
> c) (reverse l) (r (cons c l))))))) (newline)'
> "UCS-2BE"
> (353 610 867)
> $ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding!
> (current-input-port) "UCS-2BE") (write (port-encoding
> (current-input-port))) (newline) (peek-char (current-input-port))
> (write (map char->integer (string->list (drain-input
> (current-input-port))))) (newline)'
> "UCS-2BE"
> (1 97 2 98 3 99)
Thanks for the test case! FWIW, this is fixed in Guile 2.1.3. I am not
sure what we should do about Guile 2.0. I guess we should make it do
the documented thing though!
Andy
[-- Attachment #1: Type: text/plain, Size: 543 bytes --] I put together a test and tried on 2.1.7 - my test fails. See attached. (pass-if "encoded input" (let ((fn (test-file)) (nc "utf-8") (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.") ;;(st "hello, world\n") ) (let ((p1 (open-output-file fn #:encoding nc))) ;;(display st p1) (string-for-each (lambda (ch) (write-char ch p1)) st) (close p1)) (let* ((p0 (open-input-file fn #:encoding nc)) (s0 (begin (unread-char (read-char p0) p0) (drain-input p0)))) (simple-format #t "~S\n" s0) (equal? s0 st)))) [-- Attachment #2: port-di.test --] [-- Type: application/octet-stream, Size: 712 bytes --] ;; port-di.text -*- scheme -*- (add-to-load-path "guile-2.1.7-dev3/test-suite") (use-modules (test-suite lib)) (define (test-file) (string-append (getcwd) "/ports-test.tmp")) (with-test-prefix "drain-input" (pass-if "encoded input" (let ((fn (test-file)) (nc "utf-8") (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.") ;;(st "hello, world\n") ) (let ((p1 (open-output-file fn #:encoding nc))) ;;(display st p1) (string-for-each (lambda (ch) (write-char ch p1)) st) (close p1)) (let* ((p0 (open-input-file fn #:encoding nc)) (s0 (begin (unread-char (read-char p0) p0) (drain-input p0)))) (simple-format #t "~S\n" s0) (equal? s0 st)))) ) ;; --- last line ---
[-- Attachment #1: Type: text/plain, Size: 859 bytes --] > On Feb 26, 2017, at 9:46 AM, Matt Wette <matt.wette@gmail.com> wrote: > > I put together a test and tried on 2.1.7 - my test fails. See attached. > > (pass-if "encoded input" > (let ((fn (test-file)) > (nc "utf-8") > (st "\u03b2\u03b1\u03b4 \u03b1\u03c3\u03c3 am I.") > ;;(st "hello, world\n") > ) > (let ((p1 (open-output-file fn #:encoding nc))) > ;;(display st p1) > (string-for-each (lambda (ch) (write-char ch p1)) st) > (close p1)) > (let* ((p0 (open-input-file fn #:encoding nc)) > (s0 (begin (unread-char (read-char p0) p0) (drain-input p0)))) > (simple-format #t "~S\n" s0) > (equal? s0 st)))) > My bad. The failure was on guile-2.0.13. It seems to work on guile-2.1.7: mwette$ guile-2.1.7-dev3/meta/guile port-di.test "βαδ ασσ am I." PASS: drain-input: encoded input [-- Attachment #2: Type: text/html, Size: 2931 bytes --]
Are we still maintaining 2.0, or can this issue be closed? -- Taylan
Closing this since it's 5 years old and fixed in Guile 2.1 and higher. -- Taylan