unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in binary mode
@ 2015-03-25 14:31 David Kastrup
  2015-03-26 22:57 ` Mark H Weaver
  0 siblings, 1 reply; 3+ messages in thread
From: David Kastrup @ 2015-03-25 14:31 UTC (permalink / raw)
  To: 20200

[-- Attachment #1: Type: text/plain, Size: 53 bytes --]


Run the following code in an UTF-8 capable locale:


[-- Attachment #2: bad.scm --]
[-- Type: text/plain, Size: 555 bytes --]

(setlocale LC_ALL "")
(use-modules (rnrs io ports) (rnrs bytevectors) (ice-9 format))
(let ((p (open-bytevector-input-port
	  (u8-list->bytevector '(#xc3 #x9f #xc3 #X9f)))))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (format #t "#x~x\n" (char->integer (read-char p)))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (set-port-encoding! p "ISO-8859-1")
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (format #t "#x~x\n" (char->integer (read-char p)))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p)))

[-- Attachment #3: Type: text/plain, Size: 2092 bytes --]


This results in the output
#f #t
#xdf
#f #t
ISO-8859-1 #f
#xc3
ISO-8859-1 #f

The manual, however, states:

 -- Scheme Procedure: port-encoding port
 -- C Function: scm_port_encoding (port)
     Returns, as a string, the character encoding that PORT uses to
     interpret its input and output.  The value ‘#f’ is equivalent to
     ‘"ISO-8859-1"’.

That would appear to be false since the value #f here is treated as
equivalent to "UTF-8" rather than "ISO-8859-1".

In addition, the manual states

 -- Scheme Procedure: binary-port? port
     Return ‘#t’ if PORT is a "binary port", suitable for binary data
     input/output.

     Note that internally Guile does not differentiate between binary
     and textual ports, unlike the R6RS. Thus, this procedure returns
     true when PORT does not have an associated encoding—i.e., when
     ‘(port-encoding PORT)’ is ‘#f’ (*note port-encoding: Ports.).  This
     is the case for ports returned by R6RS procedures such as
     ‘open-bytevector-input-port’ and ‘make-custom-binary-output-port’.

     However, Guile currently does not prevent use of textual I/O
     procedures such as ‘display’ or ‘read-char’ with binary ports.
     Doing so “upgrades” the port from binary to textual, under the
     ISO-8859-1 encoding.  Likewise, Guile does not prevent use of
     ‘set-port-encoding!’ on a binary port, which also turns it into a
     “textual” port.

But it would appear that the only way to actually get binary-encoded
read-char behavior is to switch the port to textual.  While the port is
in "binary" mode, it will decode as utf-8 rather than deliver binary
data.  Also it will not automagically switch itself away from the
nominal #f encoding which is not actually present.

Putting (with-fluids ((%default-port-encoding #f)) ...) around the
open-bytevector-input-port call results in the output
#f #t
#xc3
ISO-8859-1 #f
ISO-8859-1 #f
#x9f
ISO-8859-1 #f
which actually corresponds to the documentation.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in binary mode
  2015-03-25 14:31 bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in binary mode David Kastrup
@ 2015-03-26 22:57 ` Mark H Weaver
  2015-03-28 20:13   ` Mark H Weaver
  0 siblings, 1 reply; 3+ messages in thread
From: Mark H Weaver @ 2015-03-26 22:57 UTC (permalink / raw)
  To: David Kastrup; +Cc: 20200

David Kastrup <dak@gnu.org> writes:

> Run the following code in an UTF-8 capable locale:
>
> (setlocale LC_ALL "")
> (use-modules (rnrs io ports) (rnrs bytevectors) (ice-9 format))
> (let ((p (open-bytevector-input-port
> 	  (u8-list->bytevector '(#xc3 #x9f #xc3 #X9f)))))
>   (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
>   (format #t "#x~x\n" (char->integer (read-char p)))
>   (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
>   (set-port-encoding! p "ISO-8859-1")
>   (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
>   (format #t "#x~x\n" (char->integer (read-char p)))
>   (format #t "~a ~a\n" (port-encoding p) (binary-port? p)))
>
> This results in the output
> #f #t
> #xdf
> #f #t
> ISO-8859-1 #f
> #xc3
> ISO-8859-1 #f
>
> The manual, however, states:
>
>  -- Scheme Procedure: port-encoding port
>  -- C Function: scm_port_encoding (port)
>      Returns, as a string, the character encoding that PORT uses to
>      interpret its input and output.  The value ‘#f’ is equivalent to
>      ‘"ISO-8859-1"’.
>
> That would appear to be false since the value #f here is treated as
> equivalent to "UTF-8" rather than "ISO-8859-1".

This is indeed a bug, introduced in Guile 2.0.9.  The workaround is to
explicitly set the encoding to "ISO-8859-1".

      Mark





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in binary mode
  2015-03-26 22:57 ` Mark H Weaver
@ 2015-03-28 20:13   ` Mark H Weaver
  0 siblings, 0 replies; 3+ messages in thread
From: Mark H Weaver @ 2015-03-28 20:13 UTC (permalink / raw)
  To: David Kastrup; +Cc: 20200-done

Fixed in d574d96f879c147c6c14df43f2e4ff9e8a6876b9, which will be in
Guile 2.0.12.  I'm closing this bug now.

    Thanks,
      Mark





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-28 20:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25 14:31 bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in binary mode David Kastrup
2015-03-26 22:57 ` Mark H Weaver
2015-03-28 20:13   ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).