unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* Please clarify docs for open-file procedure (in trunk)
@ 2011-08-10 17:27 b3timmons
  2011-08-18  9:14 ` Andy Wingo
  0 siblings, 1 reply; 2+ messages in thread
From: b3timmons @ 2011-08-10 17:27 UTC (permalink / raw)
  To: bug-guile

Hi,

I think the documentation (in trunk) for the open-file procedure (in
file doc/ref/api-io.texi) needs clarification, especially for newbies to
encoding issues such as myself.

In particular, consider the description for the binary flag b:

----------------------------------------------------------------------
@item b
Use binary mode.  On DOS systems the default text mode converts CR+LF
in the file to newline for the program, whereas binary mode reads and
writes all bytes unchanged.  On Unix-like systems there is no such
distinction, text files already contain just newlines and no
conversion is ever made.  The @code{b} flag is accepted on all
systems, but has no effect on Unix-like systems.

(For reference, Guile leaves text versus binary up to the C library,
@code{b} here just adds @code{O_BINARY} to the underlying @code{open}
call, when that flag is available.)

Also, open the file using the 8-bit character encoding "ISO-8859-1",
ignoring any coding declaration or port encoding.
...
----------------------------------------------------------------------

I stopped reading here, thinking that the b flag "has no effect on" reading my
binary data.  Yet, as subsequently explained, it does indeed have an effect on
the encoding used to open the file.  How about something like:

----------------------------------------------------------------------
@item b
Use binary mode.  In general this might affect handling of line endings
and file encodings.

Regarding line endings, on DOS systems the default text mode converts
CR+LF in the file to newline for the program, whereas binary mode reads
and writes all bytes unchanged.  On Unix-like systems there is no such
distinction, text files already contain just newlines and no conversion
is ever made.  The @code{b} flag is accepted on all systems, but has no
effect on Unix-like systems.

(For reference, Guile leaves text versus binary up to the C library,
@code{b} here just adds @code{O_BINARY} to the underlying @code{open}
call, when that flag is available.)

Regarding file encodings, a file opened in binary mode uses the 8-bit
character encoding "ISO-8859-1", ignoring any coding declaration or port
encoding.
----------------------------------------------------------------------

A bit of redundancy like this might help newbies such as myself avoid a
misunderstanding here.

I should also point out a grammatical mistake further on:

----------------------------------------------------------------------
When the file is opened, this procedure will scan for a coding
declaration (@pxref{Character Encoding of Source Files}). If present
will use that encoding for interpreting the file.  Otherwise, the
port's encoding will be used.  To suppress this behavior, open
the file in binary mode and then set the port encoding explicitly
using @code{set-port-encoding!}.
----------------------------------------------------------------------

The paragraph contains in its middle the following fragment:
"If present will use that encoding for interpreting the file."

How about: "If it is found, the corresponding encoding will be used to
interpret the file." ?

Thanks,
Bake



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Please clarify docs for open-file procedure (in trunk)
  2011-08-10 17:27 Please clarify docs for open-file procedure (in trunk) b3timmons
@ 2011-08-18  9:14 ` Andy Wingo
  0 siblings, 0 replies; 2+ messages in thread
From: Andy Wingo @ 2011-08-18  9:14 UTC (permalink / raw)
  To: b3timmons; +Cc: bug-guile

Hi Bake,

On Wed 10 Aug 2011 19:27, b3timmons@speedymail.org writes:

> I think the documentation (in trunk) for the open-file procedure (in
> file doc/ref/api-io.texi) needs clarification, especially for newbies to
> encoding issues such as myself.

Thanks for the report.  I have rewritten it a bit, following your
suggestions.  The changeset is below.

Cheers,

Andy

commit 5261e74281b1150e3b2594c92e571d8887a4900d
Author: Andy Wingo <wingo@pobox.com>
Date:   Thu Aug 18 11:13:34 2011 +0200

    reword open-file docs
    
    * doc/ref/api-io.texi (File Ports): Refactor open-file docs.  Thanks to
      Bake Timmons for the report.

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 19c0665..afcde57 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -838,34 +838,34 @@ setvbuf}
 Add line-buffering to the port.  The port output buffer will be
 automatically flushed whenever a newline character is written.
 @item b
-Use binary mode.  On DOS systems the default text mode converts CR+LF
-in the file to newline for the program, whereas binary mode reads and
-writes all bytes unchanged.  On Unix-like systems there is no such
-distinction, text files already contain just newlines and no
-conversion is ever made.  The @code{b} flag is accepted on all
-systems, but has no effect on Unix-like systems.
-
-(For reference, Guile leaves text versus binary up to the C library,
-@code{b} here just adds @code{O_BINARY} to the underlying @code{open}
-call, when that flag is available.)
-
-Also, open the file using the 8-bit character encoding "ISO-8859-1",
-ignoring any coding declaration or port encoding.
-
-Note that, when reading or writing binary data with ports, the
-bytevector ports in the @code{(rnrs io ports)} module are preferred,
-as they return vectors, and not strings (@pxref{R6RS I/O Ports}).
+Use binary mode, ensuring that each byte in the file will be read as one
+Scheme character.
+
+To provide this property, the file will be opened with the 8-bit
+character encoding "ISO-8859-1", ignoring any coding declaration or port
+encoding.  @xref{Ports}, for more information on port encodings.
+
+Note that while it is possible to read and write binary data as
+characters or strings, it is usually better to treat bytes as octets,
+and byte sequences as bytevectors.  @xref{R6RS Binary Input}, and
+@ref{R6RS Binary Output}, for more.
+
+This option had another historical meaning, for DOS compatibility: in
+the default (textual) mode, DOS reads a CR-LF sequence as one LF byte.
+The @code{b} flag prevents this from happening, adding @code{O_BINARY}
+to the underlying @code{open} call.  Still, the flag is generally useful
+because of its port encoding ramifications.
 @end table
 
 If a file cannot be opened with the access
 requested, @code{open-file} throws an exception.
 
 When the file is opened, this procedure will scan for a coding
-declaration (@pxref{Character Encoding of Source Files}). If present
-will use that encoding for interpreting the file.  Otherwise, the
-port's encoding will be used.  To suppress this behavior, open
-the file in binary mode and then set the port encoding explicitly
-using @code{set-port-encoding!}.
+declaration (@pxref{Character Encoding of Source Files}). If a coding
+declaration is found, it will be used to interpret the file.  Otherwise,
+the port's encoding will be used.  To suppress this behavior, open the
+file in binary mode and then set the port encoding explicitly using
+@code{set-port-encoding!}.
 
 In theory we could create read/write ports which were buffered
 in one direction only.  However this isn't included in the

-- 
http://wingolog.org/



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-08-18  9:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-10 17:27 Please clarify docs for open-file procedure (in trunk) b3timmons
2011-08-18  9:14 ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).