unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: "Michael Gran" <spk121@yahoo.com>
Cc: guile-devel@gnu.org
Subject: `mike-port-encodings' branch (bug #29643)
Date: Wed, 09 Jun 2010 00:29:11 +0200	[thread overview]
Message-ID: <87typdyx8o.fsf@gnu.org> (raw)
In-Reply-To: <E1OL1A2-00071t-DR@vcs-noshell.in.savannah.gnu.org> (Michael Gran's message of "Sat, 05 Jun 2010 21:44:10 +0000")

Hello Mike,

Thanks for working on this!

Here’s a review of the ‘mike-port-encodings’ branch.

"Michael Gran" <spk121@yahoo.com> writes:

> commit 26e8fd7459581f09857cccc8ddf1c366d2caea87
> Author: Michael Gran <spk121@yahoo.com>
> Date:   Sat Jun 5 13:40:32 2010 -0700
>
>     Simplify scm_i_scan_for_encoding
>     
>     Note especially that the variable 'i' has two different uses in this
>     function, and they get confused.
>     
>     * libguile/read.c (scm_i_scan_for_encoding): cleanup

-  while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE 
-         && pos + i - header < bytes_read
-	 && (isalnum ((int) pos[i]) || strchr ("_-.:/,+=()", pos[i]) != NULL))
+  while (encoding_start + i - header <= SCM_ENCODING_SEARCH_SIZE
+         && encoding_start + i - header < bytes_read
+	 && (isalnum ((int) encoding_start[i])
+	     || strchr ("_-.:/,+=()", encoding_start[i]) != NULL))

Hmm, “simplification”?  :-)

Is this commit really needed?  What’s the operational effect?

> commit 44c705d173d52cb485d761a51425b4d8e310a67a
> Author: Michael Gran <spk121@yahoo.com>
> Date:   Sat Jun 5 14:26:14 2010 -0700
>
>     read-line should use port's encoding, not locale's encoding
>     
>     * libguile/rdelim.c (scm_read_line): modified, use port's encoding
>

OK.  Would it be possible to write a test case, perhaps with a string
port?

+	  line = scm_from_stringn (s, slen-1, enc, hndl);

Please leave spaces around binary operators.

> commit 62abf139798af3f35ca3c89200389dc75d51a159
> Author: Michael Gran <spk121@yahoo.com>
> Date:   Sat Jun 5 14:19:37 2010 -0700
>
>     open-file should handle binary mode and coding declarations
>     
>     The open-file port should use the binary-friendly ISO-8859-1 encoding when
>     a file is opened using mode "b".  Also, it should honor a "coding:"
>     declaration at the top of a file when reading files where it is present.
>     
>     * libguile/fports.c (scm_open_file): modified
>     * test-suite/tests/ports.test: more tests for open-file
>     * doc/ref/api-io.texi (File Ports): more documentation for open-file

  --- a/doc/ref/api-io.texi
  +++ b/doc/ref/api-io.texi
  @@ -130,8 +130,8 @@ this fluid otherwise.

   @deffn {Scheme Procedure} port-encoding port
   @deffnx {C Function} scm_port_encoding
  -Returns, as a string, the character encoding that @var{port} uses to
  -interpret its input and output.
  +Returns, as a string, the character encoding that @var{port} uses to interpret
  +its input and output.  The value @code{#f} is equivalent to @code{"ISO-8859-1"}.

Instead of having the #f special case, I’ve been thinking that it would
be nicer and easier to have ‘port-encoding’ always return a string
(similar for ‘set-port-encoding!’), which would be "ISO-8859-1" instead
of #f.  Perhaps it’s a bit late, though.

What do you think?

  +Also, open the file using the binary-compatible character encoding
  +"ISO-8859-1", ignoring any coding declaration or port encoding.

I dislike this whole notion of “binary-compatibility”.  What is meant
here is that for binary I/O, you’d better choose ISO-8859-1 as the
port’s encoding because non-ASCII byte values will go through as is,
right?  (I suppose that’s true of any 8-bit encoding.)

OTOH, the (rnrs io ports) API does binary I/O as expected regardless of
the port’s encoding, so that’s what users should choose when doing
binary I/O, I think.

Thus I’d suggest removing any references to “binary-compatibility”.
 
  +When the file is opened, this procedure will scan for a coding
  +declaration (@pxref{Character Encoding of Source Files}). If present
  +will use that encoding for interpreting the file.  Otherwise, the
  +port's encoding will be used.

How about adding: “, which defaults to the value of
@code{%default-port-encoding} (@pxref{Ports,
@code{%default-port-encoding}}).”

  +the file in binary mode and then set the port encoding explicitly
  +using @code{set-port-encoding!}.

Again, there’s no real binary mode.

              "The following additional characters can be appended:\n"
              "@table @samp\n"
              "@item b\n"
  -	    "Open the underlying file in binary mode, if supported by the operating system. "
  +	    "Open the underlying file in binary mode, if supported by the system.\n"
  +	    "Also, open the file using the binary-compatible character encoding\n"
  +	    "\"ISO-8859-1\", ignoring the port's encoding and the coding declaration\n"
  +	    "at the top of the input file, if any.\n"

OK, except for “binary compatible”.

  +;;; binary mode ignores port encoding
  +(with-fluids ((%default-port-encoding "UTF-8"))
  +  (let* ((filename (test-file))
  +         (port (open-file filename "w"))
  +         (test-string "一二三")
  +         (binary-test-string
  +          (apply string
  +                 (map integer->char
  +                      (uniform-vector->list (string->utf8 test-string))))))
  +    (write-line test-string port)
  +    (close-port port)
  +    (let* ((in-port (open-file filename "rb"))
  +           (line (read-line in-port)))
  +      (close-port in-port)
  +      (pass-if "file: binary mode ignores port encoding"
  +               (string=? line binary-test-string)))))

Rather put all the test under ‘pass-if’ so that errors are attributed to
this test.  There should also be a (delete-file filename) as the last
thunk of a ‘dynamic-wind’ (otherwise this could break ‘make distcheck’.)

Other than that the tests look good to me.

Thanks,
Ludo’.



       reply	other threads:[~2010-06-08 22:29 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <E1OL1A2-00071t-DR@vcs-noshell.in.savannah.gnu.org>
2010-06-08 22:29 ` Ludovic Courtès [this message]
2010-06-10 18:16   ` `mike-port-encodings' branch (bug #29643) Mike Gran
2010-06-16 21:13     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87typdyx8o.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=spk121@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).