From: ludo@gnu.org (Ludovic Courtès)
To: "Michael Gran" <spk121@yahoo.com>
Cc: guile-devel@gnu.org
Subject: `mike-port-encodings' branch (bug #29643)
Date: Wed, 09 Jun 2010 00:29:11 +0200 [thread overview]
Message-ID: <87typdyx8o.fsf@gnu.org> (raw)
In-Reply-To: <E1OL1A2-00071t-DR@vcs-noshell.in.savannah.gnu.org> (Michael Gran's message of "Sat, 05 Jun 2010 21:44:10 +0000")
Hello Mike,
Thanks for working on this!
Here’s a review of the ‘mike-port-encodings’ branch.
"Michael Gran" <spk121@yahoo.com> writes:
> commit 26e8fd7459581f09857cccc8ddf1c366d2caea87
> Author: Michael Gran <spk121@yahoo.com>
> Date: Sat Jun 5 13:40:32 2010 -0700
>
> Simplify scm_i_scan_for_encoding
>
> Note especially that the variable 'i' has two different uses in this
> function, and they get confused.
>
> * libguile/read.c (scm_i_scan_for_encoding): cleanup
- while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
- && pos + i - header < bytes_read
- && (isalnum ((int) pos[i]) || strchr ("_-.:/,+=()", pos[i]) != NULL))
+ while (encoding_start + i - header <= SCM_ENCODING_SEARCH_SIZE
+ && encoding_start + i - header < bytes_read
+ && (isalnum ((int) encoding_start[i])
+ || strchr ("_-.:/,+=()", encoding_start[i]) != NULL))
Hmm, “simplification”? :-)
Is this commit really needed? What’s the operational effect?
> commit 44c705d173d52cb485d761a51425b4d8e310a67a
> Author: Michael Gran <spk121@yahoo.com>
> Date: Sat Jun 5 14:26:14 2010 -0700
>
> read-line should use port's encoding, not locale's encoding
>
> * libguile/rdelim.c (scm_read_line): modified, use port's encoding
>
OK. Would it be possible to write a test case, perhaps with a string
port?
+ line = scm_from_stringn (s, slen-1, enc, hndl);
Please leave spaces around binary operators.
> commit 62abf139798af3f35ca3c89200389dc75d51a159
> Author: Michael Gran <spk121@yahoo.com>
> Date: Sat Jun 5 14:19:37 2010 -0700
>
> open-file should handle binary mode and coding declarations
>
> The open-file port should use the binary-friendly ISO-8859-1 encoding when
> a file is opened using mode "b". Also, it should honor a "coding:"
> declaration at the top of a file when reading files where it is present.
>
> * libguile/fports.c (scm_open_file): modified
> * test-suite/tests/ports.test: more tests for open-file
> * doc/ref/api-io.texi (File Ports): more documentation for open-file
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -130,8 +130,8 @@ this fluid otherwise.
@deffn {Scheme Procedure} port-encoding port
@deffnx {C Function} scm_port_encoding
-Returns, as a string, the character encoding that @var{port} uses to
-interpret its input and output.
+Returns, as a string, the character encoding that @var{port} uses to interpret
+its input and output. The value @code{#f} is equivalent to @code{"ISO-8859-1"}.
Instead of having the #f special case, I’ve been thinking that it would
be nicer and easier to have ‘port-encoding’ always return a string
(similar for ‘set-port-encoding!’), which would be "ISO-8859-1" instead
of #f. Perhaps it’s a bit late, though.
What do you think?
+Also, open the file using the binary-compatible character encoding
+"ISO-8859-1", ignoring any coding declaration or port encoding.
I dislike this whole notion of “binary-compatibility”. What is meant
here is that for binary I/O, you’d better choose ISO-8859-1 as the
port’s encoding because non-ASCII byte values will go through as is,
right? (I suppose that’s true of any 8-bit encoding.)
OTOH, the (rnrs io ports) API does binary I/O as expected regardless of
the port’s encoding, so that’s what users should choose when doing
binary I/O, I think.
Thus I’d suggest removing any references to “binary-compatibility”.
+When the file is opened, this procedure will scan for a coding
+declaration (@pxref{Character Encoding of Source Files}). If present
+will use that encoding for interpreting the file. Otherwise, the
+port's encoding will be used.
How about adding: “, which defaults to the value of
@code{%default-port-encoding} (@pxref{Ports,
@code{%default-port-encoding}}).”
+the file in binary mode and then set the port encoding explicitly
+using @code{set-port-encoding!}.
Again, there’s no real binary mode.
"The following additional characters can be appended:\n"
"@table @samp\n"
"@item b\n"
- "Open the underlying file in binary mode, if supported by the operating system. "
+ "Open the underlying file in binary mode, if supported by the system.\n"
+ "Also, open the file using the binary-compatible character encoding\n"
+ "\"ISO-8859-1\", ignoring the port's encoding and the coding declaration\n"
+ "at the top of the input file, if any.\n"
OK, except for “binary compatible”.
+;;; binary mode ignores port encoding
+(with-fluids ((%default-port-encoding "UTF-8"))
+ (let* ((filename (test-file))
+ (port (open-file filename "w"))
+ (test-string "一二三")
+ (binary-test-string
+ (apply string
+ (map integer->char
+ (uniform-vector->list (string->utf8 test-string))))))
+ (write-line test-string port)
+ (close-port port)
+ (let* ((in-port (open-file filename "rb"))
+ (line (read-line in-port)))
+ (close-port in-port)
+ (pass-if "file: binary mode ignores port encoding"
+ (string=? line binary-test-string)))))
Rather put all the test under ‘pass-if’ so that errors are attributed to
this test. There should also be a (delete-file filename) as the last
thunk of a ‘dynamic-wind’ (otherwise this could break ‘make distcheck’.)
Other than that the tests look good to me.
Thanks,
Ludo’.
next parent reply other threads:[~2010-06-08 22:29 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <E1OL1A2-00071t-DR@vcs-noshell.in.savannah.gnu.org>
2010-06-08 22:29 ` Ludovic Courtès [this message]
2010-06-10 18:16 ` `mike-port-encodings' branch (bug #29643) Mike Gran
2010-06-16 21:13 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87typdyx8o.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=guile-devel@gnu.org \
--cc=spk121@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).