* `mike-port-encodings' branch (bug #29643)
[not found] <E1OL1A2-00071t-DR@vcs-noshell.in.savannah.gnu.org>
@ 2010-06-08 22:29 ` Ludovic Courtès
2010-06-10 18:16 ` Mike Gran
0 siblings, 1 reply; 3+ messages in thread
From: Ludovic Courtès @ 2010-06-08 22:29 UTC (permalink / raw)
To: Michael Gran; +Cc: guile-devel
Hello Mike,
Thanks for working on this!
Here’s a review of the ‘mike-port-encodings’ branch.
"Michael Gran" <spk121@yahoo.com> writes:
> commit 26e8fd7459581f09857cccc8ddf1c366d2caea87
> Author: Michael Gran <spk121@yahoo.com>
> Date: Sat Jun 5 13:40:32 2010 -0700
>
> Simplify scm_i_scan_for_encoding
>
> Note especially that the variable 'i' has two different uses in this
> function, and they get confused.
>
> * libguile/read.c (scm_i_scan_for_encoding): cleanup
- while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
- && pos + i - header < bytes_read
- && (isalnum ((int) pos[i]) || strchr ("_-.:/,+=()", pos[i]) != NULL))
+ while (encoding_start + i - header <= SCM_ENCODING_SEARCH_SIZE
+ && encoding_start + i - header < bytes_read
+ && (isalnum ((int) encoding_start[i])
+ || strchr ("_-.:/,+=()", encoding_start[i]) != NULL))
Hmm, “simplification”? :-)
Is this commit really needed? What’s the operational effect?
> commit 44c705d173d52cb485d761a51425b4d8e310a67a
> Author: Michael Gran <spk121@yahoo.com>
> Date: Sat Jun 5 14:26:14 2010 -0700
>
> read-line should use port's encoding, not locale's encoding
>
> * libguile/rdelim.c (scm_read_line): modified, use port's encoding
>
OK. Would it be possible to write a test case, perhaps with a string
port?
+ line = scm_from_stringn (s, slen-1, enc, hndl);
Please leave spaces around binary operators.
> commit 62abf139798af3f35ca3c89200389dc75d51a159
> Author: Michael Gran <spk121@yahoo.com>
> Date: Sat Jun 5 14:19:37 2010 -0700
>
> open-file should handle binary mode and coding declarations
>
> The open-file port should use the binary-friendly ISO-8859-1 encoding when
> a file is opened using mode "b". Also, it should honor a "coding:"
> declaration at the top of a file when reading files where it is present.
>
> * libguile/fports.c (scm_open_file): modified
> * test-suite/tests/ports.test: more tests for open-file
> * doc/ref/api-io.texi (File Ports): more documentation for open-file
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -130,8 +130,8 @@ this fluid otherwise.
@deffn {Scheme Procedure} port-encoding port
@deffnx {C Function} scm_port_encoding
-Returns, as a string, the character encoding that @var{port} uses to
-interpret its input and output.
+Returns, as a string, the character encoding that @var{port} uses to interpret
+its input and output. The value @code{#f} is equivalent to @code{"ISO-8859-1"}.
Instead of having the #f special case, I’ve been thinking that it would
be nicer and easier to have ‘port-encoding’ always return a string
(similar for ‘set-port-encoding!’), which would be "ISO-8859-1" instead
of #f. Perhaps it’s a bit late, though.
What do you think?
+Also, open the file using the binary-compatible character encoding
+"ISO-8859-1", ignoring any coding declaration or port encoding.
I dislike this whole notion of “binary-compatibility”. What is meant
here is that for binary I/O, you’d better choose ISO-8859-1 as the
port’s encoding because non-ASCII byte values will go through as is,
right? (I suppose that’s true of any 8-bit encoding.)
OTOH, the (rnrs io ports) API does binary I/O as expected regardless of
the port’s encoding, so that’s what users should choose when doing
binary I/O, I think.
Thus I’d suggest removing any references to “binary-compatibility”.
+When the file is opened, this procedure will scan for a coding
+declaration (@pxref{Character Encoding of Source Files}). If present
+will use that encoding for interpreting the file. Otherwise, the
+port's encoding will be used.
How about adding: “, which defaults to the value of
@code{%default-port-encoding} (@pxref{Ports,
@code{%default-port-encoding}}).”
+the file in binary mode and then set the port encoding explicitly
+using @code{set-port-encoding!}.
Again, there’s no real binary mode.
"The following additional characters can be appended:\n"
"@table @samp\n"
"@item b\n"
- "Open the underlying file in binary mode, if supported by the operating system. "
+ "Open the underlying file in binary mode, if supported by the system.\n"
+ "Also, open the file using the binary-compatible character encoding\n"
+ "\"ISO-8859-1\", ignoring the port's encoding and the coding declaration\n"
+ "at the top of the input file, if any.\n"
OK, except for “binary compatible”.
+;;; binary mode ignores port encoding
+(with-fluids ((%default-port-encoding "UTF-8"))
+ (let* ((filename (test-file))
+ (port (open-file filename "w"))
+ (test-string "一二三")
+ (binary-test-string
+ (apply string
+ (map integer->char
+ (uniform-vector->list (string->utf8 test-string))))))
+ (write-line test-string port)
+ (close-port port)
+ (let* ((in-port (open-file filename "rb"))
+ (line (read-line in-port)))
+ (close-port in-port)
+ (pass-if "file: binary mode ignores port encoding"
+ (string=? line binary-test-string)))))
Rather put all the test under ‘pass-if’ so that errors are attributed to
this test. There should also be a (delete-file filename) as the last
thunk of a ‘dynamic-wind’ (otherwise this could break ‘make distcheck’.)
Other than that the tests look good to me.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 3+ messages in thread