unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCHES] Keyword args for file openers; coding scan off by default
@ 2013-04-07  6:52 Mark H Weaver
  2013-04-07 13:00 ` Ludovic Courtès
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Mark H Weaver @ 2013-04-07  6:52 UTC (permalink / raw
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

Hello all,

Here's the last major patch set that I very much hope to get into 2.0.8.
The first patch disables the coding declaration scan in 'open-file' by
default.  I feel quite strongly that this is important for robustness
and security reasons, and I'm pleased to report that Ludovic recently
gave his consent to this for 2.0.8 on IRC.

The last patch adds keyword arguments to the commonly-used file opening
procedures to conveniently set the encoding, binary mode, and also a way
to enable the coding declaration scan if desired.

Comments and suggestions solicited.

      Mark



[-- Attachment #2: [PATCH 1/3] Do not scan for coding declarations in open-file --]
[-- Type: text/x-diff, Size: 9166 bytes --]

From bb621dbbc0df691bcad541267a08c86c36d9062b Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Wed, 30 Jan 2013 14:45:28 -0500
Subject: [PATCH 1/3] Do not scan for coding declarations in open-file.

* libguile/fports.c (scm_open_file): Do not scan for coding
  declarations.  Replace 'use_encoding' local variable with
  'binary'.  Update documentation string.

* module/ice-9/psyntax.scm (include): Add the same file-encoding
  logic that's used in compile-file and scm_primitive_load.

* module/ice-9/psyntax-pp.scm: Regenerate.

* doc/ref/api-io.texi (File Ports): Update docs.

* test-suite/tests/ports.test: Change "open-file HONORS file coding
  declarations" test to "open-file IGNORES file coding declaration".

* test-suite/tests/coding.test (scan-coding): Use 'file-encoding' to
  scan for the encoding, since 'open-input-file' no longer does so.
---
 doc/ref/api-io.texi          |   11 ++---------
 libguile/fports.c            |   28 +++++-----------------------
 module/ice-9/psyntax-pp.scm  |   10 ++++++----
 module/ice-9/psyntax.scm     |   13 +++++++++----
 test-suite/tests/coding.test |    4 ++--
 test-suite/tests/ports.test  |    5 ++---
 6 files changed, 26 insertions(+), 45 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 9c3e1fc..e7e1bb2 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -885,8 +885,8 @@ Use binary mode, ensuring that each byte in the file will be read as one
 Scheme character.
 
 To provide this property, the file will be opened with the 8-bit
-character encoding "ISO-8859-1", ignoring any coding declaration or port
-encoding.  @xref{Ports}, for more information on port encodings.
+character encoding "ISO-8859-1", ignoring the default port encoding.
+@xref{Ports}, for more information on port encodings.
 
 Note that while it is possible to read and write binary data as
 characters or strings, it is usually better to treat bytes as octets,
@@ -903,13 +903,6 @@ because of its port encoding ramifications.
 If a file cannot be opened with the access
 requested, @code{open-file} throws an exception.
 
-When the file is opened, this procedure will scan for a coding
-declaration (@pxref{Character Encoding of Source Files}). If a coding
-declaration is found, it will be used to interpret the file.  Otherwise,
-the port's encoding will be used.  To suppress this behavior, open the
-file in binary mode and then set the port encoding explicitly using
-@code{set-port-encoding!}.
-
 In theory we could create read/write ports which were buffered
 in one direction only.  However this isn't included in the
 current interfaces.
diff --git a/libguile/fports.c b/libguile/fports.c
index f6c3c92..727fe27 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -349,8 +349,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 	    "@item b\n"
 	    "Open the underlying file in binary mode, if supported by the system.\n"
 	    "Also, open the file using the binary-compatible character encoding\n"
-	    "\"ISO-8859-1\", ignoring the port's encoding and the coding declaration\n"
-	    "at the top of the input file, if any.\n"
+	    "\"ISO-8859-1\", ignoring the default port encoding.\n"
 	    "@item +\n"
 	    "Open the port for both input and output.  E.g., @code{r+}: open\n"
 	    "an existing file for both input and output.\n"
@@ -365,11 +364,6 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 	    "Add line-buffering to the port.  The port output buffer will be\n"
 	    "automatically flushed whenever a newline character is written.\n"
 	    "@end table\n"
-	    "When the file is opened, this procedure will scan for a coding\n"
-	    "declaration@pxref{Character Encoding of Source Files}. If present\n"
-	    "will use that encoding for interpreting the file.  Otherwise, the\n"
-	    "port's encoding will be used.\n"
-	    "\n"
 	    "In theory we could create read/write ports which were buffered\n"
 	    "in one direction only.  However this isn't included in the\n"
 	    "current interfaces.  If a file cannot be opened with the access\n"
@@ -377,7 +371,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 #define FUNC_NAME s_scm_open_file
 {
   SCM port;
-  int fdes, flags = 0, use_encoding = 1;
+  int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
@@ -412,7 +406,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 	  flags = (flags & ~(O_RDONLY | O_WRONLY)) | O_RDWR;
 	  break;
 	case 'b':
-	  use_encoding = 0;
+	  binary = 1;
 #if defined (O_BINARY)
 	  flags |= O_BINARY;
 #endif
@@ -451,20 +445,8 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
   port = scm_i_fdes_to_port (fdes, scm_i_mode_bits (mode),
                              fport_canonicalize_filename (filename));
 
-  if (use_encoding)
-    {
-      /* If this file has a coding declaration, use that as the port
-	 encoding.  */
-      if (SCM_INPUT_PORT_P (port))
-	{
-	  char *enc = scm_i_scan_for_encoding (port);
-	  if (enc != NULL)
-	    scm_i_set_port_encoding_x (port, enc);
-	}
-    }
-  else
-    /* If this is a binary file, use the binary-friendly ISO-8859-1
-       encoding.  */
+  if (binary)
+    /* Use the binary-friendly ISO-8859-1 encoding. */
     scm_i_set_port_encoding_x (port, NULL);
 
   scm_dynwind_end ();
diff --git a/module/ice-9/psyntax-pp.scm b/module/ice-9/psyntax-pp.scm
index 8619d78..254f701 100644
--- a/module/ice-9/psyntax-pp.scm
+++ b/module/ice-9/psyntax-pp.scm
@@ -2975,10 +2975,12 @@
            (lambda (fn dir k)
              (let ((p (open-input-file
                         (if (absolute-file-name? fn) fn (in-vicinity dir fn)))))
-               (let f ((x (read p)) (result '()))
-                 (if (eof-object? x)
-                   (begin (close-input-port p) (reverse result))
-                   (f (read p) (cons (datum->syntax k x) result))))))))
+               (let ((enc (file-encoding p)))
+                 (set-port-encoding! p (let ((t enc)) (if t t "UTF-8")))
+                 (let f ((x (read p)) (result '()))
+                   (if (eof-object? x)
+                     (begin (close-input-port p) (reverse result))
+                     (f (read p) (cons (datum->syntax k x) result)))))))))
         (let ((src (syntax-source x)))
           (let ((file (if src (assq-ref src 'filename) #f)))
             (let ((dir (if (string? file) (dirname file) #f)))
diff --git a/module/ice-9/psyntax.scm b/module/ice-9/psyntax.scm
index b359fc1..d63861c 100644
--- a/module/ice-9/psyntax.scm
+++ b/module/ice-9/psyntax.scm
@@ -2945,10 +2945,15 @@
   (lambda (x)
     (define read-file
       (lambda (fn dir k)
-        (let ((p (open-input-file
-                  (if (absolute-file-name? fn)
-                      fn
-                      (in-vicinity dir fn)))))
+        (let* ((p (open-input-file
+                   (if (absolute-file-name? fn)
+                       fn
+                       (in-vicinity dir fn))))
+               (enc (file-encoding p)))
+
+          ;; Choose the input encoding deterministically.
+          (set-port-encoding! p (or enc "UTF-8"))
+
           (let f ((x (read p))
                   (result '()))
             (if (eof-object? x)
diff --git a/test-suite/tests/coding.test b/test-suite/tests/coding.test
index 4152af8..0a15d93 100644
--- a/test-suite/tests/coding.test
+++ b/test-suite/tests/coding.test
@@ -1,6 +1,6 @@
 ;;;; coding.test --- test suite for coding declarations. -*- mode: scheme -*-
 ;;;;
-;;;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;;; Copyright (C) 2011, 2013 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -40,7 +40,7 @@
      ;; relies on the opportunistic filling of the input buffer, which
      ;; doesn't happen after a seek.
      (let* ((port (open-input-file name))
-            (res (port-encoding port)))
+            (res (file-encoding port)))
        (close-port port)
        res))))
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 0dbd3b2..5d3c213 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -270,13 +270,12 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file honors file coding declarations
-(pass-if "file: open-file honors coding declarations"
+;; open-file ignores file coding declaration
+(pass-if "file: open-file ignores coding declarations"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
                       (test-string "€100"))
-                 (set-port-encoding! port "ISO-8859-15")
                  (write-line ";; coding: iso-8859-15" port)
                  (write-line test-string port)
                  (close-port port)
-- 
1.7.10.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: [PATCH 2/3] Remove byte-order mark check from 'scm_i_scan_for_encoding' --]
[-- Type: text/x-diff, Size: 1425 bytes --]

From 2b4fa986ac43b5a8705f2d24e5fc908e07a89b94 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Fri, 5 Apr 2013 17:33:03 -0400
Subject: [PATCH 2/3] Remove byte-order mark check from
 'scm_i_scan_for_encoding'.

* libguile/read.c (scm_i_scan_for_encoding): Remove byte-order mark
  check.
---
 libguile/read.c |   10 ----------
 1 file changed, 10 deletions(-)

diff --git a/libguile/read.c b/libguile/read.c
index 64b55c3..e2e2e4a 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1986,7 +1986,6 @@ scm_i_scan_for_encoding (SCM port)
   char header[SCM_ENCODING_SEARCH_SIZE+1];
   size_t bytes_read, encoding_length, i;
   char *encoding = NULL;
-  int utf8_bom = 0;
   char *pos, *encoding_start;
   int in_comment;
 
@@ -2031,10 +2030,6 @@ scm_i_scan_for_encoding (SCM port)
       scm_seek (port, scm_from_int (0), scm_from_int (SEEK_SET));
     }
 
-  if (bytes_read > 3 
-      && header[0] == '\xef' && header[1] == '\xbb' && header[2] == '\xbf')
-    utf8_bom = 1;
-
   /* search past "coding[:=]" */
   pos = header;
   while (1)
@@ -2103,11 +2098,6 @@ scm_i_scan_for_encoding (SCM port)
     /* This wasn't in a comment */
     return NULL;
 
-  if (utf8_bom && c_strcasecmp(encoding, "UTF-8"))
-    scm_misc_error (NULL,
-		    "the port input declares the encoding ~s but is encoded as UTF-8",
-		    scm_list_1 (scm_from_locale_string (encoding)));
-
   return encoding;
 }
 
-- 
1.7.10.4


[-- Attachment #4: [PATCH 3/3] Add keyword arguments to file opening procedures --]
[-- Type: text/x-diff, Size: 34042 bytes --]

From 951b9d224d84bfec271b51615bc095013d153694 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sat, 6 Apr 2013 23:19:55 -0400
Subject: [PATCH 3/3] Add keyword arguments to file opening procedures.

* libguile/fports.c (scm_open_file_with_encoding): New API function,
  containing the code previously found in 'scm_open_file', but modified
  to accept the new 'guess_encoding' and 'encoding' arguments.

  (scm_open_file): Now just a simple wrapper that calls
  'scm_open_file_with_encoding'.

  (scm_i_open_file): New implementation of 'open-file' that accepts
  keyword arguments '#:guess-encoding' and '#:encoding', and calls
  'scm_open_file_with_encoding'.

  (scm_init_fports_keywords): New initialization function that gets
  called after keywords are initialized.

* libguile/fports.h (scm_open_file_with_encoding,
  scm_init_fports_keywords): Add prototypes.

* libguile/init.c (scm_i_init_guile): Call 'scm_init_fports_keywords'.

* module/ice-9/boot-9.scm: Add enhanced versions of 'open-input-file',
  'open-output-file', 'call-with-input-file', 'call-with-output-file',
  'with-input-from-file', 'with-output-to-file', and
  'with-error-to-file', that accept keyword arguments '#:binary',
  '#:encoding', and (for input port constructors) '#:guess-encoding'.

* doc/ref/api-io.texi (File Ports): Update documentation.

* test-suite/tests/ports.test ("keyword arguments for file openers"):
  Add tests.
---
 doc/ref/api-io.texi         |   60 ++++++---
 libguile/fports.c           |  180 +++++++++++++++++++-------
 libguile/fports.h           |    3 +
 libguile/init.c             |    1 +
 module/ice-9/boot-9.scm     |  110 ++++++++++++++++
 test-suite/tests/ports.test |  291 ++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 579 insertions(+), 66 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index e7e1bb2..da57328 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -843,7 +843,10 @@ Most systems have limits on how many files can be open, so it's
 strongly recommended that file ports be closed explicitly when no
 longer required (@pxref{Ports}).
 
-@deffn {Scheme Procedure} open-file filename mode
+@deffn {Scheme Procedure} open-file filename mode @
+                          [#:guess-encoding=#f] [#:encoding=#f]
+@deffnx {C Function} scm_open_file_with_encoding @
+                     (filename, mode, guess_encoding, encoding)
 @deffnx {C Function} scm_open_file (filename, mode)
 Open the file whose name is @var{filename}, and return a port
 representing that file.  The attributes of the port are
@@ -900,8 +903,17 @@ to the underlying @code{open} call.  Still, the flag is generally useful
 because of its port encoding ramifications.
 @end table
 
-If a file cannot be opened with the access
-requested, @code{open-file} throws an exception.
+Unless binary mode is requested, the character encoding of the new port
+is determined as follows: First, if @var{guess-encoding} is true,
+heuristics will be used to guess the encoding of the file.  If it is
+false or if the heuristics are unsuccessful, @var{encoding} is used
+unless it is also false.  As a last resort, the default port encoding is
+used.  @xref{Ports}, for more information on port encodings.  It is an
+error to pass a non-false @var{guess-encoding} or @var{encoding} if
+binary mode is requested.
+
+If a file cannot be opened with the access requested, @code{open-file}
+throws an exception.
 
 In theory we could create read/write ports which were buffered
 in one direction only.  However this isn't included in the
@@ -909,23 +921,40 @@ current interfaces.
 @end deffn
 
 @rnindex open-input-file
-@deffn {Scheme Procedure} open-input-file filename
-Open @var{filename} for input.  Equivalent to
+@deffn {Scheme Procedure} open-input-file filename @
+       [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for input.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} and
+@var{guess-encoding} determine the character encoding as described above
+for @code{open-file}.  Equivalent to
 @lisp
-(open-file @var{filename} "r")
+(open-file @var{filename}
+           (if @var{binary} "rb" "r")
+           #:guess-encoding @var{guess-encoding}
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
 @rnindex open-output-file
-@deffn {Scheme Procedure} open-output-file filename
-Open @var{filename} for output.  Equivalent to
+@deffn {Scheme Procedure} open-output-file filename @
+       [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for output.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} specifies the
+character encoding as described above for @code{open-file}.  Equivalent
+to
 @lisp
-(open-file @var{filename} "w")
+(open-file @var{filename}
+           (if @var{binary} "wb" "w")
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
-@deffn {Scheme Procedure} call-with-input-file filename proc
-@deffnx {Scheme Procedure} call-with-output-file filename proc
+@deffn {Scheme Procedure} call-with-input-file filename proc @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} call-with-output-file filename proc @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex call-with-input-file
 @rnindex call-with-output-file
 Open @var{filename} for input or output, and call @code{(@var{proc}
@@ -940,9 +969,12 @@ closed automatically, though it will be garbage collected in the usual
 way if not otherwise referenced.
 @end deffn
 
-@deffn {Scheme Procedure} with-input-from-file filename thunk
-@deffnx {Scheme Procedure} with-output-to-file filename thunk
-@deffnx {Scheme Procedure} with-error-to-file filename thunk
+@deffn {Scheme Procedure} with-input-from-file filename thunk @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} with-output-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} with-error-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex with-input-from-file
 @rnindex with-output-to-file
 Open @var{filename} and call @code{(@var{thunk})} with the new port
diff --git a/libguile/fports.c b/libguile/fports.c
index 727fe27..442b628 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -316,65 +316,36 @@ fport_canonicalize_filename (SCM filename)
     }
 }
 
-
-/* scm_open_file
+/* scm_open_file_with_encoding
  * Return a new port open on a given file.
  *
+ * Use heuristics to guess the encoding is GUESS_ENCODING
+ * is true, else use ENCODING if not false, else use the
+ * default port encoding.
+ *
  * The mode string must match the pattern: [rwa+]** which
  * is interpreted in the usual unix way.
  *
  * Return the new port.
  */
-SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
-	    (SCM filename, SCM mode),
-	    "Open the file whose name is @var{filename}, and return a port\n"
-	    "representing that file.  The attributes of the port are\n"
-	    "determined by the @var{mode} string.  The way in which this is\n"
-	    "interpreted is similar to C stdio.  The first character must be\n"
-	    "one of the following:\n"
-	    "@table @samp\n"
-	    "@item r\n"
-	    "Open an existing file for input.\n"
-	    "@item w\n"
-	    "Open a file for output, creating it if it doesn't already exist\n"
-	    "or removing its contents if it does.\n"
-	    "@item a\n"
-	    "Open a file for output, creating it if it doesn't already\n"
-	    "exist.  All writes to the port will go to the end of the file.\n"
-	    "The \"append mode\" can be turned off while the port is in use\n"
-	    "@pxref{Ports and File Descriptors, fcntl}\n"
-	    "@end table\n"
-	    "The following additional characters can be appended:\n"
-	    "@table @samp\n"
-	    "@item b\n"
-	    "Open the underlying file in binary mode, if supported by the system.\n"
-	    "Also, open the file using the binary-compatible character encoding\n"
-	    "\"ISO-8859-1\", ignoring the default port encoding.\n"
-	    "@item +\n"
-	    "Open the port for both input and output.  E.g., @code{r+}: open\n"
-	    "an existing file for both input and output.\n"
-	    "@item 0\n"
-	    "Create an \"unbuffered\" port.  In this case input and output\n"
-	    "operations are passed directly to the underlying port\n"
-	    "implementation without additional buffering.  This is likely to\n"
-	    "slow down I/O operations.  The buffering mode can be changed\n"
-	    "while a port is in use @pxref{Ports and File Descriptors,\n"
-	    "setvbuf}\n"
-	    "@item l\n"
-	    "Add line-buffering to the port.  The port output buffer will be\n"
-	    "automatically flushed whenever a newline character is written.\n"
-	    "@end table\n"
-	    "In theory we could create read/write ports which were buffered\n"
-	    "in one direction only.  However this isn't included in the\n"
-	    "current interfaces.  If a file cannot be opened with the access\n"
-	    "requested, @code{open-file} throws an exception.")
-#define FUNC_NAME s_scm_open_file
+SCM
+scm_open_file_with_encoding (SCM filename, SCM mode,
+                             SCM guess_encoding, SCM encoding)
+#define FUNC_NAME "open-file"
 {
   SCM port;
   int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
+  if (SCM_UNLIKELY (!scm_is_bool (guess_encoding)))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, guess_encoding,
+                            "guess-encoding to be boolean");
+
+  if (SCM_UNLIKELY (!(scm_is_false (encoding) || scm_is_string (encoding))))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, encoding,
+                            "encoding to be string or false");
+
   scm_dynwind_begin (0);
 
   file = scm_to_locale_string (filename);
@@ -446,8 +417,43 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
                              fport_canonicalize_filename (filename));
 
   if (binary)
-    /* Use the binary-friendly ISO-8859-1 encoding. */
-    scm_i_set_port_encoding_x (port, NULL);
+    {
+      if (scm_is_true (encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Encoding specified on a binary port",
+                        scm_list_1 (encoding));
+      if (scm_is_true (guess_encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Request to guess encoding on a binary port",
+                        SCM_EOL);
+
+      /* Use the binary-friendly ISO-8859-1 encoding. */
+      scm_i_set_port_encoding_x (port, NULL);
+    }
+  else
+    {
+      char *enc = NULL;
+
+      if (scm_is_true (guess_encoding))
+        {
+          if (SCM_INPUT_PORT_P (port))
+            enc = scm_i_scan_for_encoding (port);
+          else
+            scm_misc_error (FUNC_NAME,
+                            "Request to guess encoding on an output-only port",
+                            SCM_EOL);
+        }
+
+      if (!enc && scm_is_true (encoding))
+        {
+          char *buf = scm_to_latin1_string (encoding);
+          enc = scm_gc_strdup (buf, "encoding");
+          free (buf);
+        }
+
+      if (enc)
+        scm_i_set_port_encoding_x (port, enc);
+    }
 
   scm_dynwind_end ();
 
@@ -455,6 +461,75 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 }
 #undef FUNC_NAME
 
+SCM
+scm_open_file (SCM filename, SCM mode)
+{
+  return scm_open_file_with_encoding (filename, mode, SCM_BOOL_F, SCM_BOOL_F);
+}
+
+/* We can't define these using SCM_KEYWORD, because keywords have not
+   yet been initialized when scm_init_fports is called.  */
+SCM k_guess_encoding = SCM_UNDEFINED;
+SCM k_encoding       = SCM_UNDEFINED;
+
+SCM_DEFINE (scm_i_open_file, "open-file", 2, 0, 1,
+	    (SCM filename, SCM mode, SCM keyword_args),
+	    "Open the file whose name is @var{filename}, and return a port\n"
+	    "representing that file.  The attributes of the port are\n"
+	    "determined by the @var{mode} string.  The way in which this is\n"
+	    "interpreted is similar to C stdio.  The first character must be\n"
+	    "one of the following:\n"
+	    "@table @samp\n"
+	    "@item r\n"
+	    "Open an existing file for input.\n"
+	    "@item w\n"
+	    "Open a file for output, creating it if it doesn't already exist\n"
+	    "or removing its contents if it does.\n"
+	    "@item a\n"
+	    "Open a file for output, creating it if it doesn't already\n"
+	    "exist.  All writes to the port will go to the end of the file.\n"
+	    "The \"append mode\" can be turned off while the port is in use\n"
+	    "@pxref{Ports and File Descriptors, fcntl}\n"
+	    "@end table\n"
+	    "The following additional characters can be appended:\n"
+	    "@table @samp\n"
+	    "@item b\n"
+	    "Open the underlying file in binary mode, if supported by the system.\n"
+	    "Also, open the file using the binary-compatible character encoding\n"
+	    "\"ISO-8859-1\", ignoring the default port encoding.\n"
+	    "@item +\n"
+	    "Open the port for both input and output.  E.g., @code{r+}: open\n"
+	    "an existing file for both input and output.\n"
+	    "@item 0\n"
+	    "Create an \"unbuffered\" port.  In this case input and output\n"
+	    "operations are passed directly to the underlying port\n"
+	    "implementation without additional buffering.  This is likely to\n"
+	    "slow down I/O operations.  The buffering mode can be changed\n"
+	    "while a port is in use @pxref{Ports and File Descriptors,\n"
+	    "setvbuf}\n"
+	    "@item l\n"
+	    "Add line-buffering to the port.  The port output buffer will be\n"
+	    "automatically flushed whenever a newline character is written.\n"
+	    "@end table\n"
+	    "In theory we could create read/write ports which were buffered\n"
+	    "in one direction only.  However this isn't included in the\n"
+	    "current interfaces.  If a file cannot be opened with the access\n"
+	    "requested, @code{open-file} throws an exception.")
+#define FUNC_NAME s_scm_i_open_file
+{
+  SCM encoding = SCM_BOOL_F;
+  SCM guess_encoding = SCM_BOOL_F;
+
+  scm_c_bind_keyword_arguments (FUNC_NAME, keyword_args, 0,
+                                k_guess_encoding, &guess_encoding,
+                                k_encoding, &encoding,
+                                SCM_UNDEFINED);
+
+  return scm_open_file_with_encoding (filename, mode,
+                                      guess_encoding, encoding);
+}
+#undef FUNC_NAME
+
 \f
 /* Building Guile ports from a file descriptor.  */
 
@@ -805,6 +880,15 @@ scm_make_fptob ()
   return tc;
 }
 
+/* We can't initialize the keywords from 'scm_init_fports', because
+   keywords haven't yet been initialized at that point.  */
+void
+scm_init_fports_keywords ()
+{
+  k_guess_encoding = scm_from_latin1_keyword ("guess-encoding");
+  k_encoding       = scm_from_latin1_keyword ("encoding");
+}
+
 void
 scm_init_fports ()
 {
diff --git a/libguile/fports.h b/libguile/fports.h
index cbef0f8..c32ed95 100644
--- a/libguile/fports.h
+++ b/libguile/fports.h
@@ -51,9 +51,12 @@ SCM_API scm_t_bits scm_tc16_fport;
 SCM_API SCM scm_setbuf0 (SCM port);
 SCM_API SCM scm_setvbuf (SCM port, SCM mode, SCM size);
 SCM_API void scm_evict_ports (int fd);
+SCM_API SCM scm_open_file_with_encoding (SCM filename, SCM modes,
+                                         SCM guess_encoding, SCM encoding);
 SCM_API SCM scm_open_file (SCM filename, SCM modes);
 SCM_API SCM scm_fdes_to_port (int fdes, char *mode, SCM name);
 SCM_API SCM scm_file_port_p (SCM obj);
+SCM_INTERNAL void scm_init_fports_keywords (void);
 SCM_INTERNAL void scm_init_fports (void);
 
 /* internal functions */
diff --git a/libguile/init.c b/libguile/init.c
index 57e4902..455a772 100644
--- a/libguile/init.c
+++ b/libguile/init.c
@@ -444,6 +444,7 @@ scm_i_init_guile (void *base)
   scm_init_gettext ();
   scm_init_ioext ();
   scm_init_keywords ();    /* Requires smob_prehistory */
+  scm_init_fports_keywords ();
   scm_init_list ();
   scm_init_random ();      /* Requires smob_prehistory */
   scm_init_macros ();      /* Requires smob_prehistory and random */
diff --git a/module/ice-9/boot-9.scm b/module/ice-9/boot-9.scm
index 8461ee8..4fdfe1e 100644
--- a/module/ice-9/boot-9.scm
+++ b/module/ice-9/boot-9.scm
@@ -753,6 +753,116 @@ information is unavailable."
 \f
 
 ;;;
+;;; Enhanced file opening procedures
+;;;
+
+(define* (open-input-file
+          str #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "Takes a string naming an existing file and returns an input port
+capable of delivering characters from the file.  If the file
+cannot be opened, an error is signalled."
+  (open-file str (if binary "rb" "r")
+             #:encoding encoding
+             #:guess-encoding guess-encoding))
+
+(define* (open-output-file str #:key (binary #f) (encoding #f))
+  "Takes a string naming an output file to be created and returns an
+output port capable of writing characters to a new file by that
+name.  If the file cannot be opened, an error is signalled.  If a
+file with the given name already exists, the effect is unspecified."
+  (open-file str (if binary "wb" "w")
+             #:encoding encoding))
+
+(define* (call-with-input-file
+          str proc #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "PROC should be a procedure of one argument, and STR should be a
+string naming a file.  The file must
+already exist. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-input-file str
+                            #:binary binary
+                            #:encoding encoding
+                            #:guess-encoding guess-encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-input-port p)
+        (apply values vals)))))
+
+(define* (call-with-output-file str proc #:key (binary #f) (encoding #f))
+  "PROC should be a procedure of one argument, and STR should be a
+string naming a file.  The behaviour is unspecified if the file
+already exists. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-output-file str #:binary binary #:encoding encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-output-port p)
+        (apply values vals)))))
+
+(define* (with-input-from-file
+          file thunk #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The file must already exist. The file is opened for
+input, an input port connected to it is made
+the default value returned by `current-input-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-input-file file
+   (lambda (p) (with-input-from-port p thunk))
+   #:binary binary
+   #:encoding encoding
+   #:guess-encoding guess-encoding))
+
+(define* (with-output-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-output-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-output-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+(define* (with-error-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-error-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-error-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+\f
+
+;;;
 ;;; Extensible exception printing.
 ;;;
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 5d3c213..313cd36 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -24,8 +24,10 @@
   #:use-module (ice-9 popen)
   #:use-module (ice-9 rdelim)
   #:use-module (rnrs bytevectors)
-  #:use-module ((rnrs io ports) #:select (open-bytevector-input-port
-                                          open-bytevector-output-port)))
+  #:use-module ((ice-9 binary-ports) #:select (open-bytevector-input-port
+                                               open-bytevector-output-port
+                                               put-bytevector
+                                               get-bytevector-all)))
 
 (define (display-line . args)
   (for-each display args)
@@ -270,8 +272,8 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file ignores file coding declaration
-(pass-if "file: open-file ignores coding declarations"
+;; open-file ignores file coding declaration by default
+(pass-if "file: open-file ignores coding declaration by default"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
@@ -286,6 +288,287 @@
                    (delete-file filename)
                    (string=? line2 test-string)))))
 
+;; open-input-file with guess-encoding honors coding declaration
+(pass-if "file: open-input-file with guess-encoding honors coding declaration"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+               (let* ((filename (test-file))
+                      (port (open-output-file filename))
+                      (test-string "€100"))
+                 (set-port-encoding! port "iso-8859-15")
+                 (write-line ";; coding: iso-8859-15" port)
+                 (write-line test-string port)
+                 (close-port port)
+                 (let* ((in-port (open-input-file filename
+                                                  #:guess-encoding #t))
+                        (line1 (read-line in-port))
+                        (line2 (read-line in-port)))
+                   (close-port in-port)
+                   (delete-file filename)
+                   (string=? line2 test-string)))))
+
+(with-test-prefix "keyword arguments for file openers"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+    (let ((filename (test-file)))
+
+      (with-test-prefix "write #:encoding"
+
+        (pass-if-equal "open-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-file filename "w"
+                                   #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "open-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-output-file filename
+                                          #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "call-with-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (call-with-output-file filename
+              (lambda (port)
+                (display "test" port))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-output-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-output-to-file filename
+              (lambda ()
+                (display "test"))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-error-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-error-to-file
+             filename
+             (lambda ()
+               (display "test" (current-error-port)))
+             #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv)))
+
+      (with-test-prefix "write #:binary"
+
+        (pass-if-equal "open-output-file"
+            "ISO-8859-1"
+          (let* ((port (open-output-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-output-file"
+            "ISO-8859-1"
+          (call-with-output-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-output-to-file"
+            "ISO-8859-1"
+          (with-output-to-file filename
+            (lambda () (port-encoding (current-output-port)))
+            #:binary #t))
+
+        (pass-if-equal "with-error-to-file"
+            "ISO-8859-1"
+          (with-error-to-file
+           filename
+           (lambda () (port-encoding (current-error-port)))
+           #:binary #t)))
+
+      (with-test-prefix "read #:encoding"
+
+        (pass-if-equal "open-file read #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-file filename "r" #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-input-file filename #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (call-with-input-file filename
+            read-string
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (with-input-from-file filename
+            read-string
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "read #:binary"
+
+        (pass-if-equal "open-input-file"
+            "ISO-8859-1"
+          (let* ((port (open-input-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-input-file"
+            "ISO-8859-1"
+          (call-with-input-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-input-from-file"
+            "ISO-8859-1"
+          (with-input-from-file filename
+            (lambda () (port-encoding (current-input-port)))
+            #:binary #t)))
+
+      (with-test-prefix "#:guess-encoding with coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            (lambda (port)
+              (read-line port)
+              (read-line port)
+              (read-line port))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            (lambda ()
+              (read-line)
+              (read-line)
+              (read-line))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "#:guess-encoding without coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15")))
+
+      (delete-file filename))))
+
 ;;; ungetting characters and strings.
 (with-input-from-string "walk on the moon\nmoon"
 			(lambda ()
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCHES] Keyword args for file openers; coding scan off by default
  2013-04-07  6:52 [PATCHES] Keyword args for file openers; coding scan off by default Mark H Weaver
@ 2013-04-07 13:00 ` Ludovic Courtès
  2013-04-07 13:09 ` Ludovic Courtès
  2013-04-07 13:24 ` Ludovic Courtès
  2 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2013-04-07 13:00 UTC (permalink / raw
  To: guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> From bb621dbbc0df691bcad541267a08c86c36d9062b Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <mhw@netris.org>
> Date: Wed, 30 Jan 2013 14:45:28 -0500
> Subject: [PATCH 1/3] Do not scan for coding declarations in open-file.
>
> * libguile/fports.c (scm_open_file): Do not scan for coding
>   declarations.  Replace 'use_encoding' local variable with
>   'binary'.  Update documentation string.
>
> * module/ice-9/psyntax.scm (include): Add the same file-encoding
>   logic that's used in compile-file and scm_primitive_load.
>
> * module/ice-9/psyntax-pp.scm: Regenerate.
>
> * doc/ref/api-io.texi (File Ports): Update docs.
>
> * test-suite/tests/ports.test: Change "open-file HONORS file coding
>   declarations" test to "open-file IGNORES file coding declaration".
>
> * test-suite/tests/coding.test (scan-coding): Use 'file-encoding' to
>   scan for the encoding, since 'open-input-file' no longer does so.

Perfect!

> @@ -903,13 +903,6 @@ because of its port encoding ramifications.
>  If a file cannot be opened with the access
>  requested, @code{open-file} throws an exception.
>  
> -When the file is opened, this procedure will scan for a coding
> -declaration (@pxref{Character Encoding of Source Files}). If a coding
> -declaration is found, it will be used to interpret the file.  Otherwise,
> -the port's encoding will be used.  To suppress this behavior, open the
> -file in binary mode and then set the port encoding explicitly using
> -@code{set-port-encoding!}.

Can we change that to something like:

  When the file is opened, its encoding is inherited set to the current
  @code{%default-port-encoding}, unless the @code{b} flag was supplied.
  Sometimes it is desirable to honor Emacs-style coding declarations in
  files@footnote{Guile 2.0.0 to 2.0.7 would do this by default.  This
  behavior was deemed inappropriate and disabled starting from Guile
  2.0.8.}.  When that is the case, the @code{file-encoding} declaration
  can be used as follows (@pxref{Character Encoding of Source Files,
  @code{file-encoding}}):

  @example
  (let* ((port     (open-input-file file))
         (encoding (file-encoding port)))
    (set-port-encoding! port (or encoding (port-encoding port))))
  @end example

(I let you choose whether to also add it to the docstring or not.)

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHES] Keyword args for file openers; coding scan off by default
  2013-04-07  6:52 [PATCHES] Keyword args for file openers; coding scan off by default Mark H Weaver
  2013-04-07 13:00 ` Ludovic Courtès
@ 2013-04-07 13:09 ` Ludovic Courtès
  2013-04-07 13:24 ` Ludovic Courtès
  2 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2013-04-07 13:09 UTC (permalink / raw
  To: guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> From 2b4fa986ac43b5a8705f2d24e5fc908e07a89b94 Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <mhw@netris.org>
> Date: Fri, 5 Apr 2013 17:33:03 -0400
> Subject: [PATCH 2/3] Remove byte-order mark check from
>  'scm_i_scan_for_encoding'.
>
> * libguile/read.c (scm_i_scan_for_encoding): Remove byte-order mark
>   check.

OK!

Ludo'.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHES] Keyword args for file openers; coding scan off by default
  2013-04-07  6:52 [PATCHES] Keyword args for file openers; coding scan off by default Mark H Weaver
  2013-04-07 13:00 ` Ludovic Courtès
  2013-04-07 13:09 ` Ludovic Courtès
@ 2013-04-07 13:24 ` Ludovic Courtès
  2013-04-07 16:33   ` Mark H Weaver
  2 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2013-04-07 13:24 UTC (permalink / raw
  To: guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> From 951b9d224d84bfec271b51615bc095013d153694 Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <mhw@netris.org>
> Date: Sat, 6 Apr 2013 23:19:55 -0400
> Subject: [PATCH 3/3] Add keyword arguments to file opening procedures.
>
> * libguile/fports.c (scm_open_file_with_encoding): New API function,
>   containing the code previously found in 'scm_open_file', but modified
>   to accept the new 'guess_encoding' and 'encoding' arguments.
>
>   (scm_open_file): Now just a simple wrapper that calls
>   'scm_open_file_with_encoding'.
>
>   (scm_i_open_file): New implementation of 'open-file' that accepts
>   keyword arguments '#:guess-encoding' and '#:encoding', and calls
>   'scm_open_file_with_encoding'.
>
>   (scm_init_fports_keywords): New initialization function that gets
>   called after keywords are initialized.
>
> * libguile/fports.h (scm_open_file_with_encoding,
>   scm_init_fports_keywords): Add prototypes.
>
> * libguile/init.c (scm_i_init_guile): Call 'scm_init_fports_keywords'.
>
> * module/ice-9/boot-9.scm: Add enhanced versions of 'open-input-file',
>   'open-output-file', 'call-with-input-file', 'call-with-output-file',
>   'with-input-from-file', 'with-output-to-file', and
>   'with-error-to-file', that accept keyword arguments '#:binary',
>   '#:encoding', and (for input port constructors) '#:guess-encoding'.
>
> * doc/ref/api-io.texi (File Ports): Update documentation.
>
> * test-suite/tests/ports.test ("keyword arguments for file openers"):
>   Add tests.

Looks good.

Minor comments:

> +@deffn {Scheme Procedure} open-file filename mode @
> +                          [#:guess-encoding=#f] [#:encoding=#f]
> +@deffnx {C Function} scm_open_file_with_encoding @
> +                     (filename, mode, guess_encoding, encoding)
>  @deffnx {C Function} scm_open_file (filename, mode)
>  Open the file whose name is @var{filename}, and return a port
>  representing that file.  The attributes of the port are
> @@ -900,8 +903,17 @@ to the underlying @code{open} call.  Still, the flag is generally useful
>  because of its port encoding ramifications.
>  @end table
>  
> -If a file cannot be opened with the access
> -requested, @code{open-file} throws an exception.
> +Unless binary mode is requested, the character encoding of the new port
> +is determined as follows: First, if @var{guess-encoding} is true,
> +heuristics will be used to guess the encoding of the file.  If it is

“heuristics” is vague.  I’d prefer “the ‘file-encoding’ procedure is
called to check for Emacs-style coding declarations (@pxref{Character
Encoding of Source Files})”.  Should BOMs also be mentioned?

> +/* scm_open_file_with_encoding
>   * Return a new port open on a given file.
>   *
> + * Use heuristics to guess the encoding is GUESS_ENCODING
> + * is true, else use ENCODING if not false, else use the
> + * default port encoding.

Likewise.

And you’re welcome to remove the leading stars also.  :-)

> +SCM k_guess_encoding = SCM_UNDEFINED;
> +SCM k_encoding       = SCM_UNDEFINED;

Add ‘static’.

> +  scm_c_bind_keyword_arguments (FUNC_NAME, keyword_args, 0,
> +                                k_guess_encoding, &guess_encoding,
> +                                k_encoding, &encoding,
> +                                SCM_UNDEFINED);

Comes in handy.  ;-)

> +(define* (open-input-file
> +          str #:key (binary #f) (encoding #f) (guess-encoding #f))
> +  "Takes a string naming an existing file and returns an input port
> +capable of delivering characters from the file.  If the file
> +cannot be opened, an error is signalled."

It’s a detail, for these procedures, I would s/str/file/, and in
docstrings s/file STR/FILE/.

The test suite looks comprehensive, that’s great.

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHES] Keyword args for file openers; coding scan off by default
  2013-04-07 13:24 ` Ludovic Courtès
@ 2013-04-07 16:33   ` Mark H Weaver
  2013-04-07 19:18     ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Mark H Weaver @ 2013-04-07 16:33 UTC (permalink / raw
  To: Ludovic Courtès; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3125 bytes --]

Hi Ludovic,

Thanks for the quick reviews.  I pushed the first two patches, after
incorporating your suggestions.

ludo@gnu.org (Ludovic Courtès) writes:

> Looks good.
>
> Minor comments:
>
>> +@deffn {Scheme Procedure} open-file filename mode @
>> +                          [#:guess-encoding=#f] [#:encoding=#f]
>> +@deffnx {C Function} scm_open_file_with_encoding @
>> +                     (filename, mode, guess_encoding, encoding)
>>  @deffnx {C Function} scm_open_file (filename, mode)
>>  Open the file whose name is @var{filename}, and return a port
>>  representing that file.  The attributes of the port are
>> @@ -900,8 +903,17 @@ to the underlying @code{open} call.  Still, the flag is generally useful
>>  because of its port encoding ramifications.
>>  @end table
>>  
>> -If a file cannot be opened with the access
>> -requested, @code{open-file} throws an exception.
>> +Unless binary mode is requested, the character encoding of the new port
>> +is determined as follows: First, if @var{guess-encoding} is true,
>> +heuristics will be used to guess the encoding of the file.  If it is
>
> “heuristics” is vague.  I’d prefer “the ‘file-encoding’ procedure is
> called to check for Emacs-style coding declarations (@pxref{Character
> Encoding of Source Files})”.  Should BOMs also be mentioned?

Makes sense.  Following our discussion on IRC, the patch below has the
following wording:

  Unless binary mode is requested, the character encoding of the new
  port is determined as follows: First, if @var{guess-encoding} is true,
  the @code{file-encoding} procedure is used to guess the encoding of
  the file (@pxref{Character Encoding of Source Files}).  [...]

I left out the "Emacs-style coding declarations" language here, because
I want to leave open the possibility of adding additional heuristics to
'file-encoding' in the future.

Furthermore, I've attached an additional patch below that changes the
"Character Encoding of Source Files" node to make that more clear.
Please let me know what you think.

>> +SCM k_guess_encoding = SCM_UNDEFINED;
>> +SCM k_encoding       = SCM_UNDEFINED;
>
> Add ‘static’.

Gah, thanks for catching that.  Oops!

>> +  scm_c_bind_keyword_arguments (FUNC_NAME, keyword_args, 0,
>> +                                k_guess_encoding, &guess_encoding,
>> +                                k_encoding, &encoding,
>> +                                SCM_UNDEFINED);
>
> Comes in handy.  ;-)

This patch was the motivation for adding that :)

>> +(define* (open-input-file
>> +          str #:key (binary #f) (encoding #f) (guess-encoding #f))
>> +  "Takes a string naming an existing file and returns an input port
>> +capable of delivering characters from the file.  If the file
>> +cannot be opened, an error is signalled."
>
> It’s a detail, for these procedures, I would s/str/file/, and in
> docstrings s/file STR/FILE/.

Good idea.

An updated patch, and the aforementioned new patch for the
'file-encoding' docs, follow.  More thoughts?

    Thanks!
      Mark



[-- Attachment #2: [PATCH 1/2] Add keyword arguments to file opening procedures --]
[-- Type: text/x-diff, Size: 33862 bytes --]

From d7a915ba0fec4cc7d007a5582ed31b0135ff8123 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sat, 6 Apr 2013 23:19:55 -0400
Subject: [PATCH 1/2] Add keyword arguments to file opening procedures.

* libguile/fports.c (scm_open_file_with_encoding): New API function,
  containing the code previously found in 'scm_open_file', but modified
  to accept the new 'guess_encoding' and 'encoding' arguments.

  (scm_open_file): Now just a simple wrapper that calls
  'scm_open_file_with_encoding'.

  (scm_i_open_file): New implementation of 'open-file' that accepts
  keyword arguments '#:guess-encoding' and '#:encoding', and calls
  'scm_open_file_with_encoding'.

  (scm_init_fports_keywords): New initialization function that gets
  called after keywords are initialized.

* libguile/fports.h (scm_open_file_with_encoding,
  scm_init_fports_keywords): Add prototypes.

* libguile/init.c (scm_i_init_guile): Call 'scm_init_fports_keywords'.

* module/ice-9/boot-9.scm: Add enhanced versions of 'open-input-file',
  'open-output-file', 'call-with-input-file', 'call-with-output-file',
  'with-input-from-file', 'with-output-to-file', and
  'with-error-to-file', that accept keyword arguments '#:binary',
  '#:encoding', and (for input port constructors) '#:guess-encoding'.

* doc/ref/api-io.texi (File Ports): Update documentation.

* test-suite/tests/ports.test ("keyword arguments for file openers"):
  Add tests.
---
 doc/ref/api-io.texi         |   61 ++++++---
 libguile/fports.c           |  191 +++++++++++++++++++++--------
 libguile/fports.h           |    3 +
 libguile/init.c             |    1 +
 module/ice-9/boot-9.scm     |  110 +++++++++++++++++
 test-suite/tests/ports.test |  285 ++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 581 insertions(+), 70 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 9483166..4c42de8 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -843,7 +843,10 @@ Most systems have limits on how many files can be open, so it's
 strongly recommended that file ports be closed explicitly when no
 longer required (@pxref{Ports}).
 
-@deffn {Scheme Procedure} open-file filename mode
+@deffn {Scheme Procedure} open-file filename mode @
+                          [#:guess-encoding=#f] [#:encoding=#f]
+@deffnx {C Function} scm_open_file_with_encoding @
+                     (filename, mode, guess_encoding, encoding)
 @deffnx {C Function} scm_open_file (filename, mode)
 Open the file whose name is @var{filename}, and return a port
 representing that file.  The attributes of the port are
@@ -900,8 +903,18 @@ to the underlying @code{open} call.  Still, the flag is generally useful
 because of its port encoding ramifications.
 @end table
 
-If a file cannot be opened with the access
-requested, @code{open-file} throws an exception.
+Unless binary mode is requested, the character encoding of the new port
+is determined as follows: First, if @var{guess-encoding} is true, the
+@code{file-encoding} procedure is used to guess the encoding of the file
+(@pxref{Character Encoding of Source Files}).  If @var{guess-encoding}
+is false or if @code{file-encoding} fails, @var{encoding} is used unless
+it is also false.  As a last resort, the default port encoding is used.
+@xref{Ports}, for more information on port encodings.  It is an error to
+pass a non-false @var{guess-encoding} or @var{encoding} if binary mode
+is requested.
+
+If a file cannot be opened with the access requested, @code{open-file}
+throws an exception.
 
 When the file is opened, its encoding is set to the current
 @code{%default-port-encoding}, unless the @code{b} flag was supplied.
@@ -924,23 +937,40 @@ current interfaces.
 @end deffn
 
 @rnindex open-input-file
-@deffn {Scheme Procedure} open-input-file filename
-Open @var{filename} for input.  Equivalent to
+@deffn {Scheme Procedure} open-input-file filename @
+       [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for input.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} and
+@var{guess-encoding} determine the character encoding as described above
+for @code{open-file}.  Equivalent to
 @lisp
-(open-file @var{filename} "r")
+(open-file @var{filename}
+           (if @var{binary} "rb" "r")
+           #:guess-encoding @var{guess-encoding}
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
 @rnindex open-output-file
-@deffn {Scheme Procedure} open-output-file filename
-Open @var{filename} for output.  Equivalent to
+@deffn {Scheme Procedure} open-output-file filename @
+       [#:encoding=#f] [#:binary=#f]
+
+Open @var{filename} for output.  If @var{binary} is true, open the port
+in binary mode, otherwise use text mode.  @var{encoding} specifies the
+character encoding as described above for @code{open-file}.  Equivalent
+to
 @lisp
-(open-file @var{filename} "w")
+(open-file @var{filename}
+           (if @var{binary} "wb" "w")
+           #:encoding @var{encoding})
 @end lisp
 @end deffn
 
-@deffn {Scheme Procedure} call-with-input-file filename proc
-@deffnx {Scheme Procedure} call-with-output-file filename proc
+@deffn {Scheme Procedure} call-with-input-file filename proc @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} call-with-output-file filename proc @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex call-with-input-file
 @rnindex call-with-output-file
 Open @var{filename} for input or output, and call @code{(@var{proc}
@@ -955,9 +985,12 @@ closed automatically, though it will be garbage collected in the usual
 way if not otherwise referenced.
 @end deffn
 
-@deffn {Scheme Procedure} with-input-from-file filename thunk
-@deffnx {Scheme Procedure} with-output-to-file filename thunk
-@deffnx {Scheme Procedure} with-error-to-file filename thunk
+@deffn {Scheme Procedure} with-input-from-file filename thunk @
+        [#:guess-encoding=#f] [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} with-output-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
+@deffnx {Scheme Procedure} with-error-to-file filename thunk @
+        [#:encoding=#f] [#:binary=#f]
 @rnindex with-input-from-file
 @rnindex with-output-to-file
 Open @var{filename} and call @code{(@var{thunk})} with the new port
diff --git a/libguile/fports.c b/libguile/fports.c
index b9a9942..70732e5 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -315,65 +315,35 @@ fport_canonicalize_filename (SCM filename)
     }
 }
 
+/* scm_open_file_with_encoding
+   Return a new port open on a given file.
 
-/* scm_open_file
- * Return a new port open on a given file.
- *
- * The mode string must match the pattern: [rwa+]** which
- * is interpreted in the usual unix way.
- *
- * Return the new port.
- */
-SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
-	    (SCM filename, SCM mode),
-	    "Open the file whose name is @var{filename}, and return a port\n"
-	    "representing that file.  The attributes of the port are\n"
-	    "determined by the @var{mode} string.  The way in which this is\n"
-	    "interpreted is similar to C stdio.  The first character must be\n"
-	    "one of the following:\n"
-	    "@table @samp\n"
-	    "@item r\n"
-	    "Open an existing file for input.\n"
-	    "@item w\n"
-	    "Open a file for output, creating it if it doesn't already exist\n"
-	    "or removing its contents if it does.\n"
-	    "@item a\n"
-	    "Open a file for output, creating it if it doesn't already\n"
-	    "exist.  All writes to the port will go to the end of the file.\n"
-	    "The \"append mode\" can be turned off while the port is in use\n"
-	    "@pxref{Ports and File Descriptors, fcntl}\n"
-	    "@end table\n"
-	    "The following additional characters can be appended:\n"
-	    "@table @samp\n"
-	    "@item b\n"
-	    "Open the underlying file in binary mode, if supported by the system.\n"
-	    "Also, open the file using the binary-compatible character encoding\n"
-	    "\"ISO-8859-1\", ignoring the default port encoding.\n"
-	    "@item +\n"
-	    "Open the port for both input and output.  E.g., @code{r+}: open\n"
-	    "an existing file for both input and output.\n"
-	    "@item 0\n"
-	    "Create an \"unbuffered\" port.  In this case input and output\n"
-	    "operations are passed directly to the underlying port\n"
-	    "implementation without additional buffering.  This is likely to\n"
-	    "slow down I/O operations.  The buffering mode can be changed\n"
-	    "while a port is in use @pxref{Ports and File Descriptors,\n"
-	    "setvbuf}\n"
-	    "@item l\n"
-	    "Add line-buffering to the port.  The port output buffer will be\n"
-	    "automatically flushed whenever a newline character is written.\n"
-	    "@end table\n"
-	    "In theory we could create read/write ports which were buffered\n"
-	    "in one direction only.  However this isn't included in the\n"
-	    "current interfaces.  If a file cannot be opened with the access\n"
-	    "requested, @code{open-file} throws an exception.")
-#define FUNC_NAME s_scm_open_file
+   The mode string must match the pattern: [rwa+]** which
+   is interpreted in the usual unix way.
+
+   Unless binary mode is requested, the character encoding of the new
+   port is determined as follows: First, if GUESS_ENCODING is true,
+   'file-encoding' is used to guess the encoding of the file.  If
+   GUESS_ENCODING is false or if 'file-encoding' fails, ENCODING is used
+   unless it is also false.  As a last resort, the default port encoding
+   is used.  It is an error to pass a non-false GUESS_ENCODING or
+   ENCODING if binary mode is requested.
+
+   Return the new port. */
+SCM
+scm_open_file_with_encoding (SCM filename, SCM mode,
+                             SCM guess_encoding, SCM encoding)
+#define FUNC_NAME "open-file"
 {
   SCM port;
   int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
+  if (SCM_UNLIKELY (!(scm_is_false (encoding) || scm_is_string (encoding))))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, encoding,
+                            "encoding to be string or false");
+
   scm_dynwind_begin (0);
 
   file = scm_to_locale_string (filename);
@@ -445,8 +415,43 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
                              fport_canonicalize_filename (filename));
 
   if (binary)
-    /* Use the binary-friendly ISO-8859-1 encoding. */
-    scm_i_set_port_encoding_x (port, NULL);
+    {
+      if (scm_is_true (encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Encoding specified on a binary port",
+                        scm_list_1 (encoding));
+      if (scm_is_true (guess_encoding))
+        scm_misc_error (FUNC_NAME,
+                        "Request to guess encoding on a binary port",
+                        SCM_EOL);
+
+      /* Use the binary-friendly ISO-8859-1 encoding. */
+      scm_i_set_port_encoding_x (port, NULL);
+    }
+  else
+    {
+      char *enc = NULL;
+
+      if (scm_is_true (guess_encoding))
+        {
+          if (SCM_INPUT_PORT_P (port))
+            enc = scm_i_scan_for_encoding (port);
+          else
+            scm_misc_error (FUNC_NAME,
+                            "Request to guess encoding on an output-only port",
+                            SCM_EOL);
+        }
+
+      if (!enc && scm_is_true (encoding))
+        {
+          char *buf = scm_to_latin1_string (encoding);
+          enc = scm_gc_strdup (buf, "encoding");
+          free (buf);
+        }
+
+      if (enc)
+        scm_i_set_port_encoding_x (port, enc);
+    }
 
   scm_dynwind_end ();
 
@@ -454,6 +459,75 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 }
 #undef FUNC_NAME
 
+SCM
+scm_open_file (SCM filename, SCM mode)
+{
+  return scm_open_file_with_encoding (filename, mode, SCM_BOOL_F, SCM_BOOL_F);
+}
+
+/* We can't define these using SCM_KEYWORD, because keywords have not
+   yet been initialized when scm_init_fports is called.  */
+static SCM k_guess_encoding = SCM_UNDEFINED;
+static SCM k_encoding       = SCM_UNDEFINED;
+
+SCM_DEFINE (scm_i_open_file, "open-file", 2, 0, 1,
+	    (SCM filename, SCM mode, SCM keyword_args),
+	    "Open the file whose name is @var{filename}, and return a port\n"
+	    "representing that file.  The attributes of the port are\n"
+	    "determined by the @var{mode} string.  The way in which this is\n"
+	    "interpreted is similar to C stdio.  The first character must be\n"
+	    "one of the following:\n"
+	    "@table @samp\n"
+	    "@item r\n"
+	    "Open an existing file for input.\n"
+	    "@item w\n"
+	    "Open a file for output, creating it if it doesn't already exist\n"
+	    "or removing its contents if it does.\n"
+	    "@item a\n"
+	    "Open a file for output, creating it if it doesn't already\n"
+	    "exist.  All writes to the port will go to the end of the file.\n"
+	    "The \"append mode\" can be turned off while the port is in use\n"
+	    "@pxref{Ports and File Descriptors, fcntl}\n"
+	    "@end table\n"
+	    "The following additional characters can be appended:\n"
+	    "@table @samp\n"
+	    "@item b\n"
+	    "Open the underlying file in binary mode, if supported by the system.\n"
+	    "Also, open the file using the binary-compatible character encoding\n"
+	    "\"ISO-8859-1\", ignoring the default port encoding.\n"
+	    "@item +\n"
+	    "Open the port for both input and output.  E.g., @code{r+}: open\n"
+	    "an existing file for both input and output.\n"
+	    "@item 0\n"
+	    "Create an \"unbuffered\" port.  In this case input and output\n"
+	    "operations are passed directly to the underlying port\n"
+	    "implementation without additional buffering.  This is likely to\n"
+	    "slow down I/O operations.  The buffering mode can be changed\n"
+	    "while a port is in use @pxref{Ports and File Descriptors,\n"
+	    "setvbuf}\n"
+	    "@item l\n"
+	    "Add line-buffering to the port.  The port output buffer will be\n"
+	    "automatically flushed whenever a newline character is written.\n"
+	    "@end table\n"
+	    "In theory we could create read/write ports which were buffered\n"
+	    "in one direction only.  However this isn't included in the\n"
+	    "current interfaces.  If a file cannot be opened with the access\n"
+	    "requested, @code{open-file} throws an exception.")
+#define FUNC_NAME s_scm_i_open_file
+{
+  SCM encoding = SCM_BOOL_F;
+  SCM guess_encoding = SCM_BOOL_F;
+
+  scm_c_bind_keyword_arguments (FUNC_NAME, keyword_args, 0,
+                                k_guess_encoding, &guess_encoding,
+                                k_encoding, &encoding,
+                                SCM_UNDEFINED);
+
+  return scm_open_file_with_encoding (filename, mode,
+                                      guess_encoding, encoding);
+}
+#undef FUNC_NAME
+
 \f
 /* Building Guile ports from a file descriptor.  */
 
@@ -804,6 +878,15 @@ scm_make_fptob ()
   return tc;
 }
 
+/* We can't initialize the keywords from 'scm_init_fports', because
+   keywords haven't yet been initialized at that point.  */
+void
+scm_init_fports_keywords ()
+{
+  k_guess_encoding = scm_from_latin1_keyword ("guess-encoding");
+  k_encoding       = scm_from_latin1_keyword ("encoding");
+}
+
 void
 scm_init_fports ()
 {
diff --git a/libguile/fports.h b/libguile/fports.h
index cbef0f8..c32ed95 100644
--- a/libguile/fports.h
+++ b/libguile/fports.h
@@ -51,9 +51,12 @@ SCM_API scm_t_bits scm_tc16_fport;
 SCM_API SCM scm_setbuf0 (SCM port);
 SCM_API SCM scm_setvbuf (SCM port, SCM mode, SCM size);
 SCM_API void scm_evict_ports (int fd);
+SCM_API SCM scm_open_file_with_encoding (SCM filename, SCM modes,
+                                         SCM guess_encoding, SCM encoding);
 SCM_API SCM scm_open_file (SCM filename, SCM modes);
 SCM_API SCM scm_fdes_to_port (int fdes, char *mode, SCM name);
 SCM_API SCM scm_file_port_p (SCM obj);
+SCM_INTERNAL void scm_init_fports_keywords (void);
 SCM_INTERNAL void scm_init_fports (void);
 
 /* internal functions */
diff --git a/libguile/init.c b/libguile/init.c
index 57e4902..455a772 100644
--- a/libguile/init.c
+++ b/libguile/init.c
@@ -444,6 +444,7 @@ scm_i_init_guile (void *base)
   scm_init_gettext ();
   scm_init_ioext ();
   scm_init_keywords ();    /* Requires smob_prehistory */
+  scm_init_fports_keywords ();
   scm_init_list ();
   scm_init_random ();      /* Requires smob_prehistory */
   scm_init_macros ();      /* Requires smob_prehistory and random */
diff --git a/module/ice-9/boot-9.scm b/module/ice-9/boot-9.scm
index 8461ee8..0779d27 100644
--- a/module/ice-9/boot-9.scm
+++ b/module/ice-9/boot-9.scm
@@ -753,6 +753,116 @@ information is unavailable."
 \f
 
 ;;;
+;;; Enhanced file opening procedures
+;;;
+
+(define* (open-input-file
+          file #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "Takes a string naming an existing file and returns an input port
+capable of delivering characters from the file.  If the file
+cannot be opened, an error is signalled."
+  (open-file file (if binary "rb" "r")
+             #:encoding encoding
+             #:guess-encoding guess-encoding))
+
+(define* (open-output-file file #:key (binary #f) (encoding #f))
+  "Takes a string naming an output file to be created and returns an
+output port capable of writing characters to a new file by that
+name.  If the file cannot be opened, an error is signalled.  If a
+file with the given name already exists, the effect is unspecified."
+  (open-file file (if binary "wb" "w")
+             #:encoding encoding))
+
+(define* (call-with-input-file
+          file proc #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "PROC should be a procedure of one argument, and FILE should be a
+string naming a file.  The file must
+already exist. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-input-file file
+                            #:binary binary
+                            #:encoding encoding
+                            #:guess-encoding guess-encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-input-port p)
+        (apply values vals)))))
+
+(define* (call-with-output-file file proc #:key (binary #f) (encoding #f))
+  "PROC should be a procedure of one argument, and FILE should be a
+string naming a file.  The behaviour is unspecified if the file
+already exists. These procedures call PROC
+with one argument: the port obtained by opening the named file for
+input or output.  If the file cannot be opened, an error is
+signalled.  If the procedure returns, then the port is closed
+automatically and the values yielded by the procedure are returned.
+If the procedure does not return, then the port will not be closed
+automatically unless it is possible to prove that the port will
+never again be used for a read or write operation."
+  (let ((p (open-output-file file #:binary binary #:encoding encoding)))
+    (call-with-values
+      (lambda () (proc p))
+      (lambda vals
+        (close-output-port p)
+        (apply values vals)))))
+
+(define* (with-input-from-file
+          file thunk #:key (binary #f) (encoding #f) (guess-encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The file must already exist. The file is opened for
+input, an input port connected to it is made
+the default value returned by `current-input-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-input-file file
+   (lambda (p) (with-input-from-port p thunk))
+   #:binary binary
+   #:encoding encoding
+   #:guess-encoding guess-encoding))
+
+(define* (with-output-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-output-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-output-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+(define* (with-error-to-file file thunk #:key (binary #f) (encoding #f))
+  "THUNK must be a procedure of no arguments, and FILE must be a
+string naming a file.  The effect is unspecified if the file already exists.
+The file is opened for output, an output port connected to it is made
+the default value returned by `current-error-port',
+and the THUNK is called with no arguments.
+When the THUNK returns, the port is closed and the previous
+default is restored.  Returns the values yielded by THUNK.  If an
+escape procedure is used to escape from the continuation of these
+procedures, their behavior is implementation dependent."
+  (call-with-output-file file
+   (lambda (p) (with-error-to-port p thunk))
+   #:binary binary
+   #:encoding encoding))
+
+\f
+
+;;;
 ;;; Extensible exception printing.
 ;;;
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index fc6d087..8e3df5b 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -274,8 +274,8 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file ignores file coding declaration
-(pass-if "file: open-file ignores coding declarations"
+;; open-file ignores file coding declaration by default
+(pass-if "file: open-file ignores coding declaration by default"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
@@ -290,6 +290,287 @@
                    (delete-file filename)
                    (string=? line2 test-string)))))
 
+;; open-input-file with guess-encoding honors coding declaration
+(pass-if "file: open-input-file with guess-encoding honors coding declaration"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+               (let* ((filename (test-file))
+                      (port (open-output-file filename))
+                      (test-string "€100"))
+                 (set-port-encoding! port "iso-8859-15")
+                 (write-line ";; coding: iso-8859-15" port)
+                 (write-line test-string port)
+                 (close-port port)
+                 (let* ((in-port (open-input-file filename
+                                                  #:guess-encoding #t))
+                        (line1 (read-line in-port))
+                        (line2 (read-line in-port)))
+                   (close-port in-port)
+                   (delete-file filename)
+                   (string=? line2 test-string)))))
+
+(with-test-prefix "keyword arguments for file openers"
+  (with-fluids ((%default-port-encoding "UTF-8"))
+    (let ((filename (test-file)))
+
+      (with-test-prefix "write #:encoding"
+
+        (pass-if-equal "open-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-file filename "w"
+                                   #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "open-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (let ((port (open-output-file filename
+                                          #:encoding "UTF-16LE")))
+              (display "test" port)
+              (close-port port))
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "call-with-output-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (call-with-output-file filename
+              (lambda (port)
+                (display "test" port))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-output-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-output-to-file filename
+              (lambda ()
+                (display "test"))
+              #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv))
+
+        (pass-if-equal "with-error-to-file"
+            #vu8(116 0 101 0 115 0 116 0)
+            (with-error-to-file
+             filename
+             (lambda ()
+               (display "test" (current-error-port)))
+             #:encoding "UTF-16LE")
+            (let* ((port (open-file filename "rb"))
+                   (bv (get-bytevector-all port)))
+              (close-port port)
+              bv)))
+
+      (with-test-prefix "write #:binary"
+
+        (pass-if-equal "open-output-file"
+            "ISO-8859-1"
+          (let* ((port (open-output-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-output-file"
+            "ISO-8859-1"
+          (call-with-output-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-output-to-file"
+            "ISO-8859-1"
+          (with-output-to-file filename
+            (lambda () (port-encoding (current-output-port)))
+            #:binary #t))
+
+        (pass-if-equal "with-error-to-file"
+            "ISO-8859-1"
+          (with-error-to-file
+           filename
+           (lambda () (port-encoding (current-error-port)))
+           #:binary #t)))
+
+      (with-test-prefix "read #:encoding"
+
+        (pass-if-equal "open-file read #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-file filename "r" #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (let* ((port (open-input-file filename #:encoding "UTF-16LE"))
+                 (str  (read-string port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (call-with-input-file filename
+            read-string
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file #:encoding"
+            "test"
+          (call-with-output-file filename
+            (lambda (port)
+              (put-bytevector port #vu8(116 0 101 0 115 0 116 0))))
+          (with-input-from-file filename
+            read-string
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "read #:binary"
+
+        (pass-if-equal "open-input-file"
+            "ISO-8859-1"
+          (let* ((port (open-input-file filename #:binary #t))
+                 (enc (port-encoding port)))
+            (close-port port)
+            enc))
+
+        (pass-if-equal "call-with-input-file"
+            "ISO-8859-1"
+          (call-with-input-file filename port-encoding #:binary #t))
+
+        (pass-if-equal "with-input-from-file"
+            "ISO-8859-1"
+          (with-input-from-file filename
+            (lambda () (port-encoding (current-input-port)))
+            #:binary #t)))
+
+      (with-test-prefix "#:guess-encoding with coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "UTF-16LE"))
+                 (str (begin (read-line port)
+                             (read-line port)
+                             (read-line port))))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            (lambda (port)
+              (read-line port)
+              (read-line port)
+              (read-line port))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda ()
+              (write-line "test")
+              (write-line "; coding: ISO-8859-15")
+              (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            (lambda ()
+              (read-line)
+              (read-line)
+              (read-line))
+            #:guess-encoding #t
+            #:encoding "UTF-16LE")))
+
+      (with-test-prefix "#:guess-encoding without coding declaration"
+
+        (pass-if-equal "open-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-file filename "r"
+                                  #:guess-encoding #t
+                                  #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "open-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (let* ((port (open-input-file filename
+                                        #:guess-encoding #t
+                                        #:encoding "ISO-8859-15"))
+                 (str (read-line port)))
+            (close-port port)
+            str))
+
+        (pass-if-equal "call-with-input-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (call-with-input-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15"))
+
+        (pass-if-equal "with-input-from-file"
+            "€100"
+          (with-output-to-file filename
+            (lambda () (write-line "€100"))
+            #:encoding "ISO-8859-15")
+          (with-input-from-file filename
+            read-line
+            #:guess-encoding #t
+            #:encoding "ISO-8859-15")))
+
+      (delete-file filename))))
+
 ;;; ungetting characters and strings.
 (with-input-from-string "walk on the moon\nmoon"
 			(lambda ()
-- 
1.7.10.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: [PATCH 2/2] Clarify 'file-encoding' docs: heuristics may be improved later --]
[-- Type: text/x-diff, Size: 3949 bytes --]

From 3e9890ea809bd77a36c1005935b73cf4ca18e691 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sun, 7 Apr 2013 12:07:33 -0400
Subject: [PATCH 2/2] Clarify 'file-encoding' docs: heuristics may be improved
 later.

* doc/ref/api-evaluation.texi (Character Encoding of Source Files):
  Mention UTF-8 as another common encoding used for Scheme source files,
  and that it is used by default.  Change the description to leave open
  the possibility of adding additional heuristics in the future.
  Mention that if the coding declaration is in a #!-style block comment,
  it must be the first such comment in the file.  Mention the
  '#:guess-encoding' keyword argument.
---
 doc/ref/api-evaluation.texi |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/doc/ref/api-evaluation.texi b/doc/ref/api-evaluation.texi
index 7afbcfa..63b1d60 100644
--- a/doc/ref/api-evaluation.texi
+++ b/doc/ref/api-evaluation.texi
@@ -991,17 +991,19 @@ three arguments.
 @cindex source file encoding
 @cindex primitive-load
 @cindex load
-Scheme source code files are usually encoded in ASCII, but, the
-built-in reader can interpret other character encodings.  The
-procedure @code{primitive-load}, and by extension the functions that
-call it, such as @code{load}, first scan the top 500 characters of the
-file for a coding declaration.
+Scheme source code files are usually encoded in ASCII or UTF-8, but the
+built-in reader can interpret other character encodings as well.  When
+Guile loads Scheme source code, it uses the @code{file-encoding}
+procedure (described below) to try to guess the encoding of the file.
+In the absence of any hints, UTF-8 is assumed.  One way to provide a
+hint about the encoding of a source file is to place a coding
+declaration in the top 500 characters of the file.
 
 A coding declaration has the form @code{coding: XXXXXX}, where
 @code{XXXXXX} is the name of a character encoding in which the source
 code file has been encoded.  The coding declaration must appear in a
-scheme comment.  It can either be a semicolon-initiated comment or a block
-@code{#!} comment.
+scheme comment.  It can either be a semicolon-initiated comment, or the
+first block @code{#!} comment in the file.
 
 The name of the character encoding in the coding declaration is
 typically lower case and containing only letters, numbers, and hyphens,
@@ -1050,15 +1052,21 @@ the port's character encoding should be set to the encoding returned
 by @code{file-encoding}, if any, again by using
 @code{set-port-encoding!}.  Then the code can be read as normal.
 
+Alternatively, one can use the @code{#:guess-encoding} keyword argument
+of @code{open-file} and related procedures.  @xref{File Ports}.
+
 @deffn {Scheme Procedure} file-encoding port
 @deffnx {C Function} scm_file_encoding (port)
-Scan the port for an Emacs-like character coding declaration near the
-top of the contents of a port with random-accessible contents
-(@pxref{Recognize Coding, how Emacs recognizes file encoding,, emacs,
-The GNU Emacs Reference Manual}).  The coding declaration is of the form
-@code{coding: XXXXX} and must appear in a Scheme comment.  Return a
-string containing the character encoding of the file if a declaration
-was found, or @code{#f} otherwise.  The port is rewound.
+Attempt to scan the first few hundred bytes from the @var{port} for
+hints about its character encoding.  Return a string containing the
+encoding name or @code{#f} if the encoding cannot be determined.  The
+port is rewound.
+
+Currently, the only supported method is to look for an Emacs-like
+character coding declaration (@pxref{Recognize Coding, how Emacs
+recognizes file encoding,, emacs, The GNU Emacs Reference Manual}).  The
+coding declaration is of the form @code{coding: XXXXX} and must appear
+in a Scheme comment.  Additional heuristics may be added in the future.
 @end deffn
 
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCHES] Keyword args for file openers; coding scan off by default
  2013-04-07 16:33   ` Mark H Weaver
@ 2013-04-07 19:18     ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2013-04-07 19:18 UTC (permalink / raw
  To: Mark H Weaver; +Cc: guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Looks good.
>>
>> Minor comments:

[...]

>>> +Unless binary mode is requested, the character encoding of the new port
>>> +is determined as follows: First, if @var{guess-encoding} is true,
>>> +heuristics will be used to guess the encoding of the file.  If it is
>>
>> “heuristics” is vague.  I’d prefer “the ‘file-encoding’ procedure is
>> called to check for Emacs-style coding declarations (@pxref{Character
>> Encoding of Source Files})”.  Should BOMs also be mentioned?
>
> Makes sense.  Following our discussion on IRC, the patch below has the
> following wording:
>
>   Unless binary mode is requested, the character encoding of the new
>   port is determined as follows: First, if @var{guess-encoding} is true,
>   the @code{file-encoding} procedure is used to guess the encoding of
>   the file (@pxref{Character Encoding of Source Files}).  [...]

Perfect.

> I left out the "Emacs-style coding declarations" language here, because
> I want to leave open the possibility of adding additional heuristics to
> 'file-encoding' in the future.

Good idea.

[...]

> From 3e9890ea809bd77a36c1005935b73cf4ca18e691 Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <mhw@netris.org>
> Date: Sun, 7 Apr 2013 12:07:33 -0400
> Subject: [PATCH 2/2] Clarify 'file-encoding' docs: heuristics may be improved
>  later.
>
> * doc/ref/api-evaluation.texi (Character Encoding of Source Files):
>   Mention UTF-8 as another common encoding used for Scheme source files,
>   and that it is used by default.  Change the description to leave open
>   the possibility of adding additional heuristics in the future.
>   Mention that if the coding declaration is in a #!-style block comment,
>   it must be the first such comment in the file.  Mention the
>   '#:guess-encoding' keyword argument.

Looks good too.

Please push both!

Ludo’.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-07 19:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-07  6:52 [PATCHES] Keyword args for file openers; coding scan off by default Mark H Weaver
2013-04-07 13:00 ` Ludovic Courtès
2013-04-07 13:09 ` Ludovic Courtès
2013-04-07 13:24 ` Ludovic Courtès
2013-04-07 16:33   ` Mark H Weaver
2013-04-07 19:18     ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).