unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Scanning for coding declarations in all files (not just source)
@ 2013-01-13 18:25 Mark H Weaver
  2013-01-13 19:51 ` Mike Gran
  2013-01-15  9:32 ` Ludovic Courtès
  0 siblings, 2 replies; 11+ messages in thread
From: Mark H Weaver @ 2013-01-13 18:25 UTC (permalink / raw
  To: guile-devel

I just discovered that Guile is scanning for coding declarations in
*all* files opened with 'open-file', not just source files.

For source files, we are scanning for coding declarations twice: once
when when the file is opened, and a second time when 'compile-file' or
'primitive-load' explicitly scans for it.

The relevant commit is 211683cc5c99542dfb6e2a33f7cb8c1f9abbc702.
I was unable to find any discussion of this on guile-devel.

I don't like this.  I don't want 'open-file' to second-guess the
encoding I have asked for in my program, based on data in the file.
Also, the manual is misleading.  Section 6.17.8 gives the impression
that the scanning is only done for source files.

What do other people think?

      Mark



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Scanning for coding declarations in all files (not just source)
  2013-01-13 18:25 Scanning for coding declarations in all files (not just source) Mark H Weaver
@ 2013-01-13 19:51 ` Mike Gran
  2013-01-15  9:32 ` Ludovic Courtès
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Gran @ 2013-01-13 19:51 UTC (permalink / raw
  To: Mark H Weaver, guile-devel@gnu.org

> From: Mark H Weaver <mhw@netris.org>
> To: guile-devel@gnu.org
> Cc: Michael Gran <spk121@yahoo.com>
> Sent: Sunday, January 13, 2013 10:25 AM
> Subject: Scanning for coding declarations in all files (not just source)
> 
Hi Mark,

> I just discovered that Guile is scanning for coding declarations in
> *all* files opened with 'open-file', not just source files.
True

> For source files, we are scanning for coding declarations twice: once
> when when the file is opened, and a second time when 'compile-file' or
> 'primitive-load' explicitly scans for it.

If there was a reason for scanning the coding twice, I don't recall it.

> 
> The relevant commit is 211683cc5c99542dfb6e2a33f7cb8c1f9abbc702.
> I was unable to find any discussion of this on guile-devel.
> 
> I don't like this.  I don't want 'open-file' to second-guess the
> encoding I have asked for in my program, based on data in the file.
> Also, the manual is misleading.  Section 6.17.8 gives the impression
> that the scanning is only done for source files.
> 
> What do other people think?

Opening a file that contains a coding declaration using an encoding other
than binary or the coding declared in the file seems like it would be
something of a corner case.  So, IMHO it makes sense that opening a file
using its self-declared encoding should be the simple case, and that
opening a text file in a different (non-binary) text encoding should
be the more complicated case, in a API sense.

There are also obscure possibilities to consider, like reading code from
a file or pipe into a string, and then eval-ing the string.

But, I can see your point, though.  Guile does seem to be having a tough
time deciding how automatic or manual string encoding should be.

So, whatever makes people happy.

-Mike 




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Scanning for coding declarations in all files (not just source)
  2013-01-13 18:25 Scanning for coding declarations in all files (not just source) Mark H Weaver
  2013-01-13 19:51 ` Mike Gran
@ 2013-01-15  9:32 ` Ludovic Courtès
  2013-01-22 11:38   ` Andy Wingo
  1 sibling, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2013-01-15  9:32 UTC (permalink / raw
  To: guile-devel

Hi,

Mark H Weaver <mhw@netris.org> skribis:

> I just discovered that Guile is scanning for coding declarations in
> *all* files opened with 'open-file', not just source files.

Yeah, I don’t really like it either.

> For source files, we are scanning for coding declarations twice: once
> when when the file is opened, and a second time when 'compile-file' or
> 'primitive-load' explicitly scans for it.

Yes, I think I’m used to writing things like:

  (call-with-input-file file
    (lambda (p)
      (set-port-encoding! p (file-encoding p))
      ...))
      
when I really want that behavior.

One thing I noticed lately is that ‘call-with-input-file’ & co. never
use the “b” flag, so if what you want is really an binary input file,
you have to resort to hacks like:

  (define (call-with-latin1-input-file file proc)
    "Open FILE as an latin1 or binary file, and pass the resulting port to
  PROC.  FILE is closed when PROC's dynamic extent is left.  Return the
  return values of applying PROC to the port."
    (let ((port (with-fluids ((%default-port-encoding #f))
                  ;; Use "b" so that `open-file' ignores `coding:' cookies.
                  (open-file file "rb"))))
      (dynamic-wind
        (lambda ()
          #t)
        (lambda ()
          (proc port))
        (lambda ()
          (close-input-port port)))))

Also, guessing for all files may cause problems with non-seekable files,
or pseudo files.

> I don't like this.  I don't want 'open-file' to second-guess the
> encoding I have asked for in my program, based on data in the file.
> Also, the manual is misleading.  Section 6.17.8 gives the impression
> that the scanning is only done for source files.

That may reflect what I’ve been thinking for some time.  ;-)

Mike Gran <spk121@yahoo.com> skribis:

> Opening a file that contains a coding declaration using an encoding other
> than binary or the coding declared in the file seems like it would be
> something of a corner case.  So, IMHO it makes sense that opening a file
> using its self-declared encoding should be the simple case, and that
> opening a text file in a different (non-binary) text encoding should
> be the more complicated case, in a API sense.

That’s also true.  In Emacs, we want files to be opened with the right
encoding, whatever happens; so in some cases we could have the same
expectations here.

As usual, backward-compatibility gives us an incentive not to change
anything in 2.0.  But perhaps we should change that in 2.2.

Thoughts?

Ludo’.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Scanning for coding declarations in all files (not just source)
  2013-01-15  9:32 ` Ludovic Courtès
@ 2013-01-22 11:38   ` Andy Wingo
  2013-01-31  5:06     ` [PATCH] Do not scan for coding declarations in open-file Mark H Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2013-01-22 11:38 UTC (permalink / raw
  To: Ludovic Courtès; +Cc: guile-devel

On Tue 15 Jan 2013 10:32, ludo@gnu.org (Ludovic Courtès) writes:

> Mike Gran <spk121@yahoo.com> skribis:
>
>> Opening a file that contains a coding declaration using an encoding other
>> than binary or the coding declared in the file seems like it would be
>> something of a corner case.  So, IMHO it makes sense that opening a file
>> using its self-declared encoding should be the simple case, and that
>> opening a text file in a different (non-binary) text encoding should
>> be the more complicated case, in a API sense.

I am sympathetic to this, but also to Mike's "it's a tough problem;
whatever makes people happy" sentiment.

> As usual, backward-compatibility gives us an incentive not to change
> anything in 2.0.  But perhaps we should change that in 2.2.
>
> Thoughts?

IMO we should update the docs and leave it as it is, though I don't care
much.

Mark?

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] Do not scan for coding declarations in open-file
  2013-01-22 11:38   ` Andy Wingo
@ 2013-01-31  5:06     ` Mark H Weaver
  2013-01-31 10:00       ` Andy Wingo
  2013-01-31 21:51       ` Ludovic Courtès
  0 siblings, 2 replies; 11+ messages in thread
From: Mark H Weaver @ 2013-01-31  5:06 UTC (permalink / raw
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

Andy Wingo <wingo@pobox.com> writes:
> On Tue 15 Jan 2013 10:32, ludo@gnu.org (Ludovic Courtès) writes:
>> As usual, backward-compatibility gives us an incentive not to change
>> anything in 2.0.  But perhaps we should change that in 2.2.
>>
>> Thoughts?
>
> IMO we should update the docs and leave it as it is, though I don't care
> much.
>
> Mark?

My position is that the current coding-auto-detection behavior of
'open-file' is likely to lead to security flaws in software built using
Guile.  The issue is that programs that receive text from an untrusted
source, write those strings to a file, and then read them back in, is
potentially vulnerable to hostile coding declarations inserted within
those strings.

I feel strongly that this security issue is important enough to risk the
possibility that someone's code might have become dependent upon the
guesswork in 'open-file'.

While it is possible to work around this problem in various non-portable
ways, it is unlikely that most people would think to apply such
workarounds.  Frankly, I find it quite embarrassing that our basic
low-level text I/O is not robust.

I've attached a patch to fix this.  Please consider it.

  Thoughts?
    Mark



[-- Attachment #2: [PATCH] Do not scan for coding declarations in open-file --]
[-- Type: text/x-diff, Size: 9977 bytes --]

From f936b2553c809967fd6703d8ec8fc9a7ef7bd5af Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Wed, 30 Jan 2013 14:45:28 -0500
Subject: [PATCH] Do not scan for coding declarations in open-file.

* libguile/fports.c (scm_open_file): Do not scan for coding
  declarations.  Replace 'use_encoding' local variable with
  'binary'.  Update documentation string.

* module/ice-9/psyntax.scm (include): Add the same file-encoding
  logic that's used in compile-file and scm_primitive_load.

* module/ice-9/psyntax-pp.scm: Regenerate.

* doc/ref/api-io.texi (File Ports): Update docs.

* test-suite/tests/ports.test: Change "open-file HONORS file coding
  declarations" test to "open-file IGNORES file coding declaration".

* test-suite/tests/coding.test (scan-coding): Use 'file-encoding' to
  scan for the encoding, since 'open-input-file' no longer does so.
---
 doc/ref/api-io.texi          |   13 +++----------
 libguile/fports.c            |   33 ++++++++-------------------------
 module/ice-9/psyntax-pp.scm  |   10 ++++++----
 module/ice-9/psyntax.scm     |   13 +++++++++----
 test-suite/tests/coding.test |    4 ++--
 test-suite/tests/ports.test  |    5 ++---
 6 files changed, 30 insertions(+), 48 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 11ae580..fca3875 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -1,7 +1,7 @@
 @c -*-texinfo-*-
 @c This is part of the GNU Guile Reference Manual.
 @c Copyright (C)  1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009,
-@c   2010, 2011  Free Software Foundation, Inc.
+@c   2010, 2011, 2013  Free Software Foundation, Inc.
 @c See the file guile.texi for copying conditions.
 
 @node Input and Output
@@ -884,8 +884,8 @@ Use binary mode, ensuring that each byte in the file will be read as one
 Scheme character.
 
 To provide this property, the file will be opened with the 8-bit
-character encoding "ISO-8859-1", ignoring any coding declaration or port
-encoding.  @xref{Ports}, for more information on port encodings.
+character encoding "ISO-8859-1", ignoring the default port encoding.
+@xref{Ports}, for more information on port encodings.
 
 Note that while it is possible to read and write binary data as
 characters or strings, it is usually better to treat bytes as octets,
@@ -902,13 +902,6 @@ because of its port encoding ramifications.
 If a file cannot be opened with the access
 requested, @code{open-file} throws an exception.
 
-When the file is opened, this procedure will scan for a coding
-declaration (@pxref{Character Encoding of Source Files}). If a coding
-declaration is found, it will be used to interpret the file.  Otherwise,
-the port's encoding will be used.  To suppress this behavior, open the
-file in binary mode and then set the port encoding explicitly using
-@code{set-port-encoding!}.
-
 In theory we could create read/write ports which were buffered
 in one direction only.  However this isn't included in the
 current interfaces.
diff --git a/libguile/fports.c b/libguile/fports.c
index 10cf671..c3af8b4 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -1,5 +1,6 @@
-/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
- *   2004, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
+/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001,
+ *   2002, 2003, 2004, 2006, 2007, 2008, 2009, 2010, 2011,
+ *   2012, 2013 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -371,8 +372,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 	    "@item b\n"
 	    "Open the underlying file in binary mode, if supported by the system.\n"
 	    "Also, open the file using the binary-compatible character encoding\n"
-	    "\"ISO-8859-1\", ignoring the port's encoding and the coding declaration\n"
-	    "at the top of the input file, if any.\n"
+	    "\"ISO-8859-1\", ignoring the default port encoding.\n"
 	    "@item +\n"
 	    "Open the port for both input and output.  E.g., @code{r+}: open\n"
 	    "an existing file for both input and output.\n"
@@ -387,11 +387,6 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 	    "Add line-buffering to the port.  The port output buffer will be\n"
 	    "automatically flushed whenever a newline character is written.\n"
 	    "@end table\n"
-	    "When the file is opened, this procedure will scan for a coding\n"
-	    "declaration@pxref{Character Encoding of Source Files}. If present\n"
-	    "will use that encoding for interpreting the file.  Otherwise, the\n"
-	    "port's encoding will be used.\n"
-	    "\n"
 	    "In theory we could create read/write ports which were buffered\n"
 	    "in one direction only.  However this isn't included in the\n"
 	    "current interfaces.  If a file cannot be opened with the access\n"
@@ -399,7 +394,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 #define FUNC_NAME s_scm_open_file
 {
   SCM port;
-  int fdes, flags = 0, use_encoding = 1;
+  int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
@@ -434,7 +429,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 	  flags = (flags & ~(O_RDONLY | O_WRONLY)) | O_RDWR;
 	  break;
 	case 'b':
-	  use_encoding = 0;
+	  binary = 1;
 #if defined (O_BINARY)
 	  flags |= O_BINARY;
 #endif
@@ -473,20 +468,8 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
   port = scm_i_fdes_to_port (fdes, scm_i_mode_bits (mode),
                              fport_canonicalize_filename (filename));
 
-  if (use_encoding)
-    {
-      /* If this file has a coding declaration, use that as the port
-	 encoding.  */
-      if (SCM_INPUT_PORT_P (port))
-	{
-	  char *enc = scm_i_scan_for_encoding (port);
-	  if (enc != NULL)
-	    scm_i_set_port_encoding_x (port, enc);
-	}
-    }
-  else
-    /* If this is a binary file, use the binary-friendly ISO-8859-1
-       encoding.  */
+  if (binary)
+    /* Use the binary-friendly ISO-8859-1 encoding. */
     scm_i_set_port_encoding_x (port, NULL);
 
   scm_dynwind_end ();
diff --git a/module/ice-9/psyntax-pp.scm b/module/ice-9/psyntax-pp.scm
index 139c02b..032034e 100644
--- a/module/ice-9/psyntax-pp.scm
+++ b/module/ice-9/psyntax-pp.scm
@@ -2968,10 +2968,12 @@
          (read-file
            (lambda (fn dir k)
              (let ((p (open-input-file (if (absolute-path? fn) fn (in-vicinity dir fn)))))
-               (let f ((x (read p)) (result '()))
-                 (if (eof-object? x)
-                   (begin (close-input-port p) (reverse result))
-                   (f (read p) (cons (datum->syntax k x) result))))))))
+               (let ((enc (file-encoding p)))
+                 (set-port-encoding! p (let ((t enc)) (if t t "UTF-8")))
+                 (let f ((x (read p)) (result '()))
+                   (if (eof-object? x)
+                     (begin (close-input-port p) (reverse result))
+                     (f (read p) (cons (datum->syntax k x) result)))))))))
         (let ((src (syntax-source x)))
           (let ((file (if src (assq-ref src 'filename) #f)))
             (let ((dir (if (string? file) (dirname file) #f)))
diff --git a/module/ice-9/psyntax.scm b/module/ice-9/psyntax.scm
index 4abd3c9..d3e2616 100644
--- a/module/ice-9/psyntax.scm
+++ b/module/ice-9/psyntax.scm
@@ -2940,10 +2940,15 @@
 
     (define read-file
       (lambda (fn dir k)
-        (let ((p (open-input-file
-                  (if (absolute-path? fn)
-                      fn
-                      (in-vicinity dir fn)))))
+        (let* ((p (open-input-file
+                   (if (absolute-path? fn)
+                       fn
+                       (in-vicinity dir fn))))
+               (enc (file-encoding p)))
+
+          ;; Choose the input encoding deterministically.
+          (set-port-encoding! p (or enc "UTF-8"))
+
           (let f ((x (read p))
                   (result '()))
             (if (eof-object? x)
diff --git a/test-suite/tests/coding.test b/test-suite/tests/coding.test
index 4152af8..0a15d93 100644
--- a/test-suite/tests/coding.test
+++ b/test-suite/tests/coding.test
@@ -1,6 +1,6 @@
 ;;;; coding.test --- test suite for coding declarations. -*- mode: scheme -*-
 ;;;;
-;;;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;;; Copyright (C) 2011, 2013 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -40,7 +40,7 @@
      ;; relies on the opportunistic filling of the input buffer, which
      ;; doesn't happen after a seek.
      (let* ((port (open-input-file name))
-            (res (port-encoding port)))
+            (res (file-encoding port)))
        (close-port port)
        res))))
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 65c3b3f..5109583 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -269,13 +269,12 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file honors file coding declarations
-(pass-if "file: open-file honors coding declarations"
+;; open-file ignores file coding declaration
+(pass-if "file: open-file ignores coding declarations"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
                       (test-string "€100"))
-                 (set-port-encoding! port "ISO-8859-15")
                  (write-line ";; coding: iso-8859-15" port)
                  (write-line test-string port)
                  (close-port port)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Do not scan for coding declarations in open-file
  2013-01-31  5:06     ` [PATCH] Do not scan for coding declarations in open-file Mark H Weaver
@ 2013-01-31 10:00       ` Andy Wingo
  2013-01-31 18:58         ` Mark H Weaver
  2013-01-31 22:00         ` Ludovic Courtès
  2013-01-31 21:51       ` Ludovic Courtès
  1 sibling, 2 replies; 11+ messages in thread
From: Andy Wingo @ 2013-01-31 10:00 UTC (permalink / raw
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Thu 31 Jan 2013 06:06, Mark H Weaver <mhw@netris.org> writes:

> From: Mark H Weaver <mhw@netris.org>
> Date: Wed, 30 Jan 2013 14:45:28 -0500
> Subject: [PATCH] Do not scan for coding declarations in open-file.

The patch looks good to me but I am concerned about the behavior
change, and that it is inconvenient to get the previous behavior.

My instinct is that we should not merge this patch without including a
way to enable the coding sniff; which seems to mean adding keywords or
somehow extending the arguments of:

  open-file
  with-input-from-file
  with-output-to-file
  call-with-output-file
  call-with-input-file
  open-input-file

Dunno; that is a larger task.  I'd be interested in Ludovic's thoughts
as well.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Do not scan for coding declarations in open-file
  2013-01-31 10:00       ` Andy Wingo
@ 2013-01-31 18:58         ` Mark H Weaver
  2013-01-31 20:04           ` Andy Wingo
  2013-01-31 22:00         ` Ludovic Courtès
  1 sibling, 1 reply; 11+ messages in thread
From: Mark H Weaver @ 2013-01-31 18:58 UTC (permalink / raw
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

Hi Andy,

Andy Wingo <wingo@pobox.com> writes:
> The patch looks good to me but I am concerned about the behavior
> change, and that it is inconvenient to get the previous behavior.
>
> My instinct is that we should not merge this patch without including a
> way to enable the coding sniff; which seems to mean adding keywords or
> somehow extending the arguments of:
>
>   open-file
>   with-input-from-file
>   with-output-to-file
>   call-with-output-file
>   call-with-input-file
>   open-input-file

I'd be glad to do this.  I've long wanted these to accept keyword
arguments for encoding and binary mode.  We could also have a keyword to
ask Guile to guess the encoding.  This could be used to simplify the
code used in 'compile-file' etc.

We could also add a fluid to specify whether 'open-file' should try to
guess the encoding, if that helps.

What do you think?

      Mark



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Do not scan for coding declarations in open-file
  2013-01-31 18:58         ` Mark H Weaver
@ 2013-01-31 20:04           ` Andy Wingo
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2013-01-31 20:04 UTC (permalink / raw
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Thu 31 Jan 2013 19:58, Mark H Weaver <mhw@netris.org> writes:

>> My instinct is that we should not merge this patch without including a
>> way to enable the coding sniff; which seems to mean adding keywords or
>> somehow extending the arguments of:
>>
>>   open-file
>>   with-input-from-file
>>   with-output-to-file
>>   call-with-output-file
>>   call-with-input-file
>>   open-input-file
>
> I'd be glad to do this.  I've long wanted these to accept keyword
> arguments for encoding and binary mode.  We could also have a keyword to
> ask Guile to guess the encoding.  This could be used to simplify the
> code used in 'compile-file' etc.
>
> We could also add a fluid to specify whether 'open-file' should try to
> guess the encoding, if that helps.
>
> What do you think?

Sounds great to me :)

I would add the parameter only if you think it makes sense as an
interface going forward -- i.e. I wouldn't add it if it's only useful
for the rest of the life of the 2.0.x series.

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Do not scan for coding declarations in open-file
  2013-01-31  5:06     ` [PATCH] Do not scan for coding declarations in open-file Mark H Weaver
  2013-01-31 10:00       ` Andy Wingo
@ 2013-01-31 21:51       ` Ludovic Courtès
  1 sibling, 0 replies; 11+ messages in thread
From: Ludovic Courtès @ 2013-01-31 21:51 UTC (permalink / raw
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> My position is that the current coding-auto-detection behavior of
> 'open-file' is likely to lead to security flaws in software built using
> Guile.  The issue is that programs that receive text from an untrusted
> source, write those strings to a file, and then read them back in, is
> potentially vulnerable to hostile coding declarations inserted within
> those strings.

The way Emacs handles this is that it detects the ‘coding:’ cookie and
automatically switches the encoding accordingly.

Just mentioning it, because we seem to be hesitant between two opposite
solutions in the design space: one is Emacs, designed to make things
work by default in practical cases, and the other is POSIX, designed to
leave programmers with all the power of a chainsaw.

Ludo’.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Do not scan for coding declarations in open-file
  2013-01-31 10:00       ` Andy Wingo
  2013-01-31 18:58         ` Mark H Weaver
@ 2013-01-31 22:00         ` Ludovic Courtès
  2013-01-31 22:19           ` Noah Lavine
  1 sibling, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2013-01-31 22:00 UTC (permalink / raw
  To: Andy Wingo; +Cc: Mark H Weaver, guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> On Thu 31 Jan 2013 06:06, Mark H Weaver <mhw@netris.org> writes:
>
>> From: Mark H Weaver <mhw@netris.org>
>> Date: Wed, 30 Jan 2013 14:45:28 -0500
>> Subject: [PATCH] Do not scan for coding declarations in open-file.
>
> The patch looks good to me but I am concerned about the behavior
> change, and that it is inconvenient to get the previous behavior.

I’m concerned too.

However, I’ve been explicitly using ‘file-encoding’ “forever” when I
really wanted to handle coding cookies.  Actually the doc even
explicitly recommends this (info "(guile) Character Encoding of Source
Files"):

     If a port is used to read code of unknown character encoding, it can
  accomplish this in three steps.  First, the character encoding of the
  port should be set to ISO-8859-1 using `set-port-encoding!'.  Then, the
  procedure `file-encoding', described below, is used to scan for a
  coding declaration when reading from the port.  As a side effect, it
  rewinds the port after its scan is complete. After that, the port's
  character encoding should be set to the encoding returned by
  `file-encoding', if any, again by using `set-port-encoding!'.  Then the
  code can be read as normal.

Considering this, it is tempting to think that removing that
scm_i_scan_for_encoding call would be a bug fix.

WDYT?

> My instinct is that we should not merge this patch without including a
> way to enable the coding sniff; which seems to mean adding keywords or
> somehow extending the arguments of:
>
>   open-file
>   with-input-from-file
>   with-output-to-file
>   call-with-output-file
>   call-with-input-file
>   open-input-file
>
> Dunno; that is a larger task.  I'd be interested in Ludovic's thoughts
> as well.

There are several issues IMO.  First, some are subrs, so handling
keyword arguments is going to be painful.  Second, keyword arguments are
inelegant IMO compared to:

  (set-port-encoding! port (file-encoding port))

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Do not scan for coding declarations in open-file
  2013-01-31 22:00         ` Ludovic Courtès
@ 2013-01-31 22:19           ` Noah Lavine
  0 siblings, 0 replies; 11+ messages in thread
From: Noah Lavine @ 2013-01-31 22:19 UTC (permalink / raw
  To: Ludovic Courtès; +Cc: Andy Wingo, Mark H Weaver, guile-devel

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

Hello,


On Thu, Jan 31, 2013 at 5:00 PM, Ludovic Courtès <ludo@gnu.org> wrote:

> There are several issues IMO.  First, some are subrs, so handling
> keyword arguments is going to be painful.  Second, keyword arguments are
> inelegant IMO compared to:
>
>   (set-port-encoding! port (file-encoding port))
>

I don't have much experience in this area, but what about making a Scheme
binding for scm_i_scan_for_encoding, and then doing something like

(define (open-file filename mode #:encoding enc)
  (let ((port (open-file filename mode)))
    (set-port-encoding! port
                                 (if (eq? enc 'guess-encoding)
                                    (scan-for-encoding port)
                                    enc))))

Of course, that doesn't address your second point.

Best,
Noah

[-- Attachment #2: Type: text/html, Size: 1343 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-01-31 22:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-13 18:25 Scanning for coding declarations in all files (not just source) Mark H Weaver
2013-01-13 19:51 ` Mike Gran
2013-01-15  9:32 ` Ludovic Courtès
2013-01-22 11:38   ` Andy Wingo
2013-01-31  5:06     ` [PATCH] Do not scan for coding declarations in open-file Mark H Weaver
2013-01-31 10:00       ` Andy Wingo
2013-01-31 18:58         ` Mark H Weaver
2013-01-31 20:04           ` Andy Wingo
2013-01-31 22:00         ` Ludovic Courtès
2013-01-31 22:19           ` Noah Lavine
2013-01-31 21:51       ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).