unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: Andy Wingo <wingo@pobox.com>
Cc: guile-devel <guile-devel@gnu.org>
Subject: Re: byte-order marks
Date: Tue, 29 Jan 2013 03:22:17 -0500	[thread overview]
Message-ID: <87y5fcjt52.fsf@tines.lan> (raw)
In-Reply-To: <87boc956j2.fsf@pobox.com> (Andy Wingo's message of "Mon, 28 Jan 2013 22:42:09 +0100")

Hi Andy,

Andy Wingo <wingo@pobox.com> writes:
> What do people think about this attached patch?

I'm strongly opposed to making 'open-input-file' any more clever than it
already is.  Furthermore, I strongly believe that it should be much less
clever than it is now.  Our basic textual I/O should be robust by
default, and should not second-guess the specified encoding based on
flimsy heuristics that work 99% of the time.

IMO, our default behavior should allow portable scheme code to write an
arbitrary string of characters to a file in some encoding, and later
read it back, without having to worry about whether the string starts
with something that looks like a BOM, or contains a string that looks
like a coding declaration.  The string might be from a network, and thus
potentially from a malicious source.

Frankly, I consider this to be a potential source of security flaws in
software built using Guile, and on that basis would advocate removing
the existing cleverness from 'open-input-file' in stable-2.0.  At the
very least it should be removed from master.

Regarding byte-order marks, my preference is that users should explictly
consume BOMs if that's what they want (ideally using some convenience
procedure provided by Guile).  Sometimes consuming the BOM is the wrong
thing.  For example, if the user is copying a file to another file, or
to a socket, it may be important to preserve the BOM.

If others feel strongly that BOMs should be consumed by default, then
the following compromise is about as far as I'd (reluctantly) consider
going:

* 'open-input-file' could perhaps auto-consume a BOM at the beginning of
  the stream, but *only* if the BOM is already in the encoding specified
  by the user (possibly via an explicit call to 'file-encoding').  For
  example, if the specified port encoding is UTF-8, then EF BB BF would
  be consumed, but FE FF or FF FE would be left alone.

* BOMs absolutely should *not* be used to determine the encoding unless
  the user has explicitly asked for coding auto-detection.

Having said all this, if 'open-input-file' is changed to no longer call
'scm_i_scan_for_file_encoding', then I think it's a fine idea to add
BOMs to its list of heuristics, though I tend to agree with Mike that a
coding declaration should take precedence, for the reasons he described.

However, I strongly believe that 'scm_i_scan_for_file_encoding' is the
wrong place to consume BOMs.

What do you think?

      Mark



  parent reply	other threads:[~2013-01-29  8:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-28 21:42 byte-order marks Andy Wingo
2013-01-28 22:20 ` Mike Gran
2013-01-29  9:03   ` Andy Wingo
2013-01-29  8:22 ` Mark H Weaver [this message]
2013-01-29  9:03   ` Andy Wingo
2013-01-29 13:27     ` Ludovic Courtès
2013-01-29 14:04       ` Andy Wingo
2013-01-29 17:09         ` Mark H Weaver
2013-01-29 19:09           ` Mark H Weaver
2013-01-29 20:52             ` Ludovic Courtès
2013-01-29 20:53           ` Ludovic Courtès
2013-01-30  9:20           ` Andy Wingo
2013-01-30 21:18             ` Ludovic Courtès
2013-01-31  8:52               ` Andy Wingo
2013-01-31  4:40             ` [PATCHES] Discard BOMs at stream start for UTF-{8,16,32} encodings Mark H Weaver
2013-01-31  9:39               ` Andy Wingo
2013-01-31 10:33                 ` Andy Wingo
2013-01-31 18:01                   ` [PATCHES] Discard BOMs at stream start for UTF-{8, 16, 32} encodings Mark H Weaver
2013-01-31 21:42               ` Ludovic Courtès
2013-01-29 19:22 ` byte-order marks Neil Jerram
2013-01-29 21:09   ` Andy Wingo
2013-01-29 21:12     ` Neil Jerram

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y5fcjt52.fsf@tines.lan \
    --to=mhw@netris.org \
    --cc=guile-devel@gnu.org \
    --cc=wingo@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).