unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Ken Raeburn <raeburn@raeburn.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-devel@gnu.org
Subject: Re: fencepost error in encoding processing
Date: Mon, 16 Nov 2009 12:25:17 -0500	[thread overview]
Message-ID: <06D624B4-D409-4FC3-9EF5-12E90DBE37D0@raeburn.org> (raw)
In-Reply-To: <87k4xqoc3k.fsf@gnu.org>

On Nov 16, 2009, at 08:03, Ludovic Courtès wrote:
> As far as encoding names are concerned, Bruno Haible pointed me to
> http://www.iana.org/assignments/character-sets and I added a link to  
> it
> in the manual a couple of days ago.

Between your link and Mike's, it looks to me like we should add  
several more characters.

The GNU libc code adds ":" and "," to the list.  The comment in  
iconv_open doesn't list the comma, but the function it calls does  
permit it.  There's also some special handling of "/".

The IANA list shows names using "+" and parens ("ebcdic-us-37+euro",  
"NF_Z_62-010_(1973)"), as well as colons.

I've skimmed the ICU page Mike pointed to, and it includes names like  
"UTF-16BE,version=1" and "ibm-1149_P100-197,swaplfnl" as well as "+"  
and ":" names, when showing "all aliases".  If we only try to support,  
say, IANA and MIME, then "+" and ":" are used but not "=".

Since we're scanning an Emacs-style coding specification, as long as  
whitespace and semicolon aren't on the list, I think we can be  
expansive, so let's go ahead and include all of ":,+=/()" to the  
allowed set.  The results will still be constrained by whatever the OS  
supports; we just don't want Guile to impose additional constraints.

Should we allow punctuation in general by calling ispunct (and  
explicitly checking for semicolon) instead?  (Note that isalnum and  
ispunct will also check for locale-specific characters... of course,  
the new encoding spec hasn't come into effect yet....)

Ken

     Allow more characters in coding system names in Emacs-style  
declarations.

     * libguile/read.c (scm_i_scan_for_encoding): Allow more punctuation
       symbols in coding system names.

diff --git a/libguile/read.c b/libguile/read.c
index 775612a..657e101 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1506,8 +1506,7 @@ scm_i_scan_for_encoding (SCM port)
    i = 0;
    while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
           && pos + i - header < bytes_read
-	 && (isalnum((int) pos[i]) || pos[i] == '_' || pos[i] == '-'
-             || pos[i] == '.'))
+	 && (isalnum((int) pos[i]) || strchr("_-.:/,+=()", pos[i]) != NULL))
      i++;

    if (i == 0)





  reply	other threads:[~2009-11-16 17:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-15  4:36 fencepost error in encoding processing Ken Raeburn
2009-11-15  8:48 ` Mike Gran
2009-11-15 22:46   ` Neil Jerram
2009-11-16  7:32     ` Mike Gran
2009-11-16 22:03       ` Richard E. Harke
2009-11-16 13:03 ` Ludovic Courtès
2009-11-16 17:25   ` Ken Raeburn [this message]
2009-11-16 21:51     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06D624B4-D409-4FC3-9EF5-12E90DBE37D0@raeburn.org \
    --to=raeburn@raeburn.org \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).