From: Ken Raeburn <raeburn@raeburn.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-devel@gnu.org
Subject: Re: fencepost error in encoding processing
Date: Mon, 16 Nov 2009 12:25:17 -0500 [thread overview]
Message-ID: <06D624B4-D409-4FC3-9EF5-12E90DBE37D0@raeburn.org> (raw)
In-Reply-To: <87k4xqoc3k.fsf@gnu.org>
On Nov 16, 2009, at 08:03, Ludovic Courtès wrote:
> As far as encoding names are concerned, Bruno Haible pointed me to
> http://www.iana.org/assignments/character-sets and I added a link to
> it
> in the manual a couple of days ago.
Between your link and Mike's, it looks to me like we should add
several more characters.
The GNU libc code adds ":" and "," to the list. The comment in
iconv_open doesn't list the comma, but the function it calls does
permit it. There's also some special handling of "/".
The IANA list shows names using "+" and parens ("ebcdic-us-37+euro",
"NF_Z_62-010_(1973)"), as well as colons.
I've skimmed the ICU page Mike pointed to, and it includes names like
"UTF-16BE,version=1" and "ibm-1149_P100-197,swaplfnl" as well as "+"
and ":" names, when showing "all aliases". If we only try to support,
say, IANA and MIME, then "+" and ":" are used but not "=".
Since we're scanning an Emacs-style coding specification, as long as
whitespace and semicolon aren't on the list, I think we can be
expansive, so let's go ahead and include all of ":,+=/()" to the
allowed set. The results will still be constrained by whatever the OS
supports; we just don't want Guile to impose additional constraints.
Should we allow punctuation in general by calling ispunct (and
explicitly checking for semicolon) instead? (Note that isalnum and
ispunct will also check for locale-specific characters... of course,
the new encoding spec hasn't come into effect yet....)
Ken
Allow more characters in coding system names in Emacs-style
declarations.
* libguile/read.c (scm_i_scan_for_encoding): Allow more punctuation
symbols in coding system names.
diff --git a/libguile/read.c b/libguile/read.c
index 775612a..657e101 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1506,8 +1506,7 @@ scm_i_scan_for_encoding (SCM port)
i = 0;
while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
&& pos + i - header < bytes_read
- && (isalnum((int) pos[i]) || pos[i] == '_' || pos[i] == '-'
- || pos[i] == '.'))
+ && (isalnum((int) pos[i]) || strchr("_-.:/,+=()", pos[i]) != NULL))
i++;
if (i == 0)
next prev parent reply other threads:[~2009-11-16 17:25 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-15 4:36 fencepost error in encoding processing Ken Raeburn
2009-11-15 8:48 ` Mike Gran
2009-11-15 22:46 ` Neil Jerram
2009-11-16 7:32 ` Mike Gran
2009-11-16 22:03 ` Richard E. Harke
2009-11-16 13:03 ` Ludovic Courtès
2009-11-16 17:25 ` Ken Raeburn [this message]
2009-11-16 21:51 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06D624B4-D409-4FC3-9EF5-12E90DBE37D0@raeburn.org \
--to=raeburn@raeburn.org \
--cc=guile-devel@gnu.org \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).