* fencepost error in encoding processing
@ 2009-11-15 4:36 Ken Raeburn
2009-11-15 8:48 ` Mike Gran
2009-11-16 13:03 ` Ludovic Courtès
0 siblings, 2 replies; 8+ messages in thread
From: Ken Raeburn @ 2009-11-15 4:36 UTC (permalink / raw)
To: guile-devel
The Mac build started failing for me again, complaining about an
unknown encoding "UTF-8;" -- yes, with a semicolon on the end. So it
may not be surprising to find that it's a minor fencepost error in
processing the emacs-style encoding spec in boot-9.scm.
Why did this not show up before for you GNU/Linux guys? :-) Well, it
turns out, the GNU libc version of iconv_open is being "helpful":
iconv_t
iconv_open (const char *tocode, const char *fromcode)
{
/* Normalize the name. We remove all characters beside alpha-
numeric,
'_', '-', '/', '.', and ':'. */
...
If there's reason to believe that all these characters might show up
in valid encoding names, we might want to borrow that list for
scm_i_scan_for_encoding too. In fact, since we don't control the
iconv implementation, we should probably be *at least* as lenient as
glibc in accepting random characters.
Okay to check in?
Ken
Fix fencepost error in processing Emacs-style coding declaration.
diff --git a/libguile/read.c b/libguile/read.c
index e403cc3..775612a 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1513,7 +1513,7 @@ scm_i_scan_for_encoding (SCM port)
if (i == 0)
return NULL;
- encoding = scm_gc_strndup (pos, i + 1, "encoding");
+ encoding = scm_gc_strndup (pos, i, "encoding");
for (i = 0; i < strlen (encoding); i++)
encoding[i] = toupper ((int) encoding[i]);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: fencepost error in encoding processing
2009-11-15 4:36 fencepost error in encoding processing Ken Raeburn
@ 2009-11-15 8:48 ` Mike Gran
2009-11-15 22:46 ` Neil Jerram
2009-11-16 13:03 ` Ludovic Courtès
1 sibling, 1 reply; 8+ messages in thread
From: Mike Gran @ 2009-11-15 8:48 UTC (permalink / raw)
To: Ken Raeburn, guile-devel
> From: Ken Raeburn raeburn@raeburn.org
> iconv_t
> iconv_open (const char *tocode, const char *fromcode)
> {
> /* Normalize the name. We remove all characters beside alpha-numeric,
> '_', '-', '/', '.', and ':'. */
> ...
>
> If there's reason to believe that all these characters might show up in valid
> encoding names, we might want to borrow that list for scm_i_scan_for_encoding
> too. In fact, since we don't control the iconv implementation, we should
> probably be *at least* as lenient as glibc in accepting random characters.
You're probably right. A rather complete set of aliases for encoding names
can be found by exploring ICU's database of aliases at
http://demo.icu-project.org/icu-bin/convexp
Among those aliases, there are examples of the punctuation in the
iconv_open code snippet above (underscore, hyphen, forward slash, period, and colon)
as well as three others: equals sign, plus sign, and comma. The specification
on iconv_open doesn't place any limits on the characters allowed as input to
iconv_open; however, these eight symbols above should cover any case we're likely
to see.
>
> Okay to check in?
FWIW, it looks right to me.
-Mike Gran
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fencepost error in encoding processing
2009-11-15 8:48 ` Mike Gran
@ 2009-11-15 22:46 ` Neil Jerram
2009-11-16 7:32 ` Mike Gran
0 siblings, 1 reply; 8+ messages in thread
From: Neil Jerram @ 2009-11-15 22:46 UTC (permalink / raw)
To: Mike Gran; +Cc: Ken Raeburn, guile-devel
Mike Gran <spk121@yahoo.com> writes:
>> Okay to check in?
>
> FWIW, it looks right to me.
Yes, it looks good to me too. My only comment is that it took me a
while to see what you meant by 'fencepost'. I'm not familiar with
that expression, and thought it was something to do with building on
fencepost.gnu.org. So you might like to add some further explanation to
the commit message when you commit.
Regards,
Neil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fencepost error in encoding processing
2009-11-15 22:46 ` Neil Jerram
@ 2009-11-16 7:32 ` Mike Gran
2009-11-16 22:03 ` Richard E. Harke
0 siblings, 1 reply; 8+ messages in thread
From: Mike Gran @ 2009-11-16 7:32 UTC (permalink / raw)
To: Neil Jerram; +Cc: Ken Raeburn, guile-devel
> From: Neil Jerram <neil@ossau.uklinux.net>
> Yes, it looks good to me too. My only comment is that it took me a
> while to see what you meant by 'fencepost'. I'm not familiar with
> that expression, and thought it was something to do with building on
> fencepost.gnu.org. So you might like to add some further explanation to
> the commit message when you commit.
Sigh. I suddenly feel old. What do they teach you kids these days? ;)
http://www.catb.org/jargon/
http://www.catb.org/jargon/html/F/fencepost-error.html
-Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fencepost error in encoding processing
2009-11-15 4:36 fencepost error in encoding processing Ken Raeburn
2009-11-15 8:48 ` Mike Gran
@ 2009-11-16 13:03 ` Ludovic Courtès
2009-11-16 17:25 ` Ken Raeburn
1 sibling, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2009-11-16 13:03 UTC (permalink / raw)
To: guile-devel
Hello,
Good catch, thanks!
As far as encoding names are concerned, Bruno Haible pointed me to
http://www.iana.org/assignments/character-sets and I added a link to it
in the manual a couple of days ago.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fencepost error in encoding processing
2009-11-16 13:03 ` Ludovic Courtès
@ 2009-11-16 17:25 ` Ken Raeburn
2009-11-16 21:51 ` Ludovic Courtès
0 siblings, 1 reply; 8+ messages in thread
From: Ken Raeburn @ 2009-11-16 17:25 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel
On Nov 16, 2009, at 08:03, Ludovic Courtès wrote:
> As far as encoding names are concerned, Bruno Haible pointed me to
> http://www.iana.org/assignments/character-sets and I added a link to
> it
> in the manual a couple of days ago.
Between your link and Mike's, it looks to me like we should add
several more characters.
The GNU libc code adds ":" and "," to the list. The comment in
iconv_open doesn't list the comma, but the function it calls does
permit it. There's also some special handling of "/".
The IANA list shows names using "+" and parens ("ebcdic-us-37+euro",
"NF_Z_62-010_(1973)"), as well as colons.
I've skimmed the ICU page Mike pointed to, and it includes names like
"UTF-16BE,version=1" and "ibm-1149_P100-197,swaplfnl" as well as "+"
and ":" names, when showing "all aliases". If we only try to support,
say, IANA and MIME, then "+" and ":" are used but not "=".
Since we're scanning an Emacs-style coding specification, as long as
whitespace and semicolon aren't on the list, I think we can be
expansive, so let's go ahead and include all of ":,+=/()" to the
allowed set. The results will still be constrained by whatever the OS
supports; we just don't want Guile to impose additional constraints.
Should we allow punctuation in general by calling ispunct (and
explicitly checking for semicolon) instead? (Note that isalnum and
ispunct will also check for locale-specific characters... of course,
the new encoding spec hasn't come into effect yet....)
Ken
Allow more characters in coding system names in Emacs-style
declarations.
* libguile/read.c (scm_i_scan_for_encoding): Allow more punctuation
symbols in coding system names.
diff --git a/libguile/read.c b/libguile/read.c
index 775612a..657e101 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1506,8 +1506,7 @@ scm_i_scan_for_encoding (SCM port)
i = 0;
while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
&& pos + i - header < bytes_read
- && (isalnum((int) pos[i]) || pos[i] == '_' || pos[i] == '-'
- || pos[i] == '.'))
+ && (isalnum((int) pos[i]) || strchr("_-.:/,+=()", pos[i]) != NULL))
i++;
if (i == 0)
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: fencepost error in encoding processing
2009-11-16 17:25 ` Ken Raeburn
@ 2009-11-16 21:51 ` Ludovic Courtès
0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2009-11-16 21:51 UTC (permalink / raw)
To: Ken Raeburn; +Cc: guile-devel
Hi Ken,
Ken Raeburn <raeburn@raeburn.org> writes:
> On Nov 16, 2009, at 08:03, Ludovic Courtès wrote:
>> As far as encoding names are concerned, Bruno Haible pointed me to
>> http://www.iana.org/assignments/character-sets and I added a link to
>> it
>> in the manual a couple of days ago.
>
> Between your link and Mike's, it looks to me like we should add
> several more characters.
Yes.
> Since we're scanning an Emacs-style coding specification, as long as
> whitespace and semicolon aren't on the list, I think we can be
> expansive, so let's go ahead and include all of ":,+=/()" to the
> allowed set. The results will still be constrained by whatever the OS
> supports; we just don't want Guile to impose additional constraints.
Agreed.
Note that we follow whatever libunistring implements, which happens to
be IANA AIUI (though it’s case-insensitive.)
> * libguile/read.c (scm_i_scan_for_encoding): Allow more punctuation
> symbols in coding system names.
>
> diff --git a/libguile/read.c b/libguile/read.c
> index 775612a..657e101 100644
> --- a/libguile/read.c
> +++ b/libguile/read.c
> @@ -1506,8 +1506,7 @@ scm_i_scan_for_encoding (SCM port)
> i = 0;
> while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
> && pos + i - header < bytes_read
> - && (isalnum((int) pos[i]) || pos[i] == '_' || pos[i] == '-'
> - || pos[i] == '.'))
> + && (isalnum((int) pos[i]) || strchr("_-.:/,+=()", pos[i]) != NULL))
Sounds good to me, except for the missing whitespace before ‘(‘. ;-)
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fencepost error in encoding processing
2009-11-16 7:32 ` Mike Gran
@ 2009-11-16 22:03 ` Richard E. Harke
0 siblings, 0 replies; 8+ messages in thread
From: Richard E. Harke @ 2009-11-16 22:03 UTC (permalink / raw)
To: guile-devel
On Sunday 15 November 2009 11:32:52 pm Mike Gran wrote:
> > From: Neil Jerram <neil@ossau.uklinux.net>
> > Yes, it looks good to me too. My only comment is that it took me a
> > while to see what you meant by 'fencepost'. I'm not familiar with
> > that expression, and thought it was something to do with building on
> > fencepost.gnu.org. So you might like to add some further explanation to
> > the commit message when you commit.
>
> Sigh. I suddenly feel old. What do they teach you kids these days? ;)
>
> http://www.catb.org/jargon/
>
> http://www.catb.org/jargon/html/F/fencepost-error.html
>
> -Mike
Hey. I am old. Been programming since 1969. This the first time
I've heard of a "fence-post error".
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-11-16 22:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-15 4:36 fencepost error in encoding processing Ken Raeburn
2009-11-15 8:48 ` Mike Gran
2009-11-15 22:46 ` Neil Jerram
2009-11-16 7:32 ` Mike Gran
2009-11-16 22:03 ` Richard E. Harke
2009-11-16 13:03 ` Ludovic Courtès
2009-11-16 17:25 ` Ken Raeburn
2009-11-16 21:51 ` Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).