From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ken Raeburn Newsgroups: gmane.lisp.guile.devel Subject: Re: fencepost error in encoding processing Date: Mon, 16 Nov 2009 12:25:17 -0500 Message-ID: <06D624B4-D409-4FC3-9EF5-12E90DBE37D0@raeburn.org> References: <30581C9F-01D3-4B5F-B413-EF46E1A3D365@raeburn.org> <87k4xqoc3k.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v936) Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1258395057 30272 80.91.229.12 (16 Nov 2009 18:10:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 16 Nov 2009 18:10:57 +0000 (UTC) Cc: guile-devel@gnu.org To: =?ISO-8859-1?Q?Ludovic_Court=E8s?= Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Nov 16 19:10:50 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NA62J-0006NJ-J2 for guile-devel@m.gmane.org; Mon, 16 Nov 2009 19:10:47 +0100 Original-Received: from localhost ([127.0.0.1]:38094 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NA62I-00067X-OV for guile-devel@m.gmane.org; Mon, 16 Nov 2009 13:10:46 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NA5Ko-0007Qt-GG for guile-devel@gnu.org; Mon, 16 Nov 2009 12:25:50 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NA5Kj-0007Nz-C8 for guile-devel@gnu.org; Mon, 16 Nov 2009 12:25:49 -0500 Original-Received: from [199.232.76.173] (port=57202 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NA5Kj-0007No-5H for guile-devel@gnu.org; Mon, 16 Nov 2009 12:25:45 -0500 Original-Received: from splat.raeburn.org ([69.25.196.39]:41400 helo=raeburn.org) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NA5KY-0007QM-Dv; Mon, 16 Nov 2009 12:25:42 -0500 Original-Received: from [10.0.0.158] ([10.0.0.158]) by raeburn.org (8.14.3/8.14.1) with ESMTP id nAGHPHdL001710; Mon, 16 Nov 2009 12:25:32 -0500 (EST) In-Reply-To: <87k4xqoc3k.fsf@gnu.org> X-Mailer: Apple Mail (2.936) X-detected-operating-system: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9678 Archived-At: On Nov 16, 2009, at 08:03, Ludovic Court=E8s wrote: > As far as encoding names are concerned, Bruno Haible pointed me to > http://www.iana.org/assignments/character-sets and I added a link to =20= > it > in the manual a couple of days ago. Between your link and Mike's, it looks to me like we should add =20 several more characters. The GNU libc code adds ":" and "," to the list. The comment in =20 iconv_open doesn't list the comma, but the function it calls does =20 permit it. There's also some special handling of "/". The IANA list shows names using "+" and parens ("ebcdic-us-37+euro", =20 "NF_Z_62-010_(1973)"), as well as colons. I've skimmed the ICU page Mike pointed to, and it includes names like =20= "UTF-16BE,version=3D1" and "ibm-1149_P100-197,swaplfnl" as well as "+" =20= and ":" names, when showing "all aliases". If we only try to support, =20= say, IANA and MIME, then "+" and ":" are used but not "=3D". Since we're scanning an Emacs-style coding specification, as long as =20 whitespace and semicolon aren't on the list, I think we can be =20 expansive, so let's go ahead and include all of ":,+=3D/()" to the =20 allowed set. The results will still be constrained by whatever the OS =20= supports; we just don't want Guile to impose additional constraints. Should we allow punctuation in general by calling ispunct (and =20 explicitly checking for semicolon) instead? (Note that isalnum and =20 ispunct will also check for locale-specific characters... of course, =20 the new encoding spec hasn't come into effect yet....) Ken Allow more characters in coding system names in Emacs-style =20 declarations. * libguile/read.c (scm_i_scan_for_encoding): Allow more punctuation symbols in coding system names. diff --git a/libguile/read.c b/libguile/read.c index 775612a..657e101 100644 --- a/libguile/read.c +++ b/libguile/read.c @@ -1506,8 +1506,7 @@ scm_i_scan_for_encoding (SCM port) i =3D 0; while (pos + i - header <=3D SCM_ENCODING_SEARCH_SIZE && pos + i - header < bytes_read - && (isalnum((int) pos[i]) || pos[i] =3D=3D '_' || pos[i] =3D=3D = '-' - || pos[i] =3D=3D '.')) + && (isalnum((int) pos[i]) || strchr("_-.:/,+=3D()", pos[i]) !=3D = NULL)) i++; if (i =3D=3D 0)