From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stefan Monnier" Newsgroups: gmane.emacs.devel Subject: Re: iso-8859-1 and non-latin-1 chars Date: Thu, 07 Nov 2002 18:58:50 -0500 Sender: emacs-devel-admin@gnu.org Message-ID: <200211072358.gA7Nwoh16551@rum.cs.yale.edu> References: <200211071457.gA7EvoL08222@rum.cs.yale.edu> <2110-Thu07Nov2002182535+0200-eliz@is.elta.co.il> <200211071706.gA7H6hW09141@rum.cs.yale.edu> <200211072342.IAA10345@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1036714680 12151 80.91.224.249 (8 Nov 2002 00:18:00 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Fri, 8 Nov 2002 00:18:00 +0000 (UTC) Cc: monnier+gnu/emacs@rum.cs.yale.edu, eliz@is.elta.co.il, emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 189wqV-00039r-00 for ; Fri, 08 Nov 2002 01:17:59 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 189wzi-0006Zc-00 for ; Fri, 08 Nov 2002 01:27:30 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 189wno-0003Us-00; Thu, 07 Nov 2002 19:15:12 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 189wY4-00039x-00 for emacs-devel@gnu.org; Thu, 07 Nov 2002 18:58:56 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 189wY2-000389-00 for emacs-devel@gnu.org; Thu, 07 Nov 2002 18:58:55 -0500 Original-Received: from rum.cs.yale.edu ([128.36.229.169]) by monty-python.gnu.org with esmtp (Exim 4.10) id 189wY1-00037d-00 for emacs-devel@gnu.org; Thu, 07 Nov 2002 18:58:53 -0500 Original-Received: (from monnier@localhost) by rum.cs.yale.edu (8.11.6/8.11.6) id gA7Nwoh16551; Thu, 7 Nov 2002 18:58:50 -0500 X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4 Original-To: Kenichi Handa Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:9247 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:9247 > In article <200211071706.gA7H6hW09141@rum.cs.yale.edu>, "Stefan Monnier" writes: > > >> > When encoding text containing non-latin-1 chars with the latin-1 > >> > coding-system, they get output as some kind of escape sequence. > >> > >> Yes. IIRC, this is hard-coded in the encoder's C code: it works as if > >> latin-1 was actually iso-latin-1-wth-esc. > > > How can we change that ? > > This change will do. > > *** european.el.~1.75.~ Wed Nov 6 09:13:16 2002 > --- european.el Fri Nov 8 08:32:12 2002 > *************** > *** 36,42 **** > 'iso-latin-1 2 ?1 > "ISO 2022 based 8-bit encoding for Latin-1 (MIME:ISO-8859-1)." > '(ascii latin-iso8859-1 nil nil > ! nil nil nil nil nil nil nil nil nil nil nil nil t) > '((safe-charsets ascii latin-iso8859-1) > (mime-charset . iso-8859-1))) > > --- 36,42 ---- > 'iso-latin-1 2 ?1 > "ISO 2022 based 8-bit encoding for Latin-1 (MIME:ISO-8859-1)." > '(ascii latin-iso8859-1 nil nil > ! nil nil nil nil nil nil nil nil nil nil nil t t) > '((safe-charsets ascii latin-iso8859-1) > (mime-charset . iso-8859-1))) > > Or, if this is a problem only for ispell, we can make series > of "safe" coding-systems for ispell. I think the problem is only known to bite ispell, but I doubt there are many other applications that need to (or try to) encode a piece of text with unsafe chars, so the above patch should be safe. I also think the patch is correct since it otherwise outputs code that are not part of latin-1, strictly speaking. If you want such a behavior, you should use iso-latin-1-with-esc. The same patch should also be applied for other iso8859-N charsets I suppose. > Or, we can add a global flag, say > `inhibit-unsafe-iso-escape, to tell encoding routine not to > produces those escape sequences. Then, ispell can let-bind > that variable to t on encoding. That seems overkill since you can use iso-latin-1-with-esc instead. But it would save us from changing all the coding-systems. Stefan