From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Unicode Lisp reader escapes Date: Mon, 15 May 2006 04:49:04 +0200 Message-ID: <87ves8ngtb.fsf@gmx.de> References: <17491.34779.959316.484740@parhasard.net> <877j4z5had.fsf@gmx.de> <87irohfrx1.fsf@gmx.de> <87iroarr9i.fsf-monnier+emacs@gnu.org> <87d5egrb4c.fsf-monnier+emacs@gnu.org> <87ves8p0us.fsf-monnier+emacs@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1147661441 10819 80.91.229.2 (15 May 2006 02:50:41 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 15 May 2006 02:50:41 +0000 (UTC) Cc: emacs-devel@gnu.org, rms@gnu.org, handa@m17n.org, alkibiades@gmx.de Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 15 04:50:39 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FfTAE-0005h8-Fw for ged-emacs-devel@m.gmane.org; Mon, 15 May 2006 04:50:30 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FfTAE-0008Hc-1m for ged-emacs-devel@m.gmane.org; Sun, 14 May 2006 22:50:30 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FfTA2-0008HT-AG for emacs-devel@gnu.org; Sun, 14 May 2006 22:50:18 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FfTA1-0008HD-T0 for emacs-devel@gnu.org; Sun, 14 May 2006 22:50:17 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FfTA1-0008H9-LN for emacs-devel@gnu.org; Sun, 14 May 2006 22:50:17 -0400 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1FfTCK-0004pH-4n for emacs-devel@gnu.org; Sun, 14 May 2006 22:52:40 -0400 Original-Received: (qmail invoked by alias); 15 May 2006 02:50:14 -0000 Original-Received: from dslb-084-058-182-165.pools.arcor-ip.net (EHLO localhost.localdomain.gmx.de) [84.58.182.165] by mail.gmx.net (mp041) with SMTP; 15 May 2006 04:50:14 +0200 X-Authenticated: #1497658 Original-To: Stefan Monnier In-Reply-To: <87ves8p0us.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message of "Sun, 14 May 2006 20:55:50 -0400") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.0 (gnu/linux) X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:54472 Archived-At: Stefan Monnier writes: >>> Handa says that telling people "don't use utf-8" solves the problem. >> Additionally to "don't use unify-8859-on-decoding" which causes >> similar problems (which we already bumped into a few years ago when = we >> included unify-8859-on-decoding) with iso8859 chars and coding syste= ms >> like iso-2022. > >> There is a way for a Lisp file to specify a coding system which isn't >> utf-8. Is there a way for a Lisp file to specify that >> unify-8859-on-decoding should not be used when reading it? > >> If not, maybe we should make one. > >> Here's one idea: if the -*- line specifies `coding' and specifies >> the mode `emacs-lisp' then force unify-8859-on-decoding to nil >> for that file. Besides the work already mentioned, this would also require to turn unify-8859-on-decoding-mode into a buffer-local minor mode. Which would require to make the necessary translation tables somehow (!) buffer-local. > Forcing it to nil for a particular file is maybe too much work to impleme= nt > compared to th benefit. > Maybe an easier solution is to add a file-local variable > `no-8859-unification' such that if that file is loaded in an Emacs which > is configured to use unify-8859-on-decoding it signals an error. > > It could then be added to files like ucs-tables.el. [Nitpick: ucs-tables.el is encoded in ISO 2022. Most of Emacs' files containing m18n characters are, AFAIK. I don't know the reason. Maybe because it's 7bit, but still ASCII compatible.] How about just issuing a warning with the warning message containing a description of the effects and of what to do to change the settings? e.g.: (when (and (memq (coding-system-base buffer-file-coding-system) '(mule-utf-8 utf-7 mule-utf-16 ; ... mule-utf-16be-with-signature)) utf-fragment-on-decoding ; default is nil (let ((charsets (find-charset-region (point-min) (point-max)))) (or (memq 'greek-iso8859-7 charsets) (memq 'cyrillic-iso8859-5 charsets)))) (warn "You have enabled ... but this source file contains characters from ... Emacs has ... This might or might not be what you want ... To restore the defaults do ... bla bla ... ... you might want to use `emacs-mule' as coding system for Emacs Lisp source files ...")) And similar for the other cases. [FWIW, I think that `emacs-mule'---as Handa suggested---is a perfectly valid file encoding for Emacs Lisp source files. Since it is, by definition unambigous w.r.t. the specified charsets, emacs-mule has none of the problems we are discussing. Of course, Emacs is probably the only text editor that can deal with emacs-mule, but that would hardly matter for Elisp sources. I can think only of two drawbacks: 1. You can't simply insert or attach such files to mail or usenet postings. You have to zip, tar, base64 etc. them first. 2. Specifying particular charsets might exactly *not* be what an author wants. -- Though, the only way to deal with the latter would be to modify the Lisp printer for writing *.elc files so that it escapes non-ascii characters whereever possible with the new \u syntax. This would be another solution to the problem we are discussing.] Oliver --=20 Oliver Scholz 26 Flor=C3=A9al an 214 de la R=C3=A9volution Ostendstr. 61 Libert=C3=A9, Egalit=C3=A9, Fraternit=C3=A9! 60314 Frankfurt a. M.=20=20=20=20=20=20=20