From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Unicode Lisp reader escapes Date: Wed, 17 May 2006 14:37:02 +0200 Message-ID: <87iro4hlox.fsf@gmx.de> References: <17491.34779.959316.484740@parhasard.net> <87iroarr9i.fsf-monnier+emacs@gnu.org> <87d5egrb4c.fsf-monnier+emacs@gnu.org> <87ves8p0us.fsf-monnier+emacs@gnu.org> <87ves8ngtb.fsf@gmx.de> <87u07qcnaa.fsf@gmx.de> <87y7x289lr.fsf@gmx.de> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1147869939 30017 80.91.229.2 (17 May 2006 12:45:39 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 17 May 2006 12:45:39 +0000 (UTC) Cc: storm@cua.dk, emacs-devel@gnu.org, monnier@iro.umontreal.ca, handa@m17n.org, Oliver Scholz Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed May 17 14:45:36 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FgLP5-0000fj-4U for ged-emacs-devel@m.gmane.org; Wed, 17 May 2006 14:45:27 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FgLP4-0005nx-0X for ged-emacs-devel@m.gmane.org; Wed, 17 May 2006 08:45:26 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FgLOe-0005ld-Q8 for emacs-devel@gnu.org; Wed, 17 May 2006 08:45:00 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FgLOb-0005iz-Rw for emacs-devel@gnu.org; Wed, 17 May 2006 08:45:00 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FgLOb-0005in-KV for emacs-devel@gnu.org; Wed, 17 May 2006 08:44:57 -0400 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1FgLRS-0007WC-Il for emacs-devel@gnu.org; Wed, 17 May 2006 08:47:55 -0400 Original-Received: (qmail invoked by alias); 17 May 2006 12:38:14 -0000 Original-Received: from dslb-084-058-174-080.pools.arcor-ip.net (EHLO localhost.localdomain.gmx.de) [84.58.174.80] by mail.gmx.net (mp001) with SMTP; 17 May 2006 14:38:14 +0200 X-Authenticated: #1497658 Original-To: rms@gnu.org In-Reply-To: (Richard Stallman's message of "Tue, 16 May 2006 23:45:33 -0400") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.0 (gnu/linux) X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:54629 Archived-At: Growing a bit tired of this discussion, I hacked a kludge that might do what you want. It introduces a variable `byte-compile-no-char-translation' that is meant to be put into the Local Variables section of an Emacs Lisp source file in order to inhibit the effects of `utf-fragment-on-decoding' and `unifiy-8859-on-decoding'. In other words: This patch deals only with the issues that *I* can understand. I seem to recall that Handa also mentioned some effects of certain CJK language environments. It is *absolutely vital*, that Kenichi Handa reviews this patch. I am not entirely sure whether this breaks something or not. With my patch, in decode_coding_iso2022 looking up characters in Vstandard_translation_table_for_decode is inhibited at all if `byte-compile-no-char-translation' is non-nil. This might be wrong. Vstandard_translation_table_for_decode is not empty by default. I guess instead of inhibiting its use one could just temporarily set its parent at about the same place. But maybe this is unnecessary. decode_coding_sjis_big5 refers to Vstandard_translation_table_for_decode, too. I did not modify it, though, thus introducing a possible inconsistency. The reason is that I don't understand CJK issues and I don't understand this encoding. Note: Even with the remaining issues wielded out, IMNSHO this patch is worse than the two other solutions (1) Tell users to use emacs-mule. Or: (2) Remove `unify-8859-on-decoding-mode' and `utf-fragment-on-decoding'. The reasoning goes as follows: Check: Are `unify-8859-on-decoding-mode' and `utf-fragment-on-decoding' useful options? If no: Remove them, since they cause only trouble. If yes: then a user who set them, will want them for all affected characters. The choice for unification/fragmention should not be the choice of the programmer of the Lisp package; it should be the choice of the user. (To quote a future user, complaining on gnu-emacs-help: "The heck! Why do I have only hollow boxes for my Greek characters after byte compilation??? It's all fine in the source file!!!") Exception: In the event that the particular choice of charsets is important for a Lisp Package: Use `emacs-mule'! =20=20=20=20 Oliver Index: lisp/files.el =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/emacs/emacs/lisp/files.el,v retrieving revision 1.836 diff -u -r1.836 files.el --- lisp/files.el 16 May 2006 18:33:31 -0000 1.836 +++ lisp/files.el 17 May 2006 12:08:43 -0000 @@ -2361,6 +2361,7 @@ (left-margin . integerp) ;; C source code (no-update-autoloads . booleanp) (tab-width . integerp) ;; C source code + (byte-compile-no-char-translation . booleanp) ;; C source code (truncate-lines . booleanp))) ;; C source code =20 (put 'c-set-style 'safe-local-eval-function t) Index: lisp/emacs-lisp/bytecomp.el =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/emacs/emacs/lisp/emacs-lisp/bytecomp.el,v retrieving revision 2.185 diff -u -r2.185 bytecomp.el --- lisp/emacs-lisp/bytecomp.el 16 May 2006 10:05:09 -0000 2.185 +++ lisp/emacs-lisp/bytecomp.el 17 May 2006 12:08:45 -0000 @@ -1673,6 +1673,14 @@ (enable-local-eval nil)) ;; Arg of t means don't alter enable-local-variables. (normal-mode t) + + ;; KLUDGE: `byte-compile-no-char-translation' should affect + ;; how characters are decoded. But at this point decoding + ;; already happend. So we insert the file contents again. + (when byte-compile-no-char-translation + (erase-buffer) + (insert-file-contents filename)) +=20=20=20=20=20=20=20=20 (setq filename buffer-file-name)) ;; Set the default directory, in case an eval-when-compile uses it. (setq default-directory (file-name-directory filename))) Index: src/coding.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/emacs/emacs/src/coding.c,v retrieving revision 1.336 diff -u -r1.336 coding.c --- src/coding.c 8 May 2006 05:25:02 -0000 1.336 +++ src/coding.c 17 May 2006 12:08:50 -0000 @@ -405,6 +405,15 @@ =20 Lisp_Object Qcoding_system_p, Qcoding_system_error; =20 +/* This variable is meant to turn off character tranlation during byte + compilation. */ + +Lisp_Object Vbyte_compile_no_char_translation; + +Lisp_Object empty_translation_table; +Lisp_Object Qucs_translation_table_for_decode, Qutf_translation_table_for_= decode; +Lisp_Object Qunify_8859_on_decoding_mode, Qutf_fragment_on_decoding; + /* Coding system emacs-mule and raw-text are for converting only end-of-line format. */ Lisp_Object Qemacs_mule, Qraw_text; @@ -1849,7 +1858,7 @@ else { translation_table =3D coding->translation_table_for_decode; - if (NILP (translation_table)) + if (NILP (translation_table) && NILP (Vbyte_compile_no_char_translat= ion)) translation_table =3D Vstandard_translation_table_for_decode; } =20 @@ -4938,8 +4947,48 @@ dst_bytes--; extra =3D coding->spec.ccl.cr_carryover; } - ccl_coding_driver (coding, source, destination + extra, - src_bytes, dst_bytes, 0); + + /*KLUDGE: Inhibit unification and or fragmentation. This is + meant for byte compiling Emacs Lisp source files. For CCL + based coding systems it has to be done here, because we want + it only for decoding. We temporarily swap the affected + translation tables in Vtranslation_table_vector with an empty + translation table.*/ + if (! NILP (Vbyte_compile_no_char_translation) + && (! NILP (SYMBOL_VALUE (Qunify_8859_on_decoding_mode)) + || ! NILP (SYMBOL_VALUE (Qutf_fragment_on_decoding)))) + { + if (NILP (empty_translation_table)) + { + empty_translation_table =3D + call0 (intern ("make-translation-table")); + } + + Lisp_Object ucs_tt =3D Fget (Qucs_translation_table_for_decode, = Qtranslation_table); + Lisp_Object ucs_id =3D Fget (Qucs_translation_table_for_decode, = Qtranslation_table_id); + + Lisp_Object utf_tt =3D Fget (Qutf_translation_table_for_decode, = Qtranslation_table); + Lisp_Object utf_id =3D Fget (Qutf_translation_table_for_decode, = Qtranslation_table_id); + + /* Should this be `unwind-protect'ed? */ + + Faset (Vtranslation_table_vector, ucs_id, Fcons (Qucs_translatio= n_table_for_decode, + empty_translati= on_table)); + Faset (Vtranslation_table_vector, utf_id, Fcons (Qutf_translatio= n_table_for_decode, + empty_translati= on_table)); + + ccl_coding_driver (coding, source, destination + extra, + src_bytes, dst_bytes, 0); + + Faset (Vtranslation_table_vector, ucs_id, Fcons (Qucs_translatio= n_table_for_decode, + ucs_tt)); + Faset (Vtranslation_table_vector, utf_id, Fcons (Qutf_translatio= n_table_for_decode, + utf_tt)); + + } + else ccl_coding_driver (coding, source, destination + extra, + src_bytes, dst_bytes, 0); +=20=20=20=20=20=20 if (coding->eol_type !=3D CODING_EOL_LF) { coding->produced +=3D extra; @@ -7852,6 +7901,34 @@ defsubr (&Sset_coding_priority_internal); defsubr (&Sdefine_coding_system_internal); =20 + DEFVAR_LISP ("byte-compile-no-char-translation", &Vbyte_compile_no_char_= translation, + doc: /* Don't translate characters during byte compilation. + +Options like `utf-fragment-on-decoding' or the minor mode +`unify-8859-on-decoding-mode' modify the way Emacs maps file encodings +to mule charsets. Since *.elc files are encoded in emacs-mule, such +settings are preserved in the compiled file. If this variable is +non-nil, Emacs uses the default mule charsets. + +You can set this variable in the local variables section of a file. */); + Vbyte_compile_no_char_translation =3D Qnil; + + empty_translation_table =3D Qnil; + staticpro (&empty_translation_table); +=20=20 + Qucs_translation_table_for_decode =3D intern ("ucs-translation-table-for= -decode"); + staticpro (&Qucs_translation_table_for_decode); + + Qutf_translation_table_for_decode =3D intern ("utf-translation-table-for= -decode"); + staticpro (&Qutf_translation_table_for_decode); + + Qunify_8859_on_decoding_mode =3D intern ("unify-8859-on-decoding-mode"); + staticpro (&Qunify_8859_on_decoding_mode); + + Qutf_fragment_on_decoding =3D intern ("utf-fragment-on-decoding"); + staticpro (&Qunify_8859_on_decoding_mode); +=20=20 +=20=20 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list, doc: /* List of coding systems. =20 =20=20=20=20 --=20 Oliver Scholz 28 Flor=C3=A9al an 214 de la R=C3=A9volution Ostendstr. 61 Libert=C3=A9, Egalit=C3=A9, Fraternit=C3=A9! 60314 Frankfurt a. M.=20=20=20=20=20=20=20