From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Several serious problems Date: Mon, 2 Sep 2002 10:28:25 +0900 (JST) Sender: emacs-devel-admin@gnu.org Message-ID: <200209020128.KAA08644@etlken.m17n.org> References: <200208190748.QAA14278@etlken.m17n.org> <200208291325.WAA03596@etlken.m17n.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1030930100 993 127.0.0.1 (2 Sep 2002 01:28:20 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 2 Sep 2002 01:28:20 +0000 (UTC) Cc: d.love@dl.ac.uk, monnier+gnu/emacs@rum.cs.yale.edu, keichwa@gmx.net, emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17lg0j-0000Fq-00 for ; Mon, 02 Sep 2002 03:28:13 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17lgYi-00075L-00 for ; Mon, 02 Sep 2002 04:03:21 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17lg2E-00052e-00; Sun, 01 Sep 2002 21:29:46 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17lg11-00051X-00 for emacs-devel@gnu.org; Sun, 01 Sep 2002 21:28:31 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17lg0z-00051L-00 for emacs-devel@gnu.org; Sun, 01 Sep 2002 21:28:30 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17lg0y-00051F-00; Sun, 01 Sep 2002 21:28:28 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6/3.7W-20010518204228) with ESMTP id g821SPl18028; Mon, 2 Sep 2002 10:28:25 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.3/3.7W-20010823150639) with ESMTP id g821SP905548; Mon, 2 Sep 2002 10:28:25 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id KAA08644; Mon, 2 Sep 2002 10:28:25 +0900 (JST) Original-To: rms@gnu.org In-Reply-To: (message from Richard Stallman on Sun, 01 Sep 2002 20:01:54 -0400) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:7307 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:7307 In article , Richard Stallman writes: > That depends on whether you include code in utf-8.el that encodes > those charsets. If not, you need that change. > In that case, I will install that change presently, and then we can > study the question of whether to include the code in utf-8.el instead. > What does that code in utf-8.el do, and how safe a change is it? It defines two CCL codes to decode and encode utf-8 byte sequence, and makes the coding system mule-utf-8 by using those CCL codes. I'll attach the necessary change to enable RC's utf-8 to encode latin-X plus alpha (e.g. thai). The docstring of mule-utf-8 may need improvement. As the change is very small and that code has been in HEAD for more than one month, I think the change is quite safe. I recommend to install it in RC. I also checked the code to some extent by this testsuite. (dolist (charset (delq 'ascii (delq 'eight-bit-control (delq 'eight-bit-graphic (coding-system-get 'mule-utf-8 'safe-charsets))))) (let ((dimension (charset-dimension charset)) str) (if (= dimension 1) (setq str (string (make-char charset 33) (make-char charset 34))) (setq str (string (make-char charset 33 33) (make-char charset 33 34)))) (or (memq 'mule-utf-8 (find-coding-systems-string str)) (not (string-match "\357\277\275" ; UTF-8 form of U+FFFD (encode-coding-string str 'mule-utf-8))) (error (format "%s is not supported" charset))))) --- Ken'ichi HANDA handa@etl.go.jp *** utf-8.el.~1.9.4.2.~ Tue Jul 23 13:54:13 2002 --- utf-8.el Mon Sep 2 10:28:26 2002 *************** *** 269,275 **** (loop (if (r5 < 0) ((r1 = -1) ! (read-multibyte-character r0 r1)) (;; We have already done read-multibyte-character. (r0 = r5) (r1 = r6) --- 269,277 ---- (loop (if (r5 < 0) ((r1 = -1) ! (read-multibyte-character r0 r1) ! (translate-character ucs-mule-to-mule-unicode r0 r1)) ! (;; We have already done read-multibyte-character. (r0 = r5) (r1 = r6) *************** *** 392,397 **** --- 394,423 ---- mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff + latin-iso8859-2 (*) + latin-iso8859-3 (*) + latin-iso8859-4 (*) + cyrillic-iso8859-5 (*) + arabic-iso8859-6 (*) + greek-iso8859-7 (*) + hebrew-iso8859-8 (*) + latin-iso8859-9 (*) + latin-iso8859-14 (*) + latin-iso8859-15 (*) + chinese-sisheng (*) + ethiopic (*) + ipa (*) + lao (*) + katakana-jisx0201 (*) + thai-tis620 (*) + tibetan (*) + vietnamese-viscii-lower (*) + vietnamese-viscii-upper (*) + + Among them, the charsets labeled \"(*)\" are supported only on + encoding. That means, they are correctly encoded to UTF-8, but are + decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or + mule-unicode-2500-33ff, not to the original charsets. Unicode characters out of the ranges U+0000-U+33FF and U+E200-U+FFFF are decoded into sequences of eight-bit-control and eight-bit-graphic *************** *** 409,415 **** latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff ! mule-unicode-e000-ffff) (mime-charset . utf-8) (coding-category . coding-category-utf-8) (valid-codes (0 . 255)))) --- 435,460 ---- latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff ! mule-unicode-e000-ffff ! latin-iso8859-2 ! latin-iso8859-3 ! latin-iso8859-4 ! cyrillic-iso8859-5 ! arabic-iso8859-6 ! greek-iso8859-7 ! hebrew-iso8859-8 ! latin-iso8859-9 ! latin-iso8859-14 ! latin-iso8859-15 ! chinese-sisheng ! ethiopic ! ipa ! lao ! katakana-jisx0201 ! thai-tis620 ! tibetan ! vietnamese-viscii-lower ! vietnamese-viscii-upper) (mime-charset . utf-8) (coding-category . coding-category-utf-8) (valid-codes (0 . 255))))