From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: status of utf-8.el, etc [Re: Several serious problems] Date: Wed, 25 Sep 2002 16:01:45 +0900 (JST) Sender: emacs-devel-admin@gnu.org Message-ID: <200209250701.QAA10989@etlken.m17n.org> References: <200208190748.QAA14278@etlken.m17n.org> <200208291325.WAA03596@etlken.m17n.org> <200208291732.g7THWRU11411@rum.cs.yale.edu> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1032937483 11153 127.0.0.1 (25 Sep 2002 07:04:43 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 25 Sep 2002 07:04:43 +0000 (UTC) Cc: rms@gnu.org, monnier+gnu/emacs@rum.cs.yale.edu, keichwa@gmx.net, emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17u6Dy-0002tc-00 for ; Wed, 25 Sep 2002 09:04:42 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17u6v6-0001ZI-00 for ; Wed, 25 Sep 2002 09:49:16 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17u6E2-0006hK-00; Wed, 25 Sep 2002 03:04:46 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17u6BT-0006Ve-00 for emacs-devel@gnu.org; Wed, 25 Sep 2002 03:02:07 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17u6BQ-0006VD-00 for emacs-devel@gnu.org; Wed, 25 Sep 2002 03:02:06 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17u6BM-0006Ub-00; Wed, 25 Sep 2002 03:02:01 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6/3.7W-20010518204228) with ESMTP id g8P71kF08843; Wed, 25 Sep 2002 16:01:46 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.3/3.7W-20010823150639) with ESMTP id g8P71jd00603; Wed, 25 Sep 2002 16:01:45 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA10989; Wed, 25 Sep 2002 16:01:45 +0900 (JST) Original-To: d.love@dl.ac.uk Sent-via: d.love@dl.ac.uk Sent-via: rms@gnu.org, monnier+gnu/emacs@rum.cs.yale.edu, keichwa@gmx.net, emacs-devel@gnu.org In-reply-to: (message from Dave Love on 12 Sep 2002 23:38:48 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:8157 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:8157 In article , Dave Love writes: > Richard Stallman writes: >> For instance, the RC version of mule-utf-8 doesn't translate >> cyrillic-iso8859-5, and the Cyrillic coding systems don't translate >> mule-unicode-0100-24ff. >> >> We could consider adding that support in RC. Is it a safe change? > It won't break anything if done correctly, but I don't remember how > much of a change it is relative to the 21.2 code and I don't know who > might have been testing it, if anyone. I noticed some combinations of unify-8859-on-encoding-mode, utf-8-fragment-on-decoding, and utf-8-translate-cjk doesn't work in HEAD. So, I made a fairly comprehensive testsuite for testing them (attached at the tail). As the testsuite revealed several bugs, before working on RC, I decided to fix them in HEAD at first. I've finished these: (1) Fixing the following bugs. (1-1) unify-8859-on-encoding-mode can't be turned off safely. For instance, then, iso-latin-1 can't encode Latin-1 chars. (1-2) utf-8-translate-cjk can never be turned off once turned on. (1-3) When utf-8-fragment-on-decoding is non-nil, utf-16-* doesn't encode CJK chars correctly even if utf-8-translate-cjk is non-nil. (1-4) encode-char/decode-char don't reflect utf-8-translate-cjk. (2) Renaming tables/variables. We should have cleaner names before people starting to use it. (2-1) As utf-8-fragment-on-decoding and utf-8-translate-cjk are also applicable to utf-16, I cut off "-8" from them. (2-2) Make translation table names and their body char-tables different to avoid confusion. The result is as follows: (2-2-1) Translation-tables and translation-hash-tables (not variable) old new --- --- ucs-mule-to-mule-unicode utf-translation-table-for-encode (mule-utf-8/16 use it for encoding) utf-translation-table-for-decode utf-translation-table-for-decode (mule-utf-8/16 use it for decoding) utf-8-subst-rev-table utf-subst-table-for-encode (mule-utf-8/16 use it for encoding) utf-8-subst-table utf-subst-table-for-decode (mule-utf-8/16 use it for decoding) (2-2-2) Mapping tables (variables) populating above. old new --- --- ucs-mule-to-mule-unicode ucs-mule-to-mule-unicode (this populates utf-translation-table-for-encode when unify-8859-on-encoding-mode is non-nil) utf-8-subst-table ucs-unicode-to-mule-cjk (this populates utf-subst-table-for-decode when utf-translate-cjk is non-nil) utf-8-subst-rev-table ucs-mule-cjk-to-unicode (this populates utf-subst-table-for-encode when utf-translate-cjk is non-nil) utf-8-fragmentation-table utf-fragmentation-table (this populates utf-translation-table-for-decode when utf-fragment-on-decoding is non-nil) --not_exist-- utf-defragmentation-table (this populates utf-translation-table-for-encode when unify-8859-on-encoding-mode is nil and utf-fragment-on-decoding is non-nil) utf-8-translation-table-for-decode --deleted-- Don't you have better ideas for these names? If not, I'll install the changes soon. --- Ken'ichi HANDA handa@etl.go.jp Two files: utf-test.el and result.txt. result.txt is the result of loading utf-text.el, running M-x utf-testsuite RET, and viewing the variable utf-testsuite-result in the current Emacs. After my modification, all elements are `t'. begin 664 temp.tar.gz M'XL(`'9=D3T``^U:6W/;-A9V9O=%>M@_D!>LIU-3KZJJ:?U)K%G M7'?[X&0F%`7)M"G2(4$G_O=[#@#>Q(LOL;M*!\8.RU\=_5_I71P0/3O=.*&?[GEMJ9!>.IS,O'MDF5C,_BSRK MQ*[)S&^&SW>DPG(>F M3\H<>'ZG`P_2\EE7O,ZR_7)W]N^7VH_;[:7'F-P$7AQV_YV-0[]72(%_/(+/ M>$O$ZUY"UIY]UXS!DBA,OJV?ZHZ;L'I(V&5(/&:6GQP^7QLO!F+XSK8'VS6( M_V;7(DSB^\YR!1DOSIY,9W]@UJW(A^E:A];T3FA9Q2A9]4EA.DT30'-I7K+U5E[_<<#H(R]/?TPB<+Z;6Z(R)L. MGC_,]F#P9TV`,/,#&S`$GV%38F-O\@317J[@B3RIVDE_-&A(;@]0E@),'.2F M_O6%(_57#]Z%Q+8ALW7_K+@A#ELCV#,+.27UDT_)MQ_SZ9'E^K/37[Y`R9Y5 M2$?WDY2B`BC9Z?[H1&HLGOL3\GIZ M)*#$WX-?G^AI@VP1+XI?-X8V`F^XG0&&QF;3L0BF[T=YIY>^GE'MX,EB_&(AHG/D,^4+A]*FKA@K`+"FEJ MM7*">5G"Z!ZQG9@XY(:Z+(S(.7GUR_14-_73PU]_>W-&#,,0+6_2EO=&M]S@ MQ5Q]E!L#Y5ZP)`O0)UAKK)S\='3\<]%.VO2V:&BMJ=&4,,,MBJ'B#2<1-YR9 M%J*]91>$^G1%`R[]H>U>=*=7]?KMR4^')9]YP[C@,6\X?B)_O8#WKL(Y)<<& M(<>HT#2,L=$]PW;J!"@&>GF/$Y,9]<-/^S#W1PMX=PJ7.-_'9-#C@KLD";S% MK1L\30 MHFY^BXG./4JM553+(B>(X3T0)NCRZI[ZUL)3F95BJ-@^CS).#;EVXIC.29RX M+HWC1>+[M^BGA$P%7^.3S$\/ MEAM"5IN%23"7,S:I7<+1E?$B=`P M!Q7B"]"8N'%N'_7#(O\=EG$"ZY(E42"EV04@EBL-`UC$D$(I[T;XAHN%P>/3 MENBL7A8%%W2'*QV"J<->ZN$"BF??S[M2PBV[5>HJ^5,5:G'$RAW)YB)#*JX'W9M_E@N+ MKVKV':PJBHLR@(6=<=1G@12#Z]^5:BX^4($%IP`PG5:*;7Q0O,$NM1<5QRSB M#]Q=4'/):*S['B!60$F,'52 MV))FVC(=\3+^?%<0*B&G\&5+:)D%]A`^:Q@OZ-72KR>E7C%P#(++\ZB03'X#WFQO$3_I8;7X2?`O'Z9W2[ZHY$W9&H.Q)U1_)<=R3= M[G\Q\>QWS\_/UV^DF?A[#YZ>,U+ZV_@V:#S?&&>PK2VXC^E3`U0#?*8!?DF[ M&M0?.:@_+12Q)!PL/&5/I1V?K7TPOMI7,46*%"E2I$B1(D6*%"E2I$B1(D6* 7%"E2I$B1(D6*%"GZ(OH?U(],I@!0```` ` end