Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again
       [not found] ` <20200105203053.36B932105B@vcs0.savannah.gnu.org>
@ 2020-01-05 20:55   ` Stefan Monnier
  2020-01-06 17:07     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2020-01-05 20:55 UTC (permalink / raw)
  To: emacs-devel; +Cc: Paul Eggert

>     Go back to iso-2022-7bit for titdic-cnv.el again
>     
>     * admin/notes/unicode: Mention this.
>     * lisp/international/titdic-cnv.el:
>     Go back to iso-2022-7bit for this file, since utf-8-emacs unified
>     characters that tsanq-quick-converter did not want unified.
>     Problem reported by Eli Zaretskii in:
>     https://lists.gnu.org/r/emacs-devel/2020-01/msg00156.html

FWIW, when I converted it to utf-8, I (tried to) make sure the generated
.elc files were bit-for-bit identical.  Looking at the URL you mention,
I don't see any concrete example of a problem introduced by utf-8.

IOW, AFAICT, if you run the code from titdic-cnv.el maybe some of the
results will be different but maybe not when you run it from
titdic-cnv.elc.  Or at least when I tested it, those differences got
lost somewhere, probably when we byte-compiled some of the files.  If my
analysis is right, this has been the case for many years (probably since
Emacs-23).

If there really is a concrete difference, then I think we should add
a test for it (and find a better fix that still lets us use utf-8 ;-).
And if there isn't, then I'd much prefer that we revert to utf-8-emacs
(as it stands, this is the one and only bundled Elisp file that needs
a non-nil load-source-file-function).


        Stefan


>  admin/notes/unicode              |  10 +-
>  lisp/international/titdic-cnv.el | 237 ++++++++++++++++++++-------------------
>  2 files changed, 128 insertions(+), 119 deletions(-)
>
> diff --git a/admin/notes/unicode b/admin/notes/unicode
> index 6cb1b76..1e41859 100644
> --- a/admin/notes/unicode
> +++ b/admin/notes/unicode
> @@ -256,11 +256,19 @@ nontrivial changes to the build process.
>  
>  	etc/tutorials/TUTORIAL.ja
>  
> + * iso-2022-7bit
> +
> +     This file contains multiple Chinese charsets, and converting it
> +     to UTF-8 would lose the charset property and would change the
> +     code's behavior.  Although this could be worked around by
> +     propertizing the strings, that hasn't been done.
> +
> +	lisp/international/titdic-cnv.el
> +
>   * utf-8-emacs
>  
>       These files contain characters that cannot be encoded in UTF-8.
>  
> -	lisp/international/titdic-cnv.el
>  	lisp/language/ethio-util.el
>  	lisp/language/ethiopic.el
>  	lisp/language/ind-util.el
> diff --git a/lisp/international/titdic-cnv.el b/lisp/international/titdic-cnv.el
> index e95e399..02d1e2a 100644
> --- a/lisp/international/titdic-cnv.el
> +++ b/lisp/international/titdic-cnv.el
> @@ -1,4 +1,4 @@
> -;;; titdic-cnv.el --- convert cxterm dictionary (TIT format) to Quail package -*- coding: utf-8-emacs; lexical-binding:t -*-
> +;;; titdic-cnv.el --- convert cxterm dictionary (TIT format) to Quail package -*- coding:iso-2022-7bit; lexical-binding:t -*-
>  
>  ;; Copyright (C) 1997-1998, 2000-2020 Free Software Foundation, Inc.
>  ;; Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
> @@ -83,9 +83,9 @@
>  ;; how to select a translation from a list of candidates.
>  
>  (defvar quail-cxterm-package-ext-info
> -  '(("chinese-4corner" "四角")
> -    ("chinese-array30" "３０")
> -    ("chinese-ccdospy" "缩拼"
> +  '(("chinese-4corner" "$(0(?-F(B")
> +    ("chinese-array30" "$(0#R#O(B")
> +    ("chinese-ccdospy" "$AKuF4(B"
>       "Pinyin base input method for Chinese charset GB2312 (`chinese-gb2312').
>  
>  Pinyin is the standard Roman transliteration method for Chinese.
> @@ -94,10 +94,10 @@ method `chinese-py'.
>  
>  This input method works almost the same way as `chinese-py'.  The
>  difference is that you type a single key for these Pinyin spelling.
> -    Pinyin:  zh  en  eng ang ch  an  ao  ai  ong sh  ing  yu(ü)
> +    Pinyin:  zh  en  eng ang ch  an  ao  ai  ong sh  ing  yu($A(9(B)
>      keyseq:   a   f   g   h   i   j   k   l   s   u   y   v
>  For example:
> -    Chinese:  啊    果    中    文    光    玉    全
> +    Chinese:  $A0!(B    $A9{(B    $AVP(B    $AND(B    $A9b(B    $ASq(B    $AH+(B
>      Pinyin:   a    guo   zhong  wen  guang  yu   quan
>      Keyseq:   a1   guo4   as1   wf4  guh1  yu..6 qvj6
>  
> @@ -106,14 +106,14 @@ For example:
>  For double-width GB2312 characters corresponding to ASCII, use the
>  input method `chinese-qj'.")
>  
> -    ("chinese-ecdict" "英漢"
> +    ("chinese-ecdict" "$(05CKH(B"
>  "In this input method, you enter a Chinese (Big5) character or word
>  by typing the corresponding English word.  For example, if you type
> -\"computer\", \"電腦\" is input.
> +\"computer\", \"$(0IZH+(B\" is input.
>  
>  \\<quail-translation-docstring>")
>  
> -    ("chinese-etzy" "倚注"
> +    ("chinese-etzy" "$(06/0D(B"
>  "Zhuyin base input method for Chinese Big5 characters (`chinese-big5-1',
>  `chinese-big5-2').
>  
> @@ -122,20 +122,20 @@ compose one Chinese character.
>  
>  In this input method, you enter a Chinese character by first typing
>  keys corresponding to Zhuyin symbols (see the above table) followed by
> -SPC, 1, 2, 3, or 4 specifying a tone (SPC:陰平, 1:輕聲, 2:陽平, 3: 上聲,
> -4:去聲).
> +SPC, 1, 2, 3, or 4 specifying a tone (SPC:$(0?v(N(B, 1:$(0M=Vy(B, 2:$(0Dm(N(B, 3: $(0&9Vy(B,
> +4:$(0(+Vy(B).
>  
>  \\<quail-translation-docstring>")
>  
> -    ("chinese-punct-b5" "標B"
> +    ("chinese-punct-b5" "$(0O:(BB"
>       "Input method for Chinese punctuation and symbols of Big5
>  \(`chinese-big5-1' and `chinese-big5-2').")
>  
> -    ("chinese-punct" "标G"
> +    ("chinese-punct" "$A1j(BG"
>       "Input method for Chinese punctuation and symbols of GB2312
>  \(`chinese-gb2312').")
>  
> -    ("chinese-py-b5" "拼B"
> +    ("chinese-py-b5" "$(03<(BB"
>       "Pinyin base input method for Chinese Big5 characters
>  \(`chinese-big5-1', `chinese-big5-2').
>  
> @@ -153,28 +153,28 @@ method `chinese-qj-b5'.
>  The input method `chinese-py' and `chinese-tonepy' are also Pinyin
>  based, but for the character set GB2312 (`chinese-gb2312').")
>  
> -    ("chinese-qj-b5" "全B")
> +    ("chinese-qj-b5" "$(0)A(BB")
>  
> -    ("chinese-qj" "全G")
> +    ("chinese-qj" "$AH+(BG")
>  
> -    ("chinese-sw" "首尾"
> +    ("chinese-sw" "$AJWN2(B"
>  "Radical base input method for Chinese charset GB2312 (`chinese-gb2312').
>  
>  In this input method, you enter a Chinese character by typing two
> -keys.  The first key corresponds to the first (首) radical, the second
> -key corresponds to the last (尾) radical.  The correspondence of keys
> +keys.  The first key corresponds to the first ($AJW(B) radical, the second
> +key corresponds to the last ($AN2(B) radical.  The correspondence of keys
>  and radicals is as below:
>  
>   first radical:
>   a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z
> - 心 冖 尸 丶 火 口 扌 氵 讠 艹 亻 木 礻 饣 月 纟 石 王 八 丿 日 辶 犭 竹 一 人
> + $APD(B $AZ"(B $AJ,(B $AX<(B $A;p(B $A?Z(B $A^P(B $Ac_(B $AZ%(B $A\3(B $AXi(B $AD>(B $Alj(B $Ab;(B $ATB(B $Afy(B $AJ/(B $AMu(B $A0K(B $AX/(B $AHU(B $AeA(B $Aak(B $AVq(B $AR;(B $AHK(B
>   last radical:
>   a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z
> - 又 山 土 刀 阝 口 衣 疋 大 丁 厶 灬 十 歹 冂 门 今 丨 女 乙 囗 小 厂 虫 弋 卜
> + $ASV(B $AI=(B $AMA(B $A56(B $AZb(B $A?Z(B $ARB(B $Aqb(B $A4s(B $A6!(B $A[L(B $Ala(B $AJ.(B $A4u(B $AXg(B $ACE(B $A=q(B $AX-(B $AE.(B $ARR(B $A`m(B $AP!(B $A3'(B $A3f(B $A_.(B $A27(B
>  
>  \\<quail-translation-docstring>")
>  
> -    ("chinese-tonepy" "调拼"
> +    ("chinese-tonepy" "$A5wF4(B"
>       "Pinyin base input method for Chinese charset GB2312 (`chinese-gb2312').
>  
>  Pinyin is the standard roman transliteration method for Chinese.
> @@ -183,18 +183,18 @@ method `chinese-py'.
>  
>  This input method works almost the same way as `chinese-py'.  The
>  difference is that you must type 1..5 after each Pinyin spelling to
> -specify a tone (1:阴平, 2:阳平, 3:上声, 4下声, 5:轻声).
> +specify a tone (1:$ARuF=(B, 2:$AQtF=(B, 3:$AIOIy(B, 4$AOBIy(B, 5:$AGaIy(B).
>  
>  \\<quail-translation-docstring>
>  
> -For instance, to input 你, you type \"n i 3 3\", the first \"n i\" is
> +For instance, to input $ADc(B, you type \"n i 3 3\", the first \"n i\" is
>  a Pinyin, the next \"3\" specifies tone, and the last \"3\" selects
>  the third character from the candidate list.
>  
>  For double-width GB2312 characters corresponding to ASCII, use the
>  input method `chinese-qj'.")
>  
> -    ("chinese-zozy" "零注"
> +    ("chinese-zozy" "$(0I\0D(B"
>  "Zhuyin base input method for Chinese Big5 characters (`chinese-big5-1',
>  `chinese-big5-2').
>  
> @@ -203,8 +203,8 @@ compose a Chinese character.
>  
>  In this input method, you enter a Chinese character by first typing
>  keys corresponding to Zhuyin symbols (see the above table) followed by
> -SPC, 6, 3, 4, or 7 specifying a tone (SPC:陰平, 6:陽平, 3:上聲, 4:去聲,
> -7:輕聲).
> +SPC, 6, 3, 4, or 7 specifying a tone (SPC:$(0?v(N(B, 6:$(0Dm(N(B, 3:$(0&9Vy(B, 4:$(0(+Vy(B,
> +7:$(0M=Vy(B).
>  
>  \\<quail-translation-docstring>")))
>  
> @@ -354,7 +354,7 @@ SPC, 6, 3, 4, or 7 specifying a tone (SPC:陰平, 6:陽平, 3:上聲, 4:去聲,
>        (princ (nth 2 (assoc tit-encode tit-encode-list)))
>        (princ "\" \"")
>        (princ (or title
> -		 (if (string-match "[:∷：【]+\\([^:∷：】]+\\)" tit-prompt)
> +		 (if (string-match "[:$A!K$(0!(!J(B]+\\([^:$A!K$(0!(!K(B]+\\)" tit-prompt)
>  		     (substring tit-prompt (match-beginning 1) (match-end 1))
>  		   tit-prompt)))
>        (princ "\"\n"))
> @@ -580,7 +580,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;;    )
>  
>  (defvar quail-misc-package-ext-info
> -  '(("chinese-b5-tsangchi" "倉B"
> +  '(("chinese-b5-tsangchi" "$(06A(BB"
>       "cangjie-table.b5" big5 "tsang-b5.el"
>       tsang-b5-converter
>       "\
> @@ -590,7 +590,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; # unmodified versions is granted without royalty provided
>  ;; # this notice is preserved.")
>  
> -    ("chinese-b5-quick" "簡B"
> +    ("chinese-b5-quick" "$(0X|(BB"
>       "cangjie-table.b5" big5 "quick-b5.el"
>       quick-b5-converter
>       "\
> @@ -600,7 +600,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; # unmodified versions is granted without royalty provided
>  ;; # this notice is preserved.")
>  
> -    ("chinese-cns-tsangchi" "倉C"
> +    ("chinese-cns-tsangchi" "$(GT?(BC"
>       "cangjie-table.cns" iso-2022-cn-ext "tsang-cns.el"
>       tsang-cns-converter
>       "\
> @@ -610,7 +610,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; # unmodified versions is granted without royalty provided
>  ;; # this notice is preserved.")
>  
> -    ("chinese-cns-quick" "簡C"
> +    ("chinese-cns-quick" "$(Gv|(BC"
>       "cangjie-table.cns" iso-2022-cn-ext "quick-cns.el"
>       quick-cns-converter
>       "\
> @@ -620,7 +620,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; # unmodified versions is granted without royalty provided
>  ;; # this notice is preserved.")
>  
> -    ("chinese-py" "拼G"
> +    ("chinese-py" "$AF4(BG"
>       "pinyin.map" cn-gb-2312 "PY.el"
>       py-converter
>       "\
> @@ -648,7 +648,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; You should have received a copy of the GNU General Public License along with
>  ;; CCE.  If not, see <https://www.gnu.org/licenses/>.")
>  
> -    ("chinese-ziranma" "自然"
> +    ("chinese-ziranma" "$AWTH;(B"
>       "ziranma.cin" cn-gb-2312 "ZIRANMA.el"
>       ziranma-converter
>       "\
> @@ -676,7 +676,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; You should have received a copy of the GNU General Public License along with
>  ;; CCE.  If not, see <https://www.gnu.org/licenses/>.")
>  
> -    ("chinese-ctlau" "刘粤"
> +    ("chinese-ctlau" "$AAuTA(B"
>       "CTLau.html" cn-gb-2312 "CTLau.el"
>       ctlau-gb-converter
>       "\
> @@ -701,7 +701,7 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; # You should have received a copy of the GNU General Public License
>  ;; # along with this program.  If not, see <https://www.gnu.org/licenses/>.")
>  
> -    ("chinese-ctlaub" "劉粵"
> +    ("chinese-ctlaub" "$(0N,Gn(B"
>       "CTLau-b5.html" big5 "CTLau-b5.el"
>       ctlau-b5-converter
>       "\
> @@ -731,37 +731,38 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  ;; dictionary in the buffer DICBUF.  The input method name of the
>  ;; Quail package is NAME, and the title string is TITLE.
>  
> -;; TSANG-P is non-nil, generate 倉頡 input method.  Otherwise
> -;; generate 簡易 (simple version of 倉頡).  If BIG5-P is non-nil, the
> +;; TSANG-P is non-nil, generate $(06AQo(B input method.  Otherwise
> +;; generate $(0X|/y(B (simple version of $(06AQo(B).  If BIG5-P is non-nil, the
>  ;; input method is for inputting Big5 characters.  Otherwise the input
>  ;; method is for inputting CNS characters.
>  
> -(defun tsang-quick-converter (dicbuf tsang-p big5-p)
> -  (let ((fulltitle (if tsang-p "倉頡" "簡易"))
> +(defun tsang-quick-converter (dicbuf name title tsang-p big5-p)
> +  (let ((fulltitle (if tsang-p (if big5-p "$(06AQo(B" "$(GT?on(B")
> +		     (if big5-p "$(0X|/y(B" "$(Gv|Mx(B")))
>  	dic)
>      (goto-char (point-max))
>      (if big5-p
> -	(insert (format "\"中文輸入【%s】BIG5
> +	(insert (format "\"$(0&d'GTT&,!J(B%s$(0!K(BBIG5
>  
> -	漢語%s輸入鍵盤
> +	$(0KHM$(B%s$(0TT&,WoOu(B
>  
> -   [Q 手] [W 田] [E 水] [R 口] [T 廿] [Y 卜] [U 山] [I 戈] [O 人] [P 心]
> +   [Q $(0'D(B] [W $(0(q(B] [E $(0'V(B] [R $(0&H(B] [T $(0'>(B] [Y $(0&4(B] [U $(0&U(B] [I $(0'B(B] [O $(0&*(B] [P $(0'A(B]
>  
> -    [A 日] [S 尸] [D 木] [F 火] [G 土] [H 竹] [J 十] [L 中]
> +    [A $(0'K(B] [S $(0&T(B] [D $(0'N(B] [F $(0'W(B] [G $(0&I(B] [H $(0*M(B] [J $(0&3(B] [L $(0&d(B]
>  
> -      [Z  ] [X 難] [C 金] [V 女] [B 月] [N 弓] [M 一]
> +      [Z  ] [X $(0[E(B] [C $(01[(B] [V $(0&M(B] [B $(0'M(B] [N $(0&_(B] [M $(0&"(B]
>  
>  \\\\<quail-translation-docstring>\"\n"
>  			fulltitle fulltitle))
> -      (insert (format "\"中文輸入【%s】CNS
> +      (insert (format "\"$(GDcEFrSD+!J(B%s$(G!K(BCNS
>  
> -	漢語%s輸入鍵盤
> +	$(GiGk#(B%s$(GrSD+uomu(B
>  
> -   [Q 手] [W 田] [E 水] [R 口] [T 廿] [Y 卜] [U 山] [I 戈] [O 人] [P 心]
> +   [Q $(GEC(B] [W $(GFp(B] [E $(GEU(B] [R $(GDG(B] [T $(GE=(B] [Y $(GD3(B] [U $(GDT(B] [I $(GEA(B] [O $(GD)(B] [P $(GE@(B]
>  
> -    [A 日] [S 尸] [D 木] [F 火] [G 土] [H 竹] [J 十] [L 中]
> +    [A $(GEJ(B] [S $(GDS(B] [D $(GEM(B] [F $(GEV(B] [G $(GDH(B] [H $(GHL(B] [J $(GD2(B] [L $(GDc(B]
>  
> -      [Z  ] [X 難] [C 金] [V 女] [B 月] [N 弓] [M 一]
> +      [Z  ] [X $(GyE(B] [C $(GOZ(B] [V $(GDL(B] [B $(GEL(B] [N $(GD^(B] [M $(GD!(B]
>  
>  \\\\<quail-translation-docstring>\"\n"
>  		      fulltitle fulltitle)))
> @@ -797,35 +798,35 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>      (setq dic (sort dic (function (lambda (x y) (string< (car x ) (car y))))))
>      (dolist (elt dic)
>        (insert (format "(%S\t%S)\n" (car elt) (cdr elt))))
> -    (let ((punctuation '((";" "；﹔，、﹐﹑" "；﹔，、﹐﹑")
> -			 (":" "：︰﹕．。‧﹒·" "：︰﹕．。・﹒·")
> -			 ("'" "’‘" "’‘")
> -			 ("\"" "”“〝〞〃" "”“〝〞〃")
> -			 ("\\" "＼﹨╲" "＼﹨╲")
> -			 ("|" "｜︱︳∣" "︱︲��｜")
> -			 ("/" "／∕╱" "／∕╱")
> -			 ("?" "？﹖" "？﹖")
> -			 ("<" "〈＜﹤︿∠" "〈＜﹤︿∠")
> -			 (">" "〉＞﹥﹀" "〉＞﹦﹀")
> -			 ("[" "〔【﹝︹︻「『﹁﹃" "〔【﹝︹︻「『﹁﹃")
> -			 ("]" "〕】﹞︺︼」』﹂﹄" "〕】﹞︺︼」』﹂﹄")
> -			 ("{" "｛﹛︷ " "｛﹛︷ ")
> -			 ("}" "｝﹜︸" "｝﹜︸")
> -			 ("`" "‵′" "′‵")
> -			 ("~" "～﹋﹌︴﹏" "∼﹋﹌����")
> -			 ("!" "！﹗" "！﹗")
> -			 ("@" "＠﹫" "＠﹫")
> -			 ("#" "＃﹟" "＃﹟")
> -			 ("$" "＄﹩" "＄﹩")
> -			 ("%" "％﹪" "％﹪")
> -			 ("&" "＆﹠" "＆﹠")
> -			 ("*" "＊﹡※☆★" "＊﹡※☆★")
> -			 ("(" "（﹙︵" "（﹙︵")
> -			 (")" "）﹚︶" "）﹚︶")
> -			 ("-" "–—¯￣－﹣" "—–‾��－﹣")
> -			 ("_" "＿ˍ" "＿��")
> -			 ("=" "＝﹦" "＝﹥")
> -			 ("+" "＋﹢" "＋﹢"))))
> +    (let ((punctuation '((";" "$(0!'!2!"!#!.!/(B" "$(G!'!2!"!#!.!/(B")
> +			 (":" "$(0!(!+!3!%!$!&!0!1(B" "$(G!(!+!3!%!$!&!0!1(B")
> +			 ("'" "$(0!e!d(B" "$(G!e!d(B")
> +			 ("\"" "$(0!g!f!h!i!q(B" "$(G!g!f!h!i!q(B")
> +			 ("\\" "$(0"`"b#M(B" "$(G"`"b#M(B")
> +			 ("|" "$(0!6!8!:"^(B" "$(G!6!8!:"^(B")
> +			 ("/" "$(0"_"a#L(B" "$(G"_"a#L(B")
> +			 ("?" "$(0!)!4(B" "$(G!)!4(B")
> +			 ("<" "$(0!R"6"A!T"H(B" "$(G!R"6"A!T"H(B")
> +			 (">" "$(0!S"7"B!U(B" "$(G!S"7"B!U(B")
> +			 ("[" "$(0!F!J!b!H!L!V!Z!X!\(B" "$(G!F!J!b!H!L!V!Z!X!\(B")
> +			 ("]" "$(0!G!K!c!I!M!W![!Y!](B" "$(G!G!K!c!I!M!W![!Y!](B")
> +			 ("{" "$(0!B!`!D(B " "$(G!B!`!D(B ")
> +			 ("}" "$(0!C!a!E(B" "$(G!C!a!E(B")
> +			 ("`" "$(0!j!k(B" "$(G!j!k(B")
> +			 ("~" "$(0"D"+",!<!=(B" "$(G"D"+",!<!=(B")
> +			 ("!" "$(0!*!5(B" "$(G!*!5(B")
> +			 ("@" "$(0"i"n(B" "$(G"i"n(B")
> +			 ("#" "$(0!l"-(B" "$(G!l"-(B")
> +			 ("$" "$(0"c"l(B" "$(G"c"l(B")
> +			 ("%" "$(0"h"m(B" "$(G"h"m(B")
> +			 ("&" "$(0!m".(B" "$(G!m".(B")
> +			 ("*" "$(0!n"/!o!w!x(B" "$(G!n"/!o!w!x(B")
> +			 ("(" "$(0!>!^!@(B" "$(G!>!^!@(B")
> +			 (")" "$(0!?!_!A(B" "$(G!?!_!A(B")
> +			 ("-" "$(0!7!9"#"$"1"@(B" "$(G!7!9"#"$"1"@(B")
> +			 ("_" "$(0"%"&(B" "$(G"%"&(B")
> +			 ("=" "$(0"8"C(B" "$(G"8"C(B")
> +			 ("+" "$(0"0"?(B" "$(G"0"?(B"))))
>      (dolist (elt punctuation)
>        (insert (format "(%S %S)\n" (concat "z" (car elt))
>  		      (if big5-p (nth 1 elt) (nth 2 elt))))))
> @@ -849,11 +850,11 @@ To get complete usage, invoke \"emacs -batch -f batch-titdic-convert -h\"."
>  
>  (defun py-converter (dicbuf)
>    (goto-char (point-max))
> -  (insert (format "%S\n" "汉字输入∷拼音∷
> +  (insert (format "%S\n" "$A::WVJdHk!KF4Rt!K(B
>  
> -	拼音方案
> +	$AF4Rt7=08(B
>  
> - 小写英文字母代表「拼音」符号， \"u(yu) 则用 u: 表示∶
> + $AP!P4S"NDWVD84z1m!8F4Rt!97{:E#,(B \"u(yu) $ATrSC(B u: $A1mJ>!C(B
>  
>  Pinyin base input method for Chinese charset GB2312 (`chinese-gb2312').
>  
> @@ -867,14 +868,14 @@ character.  The sequence is made by the combination of the initials
>            iang ing iong u ua uo uai ui uan un uan ueng yu yue yuan yun
>  
>    (Note: In the correct Pinyin writing, the sequence \"yu\" in the last
> -   four finals should be written by the character u-umlaut `ü'.)
> +   four finals should be written by the character u-umlaut `$A(9(B'.)
>  
>  With this input method, you enter a Chinese character by first
>  entering its pinyin spelling.
>  
>  \\<quail-translation-docstring>
>  
> -For instance, to input 你, you type \"n i C-n 3\".  The first \"n i\"
> +For instance, to input $ADc(B, you type \"n i C-n 3\".  The first \"n i\"
>  is a Pinyin, \"C-n\" selects the next group of candidates (each group
>  contains at most 10 characters), \"3\" select the third character in
>  that group.
> @@ -957,22 +958,22 @@ method `chinese-tonepy' with which you must specify tones by digits
>  		 table)))
>      (setq dic (sort dic (function (lambda (x y) (string< (car x) (car y))))))
>      (goto-char (point-max))
> -    (insert (format "%S\n" "汉字输入∷【自然】∷
> -
> -                            键盘对照表:
> - ┏━━┳━━┳━━┳━━┳━━┳━━┳━━┳━━┳━━┳━━┓
> - ┃Ｑ  ┃Ｗ  ┃Ｅ  ┃Ｒ  ┃Ｔ  ┃Ｙ  ┃Ｕsh┃Ｉch┃Ｏ  ┃Ｐ  ┃
> - ┃  iu┃  ua┃   e┃ uan┃  ue┃ uai┃   u┃   i┃   o┃  un┃
> - ┃    ┃  ia┃    ┃ van┃  ve┃ ing┃    ┃    ┃  uo┃  vn┃
> - ┗┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━┛
> -   ┃Ａ  ┃Ｓ  ┃Ｄ  ┃Ｆ  ┃Ｇ  ┃Ｈ  ┃Ｊ  ┃Ｋ  ┃Ｌ  ┃
> -   ┃   a┃iong┃uang┃  en┃ eng┃ ang┃  an┃  ao┃  ai┃
> -   ┃    ┃ ong┃iang┃    ┃  ng┃    ┃    ┃    ┃    ┃
> -   ┗┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━┻┳━━┓
> -     ┃Ｚ  ┃Ｘ  ┃Ｃ  ┃Ｖzh┃Ｂ  ┃Ｎ  ┃Ｍ  ┃，  ┃．  ┃ ／ ┃
> -     ┃  ei┃  ie┃ iao┃  ui┃  ou┃  in┃ ian┃前页┃后页┃符号┃
> -     ┃    ┃    ┃    ┃   v┃    ┃    ┃    ┃    ┃    ┃    ┃
> -     ┗━━┻━━┻━━┻━━┻━━┻━━┻━━┻━━┻━━┻━━┛
> +    (insert (format "%S\n" "$A::WVJdHk!K!>WTH;!?!K(B
> +
> +                            $A<|EL6TUU1m(B:
> + $A)3)%)%)W)%)%)W)%)%)W)%)%)W)%)%)W)%)%)W)%)%)W)%)%)W)%)%)W)%)%)7(B
> + $A)'#Q(B  $A)'#W(B  $A)'#E(B  $A)'#R(B  $A)'#T(B  $A)'#Y(B  $A)'#U(Bsh$A)'#I(Bch$A)'#O(B  $A)'#P(B  $A)'(B
> + $A)'(B  iu$A)'(B  ua$A)'(B   e$A)'(B uan$A)'(B  ue$A)'(B uai$A)'(B   u$A)'(B   i$A)'(B   o$A)'(B  un$A)'(B
> + $A)'(B    $A)'(B  ia$A)'(B    $A)'(B van$A)'(B  ve$A)'(B ing$A)'(B    $A)'(B    $A)'(B  uo$A)'(B  vn$A)'(B
> + $A);)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)?(B
> +   $A)'#A(B  $A)'#S(B  $A)'#D(B  $A)'#F(B  $A)'#G(B  $A)'#H(B  $A)'#J(B  $A)'#K(B  $A)'#L(B  $A)'(B
> +   $A)'(B   a$A)'(Biong$A)'(Buang$A)'(B  en$A)'(B eng$A)'(B ang$A)'(B  an$A)'(B  ao$A)'(B  ai$A)'(B
> +   $A)'(B    $A)'(B ong$A)'(Biang$A)'(B    $A)'(B  ng$A)'(B    $A)'(B    $A)'(B    $A)'(B    $A)'(B
> +   $A);)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)_)W)%)%)7(B
> +     $A)'#Z(B  $A)'#X(B  $A)'#C(B  $A)'#V(Bzh$A)'#B(B  $A)'#N(B  $A)'#M(B  $A)'#,(B  $A)'#.(B  $A)'(B $A#/(B $A)'(B
> +     $A)'(B  ei$A)'(B  ie$A)'(B iao$A)'(B  ui$A)'(B  ou$A)'(B  in$A)'(B ian$A)'G0R3)':sR3)'7{:E)'(B
> +     $A)'(B    $A)'(B    $A)'(B    $A)'(B   v$A)'(B    $A)'(B    $A)'(B    $A)'(B    $A)'(B    $A)'(B    $A)'(B
> +     $A);)%)%)_)%)%)_)%)%)_)%)%)_)%)%)_)%)%)_)%)%)_)%)%)_)%)%)_)%)%)?(B
>  
>  
>  Pinyin base input method for Chinese GB2312 characters (`chinese-gb2312').
> @@ -984,34 +985,34 @@ method `chinese-py'.
>  Unlike the standard spelling of Pinyin, in this input method all
>  initials and finals are assigned to single keys (see the above table).
>  For instance, the initial \"ch\" is assigned to the key `i', the final
> -\"iu\" is assigned to the key `q', and tones 1, 2, 3, 4, and 轻声 are
> +\"iu\" is assigned to the key `q', and tones 1, 2, 3, 4, and $AGaIy(B are
>  assigned to the keys `q', `w', `e', `r', `t' respectively.
>  
>  \\<quail-translation-docstring>
>  
>  To input one-letter words, you type 4 keys, the first two for the
>  Pinyin of the letter, next one for tone, and the last one is always a
> -quote (').  For instance, \"vsq'\" input 中.  Exceptions are these
> +quote (').  For instance, \"vsq'\" input $AVP(B.  Exceptions are these
>  letters.  You can input them just by typing a single key.
>  
> -	Character: 按 不 次 的 二 发 个 和 出 及 可 了 没
> +	Character: $A04(B $A2;(B $A4N(B $A5D(B $A6~(B $A7"(B $A8v(B $A:M(B $A3v(B $A<0(B $A?I(B $AAK(B $AC;(B
>  	Key:	   a  b  c  d  e  f  g  h  i  j  k  l  m
> -	Character: 你 欧 片 七 人 三 他 是 着 我 小 一 在
> +	Character: $ADc(B $AE7(B $AF,(B $AF_(B $AHK(B $AH}(B $AK{(B $AJG(B $AWE(B $ANR(B $AP!(B $AR;(B $ATZ(B
>  	Key:	   n  o  p  q  r  s  t  u  v  w  x  y  z
>  
>  To input two-letter words, you have two ways.  One way is to type 4
>  keys, two for the first Pinyin, two for the second Pinyin.  For
> -instance, \"vsgo\" inputs 中国.  Another way is to type 3 keys: 2
> +instance, \"vsgo\" inputs $AVP9z(B.  Another way is to type 3 keys: 2
>  initials of two letters, and quote (').  For instance, \"vg'\" also
> -inputs 中国.
> +inputs $AVP9z(B.
>  
>  To input three-letter words, you type 4 keys: initials of three
> -letters, and the last is quote (').  For instance, \"bjy'2\" inputs 北
> -京鸭 (the last `2' is to select one of the candidates).
> +letters, and the last is quote (').  For instance, \"bjy'2\" inputs $A11(B
> +$A>)Q<(B (the last `2' is to select one of the candidates).
>  
>  To input words of more than three letters, you type 4 keys, initials
>  of the first three letters and the last letter.  For instance,
> -\"bjdt\" inputs 北京电视台.
> +\"bjdt\" inputs $A11>)5gJSL((B.
>  
>  To input symbols and punctuation, type `/' followed by one of `a' to
>  `z', then select one of the candidates."))
> @@ -1058,7 +1059,7 @@ To input symbols and punctuation, type `/' followed by one of `a' to
>      ;; which the file is converted have no Big5 equivalent.  Go
>      ;; through and delete them.
>      (goto-char pos)
> -    (while (search-forward "□" nil t)
> +    (while (search-forward "$(0!{(B" nil t)
>        (delete-char -1))
>      ;; Uppercase keys in dictionary need to be downcased.  Backslashes
>      ;; at the beginning of keys need to be turned into double
> @@ -1082,31 +1083,31 @@ To input symbols and punctuation, type `/' followed by one of `a' to
>  
>  (defun ctlau-gb-converter (dicbuf)
>    (ctlau-converter dicbuf
> -"汉字输入∷刘锡祥式粤音∷
> +"$A::WVJdHk!KAuN}OiJ=TARt!K(B
>  
> - 刘锡祥式粤语注音方案
> + $AAuN}OiJ=TASoW"Rt7=08(B
>   Sidney Lau's Cantonese transcription scheme as described in his book
>   \"Elementary Cantonese\", The Government Printer, Hong Kong, 1972.
> - This file was prepared by Fung Fung Lee (李枫峰).
> + This file was prepared by Fung Fung Lee ($A@n7c7e(B).
>   Originally converted from CTCPS3.tit
>   Last modified: June 2, 1993.
>  
>   Some infrequent GB characters are accessed by typing \\, followed by
> - the Cantonese romanization of the respective radical (部首)."))
> + the Cantonese romanization of the respective radical ($A2?JW(B)."))
>  
>  (defun ctlau-b5-converter (dicbuf)
>    (ctlau-converter dicbuf
> -"漢字輸入：劉錫祥式粵音：
> +"$(0KH)tTT&,!(N,Tg>A*#Gn5x!((B
>  
> - 劉錫祥式粵語注音方案
> + $(0N,Tg>A*#GnM$0D5x'J7{(B
>   Sidney Lau's Cantonese transcription scheme as described in his book
>   \"Elementary Cantonese\", The Government Printer, Hong Kong, 1972.
> - This file was prepared by Fung Fung Lee (李楓峰).
> + This file was prepared by Fung Fung Lee ($(0,XFS76(B).
>   Originally converted from CTCPS3.tit
>   Last modified: June 2, 1993.
>  
>   Some infrequent characters are accessed by typing \\, followed by
> - the Cantonese romanization of the respective radical (部首)."))
> + the Cantonese romanization of the respective radical ($(0?f5}(B)."))
>  
>  (declare-function dos-8+3-filename "dos-fns.el" (filename))
>  




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again
  2020-01-05 20:55   ` master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again Stefan Monnier
@ 2020-01-06 17:07     ` Eli Zaretskii
  2020-01-06 17:53       ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2020-01-06 17:07 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 05 Jan 2020 15:55:56 -0500
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> 
> FWIW, when I converted it to utf-8, I (tried to) make sure the generated
> .elc files were bit-for-bit identical.  Looking at the URL you mention,
> I don't see any concrete example of a problem introduced by utf-8.

I'm not saying the code is wrong.  But surely, you'll agree that code
like this:

    (let ((foo ("x" "foo" "foo")))
      (if baz (nth 1 foo) (nth 2 foo)))

just wastes CPU cycles (and some memory), and does little otherwise?

When titdic-cnv.el was encoded in ISO-2022, the 2 "foo"s belonged to 2
different charsets, so the result was less trivial.  (And yes, a
couple of string pairs are not really identical, but almost all of
them are, so treating all of them as different just doesn't look right
to me.)

If the charset properties are not important here at all (I'm not sure
yet), then we should rewrite that code not to "select" one of 2
identical strings.  And if the charset is important, we should still
have only one copy of each string, but propertize it with the charset
selected according to the big5-p flag.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again
  2020-01-06 17:07     ` Eli Zaretskii
@ 2020-01-06 17:53       ` Stefan Monnier
  2020-01-06 18:12         ` Andreas Schwab
  2020-01-06 18:24         ` Eli Zaretskii
  0 siblings, 2 replies; 7+ messages in thread
From: Stefan Monnier @ 2020-01-06 17:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, emacs-devel

> If the charset properties are not important here at all (I'm not sure
> yet), then we should rewrite that code not to "select" one of 2
> identical strings.  And if the charset is important, we should still
> have only one copy of each string, but propertize it with the charset
> selected according to the big5-p flag.

Agreed.  My point was mostly that reverting the encoding to iso-2022 is
probably not the best course of action and that other than "uselessly
complex code" I couldn't find a concrete problem description that can
help decide whether the charset properties are actually relevant.

I'm beginning to believe they're not: they may have been back in
Emacs<23 when it was important to pick the right characters from the
right charset, but since we switched to Unicode for our internals, the
distinction has disappeared.

        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again
  2020-01-06 17:53       ` Stefan Monnier
@ 2020-01-06 18:12         ` Andreas Schwab
  2020-01-06 18:21           ` Stefan Monnier
  2020-01-06 18:26           ` Eli Zaretskii
  2020-01-06 18:24         ` Eli Zaretskii
  1 sibling, 2 replies; 7+ messages in thread
From: Andreas Schwab @ 2020-01-06 18:12 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, eggert, emacs-devel

On Jan 06 2020, Stefan Monnier wrote:

> Agreed.  My point was mostly that reverting the encoding to iso-2022 is
> probably not the best course of action and that other than "uselessly
> complex code" I couldn't find a concrete problem description that can
> help decide whether the charset properties are actually relevant.

In the compiled file, the strings lose their charset property, thus the
reencoding into iso-2022-7bit was a no-op.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again
  2020-01-06 18:12         ` Andreas Schwab
@ 2020-01-06 18:21           ` Stefan Monnier
  2020-01-06 18:26           ` Eli Zaretskii
  1 sibling, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2020-01-06 18:21 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eli Zaretskii, eggert, emacs-devel

>> Agreed.  My point was mostly that reverting the encoding to iso-2022 is
>> probably not the best course of action and that other than "uselessly
>> complex code" I couldn't find a concrete problem description that can
>> help decide whether the charset properties are actually relevant.
>
> In the compiled file, the strings lose their charset property, thus the
> reencoding into iso-2022-7bit was a no-op.

I know, but the question is whether this loss of the charset property
(which AFAICT dates back to Emacs-23) is a problem (with which we've
lived for many years) or not.


        Stefan




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again
  2020-01-06 18:12         ` Andreas Schwab
  2020-01-06 18:21           ` Stefan Monnier
@ 2020-01-06 18:26           ` Eli Zaretskii
  1 sibling, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2020-01-06 18:26 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: eggert, monnier, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Mon, 06 Jan 2020 19:12:35 +0100
> 
> On Jan 06 2020, Stefan Monnier wrote:
> 
> > Agreed.  My point was mostly that reverting the encoding to iso-2022 is
> > probably not the best course of action and that other than "uselessly
> > complex code" I couldn't find a concrete problem description that can
> > help decide whether the charset properties are actually relevant.
> 
> In the compiled file, the strings lose their charset property, thus the
> reencoding into iso-2022-7bit was a no-op.

Indeed.  My idea was to have the call to propertize in the generated
input method.  But again, I'm not yet sure this is worth the hassle.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again
  2020-01-06 17:53       ` Stefan Monnier
  2020-01-06 18:12         ` Andreas Schwab
@ 2020-01-06 18:24         ` Eli Zaretskii
  1 sibling, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2020-01-06 18:24 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eggert, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Mon, 06 Jan 2020 12:53:05 -0500
> 
> > If the charset properties are not important here at all (I'm not sure
> > yet), then we should rewrite that code not to "select" one of 2
> > identical strings.  And if the charset is important, we should still
> > have only one copy of each string, but propertize it with the charset
> > selected according to the big5-p flag.
> 
> Agreed.  My point was mostly that reverting the encoding to iso-2022 is
> probably not the best course of action and that other than "uselessly
> complex code" I couldn't find a concrete problem description that can
> help decide whether the charset properties are actually relevant.

Agreed.

> I'm beginning to believe they're not: they may have been back in
> Emacs<23 when it was important to pick the right characters from the
> right charset, but since we switched to Unicode for our internals, the
> distinction has disappeared.

Probably.  I added this issue to my todo.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-01-06 18:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20200105203051.21185.19313@vcs0.savannah.gnu.org>
     [not found] ` <20200105203053.36B932105B@vcs0.savannah.gnu.org>
2020-01-05 20:55   ` master 3fc859d: Go back to iso-2022-7bit for titdic-cnv.el again Stefan Monnier
2020-01-06 17:07     ` Eli Zaretskii
2020-01-06 17:53       ` Stefan Monnier
2020-01-06 18:12         ` Andreas Schwab
2020-01-06 18:21           ` Stefan Monnier
2020-01-06 18:26           ` Eli Zaretskii
2020-01-06 18:24         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).