From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditional forms) Date: Wed, 27 Jan 2021 18:16:28 +0200 Message-ID: <83tur2z76r.fsf@gnu.org> References: <1abe6fdc-7466-193a-cbd3-4e2d3bf2660b@cs.ucla.edu> <831rsfggf8.fsf@gnu.org> <4f677e8f-86c0-3753-c272-d5acf4f568cb@cs.ucla.edu> <83imlpgb5r.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10969"; mail-complaints-to="usenet@ciao.gmane.io" Cc: mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org, handa@m17n.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jan 27 17:18:12 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l4nWG-0002kk-7a for ged-emacs-devel@m.gmane-mx.org; Wed, 27 Jan 2021 17:18:12 +0100 Original-Received: from localhost ([::1]:34790 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l4nWF-0004PR-9N for ged-emacs-devel@m.gmane-mx.org; Wed, 27 Jan 2021 11:18:11 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:47266) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l4nUQ-0003Nx-Qy for emacs-devel@gnu.org; Wed, 27 Jan 2021 11:16:18 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:49823) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l4nUP-0006YL-1U; Wed, 27 Jan 2021 11:16:17 -0500 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:1051 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1l4nUO-0002eH-9h; Wed, 27 Jan 2021 11:16:16 -0500 In-Reply-To: (message from Stefan Monnier on Tue, 26 Jan 2021 22:02:35 -0500) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:263510 Archived-At: > From: Stefan Monnier > Cc: Kenichi Handa , Eli Zaretskii , > mattiase@acm.org, emacs-devel@gnu.org > Date: Tue, 26 Jan 2021 22:02:35 -0500 > > So, I think using `iso-2022-jp` is a bad idea here: it gives the > illusion that the two branches are different where they really aren't. > If we do want to recover the difference (the one we presumably lost in > Emacs-23), we need to make those two branches return > properly-propertized strings with something like: > > (defun tsang-quick-converter (dicbuf tsang-p big5-p) > (let* ((charset (if big5-p 'chinese-big5-1 'chinese-cns11643-1)) > (fulltitle > (propertize (if tsang-p "倉頡" "簡易") > 'charset charset)) > > Tho I'm not sure even that would be sufficient, since that function > generates a file so if it just prints those strings into an Elisp file, > the info would again be lost, at least when that Elisp file > gets compiled. > > Given that we lived blissfully unaware of the problem for the last 10 > years (plus another year with some vague awareness of it but still > without doing anything about it), I suggest we get rid of the `if > big5-p` tests and switch the file to `utf-8`. I've discussed this with Handa-san a year ago, and we arrived at the conclusion that the charset information is indeed no longer important. However, if you look carefully at the part of tsang-quick-converter that begins with (let ((punctuation '((";" ";﹔,、﹐﹑" ";﹔,、﹐﹑") and ends with (dolist (elt punctuation) (insert (format "(%S %S)\n" (concat "z" (car elt)) (if big5-p (nth 1 elt) (nth 2 elt)))))) you will see that some of the characters in the punctuation structure are actually different between the big5-p and non-big5-p branches, although most of them are identical. So either these are artifacts of converting this file from its original encoding, or there are actual differences between these two branches, and we cannot simply delete one of them. This puzzle has been sitting in my TODO since I discovered these differences a year ago. If you (or someone else) are willing to unlock the mystery and simplify the file accordingly, that would be welcome indeed.