From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken
 conditional forms)
Date: Wed, 27 Jan 2021 18:16:28 +0200
Message-ID: <83tur2z76r.fsf@gnu.org>
References: <F7AAA9E8-100A-47DF-BDE4-E0C4C6F48C44@acm.org>
 <1abe6fdc-7466-193a-cbd3-4e2d3bf2660b@cs.ucla.edu>
 <831rsfggf8.fsf@gnu.org>
 <4f677e8f-86c0-3753-c272-d5acf4f568cb@cs.ucla.edu>
 <83imlpgb5r.fsf@gnu.org>
 <fb8373f1-acc7-d02e-2500-b3da0d1e56ec@cs.ucla.edu>
 <jwv7dnz15ip.fsf-monnier+emacs@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="10969"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org, handa@m17n.org
To: Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jan 27 17:18:12 2021
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1l4nWG-0002kk-7a
	for ged-emacs-devel@m.gmane-mx.org; Wed, 27 Jan 2021 17:18:12 +0100
Original-Received: from localhost ([::1]:34790 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1l4nWF-0004PR-9N
	for ged-emacs-devel@m.gmane-mx.org; Wed, 27 Jan 2021 11:18:11 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:47266)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>) id 1l4nUQ-0003Nx-Qy
 for emacs-devel@gnu.org; Wed, 27 Jan 2021 11:16:18 -0500
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:49823)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <eliz@gnu.org>)
 id 1l4nUP-0006YL-1U; Wed, 27 Jan 2021 11:16:17 -0500
Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:1051
 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@gnu.org>)
 id 1l4nUO-0002eH-9h; Wed, 27 Jan 2021 11:16:16 -0500
In-Reply-To: <jwv7dnz15ip.fsf-monnier+emacs@gnu.org> (message from Stefan
 Monnier on Tue, 26 Jan 2021 22:02:35 -0500)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:263510
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/263510>

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Kenichi Handa <handa@m17n.org>, Eli Zaretskii <eliz@gnu.org>,
>   mattiase@acm.org,  emacs-devel@gnu.org
> Date: Tue, 26 Jan 2021 22:02:35 -0500
> 
> So, I think using `iso-2022-jp` is a bad idea here: it gives the
> illusion that the two branches are different where they really aren't.
> If we do want to recover the difference (the one we presumably lost in
> Emacs-23), we need to make those two branches return
> properly-propertized strings with something like:
> 
>     (defun tsang-quick-converter (dicbuf tsang-p big5-p)
>       (let* ((charset (if big5-p 'chinese-big5-1 'chinese-cns11643-1))
>              (fulltitle
>               (propertize (if tsang-p "倉頡" "簡易")
>                           'charset charset))
> 
> Tho I'm not sure even that would be sufficient, since that function
> generates a file so if it just prints those strings into an Elisp file,
> the info would again be lost, at least when that Elisp file
> gets compiled.
> 
> Given that we lived blissfully unaware of the problem for the last 10
> years (plus another year with some vague awareness of it but still
> without doing anything about it), I suggest we get rid of the `if
> big5-p` tests and switch the file to `utf-8`.

I've discussed this with Handa-san a year ago, and we arrived at the
conclusion that the charset information is indeed no longer important.

However, if you look carefully at the part of tsang-quick-converter
that begins with

    (let ((punctuation '((";" "；﹔，、﹐﹑" "；﹔，、﹐﹑")

and ends with

    (dolist (elt punctuation)
      (insert (format "(%S %S)\n" (concat "z" (car elt))
		      (if big5-p (nth 1 elt) (nth 2 elt))))))

you will see that some of the characters in the punctuation structure
are actually different between the big5-p and non-big5-p branches,
although most of them are identical.  So either these are artifacts of
converting this file from its original encoding, or there are actual
differences between these two branches, and we cannot simply delete
one of them.

This puzzle has been sitting in my TODO since I discovered these
differences a year ago.  If you (or someone else) are willing to
unlock the mystery and simplify the file accordingly, that would be
welcome indeed.