From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditional forms) Date: Tue, 26 Jan 2021 22:02:35 -0500 Message-ID: References: <1abe6fdc-7466-193a-cbd3-4e2d3bf2660b@cs.ucla.edu> <831rsfggf8.fsf@gnu.org> <4f677e8f-86c0-3753-c272-d5acf4f568cb@cs.ucla.edu> <83imlpgb5r.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8100"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: mattiase@acm.org, Eli Zaretskii , emacs-devel@gnu.org, Kenichi Handa To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jan 27 04:03:31 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l4b7D-00021V-77 for ged-emacs-devel@m.gmane-mx.org; Wed, 27 Jan 2021 04:03:31 +0100 Original-Received: from localhost ([::1]:54960 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l4b7C-0000gu-7e for ged-emacs-devel@m.gmane-mx.org; Tue, 26 Jan 2021 22:03:30 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50026) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l4b6R-0000EV-KA for emacs-devel@gnu.org; Tue, 26 Jan 2021 22:02:43 -0500 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:52778) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l4b6O-0005WV-Rz; Tue, 26 Jan 2021 22:02:42 -0500 Original-Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 5A615805EF; Tue, 26 Jan 2021 22:02:38 -0500 (EST) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 9F553805D6; Tue, 26 Jan 2021 22:02:36 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1611716556; bh=2v10Zm8b2GG4x7LP+TScheaadFxBivc8x6u1C6MJ5Lc=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=KDHpwXrRnwgCbBFxFWdt3mJumi6IKx+ezb6is66mE40TJzjkEVGSXRdfjf3dRi/gx QO7Ft3ku1LZtwZ9rgqidokVw4X/JUNyOtnSUNQSOFMp8HNqAVw9U3HjkjN5OcezoRd s0MZl5laQkmHYtLIUl21oidfTtBgBGPJ7NuB2DNCkvvV/gMxHocH+a+dI6Z0xqMihT MnRO2UwO5eOuqOAU0xViD7eUMBN5SwfN6+X0ahxPbo7CTBzh2fC3CpJSYcwjHlGKGV o9fmuLTTIvJyaH3/RJq8mG0T6Iu39UWtXYMAvmelNYPq6+KNxXjxoy52/f3i+4g1qR DAr2h1nd7emDg== Original-Received: from alfajor (69-196-141-46.dsl.teksavvy.com [69.196.141.46]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 527E61200B8; Tue, 26 Jan 2021 22:02:36 -0500 (EST) In-Reply-To: (Paul Eggert's message of "Sun, 5 Jan 2020 12:48:29 -0800") Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:263487 Archived-At: [ This dates back to Jan 2020. To recap we have in titdic-cnv.el code like: (defun tsang-quick-converter (dicbuf tsang-p big5-p) (let ((fulltitle (if tsang-p (if big5-p "=E5=80=89=E9=A0=A1" "=E5=80= =89=E9=A0=A1") (if big5-p "=E7=B0=A1=E6=98=93" "=E7=B0=A1=E6=98= =93"))) where the `if big5-p` tests appear to do nothing. It turns out that those two strings have the same unicode chars but because the file is encoded using iso-2022-jp they have a different `charset` property applied to them which Emacs can use to render them differently. =20=20 When we bumped into this code, tho, the file has been converted to `utf-8` (by yours truly) so that the nuance had been lost. Paul reverted this part of my change to recover the subtle rendering. ] Paul Eggert [2020-01-05 12:48:29] wrote: > On 1/5/20 7:45 AM, Eli Zaretskii wrote: >> let's install this only on master, please. > OK, I did that. >> Btw, the change in titdic-cnv.el, by itself, makes no sense ... >> When Stefan recoded this file, he left that code intact, which now >> makes no sense at all. We should probably propertize the strings with >> the 2 corresponding charset properties, something that before the >> recoding happened automagically (because ISO-2022 records the charset >> in the encoding), and which was the whole purpose of this function. > I worked around the problem by converting titdic-cnv.el back to > iso-2022-7bit on master, as this conversion was simple. Stefan (or > anybody) can look into this later if they want to do it in > a better way. I just looked into it and I still can't see what's wrong with using utf-8 here. AFAICT those `if big5-p` tests have been doing nothing ever since Emacs's internal encoding was changed to be based on Unicode (i.e. Emacs-23). While it's true that using the iso-2022-jp encoding on the file does allow Emacs to render the two strings differently, this only applies to the source file. The .elc files all use `utf-8-emacs` encoding anyway, so that info is lost. And the difference is even lost before we write the .elc file because when Emacs byte-compiles that code the byte-compiler considers those two strings as "equal" and emits only one string in the byte-code (so the two branches return `eq` strings). So, I think using `iso-2022-jp` is a bad idea here: it gives the illusion that the two branches are different where they really aren't. If we do want to recover the difference (the one we presumably lost in Emacs-23), we need to make those two branches return properly-propertized strings with something like: (defun tsang-quick-converter (dicbuf tsang-p big5-p) (let* ((charset (if big5-p 'chinese-big5-1 'chinese-cns11643-1)) (fulltitle (propertize (if tsang-p "=E5=80=89=E9=A0=A1" "=E7=B0=A1=E6=98= =93") 'charset charset)) Tho I'm not sure even that would be sufficient, since that function generates a file so if it just prints those strings into an Elisp file, the info would again be lost, at least when that Elisp file gets compiled. Given that we lived blissfully unaware of the problem for the last 10 years (plus another year with some vague awareness of it but still without doing anything about it), I suggest we get rid of the `if big5-p` tests and switch the file to `utf-8`. Stefan