unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: mattiase@acm.org, Eli Zaretskii <eliz@gnu.org>,
	emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org>
Subject: Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditional forms)
Date: Tue, 26 Jan 2021 22:02:35 -0500	[thread overview]
Message-ID: <jwv7dnz15ip.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <fb8373f1-acc7-d02e-2500-b3da0d1e56ec@cs.ucla.edu> (Paul Eggert's message of "Sun, 5 Jan 2020 12:48:29 -0800")

[ This dates back to Jan 2020.

  To recap we have in titdic-cnv.el code like:

    (defun tsang-quick-converter (dicbuf tsang-p big5-p)
      (let ((fulltitle (if tsang-p (if big5-p "倉頡" "倉頡")
                         (if big5-p "簡易" "簡易")))

  where the `if big5-p` tests appear to do nothing.  It turns out that
  those two strings have the same unicode chars but because the file is
  encoded using iso-2022-jp they have a different `charset` property
  applied to them which Emacs can use to render them differently.
  
  When we bumped into this code, tho, the file has been converted to
  `utf-8` (by yours truly) so that the nuance had been lost.
  Paul reverted this part of my change to recover the subtle rendering.  ]

Paul Eggert [2020-01-05 12:48:29] wrote:
> On 1/5/20 7:45 AM, Eli Zaretskii wrote:
>> let's install this only on master, please.
> OK, I did that.
>> Btw, the change in titdic-cnv.el, by itself, makes no sense ...
>> When Stefan recoded this file, he left that code intact, which now
>> makes no sense at all.  We should probably propertize the strings with
>> the 2 corresponding charset properties, something that before the
>> recoding happened automagically (because ISO-2022 records the charset
>> in the encoding), and which was the whole purpose of this function.
> I worked around the problem by converting titdic-cnv.el back to
> iso-2022-7bit on master, as this conversion was simple.  Stefan (or
> anybody) can look into this later if they want to do it in
> a better way.

I just looked into it and I still can't see what's wrong with using
utf-8 here.  AFAICT those `if big5-p` tests have been doing nothing ever
since Emacs's internal encoding was changed to be based on Unicode
(i.e. Emacs-23).

While it's true that using the iso-2022-jp encoding on the file does
allow Emacs to render the two strings differently, this only applies to
the source file.  The .elc files all use `utf-8-emacs` encoding anyway,
so that info is lost.  And the difference is even lost before we write
the .elc file because when Emacs byte-compiles that code the
byte-compiler considers those two strings as "equal" and emits only one
string in the byte-code (so the two branches return `eq` strings).

So, I think using `iso-2022-jp` is a bad idea here: it gives the
illusion that the two branches are different where they really aren't.
If we do want to recover the difference (the one we presumably lost in
Emacs-23), we need to make those two branches return
properly-propertized strings with something like:

    (defun tsang-quick-converter (dicbuf tsang-p big5-p)
      (let* ((charset (if big5-p 'chinese-big5-1 'chinese-cns11643-1))
             (fulltitle
              (propertize (if tsang-p "倉頡" "簡易")
                          'charset charset))

Tho I'm not sure even that would be sufficient, since that function
generates a file so if it just prints those strings into an Elisp file,
the info would again be lost, at least when that Elisp file
gets compiled.

Given that we lived blissfully unaware of the problem for the last 10
years (plus another year with some vague awareness of it but still
without doing anything about it), I suggest we get rid of the `if
big5-p` tests and switch the file to `utf-8`.


        Stefan




  parent reply	other threads:[~2021-01-27  3:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-04 12:37 Scan of broken conditional forms Mattias Engdegård
2020-01-04 13:03 ` Michael Albinus
2020-01-04 19:23 ` Paul Eggert
2020-01-04 19:39   ` Eli Zaretskii
2020-01-04 21:40     ` Paul Eggert
2020-01-05 15:45       ` Eli Zaretskii
2020-01-05 20:48         ` Paul Eggert
2020-01-05 20:57           ` Stefan Monnier
2021-01-27  3:02           ` Stefan Monnier [this message]
2021-01-27  8:18             ` Broken `if big5-p` code in titdic-cnv.el Andreas Schwab
2021-01-27 16:16             ` Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditional forms) Eli Zaretskii
2021-01-27 17:35               ` Broken `if big5-p` code in titdic-cnv.el Stefan Monnier
2020-01-04 22:04   ` Scan of broken conditional forms Mattias Engdegård
2020-01-04 22:11     ` Paul Eggert
2020-01-31 16:22 ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwv7dnz15ip.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=mattiase@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).