unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: mattiase@acm.org, eggert@cs.ucla.edu, emacs-devel@gnu.org,
	handa@m17n.org
Subject: Re: Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditional forms)
Date: Wed, 27 Jan 2021 18:16:28 +0200	[thread overview]
Message-ID: <83tur2z76r.fsf@gnu.org> (raw)
In-Reply-To: <jwv7dnz15ip.fsf-monnier+emacs@gnu.org> (message from Stefan Monnier on Tue, 26 Jan 2021 22:02:35 -0500)

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Kenichi Handa <handa@m17n.org>, Eli Zaretskii <eliz@gnu.org>,
>   mattiase@acm.org,  emacs-devel@gnu.org
> Date: Tue, 26 Jan 2021 22:02:35 -0500
> 
> So, I think using `iso-2022-jp` is a bad idea here: it gives the
> illusion that the two branches are different where they really aren't.
> If we do want to recover the difference (the one we presumably lost in
> Emacs-23), we need to make those two branches return
> properly-propertized strings with something like:
> 
>     (defun tsang-quick-converter (dicbuf tsang-p big5-p)
>       (let* ((charset (if big5-p 'chinese-big5-1 'chinese-cns11643-1))
>              (fulltitle
>               (propertize (if tsang-p "倉頡" "簡易")
>                           'charset charset))
> 
> Tho I'm not sure even that would be sufficient, since that function
> generates a file so if it just prints those strings into an Elisp file,
> the info would again be lost, at least when that Elisp file
> gets compiled.
> 
> Given that we lived blissfully unaware of the problem for the last 10
> years (plus another year with some vague awareness of it but still
> without doing anything about it), I suggest we get rid of the `if
> big5-p` tests and switch the file to `utf-8`.

I've discussed this with Handa-san a year ago, and we arrived at the
conclusion that the charset information is indeed no longer important.

However, if you look carefully at the part of tsang-quick-converter
that begins with

    (let ((punctuation '((";" ";﹔,、﹐﹑" ";﹔,、﹐﹑")

and ends with

    (dolist (elt punctuation)
      (insert (format "(%S %S)\n" (concat "z" (car elt))
		      (if big5-p (nth 1 elt) (nth 2 elt))))))

you will see that some of the characters in the punctuation structure
are actually different between the big5-p and non-big5-p branches,
although most of them are identical.  So either these are artifacts of
converting this file from its original encoding, or there are actual
differences between these two branches, and we cannot simply delete
one of them.

This puzzle has been sitting in my TODO since I discovered these
differences a year ago.  If you (or someone else) are willing to
unlock the mystery and simplify the file accordingly, that would be
welcome indeed.



  parent reply	other threads:[~2021-01-27 16:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-04 12:37 Scan of broken conditional forms Mattias Engdegård
2020-01-04 13:03 ` Michael Albinus
2020-01-04 19:23 ` Paul Eggert
2020-01-04 19:39   ` Eli Zaretskii
2020-01-04 21:40     ` Paul Eggert
2020-01-05 15:45       ` Eli Zaretskii
2020-01-05 20:48         ` Paul Eggert
2020-01-05 20:57           ` Stefan Monnier
2021-01-27  3:02           ` Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditional forms) Stefan Monnier
2021-01-27  8:18             ` Broken `if big5-p` code in titdic-cnv.el Andreas Schwab
2021-01-27 16:16             ` Eli Zaretskii [this message]
2021-01-27 17:35               ` Stefan Monnier
2020-01-04 22:04   ` Scan of broken conditional forms Mattias Engdegård
2020-01-04 22:11     ` Paul Eggert
2020-01-31 16:22 ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83tur2z76r.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=mattiase@acm.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).