From: Eli Zaretskii <eliz@gnu.org>
To: rms@gnu.org
Cc: psainty@orcon.net.nz, luangruo@yahoo.com, emacs-devel@gnu.org,
kevin.legouguec@gmail.com
Subject: Re: Can watermarking Unicode text using invisible differences sneak through Emacs, or can Emacs detect it?
Date: Sun, 06 Feb 2022 10:56:47 +0200 [thread overview]
Message-ID: <83sfswz834.fsf@gnu.org> (raw)
In-Reply-To: <E1nGYvh-00058o-9o@fencepost.gnu.org> (message from Richard Stallman on Sat, 05 Feb 2022 23:13:37 -0500)
> From: Richard Stallman <rms@gnu.org>
> Cc: psainty@orcon.net.nz, luangruo@yahoo.com,
> kevin.legouguec@gmail.com, emacs-devel@gnu.org
> Date: Sat, 05 Feb 2022 23:13:37 -0500
>
> > I don't understand the specification of these functions. How would
> > diacriticize decide/know that ?~ is equivalent to the ?̃ (U+0303
> > COMBINING TILDE) that is part of ?ã ?
>
> You know more about Unicode than I do, so I'm sure it is true _in some
> sense_ that "U+0303 (COMBINING TILDE) is part of ?ã".
>
> But I have doubts that that particular sense is the one that is
> pertinent to the job `diacriticize' is meant to do.
>
> I think you mean that one can represent the glyph image `ã' in Unicode
> as a composition using a sequence of `a' and COMBINING TILDE. Please
> tell me if I am mistaken.
You are not mistaken. The character 'ã' can be "decomposed" into 2
characters, 'a' and COMBINING TILDE. This is called "canonical
decomposition" in Unicode.
> The ã in this sentence is not a composition. It is a single
> Unicode character, which is also in Latin-1. I don't think that
> COMBINING TILDE is "part of it".
It is, in the sense that the original character can be decomposed.
> But how do you propose
> to make the leap from ?̃ to ?~ ?
>
>
>
> (defconst unicode-combining-chars-alist '(... (?~ . ?̃ ) ...))
So you mean we should create a database of ASCII characters that
approximate the combining diacriticals? But if so, how is it better
than having a database of complete characters and their ASCII
equivalents, like we have now in latin1-disp.el? Your proposal may
make the database smaller (and even that mostly only for Latin
characters), but a database of complete characters makes it easier to
make sure the results are optimal, because you see the original
complete character and the complete equivalent, instead of "composing"
them in your head for all the combinations.
I think reasonable appearance is more important than memory
consumption in this case, and other than that, your proposal just
means replacing one database by another, right?
> However, `ucs-normalize-NFD-string' does not know anything about
> ligatures. Given the fi ligature, it returns the fi ligature.
You need a different kind of decomposition for that, called
"compatibility decomposition":
(ucs-normalize-NFKD-string "fi") => "fi"
You can use ucs-normalize-NFKD-string for the job of
ucs-normalize-NFD-string as well:
(append (ucs-normalize-NFKD-string "ã") nil) => (97 771)
(I used 'append' here to make it evident that the result of the
decomposition is 2 characters, not one, since the Emacs display will
by default combine them into the same glyph as the original non-ASCII
character, and an innocent reader could think the decomposition didn't
work.)
next prev parent reply other threads:[~2022-02-06 8:56 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 4:15 Can watermarking Unicode text using invisible differences sneak through Emacs, or can Emacs detect it? Richard Stallman
2022-01-19 4:47 ` Po Lu
2022-01-19 10:05 ` Phil Sainty
2022-01-19 11:43 ` Eli Zaretskii
2022-01-21 4:13 ` Richard Stallman
2022-01-21 7:49 ` Eli Zaretskii
2022-01-22 4:37 ` Richard Stallman
2022-01-22 6:58 ` Eli Zaretskii
2022-01-24 4:33 ` Richard Stallman
2022-01-24 5:06 ` Po Lu
2022-01-25 4:17 ` Richard Stallman
2022-01-25 4:58 ` Po Lu
2022-01-24 12:14 ` Eli Zaretskii
2022-01-25 4:16 ` Richard Stallman
2022-01-25 6:35 ` Eli Zaretskii
2022-01-25 12:12 ` Eli Zaretskii
2022-01-25 4:16 ` New feature: displaying ligature characters in the buffer Richard Stallman
2022-01-25 6:31 ` Eli Zaretskii
2022-01-27 4:12 ` Richard Stallman
2022-01-27 7:58 ` Eli Zaretskii
2022-01-25 11:08 ` Can watermarking Unicode text using invisible differences sneak through Emacs, or can Emacs detect it? Kévin Le Gouguec
2022-01-25 12:38 ` Eli Zaretskii
2022-01-26 3:39 ` Richard Stallman
2022-01-26 5:38 ` Eli Zaretskii
2022-01-28 13:04 ` Richard Stallman
2022-01-28 13:31 ` Eli Zaretskii
2022-01-30 4:17 ` Richard Stallman
2022-01-30 7:36 ` Eli Zaretskii
2022-01-31 4:02 ` Richard Stallman
2022-01-31 13:05 ` Eli Zaretskii
2022-02-01 5:06 ` Richard Stallman
2022-02-01 14:57 ` Eli Zaretskii
2022-02-02 3:58 ` Richard Stallman
2022-02-02 12:28 ` Eli Zaretskii
2022-02-03 4:23 ` Richard Stallman
2022-02-03 7:53 ` Eli Zaretskii
2022-02-03 8:16 ` Yuri Khan
2022-02-03 9:26 ` Eli Zaretskii
2022-02-04 3:52 ` Richard Stallman
2022-02-04 4:56 ` Yuri Khan
2022-02-06 4:13 ` Richard Stallman
2022-02-04 8:10 ` Eli Zaretskii
2022-02-06 4:13 ` Richard Stallman
2022-02-03 20:28 ` Tomas Hlavaty
2022-02-04 7:07 ` Eli Zaretskii
2022-02-05 4:20 ` Richard Stallman
2022-02-05 13:55 ` Tomas Hlavaty
2022-02-05 14:06 ` Eli Zaretskii
2022-02-05 14:12 ` Eli Zaretskii
2022-02-06 1:29 ` Tomas Hlavaty
2022-02-06 8:30 ` Eli Zaretskii
2022-02-06 10:38 ` Tomas Hlavaty
2022-02-06 10:44 ` Eli Zaretskii
2022-02-06 10:54 ` Andreas Schwab
2022-02-06 1:10 ` Tomas Hlavaty
2022-02-06 4:16 ` Richard Stallman
2022-02-06 4:16 ` Richard Stallman
2022-02-06 11:29 ` Tomas Hlavaty
2022-02-04 3:52 ` Richard Stallman
2022-02-04 8:03 ` Eli Zaretskii
2022-02-06 4:13 ` Richard Stallman
2022-02-06 8:56 ` Eli Zaretskii [this message]
2022-02-07 5:11 ` Richard Stallman
2022-02-07 13:16 ` Eli Zaretskii
2022-02-08 3:55 ` Richard Stallman
2022-02-08 12:20 ` Eli Zaretskii
2022-02-09 4:06 ` Richard Stallman
2022-02-09 13:50 ` Eli Zaretskii
2022-02-10 3:57 ` Richard Stallman
2022-02-10 6:26 ` Eli Zaretskii
2022-02-12 3:57 ` Richard Stallman
2022-02-12 7:36 ` Eli Zaretskii
2022-02-14 4:13 ` Richard Stallman
2022-02-14 12:07 ` Eli Zaretskii
2022-02-15 4:33 ` Richard Stallman
2022-02-15 13:32 ` Eli Zaretskii
2022-02-16 4:14 ` Richard Stallman
2022-02-16 12:10 ` Eli Zaretskii
2022-02-19 4:54 ` Richard Stallman
2022-02-12 20:10 ` Tomas Hlavaty
2022-02-14 4:14 ` Richard Stallman
2022-01-26 8:20 ` Andreas Schwab
2022-01-27 4:13 ` Richard Stallman
2022-01-27 6:39 ` Eli Zaretskii
2022-01-27 8:13 ` Kévin Le Gouguec
2022-01-27 9:55 ` Eli Zaretskii
2022-01-27 10:29 ` Eli Zaretskii
2022-01-27 17:36 ` Kévin Le Gouguec
2022-01-27 18:38 ` Eli Zaretskii
2022-01-20 3:17 ` Richard Stallman
2022-01-20 4:54 ` Phil Sainty
2022-01-20 6:39 ` tomas
2022-01-20 17:58 ` [External] : " Drew Adams
2022-01-22 4:37 ` Richard Stallman
2022-01-22 5:16 ` Po Lu
2022-01-20 7:57 ` Eli Zaretskii
2022-01-20 6:35 ` Tim Cross
2022-01-20 7:39 ` tomas
2022-01-20 8:20 ` Eli Zaretskii
2022-01-20 7:48 ` Eli Zaretskii
2022-01-20 8:17 ` Lars Ingebrigtsen
2022-01-21 4:14 ` Richard Stallman
2022-01-19 8:20 ` Eli Zaretskii
2022-01-19 17:36 ` T.V Raman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83sfswz834.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=kevin.legouguec@gmail.com \
--cc=luangruo@yahoo.com \
--cc=psainty@orcon.net.nz \
--cc=rms@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).