From: Eli Zaretskii <eliz@gnu.org>
To: Reini Urban <reini.urban@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Unicode confusables and reordering characters considered harmful
Date: Wed, 03 Nov 2021 19:24:29 +0200 [thread overview]
Message-ID: <83pmrhgnjm.fsf@gnu.org> (raw)
In-Reply-To: <CAHiT=DHQN34ba5pYvdLy7kWb_02G4SuWmDxkL4P66BhXNX3B5A@mail.gmail.com> (message from Reini Urban on Wed, 3 Nov 2021 16:07:51 +0100)
> From: Reini Urban <reini.urban@gmail.com>
> Date: Wed, 3 Nov 2021 16:07:51 +0100
>
> The issue is that libc, the C standard committee, linux and most others are ignoring the unicode identifier
> security guidelines.
> Identifiers must be identifiable, but strings should not be touched.
>
> Identifiers are all names, pathnames, variable names, user names, ... but not arbitrary strings.
> IDE's are just one place to fix it (that's why glib does it), but the core is more important.
>
> The ones who do care about, like java (the compiler), my cperl (the compiler and runtime, because it is
> dynamic), rust (the compiler), glib (the library), do follow these guidelines.
> All C compilers and most others are insecure. Linux Filesystems are insecure. The old APPLE Filesystem
> was secure, the new is again insecure.
> Also the libc's cannot deal with de-normalized characters at all. grep, sed, coreutils all have outstanding
> unorm patches, because libunicode is too slow. Because it iterates over the string via callbacks.
>
> In short you need to normalize each identifier, check for proper XID_Start/XID_Continue,
> check your document for mixed scripts (several combinations are allowed, several disallowed,
> HAN unification did a good job, but greek vs cyrillic is the worst), and forbid bidi changes.
I'm not sure I follow: the examples in the original paper which
sparked all this brouhaha didn't touch any identifiers. All the
identifiers in those examples were perfectly compliant with the
Unicode guidelines, AFAIR. What the examples did was insert
directional format controls so as to reorder _punctuation_ characters,
in a way that changes the visual appearance and the interpreted
semantics of the code. All of the format controls were inserted
within whitespace, not inside any identifiers.
So I'm not sure how what you tell is relevant to the issue at hand;
could you perhaps explain?
next prev parent reply other threads:[~2021-11-03 17:24 UTC|newest]
Thread overview: 172+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-02 12:57 Unicode confusables and reordering characters considered harmful Vasilij Schneidermann
2021-11-02 13:18 ` Po Lu
2021-11-02 13:54 ` Uwe Brauer
2021-11-02 14:53 ` Eli Zaretskii
2021-11-02 15:16 ` Eli Zaretskii
2021-11-02 15:21 ` Uwe Brauer
2021-11-02 16:24 ` Clément Pit-Claudel
2021-11-02 16:47 ` Eli Zaretskii
2021-11-02 17:01 ` Stefan Kangas
2021-11-02 17:10 ` Eli Zaretskii
2021-11-02 18:43 ` Stefan Kangas
2021-11-02 18:49 ` Eli Zaretskii
2021-11-02 19:12 ` Stefan Monnier
2021-11-02 19:36 ` Eli Zaretskii
2021-11-02 19:47 ` Stefan Monnier
2021-11-02 19:51 ` Eli Zaretskii
2021-11-02 21:28 ` Unicode confusables and reordering characters considered harmful, a simple solution Daniel Brooks
2021-11-03 13:30 ` Eli Zaretskii
2021-11-03 17:41 ` Yuri Khan
2021-11-03 17:56 ` Eli Zaretskii
2021-11-03 18:20 ` Juri Linkov
2021-11-03 19:02 ` Gregory Heytings
2021-11-03 19:46 ` Eli Zaretskii
2021-11-03 19:58 ` Yuri Khan
2021-11-03 20:21 ` Gregory Heytings
2021-11-03 20:31 ` Eli Zaretskii
2021-11-03 21:16 ` Gregory Heytings
2021-11-04 7:16 ` Eli Zaretskii
2021-11-04 9:06 ` Gregory Heytings
2021-11-04 9:19 ` Eli Zaretskii
2021-11-04 9:48 ` Eli Zaretskii
2021-11-04 8:44 ` Juri Linkov
2021-11-03 18:45 ` Yuri Khan
2021-11-03 19:09 ` Eli Zaretskii
2021-11-03 19:35 ` Yuri Khan
2021-11-03 20:01 ` Eli Zaretskii
2021-11-03 20:45 ` Gregory Heytings
2021-11-03 20:53 ` Eli Zaretskii
2021-11-03 21:23 ` Gregory Heytings
2021-11-04 6:58 ` Eli Zaretskii
2021-11-04 8:53 ` Gregory Heytings
2021-11-04 9:15 ` Eli Zaretskii
2021-11-03 19:54 ` Daniel Brooks
2021-11-03 20:08 ` Eli Zaretskii
2021-11-04 6:00 ` Daniel Brooks
2021-11-04 7:44 ` Eli Zaretskii
2021-11-04 9:14 ` Gregory Heytings
2021-11-04 9:45 ` Eli Zaretskii
2021-11-04 10:41 ` Gregory Heytings
2021-11-04 11:03 ` Po Lu
2021-11-04 11:27 ` Gregory Heytings
2021-11-04 11:20 ` Eli Zaretskii
2021-11-04 11:34 ` Gregory Heytings
2021-11-04 13:25 ` Eli Zaretskii
2021-11-04 14:10 ` Gregory Heytings
2021-11-04 16:50 ` Eli Zaretskii
2021-11-04 17:04 ` Gregory Heytings
2021-11-04 19:16 ` Stefan Monnier
2021-11-05 23:31 ` Gregory Heytings
2021-11-06 7:25 ` Eli Zaretskii
2021-11-04 19:22 ` Stefan Monnier
2021-11-04 19:55 ` Eli Zaretskii
2021-11-05 23:32 ` Gregory Heytings
2021-11-04 19:08 ` Eli Zaretskii
2021-11-04 20:00 ` Eli Zaretskii
2021-11-05 2:23 ` Daniel Brooks
2021-11-05 3:52 ` Stefan Kangas
2021-11-05 5:21 ` code annotations Daniel Brooks
2021-11-05 5:53 ` Stefan Kangas
2021-11-05 5:23 ` Unicode confusables and reordering characters considered harmful, a simple solution Daniel Brooks
2021-11-05 6:13 ` Po Lu
2021-11-05 7:37 ` Eli Zaretskii
2021-11-05 8:00 ` Stefan Kangas
2021-11-05 8:07 ` Eli Zaretskii
2021-11-05 9:58 ` Stefan Kangas
2021-11-05 12:12 ` Eli Zaretskii
2021-11-05 13:08 ` Stefan Kangas
2021-11-05 14:19 ` Eli Zaretskii
2021-11-05 23:33 ` Gregory Heytings
2021-11-06 0:54 ` Daniel Brooks
2021-11-06 10:56 ` Eli Zaretskii
2021-11-06 10:48 ` Eli Zaretskii
2021-11-08 19:58 ` Gregory Heytings
2021-11-08 20:27 ` Eli Zaretskii
2021-11-08 21:59 ` Stefan Monnier
2021-11-09 3:28 ` Eli Zaretskii
2021-11-06 13:58 ` Benjamin Riefenstahl
2021-11-06 15:34 ` Eli Zaretskii
2021-11-06 17:09 ` Benjamin Riefenstahl
2021-11-06 17:35 ` Eli Zaretskii
2021-11-05 8:09 ` tomas
2021-11-06 1:09 ` Daniel Brooks
2021-11-05 8:31 ` Eli Zaretskii
2021-11-05 9:34 ` Juri Linkov
2021-11-04 19:05 ` Stefan Monnier
2021-11-03 21:13 ` Daniel Brooks
2021-11-04 6:52 ` Eli Zaretskii
2021-11-02 20:18 ` Unicode confusables and reordering characters considered harmful Tim Cross
2021-11-03 0:28 ` Gregory Heytings
2021-11-03 1:07 ` Stefan Monnier
2021-11-03 1:59 ` Daniel Brooks
2021-11-03 13:35 ` Eli Zaretskii
2021-11-03 9:59 ` Gregory Heytings
2021-11-03 11:19 ` Stefan Kangas
2021-11-03 11:31 ` Gregory Heytings
2021-11-03 12:20 ` Stefan Monnier
2021-11-03 12:41 ` tomas
2021-11-03 13:15 ` Eli Zaretskii
2021-11-03 14:46 ` tomas
2021-11-03 17:13 ` Eli Zaretskii
2021-11-03 17:34 ` tomas
2021-11-03 13:46 ` Eli Zaretskii
2021-11-03 13:45 ` Eli Zaretskii
2021-11-03 13:44 ` Eli Zaretskii
2021-11-03 14:29 ` Gregory Heytings
2021-11-03 14:37 ` Eli Zaretskii
2021-11-03 16:01 ` Gregory Heytings
2021-11-03 17:44 ` Eli Zaretskii
2021-11-03 17:53 ` Gregory Heytings
2021-11-03 11:29 ` Andreas Schwab
2021-11-03 18:47 ` Stefan Monnier
2021-11-03 18:52 ` Yuri Khan
2021-11-03 19:19 ` Stefan Monnier
2021-11-03 19:28 ` Gregory Heytings
2021-11-03 19:32 ` Stefan Monnier
2021-11-03 19:41 ` Yuri Khan
2021-11-03 20:12 ` Gregory Heytings
2021-11-03 22:03 ` Gregory Heytings
2021-11-04 8:50 ` Gregory Heytings
2021-11-03 19:51 ` Eli Zaretskii
2021-11-03 19:30 ` Eli Zaretskii
2021-11-03 19:34 ` Andreas Schwab
2021-11-03 19:54 ` Eli Zaretskii
2021-11-03 13:37 ` Eli Zaretskii
2021-11-03 18:53 ` Manuel Giraud
2021-11-03 19:36 ` Eli Zaretskii
2021-11-03 21:15 ` Manuel Giraud
2021-11-04 6:56 ` Eli Zaretskii
2021-11-04 19:04 ` Eli Zaretskii
2021-11-03 13:33 ` Eli Zaretskii
2021-11-03 13:31 ` Eli Zaretskii
2021-11-02 19:26 ` Stefan Kangas
2021-11-02 19:44 ` Eli Zaretskii
2021-11-02 19:49 ` Stefan Monnier
2021-11-02 18:16 ` Clément Pit-Claudel
2021-11-02 18:37 ` Eli Zaretskii
2021-11-02 19:17 ` Yuri Khan
2021-11-02 19:37 ` Eli Zaretskii
2021-11-02 17:24 ` [authors: default bidi-display-reordering is set to t] (was: Unicode confusables and reordering characters considered harmful) Uwe Brauer
2021-11-02 17:37 ` Eli Zaretskii
2021-11-02 14:31 ` Unicode confusables and reordering characters considered harmful Eli Zaretskii
2021-11-02 15:13 ` Uwe Brauer
2021-11-02 13:42 ` tomas
2021-11-02 14:57 ` Stefan Kangas
2021-11-02 14:30 ` Eli Zaretskii
2021-11-02 14:43 ` Clément Pit-Claudel
2021-11-03 15:07 ` Reini Urban
2021-11-03 15:43 ` Stefan Monnier
2021-11-04 7:50 ` Reini Urban
2021-11-04 8:21 ` Eli Zaretskii
2021-11-03 17:24 ` Eli Zaretskii [this message]
2021-11-02 14:57 ` Stefan Kangas
2021-11-05 18:53 ` Unicode confusables " Vasilij Schneidermann
2021-11-05 20:03 ` Eli Zaretskii
2021-11-06 11:56 ` Vasilij Schneidermann
2021-11-06 12:20 ` Eli Zaretskii
2021-11-06 13:10 ` Vasilij Schneidermann
2021-11-06 13:29 ` Eli Zaretskii
2021-11-05 21:36 ` Stefan Monnier
2021-11-10 15:47 ` Unicode confusables and reordering characters " Dmitry Gutov
2021-11-10 17:03 ` Eli Zaretskii
2021-11-10 17:15 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83pmrhgnjm.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=reini.urban@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.