all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Kangas <stefan@marxist.se>
Cc: db48x@db48x.net, cpitclaudel@gmail.com, yuri.v.khan@gmail.com,
	monnier@iro.umontreal.ca, emacs-devel@gnu.org
Subject: Re: Unicode confusables and reordering characters considered harmful,  a simple solution
Date: Fri, 05 Nov 2021 16:19:56 +0200	[thread overview]
Message-ID: <831r3uelbn.fsf@gnu.org> (raw)
In-Reply-To: <CADwFkmnanWDECKs139eFEd_-N7sMD7AB6f0if+WFMm36d3v=5w@mail.gmail.com> (message from Stefan Kangas on Fri, 5 Nov 2021 06:08:42 -0700)

> From: Stefan Kangas <stefan@marxist.se>
> Date: Fri, 5 Nov 2021 06:08:42 -0700
> Cc: db48x@db48x.net, cpitclaudel@gmail.com, emacs-devel@gnu.org, 
> 	monnier@iro.umontreal.ca, yuri.v.khan@gmail.com
> 
> I didn't study `bidi-find-overridden-directionality' yet, but the
> "Trojan Source" paper writes:
> 
>     "By banning all directionality-control characters, users with
>     legitimate Bidi-override use cases in comments are penalized.
>     Therefore, a better defense might be to ban the use of
>     _unterminated_ Bidi override characters within string literals and
>     comments.  By ensuring that each override is terminated – that is,
>     for example, that every LRI has a matching PDI– it becomes
>     impossible to distort legitimate source code outside of string
>     literals and comments."  (p. 8, their emphasis)
> 
> So, IIUC, the problematic cases are "unterminated Bidi override
> characters", and those are the ones worth warning about.  Does that
> sound correct to you?

No.  What they say is simply wrong: such unterminated overrides and
embeddings are perfectly valid.  The Unicode Bidirectional Algorithm
(UBA) mandates (https://unicode.org/reports/tr9/#X8):

  X8. All explicit directional embeddings, overrides and isolates are
  completely terminated at the end of each paragraph.

      Explicit paragraph separators (bidirectional character type B)
      indicate the end of a paragraph. As such, they are not included in
      any embedding, override or isolate. They are simply assigned the
      paragraph embedding level.

And in https://unicode.org/reports/tr9/#Bidirectional_Character_Types
you can see that newline is one of the characters whose bidi type is
B; compare:

  (get-char-code-property ?\n 'bidi-class) => B

So when the UBA says "at the end of each paragraph", it means in
practice at EOL, since all the other paragraph separators are rarely
if ever used in human-readable text.  (And Emacs, of course,
implements that rule.)

The authors of the paper simply don't understand the bidi stuff well
enough to make useful proposals about this.  They should have bring
this up on the Unicode mailing list, where at least the experts (and I
don't mean myself, I mean the people who wrote the UBA) could set them
straight.

I encourage you to read the comments in the implementation I wrote, to
see which cases I consider "suspicious".  The comments need to be read
with the UBA spec in mind, at least its Xn rules.  I will be happy to
explain or clarify if something is unclear there.  This is a complex
issue, and discussing it rationally could really enhance our
understanding and handling of these cases.

> > Adding one line is a nuisance.  If it can be avoided, we should avoid
> > it.  Since we are capable of detecting the really suspicious uses of
> > those controls, it is much better to use that, because in that case
> > users will not have to add anything.
> 
> I agree that it does sound better to prefer such an approach if
> possible.

Then let's try to implement that.  If there's a need for more
bidi-specific infrastructure, let me know and I will see what I can
do.

Thanks.



  reply	other threads:[~2021-11-05 14:19 UTC|newest]

Thread overview: 172+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 12:57 Unicode confusables and reordering characters considered harmful Vasilij Schneidermann
2021-11-02 13:18 ` Po Lu
2021-11-02 13:54   ` Uwe Brauer
2021-11-02 14:53     ` Eli Zaretskii
2021-11-02 15:16       ` Eli Zaretskii
2021-11-02 15:21         ` Uwe Brauer
2021-11-02 16:24       ` Clément Pit-Claudel
2021-11-02 16:47         ` Eli Zaretskii
2021-11-02 17:01           ` Stefan Kangas
2021-11-02 17:10             ` Eli Zaretskii
2021-11-02 18:43               ` Stefan Kangas
2021-11-02 18:49                 ` Eli Zaretskii
2021-11-02 19:12                   ` Stefan Monnier
2021-11-02 19:36                     ` Eli Zaretskii
2021-11-02 19:47                       ` Stefan Monnier
2021-11-02 19:51                         ` Eli Zaretskii
2021-11-02 21:28                           ` Unicode confusables and reordering characters considered harmful, a simple solution Daniel Brooks
2021-11-03 13:30                             ` Eli Zaretskii
2021-11-03 17:41                             ` Yuri Khan
2021-11-03 17:56                               ` Eli Zaretskii
2021-11-03 18:20                                 ` Juri Linkov
2021-11-03 19:02                                   ` Gregory Heytings
2021-11-03 19:46                                     ` Eli Zaretskii
2021-11-03 19:58                                       ` Yuri Khan
2021-11-03 20:21                                       ` Gregory Heytings
2021-11-03 20:31                                         ` Eli Zaretskii
2021-11-03 21:16                                           ` Gregory Heytings
2021-11-04  7:16                                             ` Eli Zaretskii
2021-11-04  9:06                                               ` Gregory Heytings
2021-11-04  9:19                                                 ` Eli Zaretskii
2021-11-04  9:48                                                   ` Eli Zaretskii
2021-11-04  8:44                                     ` Juri Linkov
2021-11-03 18:45                                 ` Yuri Khan
2021-11-03 19:09                                   ` Eli Zaretskii
2021-11-03 19:35                                     ` Yuri Khan
2021-11-03 20:01                                       ` Eli Zaretskii
2021-11-03 20:45                                         ` Gregory Heytings
2021-11-03 20:53                                           ` Eli Zaretskii
2021-11-03 21:23                                             ` Gregory Heytings
2021-11-04  6:58                                               ` Eli Zaretskii
2021-11-04  8:53                                                 ` Gregory Heytings
2021-11-04  9:15                                                   ` Eli Zaretskii
2021-11-03 19:54                                     ` Daniel Brooks
2021-11-03 20:08                                       ` Eli Zaretskii
2021-11-04  6:00                                         ` Daniel Brooks
2021-11-04  7:44                                           ` Eli Zaretskii
2021-11-04  9:14                                             ` Gregory Heytings
2021-11-04  9:45                                               ` Eli Zaretskii
2021-11-04 10:41                                                 ` Gregory Heytings
2021-11-04 11:03                                                   ` Po Lu
2021-11-04 11:27                                                     ` Gregory Heytings
2021-11-04 11:20                                                   ` Eli Zaretskii
2021-11-04 11:34                                                     ` Gregory Heytings
2021-11-04 13:25                                                       ` Eli Zaretskii
2021-11-04 14:10                                                         ` Gregory Heytings
2021-11-04 16:50                                                           ` Eli Zaretskii
2021-11-04 17:04                                                             ` Gregory Heytings
2021-11-04 19:16                                                           ` Stefan Monnier
2021-11-05 23:31                                                             ` Gregory Heytings
2021-11-06  7:25                                                               ` Eli Zaretskii
2021-11-04 19:22                                                           ` Stefan Monnier
2021-11-04 19:55                                                             ` Eli Zaretskii
2021-11-05 23:32                                                             ` Gregory Heytings
2021-11-04 19:08                                                     ` Eli Zaretskii
2021-11-04 20:00                                                       ` Eli Zaretskii
2021-11-05  2:23                                             ` Daniel Brooks
2021-11-05  3:52                                               ` Stefan Kangas
2021-11-05  5:21                                                 ` code annotations Daniel Brooks
2021-11-05  5:53                                                   ` Stefan Kangas
2021-11-05  5:23                                                 ` Unicode confusables and reordering characters considered harmful, a simple solution Daniel Brooks
2021-11-05  6:13                                                 ` Po Lu
2021-11-05  7:37                                                 ` Eli Zaretskii
2021-11-05  8:00                                                   ` Stefan Kangas
2021-11-05  8:07                                                     ` Eli Zaretskii
2021-11-05  9:58                                                       ` Stefan Kangas
2021-11-05 12:12                                                         ` Eli Zaretskii
2021-11-05 13:08                                                           ` Stefan Kangas
2021-11-05 14:19                                                             ` Eli Zaretskii [this message]
2021-11-05 23:33                                                               ` Gregory Heytings
2021-11-06  0:54                                                                 ` Daniel Brooks
2021-11-06 10:56                                                                   ` Eli Zaretskii
2021-11-06 10:48                                                                 ` Eli Zaretskii
2021-11-08 19:58                                                                   ` Gregory Heytings
2021-11-08 20:27                                                                     ` Eli Zaretskii
2021-11-08 21:59                                                                       ` Stefan Monnier
2021-11-09  3:28                                                                         ` Eli Zaretskii
2021-11-06 13:58                                                               ` Benjamin Riefenstahl
2021-11-06 15:34                                                                 ` Eli Zaretskii
2021-11-06 17:09                                                                   ` Benjamin Riefenstahl
2021-11-06 17:35                                                                     ` Eli Zaretskii
2021-11-05  8:09                                               ` tomas
2021-11-06  1:09                                                 ` Daniel Brooks
2021-11-05  8:31                                               ` Eli Zaretskii
2021-11-05  9:34                                                 ` Juri Linkov
2021-11-04 19:05                                           ` Stefan Monnier
2021-11-03 21:13                                 ` Daniel Brooks
2021-11-04  6:52                                   ` Eli Zaretskii
2021-11-02 20:18                       ` Unicode confusables and reordering characters considered harmful Tim Cross
2021-11-03  0:28                     ` Gregory Heytings
2021-11-03  1:07                       ` Stefan Monnier
2021-11-03  1:59                         ` Daniel Brooks
2021-11-03 13:35                           ` Eli Zaretskii
2021-11-03  9:59                         ` Gregory Heytings
2021-11-03 11:19                           ` Stefan Kangas
2021-11-03 11:31                             ` Gregory Heytings
2021-11-03 12:20                               ` Stefan Monnier
2021-11-03 12:41                                 ` tomas
2021-11-03 13:15                                   ` Eli Zaretskii
2021-11-03 14:46                                     ` tomas
2021-11-03 17:13                                       ` Eli Zaretskii
2021-11-03 17:34                                         ` tomas
2021-11-03 13:46                                 ` Eli Zaretskii
2021-11-03 13:45                               ` Eli Zaretskii
2021-11-03 13:44                             ` Eli Zaretskii
2021-11-03 14:29                               ` Gregory Heytings
2021-11-03 14:37                                 ` Eli Zaretskii
2021-11-03 16:01                                   ` Gregory Heytings
2021-11-03 17:44                                     ` Eli Zaretskii
2021-11-03 17:53                                       ` Gregory Heytings
2021-11-03 11:29                           ` Andreas Schwab
2021-11-03 18:47                             ` Stefan Monnier
2021-11-03 18:52                               ` Yuri Khan
2021-11-03 19:19                                 ` Stefan Monnier
2021-11-03 19:28                               ` Gregory Heytings
2021-11-03 19:32                                 ` Stefan Monnier
2021-11-03 19:41                                   ` Yuri Khan
2021-11-03 20:12                                   ` Gregory Heytings
2021-11-03 22:03                                     ` Gregory Heytings
2021-11-04  8:50                                       ` Gregory Heytings
2021-11-03 19:51                                 ` Eli Zaretskii
2021-11-03 19:30                               ` Eli Zaretskii
2021-11-03 19:34                                 ` Andreas Schwab
2021-11-03 19:54                                   ` Eli Zaretskii
2021-11-03 13:37                           ` Eli Zaretskii
2021-11-03 18:53                             ` Manuel Giraud
2021-11-03 19:36                               ` Eli Zaretskii
2021-11-03 21:15                                 ` Manuel Giraud
2021-11-04  6:56                                   ` Eli Zaretskii
2021-11-04 19:04                                     ` Eli Zaretskii
2021-11-03 13:33                         ` Eli Zaretskii
2021-11-03 13:31                       ` Eli Zaretskii
2021-11-02 19:26                   ` Stefan Kangas
2021-11-02 19:44                     ` Eli Zaretskii
2021-11-02 19:49                     ` Stefan Monnier
2021-11-02 18:16           ` Clément Pit-Claudel
2021-11-02 18:37             ` Eli Zaretskii
2021-11-02 19:17         ` Yuri Khan
2021-11-02 19:37           ` Eli Zaretskii
2021-11-02 17:24       ` [authors: default bidi-display-reordering is set to t] (was: Unicode confusables and reordering characters considered harmful) Uwe Brauer
2021-11-02 17:37         ` Eli Zaretskii
2021-11-02 14:31   ` Unicode confusables and reordering characters considered harmful Eli Zaretskii
2021-11-02 15:13     ` Uwe Brauer
2021-11-02 13:42 ` tomas
2021-11-02 14:57   ` Stefan Kangas
2021-11-02 14:30 ` Eli Zaretskii
2021-11-02 14:43 ` Clément Pit-Claudel
2021-11-03 15:07   ` Reini Urban
2021-11-03 15:43     ` Stefan Monnier
2021-11-04  7:50       ` Reini Urban
2021-11-04  8:21         ` Eli Zaretskii
2021-11-03 17:24     ` Eli Zaretskii
2021-11-02 14:57 ` Stefan Kangas
2021-11-05 18:53 ` Unicode confusables " Vasilij Schneidermann
2021-11-05 20:03   ` Eli Zaretskii
2021-11-06 11:56     ` Vasilij Schneidermann
2021-11-06 12:20       ` Eli Zaretskii
2021-11-06 13:10         ` Vasilij Schneidermann
2021-11-06 13:29           ` Eli Zaretskii
2021-11-05 21:36   ` Stefan Monnier
2021-11-10 15:47 ` Unicode confusables and reordering characters " Dmitry Gutov
2021-11-10 17:03   ` Eli Zaretskii
2021-11-10 17:15     ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=831r3uelbn.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=cpitclaudel@gmail.com \
    --cc=db48x@db48x.net \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=stefan@marxist.se \
    --cc=yuri.v.khan@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.