unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* bidi-string-strip-control-characters
@ 2022-01-20  9:23 Eli Zaretskii
  2022-01-20  9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-01-20  9:23 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

Lars, I'm not sure I understand the purpose of this function.  Can you
explain?

The way it is currently used is also strange, to say the least: you
apply it to a string made of a single character, so either it does
nothing to the string, or it will return an empty string.  So the
following code will present the user with a riddle:

  (textsec-email-address-header-suspicious-p
   "Lars Ingebrigtsen <larsi@\N{RIGHT-TO-LEFT OVERRIDE}gnus.org>")
  "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)"

The empty string between quotes is the riddle.

I think I understand the original problem: displaying a literal U+202E
there will mess up the text on display, but if that is the reason, the
right way is not to remove the character, it is to append to it the
necessary bidi controls to prevent the messup (and make the appended
controls be invisible).

Here's an example:

  (insert (format "Disallowed character: `%s' (#x202e, RIGHT-TO-LEFT OVERRIDE)"
		(concat (string ?\x202e)
			(propertize (string ?\x202c ?\x200e) 'invisible t))))

This displays the RLO character, but doesn't mess up the description
after it.

We do something like that in descr-text.el, so I guess we need to
factor out that code and use it here.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20  9:23 bidi-string-strip-control-characters Eli Zaretskii
@ 2022-01-20  9:29 ` Lars Ingebrigtsen
  2022-01-20 10:14   ` bidi-string-strip-control-characters Eli Zaretskii
  2022-01-20 11:04   ` bidi-string-strip-control-characters Po Lu
  0 siblings, 2 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-01-20  9:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Lars, I'm not sure I understand the purpose of this function.  Can you
> explain?

Like the NEWS item says, it's for cases where you want to ensure that
there's no bidiness going on.

> The way it is currently used is also strange, to say the least: you
> apply it to a string made of a single character, so either it does
> nothing to the string, or it will return an empty string.  So the
> following code will present the user with a riddle:
>
>   (textsec-email-address-header-suspicious-p
>    "Lars Ingebrigtsen <larsi@\N{RIGHT-TO-LEFT OVERRIDE}gnus.org>")
>   "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)"
>
> The empty string between quotes is the riddle.

Well...  perhaps not optimal, but not really a riddle.  But the function
will probably be used elsewhere in textsec, too, but I haven't gotten
round to auditing all the strings yet.

> I think I understand the original problem: displaying a literal U+202E
> there will mess up the text on display, but if that is the reason, the
> right way is not to remove the character, it is to append to it the
> necessary bidi controls to prevent the messup (and make the appended
> controls be invisible).
>
> Here's an example:
>
>   (insert (format "Disallowed character: `%s' (#x202e, RIGHT-TO-LEFT OVERRIDE)"
> 		(concat (string ?\x202e)
> 			(propertize (string ?\x202c ?\x200e) 'invisible t))))
>
> This displays the RLO character, but doesn't mess up the description
> after it.

The display is identical to the one we have now, though:

   "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)"

So still a riddle.

But removing the bidi chars is "obviously correct" (and impervious to
future attacks) for somebody that's not that familiar with the bidi
machinery, so I prefer to remove the chars instead here.

> We do something like that in descr-text.el, so I guess we need to
> factor out that code and use it here.

Isn't that bidi-string-mark-left-to-right?  I forget.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20  9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen
@ 2022-01-20 10:14   ` Eli Zaretskii
  2022-01-20 12:47     ` bidi-string-strip-control-characters Lars Ingebrigtsen
  2022-01-20 11:04   ` bidi-string-strip-control-characters Po Lu
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-01-20 10:14 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Thu, 20 Jan 2022 10:29:26 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Lars, I'm not sure I understand the purpose of this function.  Can you
> > explain?
> 
> Like the NEWS item says, it's for cases where you want to ensure that
> there's no bidiness going on.

But when is that useful, the current specific use case aside?

> >   (textsec-email-address-header-suspicious-p
> >    "Lars Ingebrigtsen <larsi@\N{RIGHT-TO-LEFT OVERRIDE}gnus.org>")
> >   "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)"
> >
> > The empty string between quotes is the riddle.
> 
> Well...  perhaps not optimal, but not really a riddle.  But the function
> will probably be used elsewhere in textsec, too, but I haven't gotten
> round to auditing all the strings yet.

That's why I think we should discuss the issue now.  I don't think
removing the bidi controls is TRT, as it will make some text hard to
read and interpret.  We can do better.

> >   (insert (format "Disallowed character: `%s' (#x202e, RIGHT-TO-LEFT OVERRIDE)"
> > 		(concat (string ?\x202e)
> > 			(propertize (string ?\x202c ?\x200e) 'invisible t))))
> >
> > This displays the RLO character, but doesn't mess up the description
> > after it.
> 
> The display is identical to the one we have now, though:
> 
>    "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)"

No, it isn't identical, because in the latter case the U+202E glyph is
retained on display.  (It disappeared from your email for some reason,
but if I eval the form, I see it between the quotes.)

> But removing the bidi chars is "obviously correct" (and impervious to
> future attacks) for somebody that's not that familiar with the bidi
> machinery, so I prefer to remove the chars instead here.

You make this stuff hard to read for a reason that doesn't sound right
to me: we do have better solutions that still avoid messing up the
display.  We use those other solutions elsewhere in Emacs, so why not
here?

> Isn't that bidi-string-mark-left-to-right?

Yes, but bidi-string-mark-left-to-right will not help with overrides,
it only helps with "normal" RTL characters.  We do need a new API,
just not one that removes the bidi controls entirely, that is too
drastic.  What we do in descr-text.el provides a full solution, we
just need to factor it out into a separate function.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20  9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen
  2022-01-20 10:14   ` bidi-string-strip-control-characters Eli Zaretskii
@ 2022-01-20 11:04   ` Po Lu
  2022-01-20 11:19     ` bidi-string-strip-control-characters Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: Po Lu @ 2022-01-20 11:04 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> The display is identical to the one we have now, though:
>
>    "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)"

I think the best thing in this case would be to get rid of the quotation
marks entirely, and just display:

    "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE"



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 11:04   ` bidi-string-strip-control-characters Po Lu
@ 2022-01-20 11:19     ` Eli Zaretskii
  2022-01-20 11:21       ` bidi-string-strip-control-characters Po Lu
  2022-01-20 11:23       ` bidi-string-strip-control-characters Lars Ingebrigtsen
  0 siblings, 2 replies; 14+ messages in thread
From: Eli Zaretskii @ 2022-01-20 11:19 UTC (permalink / raw)
  To: Po Lu; +Cc: larsi, emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Thu, 20 Jan 2022 19:04:25 +0800
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> > The display is identical to the one we have now, though:
> >
> >    "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)"
> 
> I think the best thing in this case would be to get rid of the quotation
> marks entirely, and just display:
> 
>     "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE"

What is "this case" in this context?  IOW, under what conditions do
you suggest to omit the character itself?

And in any case, the question about the function is more general.
Unless you suggest to remove it, that is.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 11:19     ` bidi-string-strip-control-characters Eli Zaretskii
@ 2022-01-20 11:21       ` Po Lu
  2022-01-20 11:23       ` bidi-string-strip-control-characters Lars Ingebrigtsen
  1 sibling, 0 replies; 14+ messages in thread
From: Po Lu @ 2022-01-20 11:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> I think the best thing in this case would be to get rid of the quotation
>> marks entirely, and just display:
>> 
>>     "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE"

> What is "this case" in this context?  IOW, under what conditions do
> you suggest to omit the character itself?

When it is stripped by the function, I think.  Or perhaps I
misunderstood what this function is supposed to do.

> And in any case, the question about the function is more general.

Fair enough, thanks.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 11:19     ` bidi-string-strip-control-characters Eli Zaretskii
  2022-01-20 11:21       ` bidi-string-strip-control-characters Po Lu
@ 2022-01-20 11:23       ` Lars Ingebrigtsen
  2022-01-20 11:33         ` bidi-string-strip-control-characters Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-01-20 11:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Po Lu, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>>     "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE"
>
> What is "this case" in this context?  IOW, under what conditions do
> you suggest to omit the character itself?

I think it makes sense to change the display to what Po Lu suggested
when the char is glyphless.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 11:23       ` bidi-string-strip-control-characters Lars Ingebrigtsen
@ 2022-01-20 11:33         ` Eli Zaretskii
  2022-01-20 12:46           ` bidi-string-strip-control-characters Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-01-20 11:33 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: luangruo, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Po Lu <luangruo@yahoo.com>,  emacs-devel@gnu.org
> Date: Thu, 20 Jan 2022 12:23:20 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >>     "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE"
> >
> > What is "this case" in this context?  IOW, under what conditions do
> > you suggest to omit the character itself?
> 
> I think it makes sense to change the display to what Po Lu suggested
> when the char is glyphless.

That'd be fine by me for this particular use case.

But what are we going to do with the function in the Subject?  Is it
still useful enough to have it, if we make that change?




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 11:33         ` bidi-string-strip-control-characters Eli Zaretskii
@ 2022-01-20 12:46           ` Lars Ingebrigtsen
  2022-01-20 13:02             ` bidi-string-strip-control-characters Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-01-20 12:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> But what are we going to do with the function in the Subject?  Is it
> still useful enough to have it, if we make that change?

We're still using it in textsec-link-suspicious-p...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 10:14   ` bidi-string-strip-control-characters Eli Zaretskii
@ 2022-01-20 12:47     ` Lars Ingebrigtsen
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-01-20 12:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Isn't that bidi-string-mark-left-to-right?
>
> Yes, but bidi-string-mark-left-to-right will not help with overrides,
> it only helps with "normal" RTL characters.  We do need a new API,
> just not one that removes the bidi controls entirely, that is too
> drastic.  What we do in descr-text.el provides a full solution, we
> just need to factor it out into a separate function.

Right -- sounds like a useful function to have in general, too.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 12:46           ` bidi-string-strip-control-characters Lars Ingebrigtsen
@ 2022-01-20 13:02             ` Eli Zaretskii
  2022-01-20 13:36               ` bidi-string-strip-control-characters Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-01-20 13:02 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: luangruo, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: luangruo@yahoo.com,  emacs-devel@gnu.org
> Date: Thu, 20 Jan 2022 13:46:52 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > But what are we going to do with the function in the Subject?  Is it
> > still useful enough to have it, if we make that change?
> 
> We're still using it in textsec-link-suspicious-p...

How about if, instead of filtering out these controls, the function
would replace them with their printed representation, like \u200e?
Removing them might leave the user wondering what and where is wrong
with the original text.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 13:02             ` bidi-string-strip-control-characters Eli Zaretskii
@ 2022-01-20 13:36               ` Lars Ingebrigtsen
  2022-01-20 16:51                 ` bidi-string-strip-control-characters Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-01-20 13:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> How about if, instead of filtering out these controls, the function
> would replace them with their printed representation, like \u200e?
> Removing them might leave the user wondering what and where is wrong
> with the original text.

Yes, that's even better.  Or even with the \N{NAME} syntax, which would
spell things out even clearer?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 13:36               ` bidi-string-strip-control-characters Lars Ingebrigtsen
@ 2022-01-20 16:51                 ` Eli Zaretskii
  2022-01-21  9:18                   ` bidi-string-strip-control-characters Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-01-20 16:51 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: luangruo, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: luangruo@yahoo.com,  emacs-devel@gnu.org
> Date: Thu, 20 Jan 2022 14:36:19 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > How about if, instead of filtering out these controls, the function
> > would replace them with their printed representation, like \u200e?
> > Removing them might leave the user wondering what and where is wrong
> > with the original text.
> 
> Yes, that's even better.  Or even with the \N{NAME} syntax, which would
> spell things out even clearer?

The problem with \N{NAME} is that it's very long, which makes the text
inconvenient to display and read.  So I tend to favor the \uNNNN
alternative.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bidi-string-strip-control-characters
  2022-01-20 16:51                 ` bidi-string-strip-control-characters Eli Zaretskii
@ 2022-01-21  9:18                   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-01-21  9:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> The problem with \N{NAME} is that it's very long, which makes the text
> inconvenient to display and read.  So I tend to favor the \uNNNN
> alternative.

Yes, having a too-ling string would be annoying, too, so \uNNNN is fine
by me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-21  9:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-20  9:23 bidi-string-strip-control-characters Eli Zaretskii
2022-01-20  9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen
2022-01-20 10:14   ` bidi-string-strip-control-characters Eli Zaretskii
2022-01-20 12:47     ` bidi-string-strip-control-characters Lars Ingebrigtsen
2022-01-20 11:04   ` bidi-string-strip-control-characters Po Lu
2022-01-20 11:19     ` bidi-string-strip-control-characters Eli Zaretskii
2022-01-20 11:21       ` bidi-string-strip-control-characters Po Lu
2022-01-20 11:23       ` bidi-string-strip-control-characters Lars Ingebrigtsen
2022-01-20 11:33         ` bidi-string-strip-control-characters Eli Zaretskii
2022-01-20 12:46           ` bidi-string-strip-control-characters Lars Ingebrigtsen
2022-01-20 13:02             ` bidi-string-strip-control-characters Eli Zaretskii
2022-01-20 13:36               ` bidi-string-strip-control-characters Lars Ingebrigtsen
2022-01-20 16:51                 ` bidi-string-strip-control-characters Eli Zaretskii
2022-01-21  9:18                   ` bidi-string-strip-control-characters Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).