* bidi-string-strip-control-characters @ 2022-01-20 9:23 Eli Zaretskii 2022-01-20 9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen 0 siblings, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2022-01-20 9:23 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: emacs-devel Lars, I'm not sure I understand the purpose of this function. Can you explain? The way it is currently used is also strange, to say the least: you apply it to a string made of a single character, so either it does nothing to the string, or it will return an empty string. So the following code will present the user with a riddle: (textsec-email-address-header-suspicious-p "Lars Ingebrigtsen <larsi@\N{RIGHT-TO-LEFT OVERRIDE}gnus.org>") "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" The empty string between quotes is the riddle. I think I understand the original problem: displaying a literal U+202E there will mess up the text on display, but if that is the reason, the right way is not to remove the character, it is to append to it the necessary bidi controls to prevent the messup (and make the appended controls be invisible). Here's an example: (insert (format "Disallowed character: `%s' (#x202e, RIGHT-TO-LEFT OVERRIDE)" (concat (string ?\x202e) (propertize (string ?\x202c ?\x200e) 'invisible t)))) This displays the RLO character, but doesn't mess up the description after it. We do something like that in descr-text.el, so I guess we need to factor out that code and use it here. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 9:23 bidi-string-strip-control-characters Eli Zaretskii @ 2022-01-20 9:29 ` Lars Ingebrigtsen 2022-01-20 10:14 ` bidi-string-strip-control-characters Eli Zaretskii 2022-01-20 11:04 ` bidi-string-strip-control-characters Po Lu 0 siblings, 2 replies; 14+ messages in thread From: Lars Ingebrigtsen @ 2022-01-20 9:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Lars, I'm not sure I understand the purpose of this function. Can you > explain? Like the NEWS item says, it's for cases where you want to ensure that there's no bidiness going on. > The way it is currently used is also strange, to say the least: you > apply it to a string made of a single character, so either it does > nothing to the string, or it will return an empty string. So the > following code will present the user with a riddle: > > (textsec-email-address-header-suspicious-p > "Lars Ingebrigtsen <larsi@\N{RIGHT-TO-LEFT OVERRIDE}gnus.org>") > "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" > > The empty string between quotes is the riddle. Well... perhaps not optimal, but not really a riddle. But the function will probably be used elsewhere in textsec, too, but I haven't gotten round to auditing all the strings yet. > I think I understand the original problem: displaying a literal U+202E > there will mess up the text on display, but if that is the reason, the > right way is not to remove the character, it is to append to it the > necessary bidi controls to prevent the messup (and make the appended > controls be invisible). > > Here's an example: > > (insert (format "Disallowed character: `%s' (#x202e, RIGHT-TO-LEFT OVERRIDE)" > (concat (string ?\x202e) > (propertize (string ?\x202c ?\x200e) 'invisible t)))) > > This displays the RLO character, but doesn't mess up the description > after it. The display is identical to the one we have now, though: "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" So still a riddle. But removing the bidi chars is "obviously correct" (and impervious to future attacks) for somebody that's not that familiar with the bidi machinery, so I prefer to remove the chars instead here. > We do something like that in descr-text.el, so I guess we need to > factor out that code and use it here. Isn't that bidi-string-mark-left-to-right? I forget. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen @ 2022-01-20 10:14 ` Eli Zaretskii 2022-01-20 12:47 ` bidi-string-strip-control-characters Lars Ingebrigtsen 2022-01-20 11:04 ` bidi-string-strip-control-characters Po Lu 1 sibling, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2022-01-20 10:14 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: emacs-devel@gnu.org > Date: Thu, 20 Jan 2022 10:29:26 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Lars, I'm not sure I understand the purpose of this function. Can you > > explain? > > Like the NEWS item says, it's for cases where you want to ensure that > there's no bidiness going on. But when is that useful, the current specific use case aside? > > (textsec-email-address-header-suspicious-p > > "Lars Ingebrigtsen <larsi@\N{RIGHT-TO-LEFT OVERRIDE}gnus.org>") > > "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" > > > > The empty string between quotes is the riddle. > > Well... perhaps not optimal, but not really a riddle. But the function > will probably be used elsewhere in textsec, too, but I haven't gotten > round to auditing all the strings yet. That's why I think we should discuss the issue now. I don't think removing the bidi controls is TRT, as it will make some text hard to read and interpret. We can do better. > > (insert (format "Disallowed character: `%s' (#x202e, RIGHT-TO-LEFT OVERRIDE)" > > (concat (string ?\x202e) > > (propertize (string ?\x202c ?\x200e) 'invisible t)))) > > > > This displays the RLO character, but doesn't mess up the description > > after it. > > The display is identical to the one we have now, though: > > "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" No, it isn't identical, because in the latter case the U+202E glyph is retained on display. (It disappeared from your email for some reason, but if I eval the form, I see it between the quotes.) > But removing the bidi chars is "obviously correct" (and impervious to > future attacks) for somebody that's not that familiar with the bidi > machinery, so I prefer to remove the chars instead here. You make this stuff hard to read for a reason that doesn't sound right to me: we do have better solutions that still avoid messing up the display. We use those other solutions elsewhere in Emacs, so why not here? > Isn't that bidi-string-mark-left-to-right? Yes, but bidi-string-mark-left-to-right will not help with overrides, it only helps with "normal" RTL characters. We do need a new API, just not one that removes the bidi controls entirely, that is too drastic. What we do in descr-text.el provides a full solution, we just need to factor it out into a separate function. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 10:14 ` bidi-string-strip-control-characters Eli Zaretskii @ 2022-01-20 12:47 ` Lars Ingebrigtsen 0 siblings, 0 replies; 14+ messages in thread From: Lars Ingebrigtsen @ 2022-01-20 12:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> Isn't that bidi-string-mark-left-to-right? > > Yes, but bidi-string-mark-left-to-right will not help with overrides, > it only helps with "normal" RTL characters. We do need a new API, > just not one that removes the bidi controls entirely, that is too > drastic. What we do in descr-text.el provides a full solution, we > just need to factor it out into a separate function. Right -- sounds like a useful function to have in general, too. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen 2022-01-20 10:14 ` bidi-string-strip-control-characters Eli Zaretskii @ 2022-01-20 11:04 ` Po Lu 2022-01-20 11:19 ` bidi-string-strip-control-characters Eli Zaretskii 1 sibling, 1 reply; 14+ messages in thread From: Po Lu @ 2022-01-20 11:04 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel Lars Ingebrigtsen <larsi@gnus.org> writes: > The display is identical to the one we have now, though: > > "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" I think the best thing in this case would be to get rid of the quotation marks entirely, and just display: "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE" ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 11:04 ` bidi-string-strip-control-characters Po Lu @ 2022-01-20 11:19 ` Eli Zaretskii 2022-01-20 11:21 ` bidi-string-strip-control-characters Po Lu 2022-01-20 11:23 ` bidi-string-strip-control-characters Lars Ingebrigtsen 0 siblings, 2 replies; 14+ messages in thread From: Eli Zaretskii @ 2022-01-20 11:19 UTC (permalink / raw) To: Po Lu; +Cc: larsi, emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org > Date: Thu, 20 Jan 2022 19:04:25 +0800 > > Lars Ingebrigtsen <larsi@gnus.org> writes: > > > The display is identical to the one we have now, though: > > > > "Disallowed character: `' (#x202e, RIGHT-TO-LEFT OVERRIDE)" > > I think the best thing in this case would be to get rid of the quotation > marks entirely, and just display: > > "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE" What is "this case" in this context? IOW, under what conditions do you suggest to omit the character itself? And in any case, the question about the function is more general. Unless you suggest to remove it, that is. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 11:19 ` bidi-string-strip-control-characters Eli Zaretskii @ 2022-01-20 11:21 ` Po Lu 2022-01-20 11:23 ` bidi-string-strip-control-characters Lars Ingebrigtsen 1 sibling, 0 replies; 14+ messages in thread From: Po Lu @ 2022-01-20 11:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> I think the best thing in this case would be to get rid of the quotation >> marks entirely, and just display: >> >> "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE" > What is "this case" in this context? IOW, under what conditions do > you suggest to omit the character itself? When it is stripped by the function, I think. Or perhaps I misunderstood what this function is supposed to do. > And in any case, the question about the function is more general. Fair enough, thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 11:19 ` bidi-string-strip-control-characters Eli Zaretskii 2022-01-20 11:21 ` bidi-string-strip-control-characters Po Lu @ 2022-01-20 11:23 ` Lars Ingebrigtsen 2022-01-20 11:33 ` bidi-string-strip-control-characters Eli Zaretskii 1 sibling, 1 reply; 14+ messages in thread From: Lars Ingebrigtsen @ 2022-01-20 11:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Po Lu, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE" > > What is "this case" in this context? IOW, under what conditions do > you suggest to omit the character itself? I think it makes sense to change the display to what Po Lu suggested when the char is glyphless. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 11:23 ` bidi-string-strip-control-characters Lars Ingebrigtsen @ 2022-01-20 11:33 ` Eli Zaretskii 2022-01-20 12:46 ` bidi-string-strip-control-characters Lars Ingebrigtsen 0 siblings, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2022-01-20 11:33 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: luangruo, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Po Lu <luangruo@yahoo.com>, emacs-devel@gnu.org > Date: Thu, 20 Jan 2022 12:23:20 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> "Disallowed character: #x202e, RIGHT-TO-LEFT OVERRIDE" > > > > What is "this case" in this context? IOW, under what conditions do > > you suggest to omit the character itself? > > I think it makes sense to change the display to what Po Lu suggested > when the char is glyphless. That'd be fine by me for this particular use case. But what are we going to do with the function in the Subject? Is it still useful enough to have it, if we make that change? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 11:33 ` bidi-string-strip-control-characters Eli Zaretskii @ 2022-01-20 12:46 ` Lars Ingebrigtsen 2022-01-20 13:02 ` bidi-string-strip-control-characters Eli Zaretskii 0 siblings, 1 reply; 14+ messages in thread From: Lars Ingebrigtsen @ 2022-01-20 12:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: luangruo, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > But what are we going to do with the function in the Subject? Is it > still useful enough to have it, if we make that change? We're still using it in textsec-link-suspicious-p... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 12:46 ` bidi-string-strip-control-characters Lars Ingebrigtsen @ 2022-01-20 13:02 ` Eli Zaretskii 2022-01-20 13:36 ` bidi-string-strip-control-characters Lars Ingebrigtsen 0 siblings, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2022-01-20 13:02 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: luangruo, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: luangruo@yahoo.com, emacs-devel@gnu.org > Date: Thu, 20 Jan 2022 13:46:52 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > But what are we going to do with the function in the Subject? Is it > > still useful enough to have it, if we make that change? > > We're still using it in textsec-link-suspicious-p... How about if, instead of filtering out these controls, the function would replace them with their printed representation, like \u200e? Removing them might leave the user wondering what and where is wrong with the original text. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 13:02 ` bidi-string-strip-control-characters Eli Zaretskii @ 2022-01-20 13:36 ` Lars Ingebrigtsen 2022-01-20 16:51 ` bidi-string-strip-control-characters Eli Zaretskii 0 siblings, 1 reply; 14+ messages in thread From: Lars Ingebrigtsen @ 2022-01-20 13:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: luangruo, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > How about if, instead of filtering out these controls, the function > would replace them with their printed representation, like \u200e? > Removing them might leave the user wondering what and where is wrong > with the original text. Yes, that's even better. Or even with the \N{NAME} syntax, which would spell things out even clearer? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 13:36 ` bidi-string-strip-control-characters Lars Ingebrigtsen @ 2022-01-20 16:51 ` Eli Zaretskii 2022-01-21 9:18 ` bidi-string-strip-control-characters Lars Ingebrigtsen 0 siblings, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2022-01-20 16:51 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: luangruo, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: luangruo@yahoo.com, emacs-devel@gnu.org > Date: Thu, 20 Jan 2022 14:36:19 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > How about if, instead of filtering out these controls, the function > > would replace them with their printed representation, like \u200e? > > Removing them might leave the user wondering what and where is wrong > > with the original text. > > Yes, that's even better. Or even with the \N{NAME} syntax, which would > spell things out even clearer? The problem with \N{NAME} is that it's very long, which makes the text inconvenient to display and read. So I tend to favor the \uNNNN alternative. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bidi-string-strip-control-characters 2022-01-20 16:51 ` bidi-string-strip-control-characters Eli Zaretskii @ 2022-01-21 9:18 ` Lars Ingebrigtsen 0 siblings, 0 replies; 14+ messages in thread From: Lars Ingebrigtsen @ 2022-01-21 9:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: luangruo, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > The problem with \N{NAME} is that it's very long, which makes the text > inconvenient to display and read. So I tend to favor the \uNNNN > alternative. Yes, having a too-ling string would be annoying, too, so \uNNNN is fine by me. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-01-21 9:18 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-01-20 9:23 bidi-string-strip-control-characters Eli Zaretskii 2022-01-20 9:29 ` bidi-string-strip-control-characters Lars Ingebrigtsen 2022-01-20 10:14 ` bidi-string-strip-control-characters Eli Zaretskii 2022-01-20 12:47 ` bidi-string-strip-control-characters Lars Ingebrigtsen 2022-01-20 11:04 ` bidi-string-strip-control-characters Po Lu 2022-01-20 11:19 ` bidi-string-strip-control-characters Eli Zaretskii 2022-01-20 11:21 ` bidi-string-strip-control-characters Po Lu 2022-01-20 11:23 ` bidi-string-strip-control-characters Lars Ingebrigtsen 2022-01-20 11:33 ` bidi-string-strip-control-characters Eli Zaretskii 2022-01-20 12:46 ` bidi-string-strip-control-characters Lars Ingebrigtsen 2022-01-20 13:02 ` bidi-string-strip-control-characters Eli Zaretskii 2022-01-20 13:36 ` bidi-string-strip-control-characters Lars Ingebrigtsen 2022-01-20 16:51 ` bidi-string-strip-control-characters Eli Zaretskii 2022-01-21 9:18 ` bidi-string-strip-control-characters Lars Ingebrigtsen
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.