unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Bidirectional text and URLs
@ 2014-11-28  2:51 Lars Magne Ingebrigtsen
  2014-11-28  3:27 ` Stephen J. Turnbull
                   ` (3 more replies)
  0 siblings, 4 replies; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-28  2:51 UTC (permalink / raw)
  To: emacs-devel

Using right-to-left markers to do phishing and obscure URLs has gotten
some attention on the webs today.  For instance, can you easily tell
where the link below takes you if you click on it in Gnus and
(presumably) rmail?

     Works on URLs too.                                               
                                                                      
     ‮http://myspace.com/#/segami/moc.koobecaf//:sptth                 
                                                                      
Unless I messed something up while cut'n'pasting that, you should see
the problem.

Now, should we do something about that?  And if so -- what?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Bidirectional text and URLs
  2014-11-28  2:51 Bidirectional text and URLs Lars Magne Ingebrigtsen
@ 2014-11-28  3:27 ` Stephen J. Turnbull
  2014-11-28 14:54   ` Eli Zaretskii
  2014-11-28 11:19 ` Ted Zlatanov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 133+ messages in thread
From: Stephen J. Turnbull @ 2014-11-28  3:27 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

Lars Magne Ingebrigtsen writes:

 > Using right-to-left markers to do phishing and obscure URLs has gotten
 > some attention on the webs today.  For instance, can you easily tell
 > where the link below takes you if you click on it in Gnus and
 > (presumably) rmail?

Eli's the expert, but I would say that given that the UAX#9 bidi
algorithm does what's wanted 99.44% of the time, it makes sense to
mark text reordered by RTL markers with a warning face, and to the
extent that your UI recognizes URLs, you could even query the user:

    This link appears to have been obfuscated by using unusual
    characters or presentation techniques.  This link points to

    http://myspace.com/#/...

    Is that your intended destination?

if you recognize that the URL was obfuscated (not limited to RTL, but
also out-of-block confusable characters such as a Cyrillic A in an
otherwise ASCII URL and HTML A elements where the displayed text
appears to be a URL that doesn't match the href, etc).

Personally I'll probably just add RTL characters to my .procmailrc,
and never see them in the first place. :-)  Sorry about not noticing
your post, larsi! ;^)

 >      Works on URLs too.                                               
 >                                                                       
 >      ‮http://myspace.com/#/segami/moc.koobecaf//:sptth                 
 >                                                                       
 > Unless I messed something up while cut'n'pasting that, you should see
 > the problem.

Interestingly, it worked temporarily in Terminal.app but then stopped,
I'm not sure why.  A wormy Apple, I guess! ;-)

 > Now, should we do something about that?  And if so -- what?

I think that the query and the statistical analysis of confusables is
likely to be a fair amount of work, if you want to avoid confusing the
user more than the obfuscation does.  A different face should be easy
enough in cases where you have RTL markers or mixed charset blocks.
You do need a way to turn it off, or to make it reasonably smart, in
the case of ASCII which is often mixed with other charsets.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28  2:51 Bidirectional text and URLs Lars Magne Ingebrigtsen
  2014-11-28  3:27 ` Stephen J. Turnbull
@ 2014-11-28 11:19 ` Ted Zlatanov
  2014-11-28 13:58   ` Lars Magne Ingebrigtsen
                     ` (3 more replies)
  2014-11-28 14:45 ` Eli Zaretskii
  2014-11-28 17:09 ` Richard Stallman
  3 siblings, 4 replies; 133+ messages in thread
From: Ted Zlatanov @ 2014-11-28 11:19 UTC (permalink / raw)
  To: emacs-devel

On Fri, 28 Nov 2014 03:51:14 +0100 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: 

LMI> Using right-to-left markers to do phishing and obscure URLs has gotten
LMI> some attention on the webs today.  For instance, can you easily tell
LMI> where the link below takes you if you click on it in Gnus and
LMI> (presumably) rmail?

LMI>      Works on URLs too.                                               
                                                                      
LMI>      ‮http://myspace.com/#/segami/moc.koobecaf//:sptth                 
                                                                      
LMI> Unless I messed something up while cut'n'pasting that, you should see
LMI> the problem.

LMI> Now, should we do something about that?  And if so -- what?

My uni-confusables package in the GNU ELPA would help detect things like
б (CYRILLIC SMALL LETTER BE) confused with the number 6.  The relevant
line from confusables.txt is:

0431 ;	0036 ;	SL	# ( б → 6 ) CYRILLIC SMALL LETTER BE → DIGIT SIX	#

which maps to (1073 "6") in `uni-confusables-char-table-single'. EWW and
SHR could opportunistically use that table to highlight such characters.

I could also add RTL markers and other useful things to uni-confusables
if you think it's the right place, and maybe provide the function for
EWW and SHR and others to use when looking for suspicious characters. Or
I could keep the package to a single purpose. I'm not sure of the right
thing because this feels a little bit like core functionality.

Ted




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 11:19 ` Ted Zlatanov
@ 2014-11-28 13:58   ` Lars Magne Ingebrigtsen
  2014-11-28 19:49     ` Ted Zlatanov
  2014-11-28 14:24   ` Stefan Monnier
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-28 13:58 UTC (permalink / raw)
  To: emacs-devel

Ted Zlatanov <tzz@lifelogs.com> writes:

> My uni-confusables package in the GNU ELPA would help detect things like
> б (CYRILLIC SMALL LETTER BE) confused with the number 6.  The relevant
> line from confusables.txt is:
>
> 0431 ;	0036 ;	SL	# ( б → 6 ) CYRILLIC SMALL LETTER BE → DIGIT SIX	#
>
> which maps to (1073 "6") in `uni-confusables-char-table-single'. EWW and
> SHR could opportunistically use that table to highlight such characters.

Yes, and perhaps use that to do a "are you sure?" if a user tries to
visit https://𝐩𝐚𝐲𝐩𝐚𝐥.com or https://paypal.com.  
But then uni-confusables should perhaps be moved from ELPA to Emacs so
that we can use it generally?

> I could also add RTL markers and other useful things to uni-confusables
> if you think it's the right place, and maybe provide the function for
> EWW and SHR and others to use when looking for suspicious characters. Or
> I could keep the package to a single purpose. I'm not sure of the right
> thing because this feels a little bit like core functionality.

Yeah, I think the RTL stuff sounds kinda like a separate issue that's
even more fundamental than the confusables, perhaps.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 11:19 ` Ted Zlatanov
  2014-11-28 13:58   ` Lars Magne Ingebrigtsen
@ 2014-11-28 14:24   ` Stefan Monnier
  2014-11-28 14:57   ` Eli Zaretskii
  2014-11-29  6:17   ` Stephen J. Turnbull
  3 siblings, 0 replies; 133+ messages in thread
From: Stefan Monnier @ 2014-11-28 14:24 UTC (permalink / raw)
  To: emacs-devel

> which maps to (1073 "6") in `uni-confusables-char-table-single'.  EWW and
> SHR could opportunistically use that table to highlight such characters.

I don't think SHR/EWW can really do that for the buffer's main text,
since AFAIK it doesn't know whether what it displays is supposed to be
a URL or just plain human text (or rather, to do it well it would have
to somehow detect a particular mix of characters).

OTOH it can&should indeed do something (including a bigfat warning for
bidi-ordering codes) when displaying something it knows to be a URL.


        Stefan



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28  2:51 Bidirectional text and URLs Lars Magne Ingebrigtsen
  2014-11-28  3:27 ` Stephen J. Turnbull
  2014-11-28 11:19 ` Ted Zlatanov
@ 2014-11-28 14:45 ` Eli Zaretskii
  2014-11-28 17:09 ` Richard Stallman
  3 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-28 14:45 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Date: Fri, 28 Nov 2014 03:51:14 +0100
> 
> Using right-to-left markers to do phishing and obscure URLs has gotten
> some attention on the webs today.  For instance, can you easily tell
> where the link below takes you if you click on it in Gnus and
> (presumably) rmail?
> 
>      Works on URLs too.                                               
>                                                                       
>      ‮http://myspace.com/#/segami/moc.koobecaf//:sptth                 
>                                                                       
> Unless I messed something up while cut'n'pasting that, you should see
> the problem.
> 
> Now, should we do something about that?  And if so -- what?

It depends on what do we _want_ to do.  All I can do at this stage is
point to the relevant resources (which unfortunately are not helpful
enough IMO when it comes to recommendations for browser-type
applications that need to display such URLs without fooling users):

  http://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing
  http://www.unicode.org/reports/tr39/
  http://www.ietf.org/rfc/rfc3987.txt





^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28  3:27 ` Stephen J. Turnbull
@ 2014-11-28 14:54   ` Eli Zaretskii
  2014-11-29  6:09     ` Stephen J. Turnbull
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-28 14:54 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: larsi, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Date: Fri, 28 Nov 2014 12:27:28 +0900
> Cc: emacs-devel@gnu.org
> 
> Lars Magne Ingebrigtsen writes:
> 
>  > Using right-to-left markers to do phishing and obscure URLs has gotten
>  > some attention on the webs today.  For instance, can you easily tell
>  > where the link below takes you if you click on it in Gnus and
>  > (presumably) rmail?
> 
> Eli's the expert

Not really, not in this particular field.

> but I would say that given that the UAX#9 bidi algorithm does what's
> wanted 99.44% of the time, it makes sense to mark text reordered by
> RTL markers with a warning face

That might be considered an annoyance by users of bidi scripts.
There's any number of perfectly valid URLs that use the same
formatting control characters.

What you suggest might be TRT when left-to-right text is enclosed
within directional override controls (which is what Lars did in his
example).  These controls assign right-to-left directionality to all
the enclosed characters, which is indeed highly suspicious in URLs.

In addition to using a special face, another possibility is to present
the directional overrides in these cases in percent-hex notation,
which will disable their effect on the enclosed text.  Of course, this
should be only done when the enclosed text is entirely made of LTR
characters and neutrals.

Like I said: we should first decide what we want to do in these cases,
and then look around for machinery to implement that.

> You do need a way to turn it off, or to make it reasonably smart, in
> the case of ASCII which is often mixed with other charsets.

Not sure what you mean here.  Care to elaborate?  "Turn off" how?  And
how do you do that without unduly punishing perfectly valid URLs that
need these controls to avoid visual "jumbles"?



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 11:19 ` Ted Zlatanov
  2014-11-28 13:58   ` Lars Magne Ingebrigtsen
  2014-11-28 14:24   ` Stefan Monnier
@ 2014-11-28 14:57   ` Eli Zaretskii
  2014-11-29  6:17   ` Stephen J. Turnbull
  3 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-28 14:57 UTC (permalink / raw)
  To: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Fri, 28 Nov 2014 06:19:31 -0500
> 
> I could also add RTL markers and other useful things to uni-confusables
> if you think it's the right place

I don't think it's TRT to highlight these controls regardless of what
characters they affect.  See my other message for why.

> I'm not sure of the right thing because this feels a little bit like
> core functionality.

What is "core functionality" here?  For that matter, what
functionality are we talking about?  We should first decide what we
want to do with these cases, and only then discuss whether that
functionality belongs to the core.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28  2:51 Bidirectional text and URLs Lars Magne Ingebrigtsen
                   ` (2 preceding siblings ...)
  2014-11-28 14:45 ` Eli Zaretskii
@ 2014-11-28 17:09 ` Richard Stallman
  2014-11-28 18:28   ` Eli Zaretskii
  2014-11-28 19:28   ` Andreas Schwab
  3 siblings, 2 replies; 133+ messages in thread
From: Richard Stallman @ 2014-11-28 17:09 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

There is no legitimate need for such URLs to "work."
Perhaps the Emacs programs that follow a URL
should give an error if there is any special RTL flag character
in the URL.  Or anything else strange or dangerous.


-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 17:09 ` Richard Stallman
@ 2014-11-28 18:28   ` Eli Zaretskii
  2014-11-29 17:03     ` Richard Stallman
  2014-11-28 19:28   ` Andreas Schwab
  1 sibling, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-28 18:28 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Fri, 28 Nov 2014 12:09:51 -0500
> From: Richard Stallman <rms@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> There is no legitimate need for such URLs to "work."

Yes, there is.  Some bidirectional texts can be hard to read without
these control characters.

> Perhaps the Emacs programs that follow a URL
> should give an error if there is any special RTL flag character
> in the URL.  Or anything else strange or dangerous.

That'd be a mistake, IMO.  If we can detect unreasonable or suspicious
uses of these control characters (like when strictly left-to-right
text is included in a right-to-left override embedding), then we
should flag only those.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 17:09 ` Richard Stallman
  2014-11-28 18:28   ` Eli Zaretskii
@ 2014-11-28 19:28   ` Andreas Schwab
  2014-11-29 17:04     ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: Andreas Schwab @ 2014-11-28 19:28 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Lars Magne Ingebrigtsen, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> There is no legitimate need for such URLs to "work."
> Perhaps the Emacs programs that follow a URL
> should give an error if there is any special RTL flag character
> in the URL.  Or anything else strange or dangerous.

The RTL flag character in the example isn't part of the URL, it only
precedes it.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 13:58   ` Lars Magne Ingebrigtsen
@ 2014-11-28 19:49     ` Ted Zlatanov
  2014-11-28 21:02       ` Stefan Monnier
  2014-11-28 22:26       ` Eli Zaretskii
  0 siblings, 2 replies; 133+ messages in thread
From: Ted Zlatanov @ 2014-11-28 19:49 UTC (permalink / raw)
  To: emacs-devel

On Fri, 28 Nov 2014 14:58:27 +0100 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: 

LMI> Ted Zlatanov <tzz@lifelogs.com> writes:
>> My uni-confusables package in the GNU ELPA would help detect things like
>> б (CYRILLIC SMALL LETTER BE) confused with the number 6.  The relevant
>> line from confusables.txt is:
>> 
>> 0431 ;	0036 ;	SL	# ( б → 6 ) CYRILLIC SMALL LETTER BE → DIGIT SIX	#
>> 
>> which maps to (1073 "6") in `uni-confusables-char-table-single'. EWW and
>> SHR could opportunistically use that table to highlight such characters.

LMI> Yes, and perhaps use that to do a "are you sure?" if a user tries to
LMI> visit https://𝐩𝐚𝐲𝐩𝐚𝐥.com or https://paypal.com.  

Right.  At least in the SHR/EWW context we can control that experience,
and also perhaps in places like `browse-url' or `ffap-url-at-point'.

LMI> But then uni-confusables should perhaps be moved from ELPA to Emacs so
LMI> that we can use it generally?

It would probably improve the use experience, yes.  Stefan, WDYT?

On Fri, 28 Nov 2014 09:24:21 -0500 Stefan Monnier <monnier@IRO.UMontreal.CA> wrote: 

>> which maps to (1073 "6") in `uni-confusables-char-table-single'.  EWW and
>> SHR could opportunistically use that table to highlight such characters.

SM> I don't think SHR/EWW can really do that for the buffer's main text,
SM> since AFAIK it doesn't know whether what it displays is supposed to be
SM> a URL or just plain human text (or rather, to do it well it would have
SM> to somehow detect a particular mix of characters).

For Gnus users, for instance, the buffer would be using SHR so there's
some control over the experience and metadata about the content.  You're
right that in general this is not clear, which is why interactive
functions like `browse-url' and others may need to be advised.

SM> OTOH it can&should indeed do something (including a bigfat warning for
SM> bidi-ordering codes) when displaying something it knows to be a URL.

I'm not sure about the bidi markers, Eli can discuss that side.  I'll
try to get the confusables in there and maybe write general code that
bidi markers and others can hook into.

On Fri, 28 Nov 2014 16:57:27 +0200 Eli Zaretskii <eliz@gnu.org> wrote: 

>> I could also add RTL markers and other useful things to uni-confusables
>> if you think it's the right place

EZ> I don't think it's TRT to highlight these controls regardless of what
EZ> characters they affect.  See my other message for why.

OK.

>> I'm not sure of the right thing because this feels a little bit like
>> core functionality.

EZ> What is "core functionality" here?

Things that work in Emacs without customization.

EZ> For that matter, what functionality are we talking about?

The uni-confusables package from the GNU ELPA and glue code to let SHR
and EWW know that a URL includes such characters.

EZ> We should first decide what we want to do with these cases, and only
EZ> then discuss whether that functionality belongs to the core.

I think Lars' suggestion is decent, see above.

Ted




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 19:49     ` Ted Zlatanov
@ 2014-11-28 21:02       ` Stefan Monnier
  2014-11-29  0:26         ` Ted Zlatanov
  2014-11-28 22:26       ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Stefan Monnier @ 2014-11-28 21:02 UTC (permalink / raw)
  To: emacs-devel

> For Gnus users, for instance, the buffer would be using SHR so there's
> some control over the experience and metadata about the content.  You're
> right that in general this is not clear, which is why interactive
> functions like `browse-url' and others may need to be advised.

What I meant is that in the SHR case, the text displayed is not the URL
but some random piece of text that should be highlighted as a button
(although in some cases it is the same text as the URL itself, SHR has
no idea whether that's the case or not).

We can do something in the Gnus case rendering non-HTML contents, where
the URL is highlighted as a button, because at that point we do display
something which we know is supposed to be interpreted by the user as a URL.


        Stefan



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 19:49     ` Ted Zlatanov
  2014-11-28 21:02       ` Stefan Monnier
@ 2014-11-28 22:26       ` Eli Zaretskii
  1 sibling, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-28 22:26 UTC (permalink / raw)
  To: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Fri, 28 Nov 2014 14:49:59 -0500
> 
> I'm not sure about the bidi markers, Eli can discuss that side.  I'll
> try to get the confusables in there and maybe write general code that
> bidi markers and others can hook into.

I cannot say I can follow that.  Those "bidi markers" are just
characters, so how can they hook into something?

> EZ> For that matter, what functionality are we talking about?
> 
> The uni-confusables package from the GNU ELPA and glue code to let SHR
> and EWW know that a URL includes such characters.

Once again, these characters are not confusables.  Their use around
the URL is.  So highlighting them wherever we see them is not
necessarily the best way.

> EZ> We should first decide what we want to do with these cases, and only
> EZ> then discuss whether that functionality belongs to the core.
> 
> I think Lars' suggestion is decent, see above.

What question?  And what does decency have to do with this?



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 21:02       ` Stefan Monnier
@ 2014-11-29  0:26         ` Ted Zlatanov
  0 siblings, 0 replies; 133+ messages in thread
From: Ted Zlatanov @ 2014-11-29  0:26 UTC (permalink / raw)
  To: emacs-devel

On Fri, 28 Nov 2014 16:02:02 -0500 Stefan Monnier <monnier@IRO.UMontreal.CA> wrote: 

>> For Gnus users, for instance, the buffer would be using SHR so there's
>> some control over the experience and metadata about the content.  You're
>> right that in general this is not clear, which is why interactive
>> functions like `browse-url' and others may need to be advised.

SM> What I meant is that in the SHR case, the text displayed is not the URL
SM> but some random piece of text that should be highlighted as a button
SM> (although in some cases it is the same text as the URL itself, SHR has
SM> no idea whether that's the case or not).

SM> We can do something in the Gnus case rendering non-HTML contents, where
SM> the URL is highlighted as a button, because at that point we do display
SM> something which we know is supposed to be interpreted by the user as a URL.

I see what you mean. You're right that it should be rendered
differently, but there's too many ways the rendering can be modified by
the buffer mode, so whatever SHR does will not be enough. Intercepting
the `browse-url' action, on the other hand, is definitely going to
interrupt the user in order to warn them, no matter how they got that
URL.

For rendering, I think some help from the core would be nice for modes
that want it; see below about "markchars" and `prettify-symbols-mode' etc.

On Sat, 29 Nov 2014 00:26:01 +0200 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Fri, 28 Nov 2014 14:49:59 -0500
>> 
>> I'm not sure about the bidi markers, Eli can discuss that side.  I'll
>> try to get the confusables in there and maybe write general code that
>> bidi markers and others can hook into.

EZ> I cannot say I can follow that.  Those "bidi markers" are just
EZ> characters, so how can they hook into something?

Sorry, I meant "code that detects suspicious bidi markers" instead of
"bidi markers." We have the "markchars" package in the GNU ELPA, which
can currently highlight Unicode confusables and others with a special
face (magenta underline by default). For confusables specifically, it
just looks for more than one Unicode script within a word, so it's not
exactly what Lars asked originally. There was an epic discussion about
"markchars" back in 2011: http://comments.gmane.org/gmane.emacs.devel/122200

Anyhow, I was thinking of bringing something like "markchars" into the
core and also making the "uni-confusables" package (which is just a
conversion of the Unicode confusables.txt) available by default as a
char-table. I'm not sure what it will look like, so if anyone can think
of precedents, let me know. I think the `prettify-symbols-mode' approach
is one possibility, and in fact it was just suggested recently that it
should support regexps... any others?

EZ> For that matter, what functionality are we talking about?
>> 
>> The uni-confusables package from the GNU ELPA and glue code to let SHR
>> and EWW know that a URL includes such characters.

EZ> Once again, these characters are not confusables.  Their use around
EZ> the URL is.  So highlighting them wherever we see them is not
EZ> necessarily the best way.

OK, understood.  See above about rendering vs. interrupting UI flow.
The latter is what Lars suggested and I agree is more useful.

Ted




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 14:54   ` Eli Zaretskii
@ 2014-11-29  6:09     ` Stephen J. Turnbull
  2014-11-29  8:22       ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Stephen J. Turnbull @ 2014-11-29  6:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

Eli Zaretskii writes:

 > Not really, not in this particular field.
 > 
 > > but I would say that given that the UAX#9 bidi algorithm does what's
 > > wanted 99.44% of the time, it makes sense to mark text reordered by
 > > RTL markers with a warning face
 > 
 > That might be considered an annoyance by users of bidi scripts.
 > There's any number of perfectly valid URLs that use the same
 > formatting control characters.

Why?  Because many displays don't implement UAX#9?  Or is it because
UAX#9 defines segments in a way that would reorder the components of a
domain name or path?  That is, the logical URL

    http://www.example.com/ABC/DEF/

is expected by a bidi reader to appear as

    http://www.example.com/CBA/FED/

but UAX#9 would display it as

    http://www.example.com/FED/CBA/

(the natural direction of lowercase characters is LTR, the natural
direction of uppercase characters is RTL)?  (Or perhaps the reverse
misdisplay.)

Whatever the reason, I'd have to say that's too bad for users of bidi
languages, because that means *any* bidi URLs is ambiguous, and
therefore subject to being deliberately obfuscated by reflection
and/or jumbling, regardless of the presence of directional controls.

 > What you suggest might be TRT when left-to-right text is enclosed
 > within directional override controls (which is what Lars did in his
 > example).  These controls assign right-to-left directionality to all
 > the enclosed characters, which is indeed highly suspicious in URLs.

This isn't hard to detect.  But there is also the case where you have
a word which is a different word when reflected.  I assume that this
is the case in bidi languages as well, and of course any jumble is
possible as a domain or path component which is an abbreviation.  And
any useful jumble can probably be registered as a domain, and
certainly incorporated in a path.

 > In addition to using a special face, another possibility is to present
 > the directional overrides in these cases in percent-hex notation,
 > which will disable their effect on the enclosed text.  Of course, this
 > should be only done when the enclosed text is entirely made of LTR
 > characters and neutrals.

Well, no.  I assume that bidi readers are as vulnerable to phishing
and other frauds as non-bidi readers (hard as that may be to believe
for you bidi readers).  That is not yet clear.

 > > You do need a way to turn it off, or to make it reasonably smart, in
 > > the case of ASCII which is often mixed with other charsets.
 > 
 > Not sure what you mean here.

As above, where the domain name is ASCII and the path is RTL.  Or the
path (or the domain) might be mixed.

 > "Turn off" how?

"We need to decide what we want to do, and then look for a mechanism."

 > And how do you do that without unduly punishing perfectly valid
 > URLs that need these controls to avoid visual "jumbles"?

I hate to tell you, but the phishers have *already* started punishing
those perfectly valid URLs.  You have a choice of punishment, that's
all: "jumbled display" vs. "defrauded users".

Except that as I say above, apparently all bidi URLs must now be
considered to offer suspicious display under some circumstances, so
maybe you have no choice about the defrauded users.  In that case I
suppose avoiding jumbles does take precedence.





^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 11:19 ` Ted Zlatanov
                     ` (2 preceding siblings ...)
  2014-11-28 14:57   ` Eli Zaretskii
@ 2014-11-29  6:17   ` Stephen J. Turnbull
  3 siblings, 0 replies; 133+ messages in thread
From: Stephen J. Turnbull @ 2014-11-29  6:17 UTC (permalink / raw)
  To: emacs-devel

Ted Zlatanov writes:

 > I could also add RTL markers and other useful things to uni-confusables

If you do, change the name of the package or at least use a different
library name.  "Confusable" is a technical term in Unicode, and people
familiar with Unicode would not expect directionality related features
to be in the uni-confusables library.

 > suspicious

Eureka!  How about the "uni-suspicious" package, with uni-confusables
and uni-directional libraries?

 > I'm not sure of the right thing because this feels a little bit
 > like core functionality.

+1  Any text (including web documents and programs) might contain a
URL or other "problematic if copied and pasted" phrase.  I've also
seen many students copy math symbols and the like from different
blocks, so "confusables" might be useful in lexing such documents.

Steve





^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29  6:09     ` Stephen J. Turnbull
@ 2014-11-29  8:22       ` Eli Zaretskii
  2014-11-29 17:05         ` Richard Stallman
                           ` (2 more replies)
  0 siblings, 3 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-29  8:22 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: larsi, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: larsi@gnus.org,
>     emacs-devel@gnu.org
> Date: Sat, 29 Nov 2014 15:09:02 +0900
> 
>  > > but I would say that given that the UAX#9 bidi algorithm does what's
>  > > wanted 99.44% of the time, it makes sense to mark text reordered by
>  > > RTL markers with a warning face
>  > 
>  > That might be considered an annoyance by users of bidi scripts.
>  > There's any number of perfectly valid URLs that use the same
>  > formatting control characters.
> 
> Why?  Because many displays don't implement UAX#9?  Or is it because
> UAX#9 defines segments in a way that would reorder the components of a
> domain name or path?  That is, the logical URL
> 
>     http://www.example.com/ABC/DEF/
> 
> is expected by a bidi reader to appear as
> 
>     http://www.example.com/CBA/FED/
> 
> but UAX#9 would display it as
> 
>     http://www.example.com/FED/CBA/

Yes.  And there are worse examples (e.g., try an HTML link which
includes both a URL and a link text).

The problem here is that all those /, :, <, and > characters are
neutrals, so they take the direction of surrounding text, i.e. are
reversed for display when the surrounding text is RTL.  In addition, <
and > are mirrored in that case.  That can make quite a jumble.
(Unicode 6.3 added special handling for "paired-bracket" characters,
which makes the situation with < and > somewhat better, but we only
support that on master, Emacs 24.4 doesn't.)

> Whatever the reason, I'd have to say that's too bad for users of bidi
> languages, because that means *any* bidi URLs is ambiguous, and
> therefore subject to being deliberately obfuscated by reflection
> and/or jumbling, regardless of the presence of directional controls.

I agree, but the issue discussed here is different: it's AFAIU about
users of LTR scripts that can fall victim to use of directional
controls that are by default (almost) invisible on Emacs display.  I
think we would like to have at least that situation "handled" in some
way.  My point above was that the way we handle that should not unduly
punish users of bidi scripts, i.e. legitimate uses of these controls.

>  > What you suggest might be TRT when left-to-right text is enclosed
>  > within directional override controls (which is what Lars did in his
>  > example).  These controls assign right-to-left directionality to all
>  > the enclosed characters, which is indeed highly suspicious in URLs.
> 
> This isn't hard to detect.  But there is also the case where you have
> a word which is a different word when reflected.

If we have a dictionary, we can detect that, too.  If we don't, then
detecting only the enclosed-LTR case is better than nothing, I think.

Another possibility is to modify the way these control characters are
displayed by manipulating their entries in the glyphless-char-display
char-table.  It should probably be enough to display them as hex-code
in a box, to make the user aware of the possible problem.  This should
be done by applications that display URLs, like eww, Gnus, Rmail,
etc.; not globally.

> I assume that this is the case in bidi languages as well

Yes, but that would require RTL text embedded in a left-to-right
overriding embedding, which is easily detectable, like the opposite
case that started this thread.

> and of course any jumble is possible as a domain or path component
> which is an abbreviation.  And any useful jumble can probably be
> registered as a domain, and certainly incorporated in a path.

I doubt that a domain like this could be registered, as using such
characters in a domain name is AFAIU against the regulations, see
RFC3987.

>  > In addition to using a special face, another possibility is to present
>  > the directional overrides in these cases in percent-hex notation,
>  > which will disable their effect on the enclosed text.  Of course, this
>  > should be only done when the enclosed text is entirely made of LTR
>  > characters and neutrals.
> 
> Well, no.  I assume that bidi readers are as vulnerable to phishing
> and other frauds as non-bidi readers (hard as that may be to believe
> for you bidi readers).  That is not yet clear.

The easy cases with RTL text, as mentioned above, should be also
easily detectable, and I agree they should get the same treatment.

>  > > You do need a way to turn it off, or to make it reasonably smart, in
>  > > the case of ASCII which is often mixed with other charsets.
>  > 
>  > Not sure what you mean here.
> 
> As above, where the domain name is ASCII and the path is RTL.  Or the
> path (or the domain) might be mixed.
> 
>  > "Turn off" how?
> 
> "We need to decide what we want to do, and then look for a mechanism."

OK, let me rephrase: what effect will "turning off" have on display?

>  > And how do you do that without unduly punishing perfectly valid
>  > URLs that need these controls to avoid visual "jumbles"?
> 
> I hate to tell you, but the phishers have *already* started punishing
> those perfectly valid URLs.  You have a choice of punishment, that's
> all: "jumbled display" vs. "defrauded users".

I very much hope we will find a sane middle ground, possibly subject
to user control.  I'd hate to see Emacs become another case of the TSA
disaster.

> Except that as I say above, apparently all bidi URLs must now be
> considered to offer suspicious display under some circumstances, so
> maybe you have no choice about the defrauded users.  In that case I
> suppose avoiding jumbles does take precedence.

Once we decide which cases we want to avoid or flag, we could be smart
there, by comparing the original and reordered strings, perhaps aided
by some dictionary lookup.  The infrastructure is either already there
or easy to add.  It's "just" a matter of deciding what to do and when.

Someone(TM) should present a list of well-thought requirements, and we
can take it from there.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 18:28   ` Eli Zaretskii
@ 2014-11-29 17:03     ` Richard Stallman
  2014-11-29 17:06       ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-11-29 17:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > There is no legitimate need for such URLs to "work."

  > Yes, there is.  Some bidirectional texts can be hard to read without
  > these control characters.

We seem to be talking about different questions.
You're talking about "some...text" but the question was specifically URLs.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-28 19:28   ` Andreas Schwab
@ 2014-11-29 17:04     ` Richard Stallman
  2014-11-29 17:11       ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-11-29 17:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The RTL flag character in the example isn't part of the URL, it only
  > precedes it.

(I couldn't see it in any case.)

This suggests we need to provide a primitive to tell Lisp programs a
guaranteed answer for which direction the text at a certain point is
displayed in.  Also, a primitive to verify that a certain region of
text has no bidi strangeness within it.  It could return the position
of the first bidi strangeness in the region, or nil.

On issues like this, better safe than sorry.  The user who wants to
override the safety measure can easily do that.  For instance,
inserting line breaks around the URL would make it be considered safe,
right?

I think the precaution I suggested about bidi flags inside the URL is
needed also.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29  8:22       ` Eli Zaretskii
@ 2014-11-29 17:05         ` Richard Stallman
  2014-11-29 17:13           ` Lars Magne Ingebrigtsen
  2014-11-29 17:14         ` Ted Zlatanov
  2014-11-30 13:42         ` Stephen J. Turnbull
  2 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-11-29 17:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Whatever the reason, I'd have to say that's too bad for users of bidi
  > > languages, because that means *any* bidi URLs is ambiguous, and
  > > therefore subject to being deliberately obfuscated by reflection
  > > and/or jumbling, regardless of the presence of directional controls.

  > I agree, but the issue discussed here is different: it's AFAIU about
  > users of LTR scripts that can fall victim to use of directional
  > controls that are by default (almost) invisible on Emacs display.

We need to address both issues --- with two different solutions, if
necessary.

I have a feeling that the problem that LTR URLs get reordered
strangely must have presented itself in other software, such as
browsers.  What do they do about it?

If the host NAME isn't confused, perhaps it is not really dangerous.
So perhaps it is enough to make sure to avoid confusion about the host
name.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:03     ` Richard Stallman
@ 2014-11-29 17:06       ` Eli Zaretskii
  2014-11-30  9:37         ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-29 17:06 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Sat, 29 Nov 2014 12:03:43 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > > There is no legitimate need for such URLs to "work."
> 
>   > Yes, there is.  Some bidirectional texts can be hard to read without
>   > these control characters.
> 
> We seem to be talking about different questions.
> You're talking about "some...text" but the question was specifically URLs.

URLs are a special case of human-readable text.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:04     ` Richard Stallman
@ 2014-11-29 17:11       ` Eli Zaretskii
  2014-11-30  9:38         ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-29 17:11 UTC (permalink / raw)
  To: rms; +Cc: larsi, schwab, emacs-devel

> Date: Sat, 29 Nov 2014 12:04:17 -0500
> From: Richard Stallman <rms@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > The RTL flag character in the example isn't part of the URL, it only
>   > precedes it.
> 
> (I couldn't see it in any case.)

It's displayed as a very thin space.

> This suggests we need to provide a primitive to tell Lisp programs a
> guaranteed answer for which direction the text at a certain point is
> displayed in.

The directionality of the text is determined by the display engine,
and by design is not subject to control by Lisp programs, with 2
notable exceptions (none of which are relevant to the issue at hand):

 . Lisp programs can disable bidi reordering in a buffer

 . Lisp programs can define the base paragraph direction

> Also, a primitive to verify that a certain region of text has no
> bidi strangeness within it.

We need to have a good instrumental definition of "bidi strangeness"
for that.  The simple job of determining whether the region of text
includes RTL characters or bidi formatting controls is already
possible by using suitable regular expressions, of course.

> On issues like this, better safe than sorry.  The user who wants to
> override the safety measure can easily do that.  For instance,
> inserting line breaks around the URL would make it be considered safe,
> right?

No.  In fact, it won't change at all the (jumbled) display of the
example presented by Lars.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:05         ` Richard Stallman
@ 2014-11-29 17:13           ` Lars Magne Ingebrigtsen
  2014-11-29 17:49             ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-29 17:13 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Eli Zaretskii, stephen, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> I have a feeling that the problem that LTR URLs get reordered
> strangely must have presented itself in other software, such as
> browsers.  What do they do about it?

Most browsers do nothing about it -- Firefox, for instance, will just
display the reordered URL, and clicking it will take you to unexpected
places.

While this problem has existed for years, it seems like it's only been
getting attention lately, and perhaps the other browser maintainers are
also scratching their heads about what the right approach to take here
is...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29  8:22       ` Eli Zaretskii
  2014-11-29 17:05         ` Richard Stallman
@ 2014-11-29 17:14         ` Ted Zlatanov
  2014-11-30 13:42         ` Stephen J. Turnbull
  2 siblings, 0 replies; 133+ messages in thread
From: Ted Zlatanov @ 2014-11-29 17:14 UTC (permalink / raw)
  To: emacs-devel

On Sat, 29 Nov 2014 10:22:45 +0200 Eli Zaretskii <eliz@gnu.org> wrote: 

EZ> Once we decide which cases we want to avoid or flag, we could be smart
EZ> there, by comparing the original and reordered strings, perhaps aided
EZ> by some dictionary lookup.  The infrastructure is either already there
EZ> or easy to add.  It's "just" a matter of deciding what to do and when.

EZ> Someone(TM) should present a list of well-thought requirements, and we
EZ> can take it from there.

Well, here are the pieces I think will be useful for SHR and EWW. I
don't claim they are well-thought :)

Items 1-3 could be used through font-lock and just set some special text
properties in the buffer in text modes that request it (so this will be
an optional piece that is always available). Then themes and packages
can add special highlighting or handling for those properties.

1) bring uni-confusables in the core. In regular expressions, support
either a new syntax char class \s~ to mean "confusable" or a new
character class [:confusable:] (or some other way to easily search for
such characters, especially if they used outside of their
native script).  Possible text property: 'uni-confusable

2) in regular expressions, support a new character class [:unicodemeta:]
for any characters that have meta meaning in Unicode and no printable
representation, from bidi markers to composition. I'm not sure if that's
already possible. That will allow packages to detect these characters in
places where they are not expected, e.g. inside URL buttons. Possible
text property: 'uni-meta

3) make it easy in the core to scan the buffer for places where scripts
are mixed in a single sentence, string, word, symbol, etc. syntactic
unit. markchars.el does that but only inside words. Possible text
property: 'uni-mixedscripts

4) modify `browse-url' to intercept suspicious URLs where any of the
above happened in the source buffer. I think the calling package will
have to help set the context. I don't know if it can be automated...
maybe the function could look for those special text properties around
point in the buffer where it was invoked?

5) modify SHR/EWW to highlight these text properties and interrupt the
user when the text or content of the URL button has them.

Does that seem useful?

Ted




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:13           ` Lars Magne Ingebrigtsen
@ 2014-11-29 17:49             ` Lars Magne Ingebrigtsen
  2014-11-29 17:54               ` Lars Magne Ingebrigtsen
                                 ` (2 more replies)
  0 siblings, 3 replies; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-29 17:49 UTC (permalink / raw)
  To: emacs-devel

Phishing using this method is a problem mainly on the web and in mail,
so I wonder whether the solution we're looking for would be applied to
main and web modes instead of having a more general mechanism.

It seems pretty clear that stuff like

    ‮http://myspace.com/#/segami/moc.koobecaf//:sptth

where you have a buffer with only left-to-right text, but then you have
a single right-to-left indicator, is suspicious.  And since Latin
characters are strongly left-to-right, you don't get confusing URLs in
the middle of right-to-left text:

הממשלה בכך שהוא http://myspace.com/#/segami/moc.koobecaf//:sptth "משתף פעולה עם 

(I hope that's nothing rude, I just cut'n'pasted text at random from a
Hebrew web page.)

So...  would a possible solution here be as simple as removing all
right-to-left indicators in mail and web modes if those right-to-left
indicators apply to URLs?  That is, after the modes mark the regions it
thinks are URLs, then they would check if there are any RTL characters
that apply to the regions that it thinks are URLs?

But currently Emacs doesn't really have a mechanism for querying the
directionality of a buffer region, I think?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:49             ` Lars Magne Ingebrigtsen
@ 2014-11-29 17:54               ` Lars Magne Ingebrigtsen
  2014-11-29 18:24                 ` Eli Zaretskii
  2014-11-29 18:18               ` Eli Zaretskii
  2014-11-30  9:38               ` Richard Stallman
  2 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-29 17:54 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> So...  would a possible solution here be as simple as removing all
> right-to-left indicators in mail and web modes if those right-to-left
> indicators apply to URLs?

Or even simpler: The URL-finding functions would explicitly place
left-to-right markers over the bits of the URL that have left-to-right
characters if there are any RTL markers in the buffer.

This would make all the bits that say "http://example.com" etc be
left-to-right, and if there are bits in the URL later that contains,
say, Hebrew, those would still be displayed correctly.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:49             ` Lars Magne Ingebrigtsen
  2014-11-29 17:54               ` Lars Magne Ingebrigtsen
@ 2014-11-29 18:18               ` Eli Zaretskii
  2014-11-29 18:33                 ` Lars Magne Ingebrigtsen
  2014-11-30 16:26                 ` Lars Magne Ingebrigtsen
  2014-11-30  9:38               ` Richard Stallman
  2 siblings, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-29 18:18 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Date: Sat, 29 Nov 2014 18:49:21 +0100
> 
> It seems pretty clear that stuff like
> 
>     ‮http://myspace.com/#/segami/moc.koobecaf//:sptth
> 
> where you have a buffer with only left-to-right text, but then you have
> a single right-to-left indicator, is suspicious.

The "single right-to-left indicator" is a fallacy: the correct use of
these formatting controls calls for a u+202E RIGHT-TO-LEFT OVERRIDE
(RLO) character before the text and a u+202C POP DIRECTIONAL
FORMATTing (PDF) character after the text.  Your example only works
because the UBA mandates that all embeddings end at the end of a
physical line, so omitting a PDF here doesn't affect the display,
since the URL stands out on its own line.

So you could actually see a URL enclosed in the RLO..PDF pair as well,
and we need to handle that in the same manner.

> And since Latin characters are strongly left-to-right, you don't get
> confusing URLs in the middle of right-to-left text:

As Stephen pointed out earlier, the same effect can be achieved with
RTL text by using the LRO..PDF embedding (LRO is u+202D).

> So...  would a possible solution here be as simple as removing all
> right-to-left indicators in mail and web modes if those right-to-left
> indicators apply to URLs?

I think instead of removing them it is better to display them
prominently, e.g., by changing their entry in the
glyphless-char-display char-table.  The advantage is that you don't
accidentally harm the display where these controls are used
legitimately, and OTOH make their presence acutely evident.

> But currently Emacs doesn't really have a mechanism for querying the
> directionality of a buffer region, I think?

What do you mean by "directionality of a buffer region"?  At least
under some definitions of that, I can think of a very easy
implementation.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:54               ` Lars Magne Ingebrigtsen
@ 2014-11-29 18:24                 ` Eli Zaretskii
  2014-11-29 18:29                   ` Lars Magne Ingebrigtsen
  2014-11-30  9:38                   ` Richard Stallman
  0 siblings, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-29 18:24 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Date: Sat, 29 Nov 2014 18:54:43 +0100
> 
> Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
> 
> > So...  would a possible solution here be as simple as removing all
> > right-to-left indicators in mail and web modes if those right-to-left
> > indicators apply to URLs?
> 
> Or even simpler: The URL-finding functions would explicitly place
> left-to-right markers over the bits of the URL that have left-to-right
> characters if there are any RTL markers in the buffer.
> 
> This would make all the bits that say "http://example.com" etc be
> left-to-right, and if there are bits in the URL later that contains,
> say, Hebrew, those would still be displayed correctly.

Please don't: you will never be able to do that correctly without
re-implementing bidi.c in Lisp.  The UBA rules are much more complex
than what you seem to envision; in particular, a character can be
neither RTL nor LTR (so called "weak" and "neutral" characters, like
the slash and the period).

In any case, I think what you suggest is too drastic.  We don't need
to change the display of these URLs from their intended one, we just
need to make the user aware of the possible phishing.  E.g., with your
suggestion, a Web page that explain how the URL you posted at the
beginning could be dangerous won't be able to make its point clearly
visible ;-)



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 18:24                 ` Eli Zaretskii
@ 2014-11-29 18:29                   ` Lars Magne Ingebrigtsen
  2014-11-30  9:38                   ` Richard Stallman
  1 sibling, 0 replies; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-29 18:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> In any case, I think what you suggest is too drastic.  We don't need
> to change the display of these URLs from their intended one, we just
> need to make the user aware of the possible phishing.  E.g., with your
> suggestion, a Web page that explain how the URL you posted at the
> beginning could be dangerous won't be able to make its point clearly
> visible ;-)

Yeah, that's true.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 18:18               ` Eli Zaretskii
@ 2014-11-29 18:33                 ` Lars Magne Ingebrigtsen
  2014-11-29 18:47                   ` Eli Zaretskii
  2014-11-30 16:26                 ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-29 18:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I think instead of removing them it is better to display them
> prominently, e.g., by changing their entry in the
> glyphless-char-display char-table.  The advantage is that you don't
> accidentally harm the display where these controls are used
> legitimately, and OTOH make their presence acutely evident.

Yeah, isn't that a bit too intrusive if done generally?  If we display
these markers very visibly, then buffers where they are legitimately
used would be kinda ugly.  And I don't think users would necessarily
know that the URL is displayed the wrong way around just because there's
an ugly control character displayed before or after the URL...

>> But currently Emacs doesn't really have a mechanism for querying the
>> directionality of a buffer region, I think?
>
> What do you mean by "directionality of a buffer region"?  At least
> under some definitions of that, I can think of a very easy
> implementation.

When hitting RET on an URL, the function that handles that could ask
Emacs "is the http://domain.com bit displayed RTL or LTR"?  If it's RTL,
then that function could "are you sure?" the user.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 18:33                 ` Lars Magne Ingebrigtsen
@ 2014-11-29 18:47                   ` Eli Zaretskii
  2014-11-29 19:12                     ` Andreas Schwab
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-29 18:47 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 29 Nov 2014 19:33:47 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I think instead of removing them it is better to display them
> > prominently, e.g., by changing their entry in the
> > glyphless-char-display char-table.  The advantage is that you don't
> > accidentally harm the display where these controls are used
> > legitimately, and OTOH make their presence acutely evident.
> 
> Yeah, isn't that a bit too intrusive if done generally?

I didn't suggest to do that generally, just in Web pages.  These
format controls are discouraged in Web pages anyway; the use of HTML
bidi markup dir="rtl" etc. is advised instead.

> If we display these markers very visibly, then buffers where they
> are legitimately used would be kinda ugly.

I don't know why it would be "ugly".  The text will still be displayed
correctly, so it will be as legible as with our current bidi display.

> And I don't think users would necessarily know that the URL is
> displayed the wrong way around just because there's an ugly control
> character displayed before or after the URL...

I think the existence of a strange unprintable character in or around
a URL should attract attention, which is all we need to accomplish.

> >> But currently Emacs doesn't really have a mechanism for querying the
> >> directionality of a buffer region, I think?
> >
> > What do you mean by "directionality of a buffer region"?  At least
> > under some definitions of that, I can think of a very easy
> > implementation.
> 
> When hitting RET on an URL, the function that handles that could ask
> Emacs "is the http://domain.com bit displayed RTL or LTR"?  If it's RTL,
> then that function could "are you sure?" the user.

You just replaced one not well-defined term with another.  So now my
question becomes what do you mean by "displayed RTL or LTR"?  And mind
you: the "domain" part can legitimately consist of RTL characters, if
my reading of the respective RFCs is correct.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 18:47                   ` Eli Zaretskii
@ 2014-11-29 19:12                     ` Andreas Schwab
  2014-11-29 19:31                       ` Lars Magne Ingebrigtsen
  2014-11-29 20:13                       ` Eli Zaretskii
  0 siblings, 2 replies; 133+ messages in thread
From: Andreas Schwab @ 2014-11-29 19:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Magne Ingebrigtsen, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
>> Cc: emacs-devel@gnu.org
>> Date: Sat, 29 Nov 2014 19:33:47 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > I think instead of removing them it is better to display them
>> > prominently, e.g., by changing their entry in the
>> > glyphless-char-display char-table.  The advantage is that you don't
>> > accidentally harm the display where these controls are used
>> > legitimately, and OTOH make their presence acutely evident.
>> 
>> Yeah, isn't that a bit too intrusive if done generally?
>
> I didn't suggest to do that generally, just in Web pages.  These
> format controls are discouraged in Web pages anyway; the use of HTML
> bidi markup dir="rtl" etc. is advised instead.

But the problem at hand is not relevant to Web pages.  The URL in an
anchor is always a separate entity.  Only non-HTML text where URLs are
made active by heuristics are the case to worry about.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 19:12                     ` Andreas Schwab
@ 2014-11-29 19:31                       ` Lars Magne Ingebrigtsen
  2014-11-29 19:39                         ` Andreas Schwab
  2014-11-29 20:13                       ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-29 19:31 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eli Zaretskii, emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> But the problem at hand is not relevant to Web pages.  The URL in an
> anchor is always a separate entity.  Only non-HTML text where URLs are
> made active by heuristics are the case to worry about.

It's sort of relevant to web pages, too:

`M-x eww RET http://permalink.gmane.org/gmane.emacs.devel/178392 RET'

Of course, the <a> text could just contain "http://facebook.com" without
any RTL, like

<a href="http://myspace.com">http://facebook.com</a>

shr should warn when the <a> text is also an URL and when it's
different from the href.  But in this case, the <a> text and the href
are identical, so that check wouldn't do anything helpful, and the user
would end up on the dreaded Myspace...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 19:31                       ` Lars Magne Ingebrigtsen
@ 2014-11-29 19:39                         ` Andreas Schwab
  0 siblings, 0 replies; 133+ messages in thread
From: Andreas Schwab @ 2014-11-29 19:39 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> shr should warn when the <a> text is also an URL and when it's
> different from the href.  But in this case, the <a> text and the href
> are identical, so that check wouldn't do anything helpful, and the user
> would end up on the dreaded Myspace...

You can always show the URL unambigously before following it.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 19:12                     ` Andreas Schwab
  2014-11-29 19:31                       ` Lars Magne Ingebrigtsen
@ 2014-11-29 20:13                       ` Eli Zaretskii
  1 sibling, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-29 20:13 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: larsi, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Lars Magne Ingebrigtsen <larsi@gnus.org>,  emacs-devel@gnu.org
> Date: Sat, 29 Nov 2014 20:12:24 +0100
> 
> >> Yeah, isn't that a bit too intrusive if done generally?
> >
> > I didn't suggest to do that generally, just in Web pages.  These
> > format controls are discouraged in Web pages anyway; the use of HTML
> > bidi markup dir="rtl" etc. is advised instead.
> 
> But the problem at hand is not relevant to Web pages.  The URL in an
> anchor is always a separate entity.  Only non-HTML text where URLs are
> made active by heuristics are the case to worry about.

Then I guess it's even easier.  Of course, we still have ffap and the
likes, which do their thing even in general-purpose text.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:06       ` Eli Zaretskii
@ 2014-11-30  9:37         ` Richard Stallman
  2014-11-30 15:16           ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-11-30  9:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > We seem to be talking about different questions.
  > > You're talking about "some...text" but the question was specifically URLs.

  > URLs are a special case of human-readable text.

Yes, but that's not the point.  The point is that your special cases

    The places where bidi characters should work are human-readable text.

don't overlap with URLs.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:11       ` Eli Zaretskii
@ 2014-11-30  9:38         ` Richard Stallman
  2014-11-30 15:20           ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-11-30  9:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, schwab, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > This suggests we need to provide a primitive to tell Lisp programs a
  > > guaranteed answer for which direction the text at a certain point is
  > > displayed in.

  > The directionality of the text is determined by the display engine,
  > and by design is not subject to control by Lisp programs,

I think we are talking about different issues.  You're talking about
whether Lisp programs control the directionality.  I'm talking about
providing a way for them to inquire what display will do.

  > > Also, a primitive to verify that a certain region of text has no
  > > bidi strangeness within it.

  > We need to have a good instrumental definition of "bidi strangeness"
  > for that.

I suggest the definition: whatever would cause the displayed order of
characters to be perhaps misleading if the text is interpreted as a
URL or anything else with programatic significance.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 17:49             ` Lars Magne Ingebrigtsen
  2014-11-29 17:54               ` Lars Magne Ingebrigtsen
  2014-11-29 18:18               ` Eli Zaretskii
@ 2014-11-30  9:38               ` Richard Stallman
  2014-11-30 15:27                 ` Eli Zaretskii
  2 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-11-30  9:38 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > It seems pretty clear that stuff like

  >      http://myspace.com/#/segami/moc.koobecaf//:sptth

This is the first time I've observe RTL display in Emacs.  I don't see
any way to detect the magic character that specifies it.  (I am using
a terminal as usual.)

I think we need to provide a way to make them visible.
Perhaps it should even be the default.

Also, is there a way to disable bidi in the current buffer?
If not, I think we need one.

  > But currently Emacs doesn't really have a mechanism for querying the
  > directionality of a buffer region, I think?

I think we need to add this.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 18:24                 ` Eli Zaretskii
  2014-11-29 18:29                   ` Lars Magne Ingebrigtsen
@ 2014-11-30  9:38                   ` Richard Stallman
  2014-11-30 15:21                     ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-11-30  9:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Please don't: you will never be able to do that correctly without
  > re-implementing bidi.c in Lisp.

Rather than re-implementing bidi.c in Lisp, I suggest we provide
primitives to make all the relevant inquiries from Lisp code
through the same code in bidi.c.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29  8:22       ` Eli Zaretskii
  2014-11-29 17:05         ` Richard Stallman
  2014-11-29 17:14         ` Ted Zlatanov
@ 2014-11-30 13:42         ` Stephen J. Turnbull
  2014-11-30 15:36           ` Eli Zaretskii
  2014-12-01 10:18           ` Richard Stallman
  2 siblings, 2 replies; 133+ messages in thread
From: Stephen J. Turnbull @ 2014-11-30 13:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

Eli Zaretskii writes:

 > I agree, but the issue discussed here is different:

I have to disagree.  The issue is about *any* technology that can be
used to convince the user that one URL is being accessed when in fact
another one is.

Whether one should try to warn the user is a separate question, which
depends on the probabilities of legitimate vs. fraudulent displays,
and the cost of annoyance vs the *avoidable* cost to fraud victims.

Unfortunately, the HCI evidence suggests that few potential victims
listen to warnings (or even understand them), so you're probably right
that it's a bad idea to warn if RTL characters are present.

 > detecting only the enclosed-LTR case is better than nothing, I
 > think.

Agreed.

 > > and of course any jumble is possible as a domain or path component
 > > which is an abbreviation.  And any useful jumble can probably be
 > > registered as a domain, and certainly incorporated in a path.
 > 
 > I doubt that a domain like this could be registered, as using such
 > characters in a domain name is AFAIU against the regulations, see
 > RFC3987.

If you mean the controls, you're probably right, although RFC3987 has
been updated for international domain names.  I suppose those controls
are not permitted, though.

 > The easy cases with RTL text, as mentioned above, should be also
 > easily detectable, and I agree they should get the same treatment.

OK, good enough for me.

 > > "We need to decide what we want to do, and then look for a mechanism."
 > 
 > OK, let me rephrase: what effect will "turning off" have on
 > display?

Whatever the display would be in the absence of an attempt to detect
and warn about instances of possibly fraudulent use of directional
controls.

 > I very much hope we will find a sane middle ground, possibly subject
 > to user control.  I'd hate to see Emacs become another case of the TSA
 > disaster.

The best I've been able to come up with given the unfortunate conflict
between UAX#9 and the "normal" display of URLs as I understand it is a
one-off warning (or use of something like the novice mechanism so the
user can easily "turn it off" as defined above as soon as it becomes
annoying -- I expect your judgment to be that it would *always* be
annoying, just mentioning the possibility for completeness).

 > Someone(TM) should present a list of well-thought requirements, and we
 > can take it from there.

Unfortunately, besides LTR in RTL control, and RTL in LTR control, I
can't help, not being familiar with the expected display.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30  9:37         ` Richard Stallman
@ 2014-11-30 15:16           ` Eli Zaretskii
  2014-12-01 10:18             ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 15:16 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Sun, 30 Nov 2014 04:37:42 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > > We seem to be talking about different questions.
>   > > You're talking about "some...text" but the question was specifically URLs.
> 
>   > URLs are a special case of human-readable text.
> 
> Yes, but that's not the point.  The point is that your special cases
> 
>     The places where bidi characters should work are human-readable text.
> 
> don't overlap with URLs.

My conclusion is the opposite: This issue happens _precisely_
_because_ humans review the URLs presented to them before they decide
to follow the link to those URLs.

The issue here is that bidirectional display features are being
(ab)used to trick humans into thinking they will follow a link to some
place, while in fact the link leads to a very different place.  This
problem would not have existed without humans reading the URLs, and
without the discrepancy between what those humans perceive visually
and the actual URL as seen by the program which interprets it.  A
program always reads and processes a URL in the logical order of its
characters, i.e. in the strictly increasing order of the character
positions in the string, so a program will never see any strangeness
here.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30  9:38         ` Richard Stallman
@ 2014-11-30 15:20           ` Eli Zaretskii
  2014-11-30 23:39             ` chad
  2014-12-01 10:18             ` Richard Stallman
  0 siblings, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 15:20 UTC (permalink / raw)
  To: rms; +Cc: larsi, schwab, emacs-devel

> Date: Sun, 30 Nov 2014 04:38:04 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: schwab@linux-m68k.org, larsi@gnus.org, emacs-devel@gnu.org
> 
>  > > This suggests we need to provide a primitive to tell Lisp programs a
>   > > guaranteed answer for which direction the text at a certain point is
>   > > displayed in.
> 
>   > The directionality of the text is determined by the display engine,
>   > and by design is not subject to control by Lisp programs,
> 
> I think we are talking about different issues.  You're talking about
> whether Lisp programs control the directionality.  I'm talking about
> providing a way for them to inquire what display will do.

I apologize for my misunderstanding.

>   > > Also, a primitive to verify that a certain region of text has no
>   > > bidi strangeness within it.
> 
>   > We need to have a good instrumental definition of "bidi strangeness"
>   > for that.
> 
> I suggest the definition: whatever would cause the displayed order of
> characters to be perhaps misleading if the text is interpreted as a
> URL or anything else with programatic significance.

I'm sorry, but this is not instrumental: it doesn't specify what
"misleading" means.  We need a detailed spec for that.  The underlying
problem here is that many cases of what readers of RTL scripts will
perceive as perfectly valid reordering might appear "misleading" to
people who don't read those scripts.  We should strive to arrive at a
definition that detects unreasonable and suspicious reordering, not
just any reordering.

One possible definitions for "misleading" were suggested earlier:
strict left-to-right text which is reordered for display due to
directional control characters.  If this is what we want, I can work
on providing infrastructure for detecting these cases (and perhaps
also similar ones for when similar games are played with URLs that use
RTL characters).

If that is not what we want, then we need to continue discussing the
requirements.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30  9:38                   ` Richard Stallman
@ 2014-11-30 15:21                     ` Eli Zaretskii
  0 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 15:21 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Sun, 30 Nov 2014 04:38:22 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
> Rather than re-implementing bidi.c in Lisp, I suggest we provide
> primitives to make all the relevant inquiries from Lisp code
> through the same code in bidi.c.

I agree, and we already have that for every inquiry of this kind that
surfaced until now.  One example is current-bidi-paragraph-direction.

The issue here is what exactly is the inquiry we are talking about
this time.  I don't yet see what exactly is required.  Maybe someone
else does.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30  9:38               ` Richard Stallman
@ 2014-11-30 15:27                 ` Eli Zaretskii
  2014-12-01 10:17                   ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 15:27 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Sun, 30 Nov 2014 04:38:14 -0500
> From: Richard Stallman <rms@gnu.org>
> Cc: emacs-devel@gnu.org
> 
>  > It seems pretty clear that stuff like
> 
>   >      http://myspace.com/#/segami/moc.koobecaf//:sptth
> 
> This is the first time I've observe RTL display in Emacs.  I don't see
> any way to detect the magic character that specifies it.

That's because there isn't one, in the citation you provided.  The
original example was this:

    ‮http://myspace.com/#/segami/moc.koobecaf//:sptth

where there is a u+202e character at the rightmost (visual) edge of
the line.  If you move point with C-f from the beginning of that line,
you should see it jump to the right edge of the line after the leading
whitespace, and then continue to "advance backwards", i.e. to the left.

You can search for this character by typing

  C-s C-x 8 RET 202e RET

After typing this, you should see the offending character highlighted
in some reddish background.

> (I am using a terminal as usual.)

These characters are by default displayed as spaces on a TTY, and as a
very thin (1-pixel) space on GUI frames.

> I think we need to provide a way to make them visible.

We already have it: the glyphless-char-display char-table.

> Perhaps it should even be the default.

I don't think so: these controls should normally be all but invisible.
The Unicode Standard actually recommends to remove them from display,
but when I worked on the bidi display engine, I decided that removing
characters by infrastructure is un-Emacsy, so I left them alone.  Lisp
programs and specialized major modes can make them invisible by using
text properties, if they want.

Making these controls visible by default will uglify the display for
no good reason.  These controls are perfectly valid in email messages
with RTL text, for example.

> Also, is there a way to disable bidi in the current buffer?
> If not, I think we need one.

There is a way, but it is not meant for Lisp programs, only for
debugging the display engine.

In any case, I don't think disabling display reordering is the right
solution for the problem at hand.  It's a cure that is worse than the
disease, since Web pages and email messages with RTL text will be
displayed incorrectly, and be almost illegible.

>   > But currently Emacs doesn't really have a mechanism for querying the
>   > directionality of a buffer region, I think?
> 
> I think we need to add this.

We are still discussing what that means, exactly.  When we reach
conclusions, we can start working on implementing whatever is needed.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 13:42         ` Stephen J. Turnbull
@ 2014-11-30 15:36           ` Eli Zaretskii
  2014-12-01 10:18           ` Richard Stallman
  1 sibling, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 15:36 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: larsi, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: larsi@gnus.org,
>     emacs-devel@gnu.org
> Date: Sun, 30 Nov 2014 22:42:18 +0900
> 
> Eli Zaretskii writes:
> 
>  > I agree, but the issue discussed here is different:
> 
> I have to disagree.  The issue is about *any* technology that can be
> used to convince the user that one URL is being accessed when in fact
> another one is.

Well, I thought "bidirectional" in the subject does mean just that.

> Whether one should try to warn the user is a separate question, which
> depends on the probabilities of legitimate vs. fraudulent displays,
> and the cost of annoyance vs the *avoidable* cost to fraud victims.

I don't think the probability of legitimate vs fraudulent displays is
so low that it justifies the annoyance.

>  > > "We need to decide what we want to do, and then look for a mechanism."
>  > 
>  > OK, let me rephrase: what effect will "turning off" have on
>  > display?
> 
> Whatever the display would be in the absence of an attempt to detect
> and warn about instances of possibly fraudulent use of directional
> controls.

Sorry, couldn't parse this.

>  > Someone(TM) should present a list of well-thought requirements, and we
>  > can take it from there.
> 
> Unfortunately, besides LTR in RTL control, and RTL in LTR control, I
> can't help, not being familiar with the expected display.

Maybe we should simply start with that, and take it from there if
needed.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-29 18:18               ` Eli Zaretskii
  2014-11-29 18:33                 ` Lars Magne Ingebrigtsen
@ 2014-11-30 16:26                 ` Lars Magne Ingebrigtsen
  2014-11-30 17:29                   ` Yuri Khan
  2014-11-30 17:53                   ` Eli Zaretskii
  1 sibling, 2 replies; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-30 16:26 UTC (permalink / raw)
  To: emacs-devel

Just a point of clarification: When people embed URLs in paragraphs with
mainly right-to-left script (like Hebrew), do they expect to see
http://myspace.com or ‮?http://myspace.com

(If I did that correctly, the latter URL should have an RLO character
preceding it so that it reads right to left.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 16:26                 ` Lars Magne Ingebrigtsen
@ 2014-11-30 17:29                   ` Yuri Khan
  2014-11-30 17:57                     ` Lars Magne Ingebrigtsen
  2014-11-30 17:53                   ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Yuri Khan @ 2014-11-30 17:29 UTC (permalink / raw)
  To: Emacs developers

On Sun, Nov 30, 2014 at 10:26 PM, Lars Magne Ingebrigtsen
<larsi@gnus.org> wrote:

> Just a point of clarification: When people embed URLs in paragraphs with
> mainly right-to-left script (like Hebrew), do they expect to see
> http://myspace.com or ‮?http://myspace.com

As a person who has never spoken or written an RTL language but who
understands the logic behind RTL, I think in an RTL context I might
expect a rendering which is visually identical to that of
com.myspace//:http or maybe com.myspace\\:http.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 16:26                 ` Lars Magne Ingebrigtsen
  2014-11-30 17:29                   ` Yuri Khan
@ 2014-11-30 17:53                   ` Eli Zaretskii
  2014-11-30 18:13                     ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 17:53 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 30 Nov 2014 17:26:33 +0100
> 
> Just a point of clarification: When people embed URLs in paragraphs with
> mainly right-to-left script (like Hebrew)

Let's clear up terminology first, OK?

There's no distinction in bidi display and bidi scripts between
"paragraphs with mainly right-to-left scripts" and "paragraphs with
mainly left-to-right scripts".  Instead, there's "the base direction
of a paragraph", which can be either left-to-right (LTR) or
right-to-left (RTL).  The former is displayed with the first character
(in the _visual_ order!) at the left edge of the window, while the
latter at the right edge.

It is true that the LTR paragraphs make most sense when most of the
paragraph text is made of LTR characters, and the RTL paragraphs in
the opposite case.  But nothing prevents me from having a paragraph
whose base direction is LTR which is nevertheless full of RTL
characters.  It is entirely legitimate and sometimes even necessary.

Emacs determines the base direction of a paragraph by searching for
the first strong directional character in the paragraph (this is a
simplification, the actual rules described in the UBA are more
complex).  Buffer-local variable bidi-paragraph-direction overrides
this dynamic calculation and forces a specific base direction on all
paragraphs of the buffer.

With this out of our way, I will assume that you were asking about
URLs that are part of paragraphs whose base direction is RTL.  Now
let's go back to your question:

> do they expect to see http://myspace.com or ‮?http://myspace.com

The answer to your question is "it depends".  Here are 3 examples, to
see them as I intended, make sure you are viewing them in a buffer
whose bidi-paragraph-direction is set to nil:

abc http://אבג.דהוזחט.קום

אבג http://foo.bar.com

אבג http://אבג.דהוזחט.קום

The leading 3 letters (1 would be enough) cause Emacs to decide that
the paragraph has LTR base direction in the 1st example and RTL base
direction in the last 2 examples.

Now move the cursor with C-f from the beginning of each of these three
lines (you can get to the beginning of a line with C-a or Home, as
usual), and I hope you will see what's going on: cursor movement with
C-f follows the "reading order", i.e. the order in which a human is
supposed to read these URLs.

To summarize: Latin characters are displayed left to right, even in
RTL paragraphs, while right-to-left characters are always displayed
right to left.  Neutral characters (slash, period) take the direction
of the surrounding text.

> (If I did that correctly, the latter URL should have an RLO character
> preceding it so that it reads right to left.)

As you see above, there's no need to use any directional overrides to
see what users expect: Emacs does that automatically, by following the
Unicode Bidirectional Algorithm (UBA).  You just need to arrange for
the paragraph to have a RTL base direction, which is very easy, as
shown above.

RLO and LRO (and the other directional control characters) are needed
when you need to override the normal reordering for some reason,
typically because you want punctuation characters to take a different
directionality from its default.  This is rarely needed when rendering
URLs.

HTH

May I ask why you came up with the question?




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 17:29                   ` Yuri Khan
@ 2014-11-30 17:57                     ` Lars Magne Ingebrigtsen
  2014-11-30 18:18                       ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-30 17:57 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Emacs developers

Yuri Khan <yuri.v.khan@gmail.com> writes:

> As a person who has never spoken or written an RTL language but who
> understands the logic behind RTL, I think in an RTL context I might
> expect a rendering which is visually identical to that of
> com.myspace//:http or maybe com.myspace\\:http.

Well, I had a look at a Hebrew mailing list, and I found paragraphs like

המילון שאמור לרכז את המונחים ולתקנן את המינוח העברי במיזמי הקוד הפתוח
הינו מילון כרמ"ל (כרמל איננה רשימת מילים לתרגום), ניתן למצוא את המילון
בכתובת: http://carmel.whatsup.org.il



and


בניתי חבילה לקבצי התרגום לעברית של אופן אופיס לארצ'. הבעייה שאני לא
יודע מה הרשיון שלה.
http://aur.archlinux.org/packages.php?do_Details=1&ID=9791

where all the URLs are displayed left-to-right.  I don't know whether
this is a representative sample, though.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 17:53                   ` Eli Zaretskii
@ 2014-11-30 18:13                     ` Lars Magne Ingebrigtsen
  2014-11-30 19:06                       ` Lars Magne Ingebrigtsen
                                         ` (2 more replies)
  0 siblings, 3 replies; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-30 18:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Let's clear up terminology first, OK?

Thanks for the explanation.

> To summarize: Latin characters are displayed left to right, even in
> RTL paragraphs, while right-to-left characters are always displayed
> right to left.  Neutral characters (slash, period) take the direction
> of the surrounding text.

Right.

> HTH

It does, yes.

> May I ask why you came up with the question?

Because I was wondering whether my suggestion from yesterday (that we
insert LRO/PDF characters into URLs if there is an LRO present in the
buffer when recognising URLs) is at all feasible, and from your
explanation, it seems like it would be.

And it would not require reimplementing bidi.c in Lisp.

I agreed with your objection that if we used such a scheme, then the
discussion we're doing here would look pretty incomprehensible.
However, thinking about it a bit more, this is really favouring
meta-discussion over usage, and I think we should be leery of doing
that.

Here's my proposal again, fleshed out with examples, for the algorithms
that recognise (and make buttons out of) URLs and the like in email
(etc.) buffers:

1) If there are no right-to-left overrides in the buffer, then do
nothing special.  This will cover 99.996% of all buffers.

2) If there is an LRO in the buffer, then, after recognising an URL, it
is further treated.

* If it contains no strongly right-to-left characters, we just wrap it
  in an LRO/PDF pair.  URLs like "http://myspace.com" will then be
  guaranteed to be displayed reading left-to-right.

* If the URL is like http://אבג.דהוזחט.קום, we would segment the URL
  into strongly-left-to-right-with-weak-chars and
  strongly-right-to-left-with-weak-chars segments.  We wrap each
  left-to-right-with-weak-chars in LRO/PDF pairs.

  For that URL, this would be

  LRO http:// PDF אבג.דהוזחט.קום 

Emacs already exposes the weak/strong/LTR/RTL status of each character,
so function to do this LRO/PDF insertion is trivial.  It's like a
seven-line Elisp function or something.

From what you say, sounds like it would make the display of these URLs
acceptable for bidi readers, too -- this would be the normal display of
these URLs, anyway.  The only thing we're protecting the users from is
shenaningans.

And discussions like this, of course, since all the URLs would display
"correctly".  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 17:57                     ` Lars Magne Ingebrigtsen
@ 2014-11-30 18:18                       ` Eli Zaretskii
  0 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 18:18 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel, yuri.v.khan

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 30 Nov 2014 18:57:33 +0100
> Cc: Emacs developers <emacs-devel@gnu.org>
> 
> Well, I had a look at a Hebrew mailing list, and I found paragraphs like

> המילון שאמור לרכז את המונחים ולתקנן את המינוח העברי במיזמי הקוד הפתוח
> הינו מילון כרמ"ל (כרמל איננה רשימת מילים לתרגום), ניתן למצוא את המילון
> בכתובת: http://carmel.whatsup.org.il

> and

> בניתי חבילה לקבצי התרגום לעברית של אופן אופיס לארצ'. הבעייה שאני לא
> יודע מה הרשיון שלה.
> http://aur.archlinux.org/packages.php?do_Details=1&ID=9791

> where all the URLs are displayed left-to-right.  I don't know whether
> this is a representative sample, though.

They are.  Since all the characters in the URL are either strong LTR
or weak/neutral characters, the entire URL is displayed left to right,
no matter whether the paragraph's base direction is LTR or RTL.

But if the part after the "?" will include RTL characters, that part
will be rendered right to left.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 18:13                     ` Lars Magne Ingebrigtsen
@ 2014-11-30 19:06                       ` Lars Magne Ingebrigtsen
  2014-11-30 19:10                         ` Lars Magne Ingebrigtsen
  2014-11-30 19:19                       ` Lars Magne Ingebrigtsen
  2014-11-30 21:05                       ` Eli Zaretskii
  2 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-30 19:06 UTC (permalink / raw)
  To: emacs-devel

Ok, it was a bit longer than 7 lines, and I'm not quite sure what to do
about further embedded LRM, RLM, ALM, LRE, RLE, LRO, RLO, PDF, LRI, RLI,
FSI, PDI characters (perhaps just drop them?), but here's a kinda weak
proof of concept.

Eval the following, and you sort of get what you'd expect.

(concat (string ?\x202e) "---" (ensure-left-to-right-string "http://אבג.דהוזחט.קום/yes/indeed.קום///"))

?\x202e is the right-to-left override.

Compare with the output you get if you don't hack up the URL and
sprinkle LROs:

(concat (string ?\x202e) "---" "http://אבג.דהוזחט.קום/yes/indeed.קום///")

(defun ensure-left-to-right-string (string)
  (let ((prev (get-char-code-property (aref string 0) 'bidi-class))
        (start 0)
        (pos 0)
        (bits nil))
    (while (< pos (length string))
      (setq current (get-char-code-property (aref string pos) 'bidi-class))
      (when (or (and (eq prev 'L)
                     (memq current '(R AL)))
                (and (memq prev '(R AL))
                     (eq current 'L)))
        (push (substring string start pos) bits)
        (when (memq current '(L R AL))
          (setq prev current))
        (setq start pos))
      (cl-incf pos))
    (push (substring string start pos) bits)
    (mapconcat
     (lambda (bit)
       (if (cl-notany (lambda (char)
                        (memq (get-char-code-property char 'bidi-class) '(R AL)))
                      bit)
           ;; Wrap the string in LRO and PDF.
           (concat (string ?\x202d) bit (string ?\x202C))
         ;; And RLO and PDF for the right-to-left bits.
         (concat (string ?\x202e) bit (string ?\x202C))))
     (nreverse bits)
     "")))

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 19:06                       ` Lars Magne Ingebrigtsen
@ 2014-11-30 19:10                         ` Lars Magne Ingebrigtsen
  2014-11-30 20:41                           ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-30 19:10 UTC (permalink / raw)
  To: emacs-devel

Bug fix!  Leading neutralish characters would defeat it.

(defun ensure-left-to-right-string (string)
  (let ((prev (get-char-code-property (aref string 0) 'bidi-class))
	(start 0)
	(pos 0)
	(bits nil))
    (while (< pos (length string))
      (setq current (get-char-code-property (aref string pos) 'bidi-class))
      (when (or (and (eq prev 'L)
		     (memq current '(R AL)))
		(and (memq prev '(R AL))
		     (eq current 'L)))
	(push (substring string start pos) bits)
	(setq start pos))
      (when (memq current '(L R AL))
	(setq prev current))
      (cl-incf pos))
    (push (substring string start pos) bits)
    (mapconcat
     (lambda (bit)
       (if (cl-notany (lambda (char)
			(memq (get-char-code-property char 'bidi-class) '(R AL)))
		      bit)
	   ;; Wrap the string in LRO and PDF.
	   (concat (string ?\x202d) bit (string ?\x202C))
	 ;; And RLO and PDF for the right-to-left bits.
	 (concat (string ?\x202e) bit (string ?\x202C))))
     (nreverse bits)
     "")))

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 18:13                     ` Lars Magne Ingebrigtsen
  2014-11-30 19:06                       ` Lars Magne Ingebrigtsen
@ 2014-11-30 19:19                       ` Lars Magne Ingebrigtsen
  2014-11-30 21:05                       ` Eli Zaretskii
  2 siblings, 0 replies; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-30 19:19 UTC (permalink / raw)
  To: emacs-devel

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> 1) If there are no right-to-left overrides in the buffer, then do
> nothing special.  This will cover 99.996% of all buffers.

And with that I mean all the right-to-left indicators characters, I
think.  RLE, RLM, ALM, etc.  Probably.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 19:10                         ` Lars Magne Ingebrigtsen
@ 2014-11-30 20:41                           ` Eli Zaretskii
  0 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 20:41 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 30 Nov 2014 20:10:29 +0100
> 
> Bug fix!  Leading neutralish characters would defeat it.

You are well on your way to re-implement bidi.c.  Good luck.

That's not how this problem should be handled.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 18:13                     ` Lars Magne Ingebrigtsen
  2014-11-30 19:06                       ` Lars Magne Ingebrigtsen
  2014-11-30 19:19                       ` Lars Magne Ingebrigtsen
@ 2014-11-30 21:05                       ` Eli Zaretskii
  2014-11-30 21:36                         ` Lars Magne Ingebrigtsen
                                           ` (2 more replies)
  2 siblings, 3 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-11-30 21:05 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 30 Nov 2014 19:13:54 +0100
> Cc: emacs-devel@gnu.org
> 
> Because I was wondering whether my suggestion from yesterday (that we
> insert LRO/PDF characters into URLs if there is an LRO present in the
> buffer when recognising URLs) is at all feasible, and from your
> explanation, it seems like it would be.

IMO, you are jumping to solutions too early, without a good
understanding of the real problem.

I also guess that you meant RLO, not LRO.  The latter makes the
embedded text render like strict left-to-right characters, so it
doesn't need any special handling and cannot do any harm in URLs that
use left-to-right characters (which is 99.99% of URLs).

Can we please take a step back and try to identify the real problem
here?  What exactly are we trying to detect and handle?  Is it true
that we are trying to detect URLs whose characters got their "normal"
bidirectional properties overridden by some directional control
characters?  If so, I can write a primitive that will take a region of
buffer text and examine it to detect this.

If it is something else, please tell what that is, and chances are you
can have it without having to go through a crash course in UBA.

In any way, it is IMO wrong to look for specific controls that you
just happened to learn yesterday.  They are not what you need to look
for, they are just one sign of what you are looking for.  The UBA is
too complex an algorithm, and it keeps evolving, so chances are there
will be more ways to do these tricks.  You need to define what is it
that you are looking for, not search for this or that sign.

Next, given that you have detected the spoofed URL, what do you want
to do with it?  Do you want to highlight it, do you want to de-spoof
(i.e. undo the spoofing) in some way, but still leave some indication
of the fact that it was spoofed, or maybe you want to remove any trace
of the spoofing as if it never happened (and leave the user oblivious
to the fact it did)?

Given the answers to those questions, there's any number of possible
solutions that do NOT require inserting more directional controls.
Some of the possible solutions were already mentioned in this thread.
Here's another: cover the offending RLO with a display property
showing whatever you want -- a warning sign, a smiley, a string made
of a SPC character, anything.  You can try it with your example: you
will see the spoofing gone immediately.  Why is this worse than
inserting directional controls whose effect on the surrounding text
can be far reaching?

> 2) If there is an LRO in the buffer, then, after recognising an URL, it
> is further treated.
> 
> * If it contains no strongly right-to-left characters, we just wrap it
>   in an LRO/PDF pair.  URLs like "http://myspace.com" will then be
>   guaranteed to be displayed reading left-to-right.
> 
> * If the URL is like http://אבג.דהוזחט.קום, we would segment the URL
>   into strongly-left-to-right-with-weak-chars and
>   strongly-right-to-left-with-weak-chars segments.  We wrap each
>   left-to-right-with-weak-chars in LRO/PDF pairs.

This will change how these URLs are displayed, in a way that users
will not like, and personally it sounds to me like another kind of
phishing.

> Emacs already exposes the weak/strong/LTR/RTL status of each character,
> so function to do this LRO/PDF insertion is trivial.  It's like a
> seven-line Elisp function or something.

It's easy to insert them, yes.  But the effect is not what you or our
users necessarily want.  More importantly, there are better ways to
deal with that, provided that we DEFINE WHAT PROBLEMS DO WE WANT TO
SOLVE, AND HOW.

> >From what you say, sounds like it would make the display of these URLs
> acceptable for bidi readers, too -- this would be the normal display of
> these URLs, anyway.

No, it isn't.  You cannot get the correct display by overriding the
bidi properties with LRO or its ilk.  You can see the differences by
moving point with C-f.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 21:05                       ` Eli Zaretskii
@ 2014-11-30 21:36                         ` Lars Magne Ingebrigtsen
  2014-12-01  3:45                           ` Eli Zaretskii
  2014-12-01 19:15                         ` Richard Stallman
  2014-12-01 19:15                         ` Richard Stallman
  2 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-11-30 21:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Can we please take a step back and try to identify the real problem
> here?  What exactly are we trying to detect and handle?  Is it true
> that we are trying to detect URLs whose characters got their "normal"
> bidirectional properties overridden by some directional control
> characters?  If so, I can write a primitive that will take a region of
> buffer text and examine it to detect this.

Oh, great.  My impression was that such functionality was off the table.

> Next, given that you have detected the spoofed URL, what do you want
> to do with it?  Do you want to highlight it, do you want to de-spoof
> (i.e. undo the spoofing) in some way, but still leave some indication
> of the fact that it was spoofed, or maybe you want to remove any trace
> of the spoofing as if it never happened (and leave the user oblivious
> to the fact it did)?

Yes, I want to unspoof the URL.  Adding some markings to notify that
this has been done would also be nice, perhaps by adding a 'warning face
to the text or the like.

> Given the answers to those questions, there's any number of possible
> solutions that do NOT require inserting more directional controls.
> Some of the possible solutions were already mentioned in this thread.
> Here's another: cover the offending RLO with a display property
> showing whatever you want -- a warning sign, a smiley, a string made
> of a SPC character, anything.  You can try it with your example: you
> will see the spoofing gone immediately.  Why is this worse than
> inserting directional controls whose effect on the surrounding text
> can be far reaching?

RLOs are used legitimately, and I think they display you've selected for
them now (a thin blank line) is good.  So I don't want to uglify mail
mode buffers just to handle this quite obscure URL UI problem.  I mean,
why shouldn't ‮people be able to‬ do this if they want to in a smooth way?
(Ok, bad example, but these overrides are used legitimately in the bidi
community, if I understand my extensive research correctly.)

And displaying ‮http://myspace.com/#/segami/moc.koobecaf//:sptth‬ with a
couple of visible control characters doesn't really solve the problem,
because most people will still assume that that's a link to Facebook,
not to Myspace.  Most people are not even aware that this bidi stuff
exists.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 15:20           ` Eli Zaretskii
@ 2014-11-30 23:39             ` chad
  2014-12-01  3:49               ` Eli Zaretskii
  2014-12-01 10:18             ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: chad @ 2014-11-30 23:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, schwab, Richard Stallman, emacs-devel


> On 30 Nov 2014, at 07:20, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> I'm sorry, but this is not instrumental: it doesn't specify what
> "misleading" means.  We need a detailed spec for that. 

Given things we're already identifying the URL in text, is it
possible/easy to check for a different directionality of any part
of a URL text (including the entire url) compared to the text (not
whitespace) before and after the URL?

In order to make phishing-style surprises work, the mal-ordered
text probably wants to have the left-side string "http[s]://" and
the right-side string "//:[s]ptth", right? That should be reasonably
easy to check, and would be a good heuristic.

I suppose there are non-HTTP schemes that might be troublesome also.
Some that come to mind are: ftp, file, imap, jabber, nntp, sip,
sips, and xmpp. I can't think of a way offhand to abuse mailto: or
about:, but I might just be missing it.

Hope that helps,
~Chad




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 21:36                         ` Lars Magne Ingebrigtsen
@ 2014-12-01  3:45                           ` Eli Zaretskii
  2014-12-01 16:19                             ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01  3:45 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Sun, 30 Nov 2014 22:36:41 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Can we please take a step back and try to identify the real problem
> > here?  What exactly are we trying to detect and handle?  Is it true
> > that we are trying to detect URLs whose characters got their "normal"
> > bidirectional properties overridden by some directional control
> > characters?  If so, I can write a primitive that will take a region of
> > buffer text and examine it to detect this.
> 
> Oh, great.  My impression was that such functionality was off the table.

Why would it be off the table?

Anyway, if you want this, please show the API of the function -- what
it should return and how.

> > Next, given that you have detected the spoofed URL, what do you want
> > to do with it?  Do you want to highlight it, do you want to de-spoof
> > (i.e. undo the spoofing) in some way, but still leave some indication
> > of the fact that it was spoofed, or maybe you want to remove any trace
> > of the spoofing as if it never happened (and leave the user oblivious
> > to the fact it did)?
> 
> Yes, I want to unspoof the URL.  Adding some markings to notify that
> this has been done would also be nice, perhaps by adding a 'warning face
> to the text or the like.

Then putting a display property on the offending RLO might be the best
solution.

> > Given the answers to those questions, there's any number of possible
> > solutions that do NOT require inserting more directional controls.
> > Some of the possible solutions were already mentioned in this thread.
> > Here's another: cover the offending RLO with a display property
> > showing whatever you want -- a warning sign, a smiley, a string made
> > of a SPC character, anything.  You can try it with your example: you
> > will see the spoofing gone immediately.  Why is this worse than
> > inserting directional controls whose effect on the surrounding text
> > can be far reaching?
> 
> RLOs are used legitimately, and I think they display you've selected for
> them now (a thin blank line) is good.

Yes, but adding RLOs or LROs just to undo some evil effect is
something I think we should avoid, because its effect is non-local and
can frequently be surprising and unintended.  It is better to use
other means we have.

> So I don't want to uglify mail mode buffers just to handle this
> quite obscure URL UI problem.

Where do you see uglification in my suggestions?

> (Ok, bad example, but these overrides are used legitimately in the bidi
> community, if I understand my extensive research correctly.)

They are meant for very specific situations, and this one isn't one of
them.

> And displaying ‮http://myspace.com/#/segami/moc.koobecaf//:sptth‬ with a
> couple of visible control characters doesn't really solve the problem,
> because most people will still assume that that's a link to Facebook,
> not to Myspace.  Most people are not even aware that this bidi stuff
> exists.

Under my suggestion to cover the overrides with a display property,
the URL will not be reversed on display.  Did you try that?




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 23:39             ` chad
@ 2014-12-01  3:49               ` Eli Zaretskii
  2014-12-01  8:01                 ` chad
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01  3:49 UTC (permalink / raw)
  To: chad; +Cc: larsi, schwab, rms, emacs-devel

> From: chad <yandros@gmail.com>
> Date: Sun, 30 Nov 2014 15:39:15 -0800
> Cc: Richard Stallman <rms@gnu.org>,
>  larsi@gnus.org,
>  schwab@linux-m68k.org,
>  emacs-devel@gnu.org
> 
> 
> > On 30 Nov 2014, at 07:20, Eli Zaretskii <eliz@gnu.org> wrote:
> > 
> > I'm sorry, but this is not instrumental: it doesn't specify what
> > "misleading" means.  We need a detailed spec for that. 
> 
> Given things we're already identifying the URL in text, is it
> possible/easy to check for a different directionality of any part
> of a URL text (including the entire url) compared to the text (not
> whitespace) before and after the URL?

Yes, but this would only be a sign of trouble if the rest of buffer
text is strictly left to right.  And even then, there are legitimate
URLs that have RTL characters, e.g. in Google queries.

So I don't see how this would help.

> In order to make phishing-style surprises work, the mal-ordered
> text probably wants to have the left-side string "http[s]://" and
> the right-side string "//:[s]ptth", right?

I don't think we can count on that.  Villains might surprise us.  This
is just one example.

But if someone does the research and comes up with such a conclusion,
then yes, it makes our job easier.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01  3:49               ` Eli Zaretskii
@ 2014-12-01  8:01                 ` chad
  2014-12-01 15:58                   ` Eli Zaretskii
  2014-12-01 19:17                   ` Richard Stallman
  0 siblings, 2 replies; 133+ messages in thread
From: chad @ 2014-12-01  8:01 UTC (permalink / raw)
  To: Eli Zaretskii, emacs


> On 30 Nov 2014, at 19:49, Eli Zaretskii <eliz@gnu.org> wrote:
>> Given things we're already identifying the URL in text, is it
>> possible/easy to check for a different directionality of any part
>> of a URL text (including the entire url) compared to the text (not
>> whitespace) before and after the URL?
> 
> Yes, but this would only be a sign of trouble if the rest of buffer
> text is strictly left to right.  And even then, there are legitimate
> URLs that have RTL characters, e.g. in Google queries.

This is a great point. Does this happen often enough that it would
be troublesome to add a warning or prompt about it?

~Chad




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 15:27                 ` Eli Zaretskii
@ 2014-12-01 10:17                   ` Richard Stallman
  2014-12-01 16:17                     ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-01 10:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

We need to make Emacs safe and clear for users who don't know anything
about bidi and don't want to.

One idea: change the mode line color when there is any RTL text
(in the buffer, or on the screen, whichever is easier).

Another idea: make magic bidi characters visible by default.  People
who edit in RTL languages and get used to bidi could set a user option
to make them invisible.

    > This is the first time I've observe RTL display in Emacs.  I don't see
    > any way to detect the magic character that specifies it.

    That's because there isn't one, in the citation you provided.

Yes there was -- you said so yourself:

  > where there is a u+202e character

The point is that I could not tell what it was, or where it was, or
anything about it, from my ordinary Emacs commands -- even though I
knew I was observing RTL text display and that some magic bidi
character was probably the reason for it.

Plenty of users wouldn't even know that much.

				      at the rightmost (visual) edge of
  > the line.  If you move point with C-f from the beginning of that line,
  > you should see it jump to the right edge of the line after the leading
  > whitespace, and then continue to "advance backwards", i.e. to the left.

Yes, I observed that strange behavior.  As I said, it was the first time
I saw Emacs's bidi display functionality actually operate.

But I could not tell how to detect the presence of that the magic
character directly.  I could see the bidi effect, but I could not tell
what was causing it.

  > These characters are by default displayed as spaces on a TTY, and as a
  > very thin (1-pixel) space on GUI frames.

  > > I think we need to provide a way to make them visible.

  > We already have it: the glyphless-char-display char-table.

We need a convenient _user-level_ feature to make them visible.

  > I don't think so: these controls should normally be all but invisible.

We need to make it easy to see them.  Otherwise people can't tell why
strangeness is happening on their screens.

  > > Also, is there a way to disable bidi in the current buffer?
  > > If not, I think we need one.

  > There is a way, but it is not meant for Lisp programs, only for
  > debugging the display engine.

It needs to be made convenient for users.  Especially for users
who never use bidi.

You use an RTL language, so you see bidi text often and it doesn't
surprise you.  When you see it, you know what is going on.  You know
what in the buffer is likely to cause what visual results.

I don't speak any RTL language (and those characters won't display on
this tty anyway).  So I never see bidi at work, or at least not in a
way I would notice.  I get mail that might be in Arabic script, but
that's just a guess.  The messages are spam, so I delete them.

Even so, I am more knowledgeable about bidi than most Emacs users.  I
once read the the Unicode bidi rules, I just don't remember them.  I
think most Emacs users have even less knowledge of this issue.

We need to make Emacs safe and clear for users who don't know anything
about bidi and don't want to.



-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 15:20           ` Eli Zaretskii
  2014-11-30 23:39             ` chad
@ 2014-12-01 10:18             ` Richard Stallman
  1 sibling, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-01 10:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, schwab, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > >   > We need to have a good instrumental definition of "bidi strangeness"
  > >   > for that.
  > > 
  > > I suggest the definition: whatever would cause the displayed order of
  > > characters to be perhaps misleading if the text is interpreted as a
  > > URL or anything else with programatic significance.

  > I'm sorry, but this is not instrumental: it doesn't specify what
  > "misleading" means.  We need a detailed spec for that.

Yes, my proposal is a first step that needs to be fleshed out.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 15:16           ` Eli Zaretskii
@ 2014-12-01 10:18             ` Richard Stallman
  2014-12-01 16:02               ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-01 10:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The issue here is that bidirectional display features are being
  > (ab)used to trick humans into thinking they will follow a link to some
  > place, while in fact the link leads to a very different place.  This
  > problem would not have existed without humans reading the URLs, and
  > without the discrepancy between what those humans perceive visually
  > and the actual URL as seen by the program which interprets it.

That is true.  These magic characters have the same effect in URLs
as everywhere else, because Emacs display does not distinguish.

But URLs are not the places where these magic characters are useful
and meant to be used.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 13:42         ` Stephen J. Turnbull
  2014-11-30 15:36           ` Eli Zaretskii
@ 2014-12-01 10:18           ` Richard Stallman
  2014-12-01 16:18             ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-01 10:18 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: eliz, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  >  > I agree, but the issue discussed here is different:

  > I have to disagree.  The issue is about *any* technology that can be
  > used to convince the user that one URL is being accessed when in fact
  > another one is.

In general, yes, but at present we're looking at two specific cases
of that.  They made need different solutions.

1. There are magic bidi characters inside the URL.

2. The bidi context of the URL could cause the URL to appear strangely
even though the URL itself does not contain any magic bidi characters.

Mixing up these two cases has caused a lot of confusion in this
discussion.  Things said about one of them were mistakenly applied to
the other, resulting in nonsense.

I proposed checking the URL for bidi magic, for case 1, and someone
interpreted the suggestion based on case 2 and said it would be
ineffective.

For case 2 I proposed the user could insert newlines around the URL to
see what it really says.  Someone replied that this would be
ineffective because he interpreted it based on case 1.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01  8:01                 ` chad
@ 2014-12-01 15:58                   ` Eli Zaretskii
  2014-12-02 14:41                     ` Richard Stallman
  2014-12-01 19:17                   ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 15:58 UTC (permalink / raw)
  To: chad; +Cc: emacs-devel

> From: chad <yandros@gmail.com>
> Date: Mon, 1 Dec 2014 00:01:43 -0800
> 
> > Yes, but this would only be a sign of trouble if the rest of buffer
> > text is strictly left to right.  And even then, there are legitimate
> > URLs that have RTL characters, e.g. in Google queries.
> 
> This is a great point. Does this happen often enough that it would
> be troublesome to add a warning or prompt about it?

What happens often enough? Google queries with RTL text?  For me, all
the time.  Here's a random example:

  https://www.google.co.il/search?q=זאפ+השוואת+מחירים&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=sb&gfe_rd=cr&ei=g498VJeSJ8Sg8wfD5IDwBQ

Note that copy-pasting this from the Firefox's address bar actually
pastes this instead:

  https://www.google.co.il/search?q=%D7%96%D7%90%D7%A4+%D7%94%D7%A9%D7%95%D7%95%D7%90%D7%AA+%D7%9E%D7%97%D7%99%D7%A8%D7%99%D7%9D&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=sb&gfe_rd=cr&ei=g498VJeSJ8Sg8wfD5IDwBQ

which might be something we could consider (I suggested that earlier
as one of the possible ways to fight the malicious directional
overrides).




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 10:18             ` Richard Stallman
@ 2014-12-01 16:02               ` Eli Zaretskii
  0 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 16:02 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Mon, 01 Dec 2014 05:18:01 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>  > The issue here is that bidirectional display features are being
>   > (ab)used to trick humans into thinking they will follow a link to some
>   > place, while in fact the link leads to a very different place.  This
>   > problem would not have existed without humans reading the URLs, and
>   > without the discrepancy between what those humans perceive visually
>   > and the actual URL as seen by the program which interprets it.
> 
> That is true.  These magic characters have the same effect in URLs
> as everywhere else, because Emacs display does not distinguish.
> 
> But URLs are not the places where these magic characters are useful
> and meant to be used.

Not in the host.domain parts, but URLs can hold more than just that.
The query part, the one after the "?", might very well use it.

Anyway, if we want to detect the cases that are simple for detection,
we can start there; it's probably better than nothing.  But we need to
have a very specific definition of those cases.  Many people in this
thread talk in terms of vague concepts, such as "directionality",
which sound intuitive, but break down as soon as we need to translate
them into requirements for what Emacs should do.  Not their fault, of
course: the issue is complex and most people don't know the details,
or need to.  But it does make the discussion more difficult.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 10:17                   ` Richard Stallman
@ 2014-12-01 16:17                     ` Eli Zaretskii
  2014-12-02 14:42                       ` Richard Stallman
  2014-12-02 14:42                       ` Richard Stallman
  0 siblings, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 16:17 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Mon, 01 Dec 2014 05:17:58 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
> We need to make Emacs safe and clear for users who don't know anything
> about bidi and don't want to.

I think we are in violent agreement here.  The question is how to do
that, not whether or not do it.

> One idea: change the mode line color when there is any RTL text
> (in the buffer, or on the screen, whichever is easier).

That's possible, but I think it's too drastic.  Just having RTL text
doesn't yet constitute any danger or require special vigilance on the
part of the user, even if she doesn't want to know anything about
bidi, let alone if she does.  And of course, the display engine only
examines the visible portion of the buffer and sometimes a small
region above and below, so it cannot really tell what's in the rest of
the buffer.

OTOH, we have indications on the mode line, such as "(DOS)", which
users in the past said they didn't pay attention to.  My conclusion
from that is that mode-line indication is only effective when we know
users will look at the mode line at the right moment.

> Another idea: make magic bidi characters visible by default.  People
> who edit in RTL languages and get used to bidi could set a user option
> to make them invisible.

This is both possible and easy, we already have infrastructure for
this.  Not sure it's enough, though: the reordering effect on URLs,
like in the example that started this thread, will still be there, and
seeing the actual URL where the link will take the user if clicked
upon will still be not easy enough, IMO.

>     > This is the first time I've observe RTL display in Emacs.  I don't see
>     > any way to detect the magic character that specifies it.
>
>     That's because there isn't one, in the citation you provided.
> 
> Yes there was -- you said so yourself:
> 
>   > where there is a u+202e character

There was no such character in your mail, only in the one sent by
Lars.  So I assumed you somehow lost it.  My bad.

>   > > I think we need to provide a way to make them visible.
> 
>   > We already have it: the glyphless-char-display char-table.
> 
> We need a convenient _user-level_ feature to make them visible.

We have glyphless-char-display-control, which is a defcustom.  If that
is still too technical, we can have a minor mode to set that for these
directional controls, or maybe just for some subset of them (most of
them cannot cause such disastrous effects on display).

>   > I don't think so: these controls should normally be all but invisible.
> 
> We need to make it easy to see them.  Otherwise people can't tell why
> strangeness is happening on their screens.

I think we should prefer making them visible only in the context where
they could cause harm.  Making them visible everywhere could be an
annoyance.

>   > > Also, is there a way to disable bidi in the current buffer?
>   > > If not, I think we need one.
> 
>   > There is a way, but it is not meant for Lisp programs, only for
>   > debugging the display engine.
> 
> It needs to be made convenient for users.

I don't think this is needed.  People who don't read and don't
understand about bidi will not find it useful, because they cannot
read text affected by the reordering anyway, regardless of its order.

This could only help in the rare situations such as the one discussed
here.  But in those cases, I think we all agree that Emacs should
detect them and act on them automatically; passing the buck to the
user would be a mistake on our part.

But even if I'd agree with you, making a convenient and reliable way
of going back to unidirectional display of Emacs 23 and before would
require a lot of work, because the current display engine no longer
supports unidirectional display without reordering, at least not
reliably.  The old unidirectional code was left in some of the places,
either as a debugging aid or for special corner cases, like unibyte
buffers.  In other places, the code was simply rewritten to work only
through the reordering engine, and the old code no longer exists.  For
example, display strings and overlay strings are rendered exclusively
by the reordering engine.

IOW, the unidirectional display code is for all practical purposes
gone; what's left is not reliable enough for users to use it.  So we
simply cannot turn reordering off and get an otherwise the same Emacs.

> I don't speak any RTL language (and those characters won't display on
> this tty anyway).  So I never see bidi at work, or at least not in a
> way I would notice.  I get mail that might be in Arabic script, but
> that's just a guess.  The messages are spam, so I delete them.

Once again, the only dangerous situation with bidi we are aware of is
the one that started this thread: a malicious use of directional
overrides that changes the visual appearance of what is otherwise
strict left-to-right text.  Let's concentrate on solving this rather
unique situation.  It is IMO wrong to try to generalize these rare
cases into a view that bidi reordering is somehow a menace that users
need to turn off every now and then; it isn't.

> We need to make Emacs safe and clear for users who don't know anything
> about bidi and don't want to.

Again, I think we are in violent agreement here.  The question is how to do
that, not whether or not do it.

But disabling bidi is not the way.

Several useful ideas were raised in this discussion.  I suggest that
we implement some of them and see if they are enough.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 10:18           ` Richard Stallman
@ 2014-12-01 16:18             ` Eli Zaretskii
  2014-12-01 18:32               ` Stephen J. Turnbull
  2014-12-02 14:42               ` Richard Stallman
  0 siblings, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 16:18 UTC (permalink / raw)
  To: rms; +Cc: stephen, larsi, emacs-devel

> Date: Mon, 01 Dec 2014 05:18:07 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: eliz@gnu.org, larsi@gnus.org, emacs-devel@gnu.org
> 
> 1. There are magic bidi characters inside the URL.

By "magic bidi characters" do you mean printable characters from RTL
scripts, or do you mean the directional controls?  (RTL characters are
also "magic" in some sense, because they might cause reordering of
surrounding text, e.g. if it contains numerical characters.)

> 2. The bidi context of the URL could cause the URL to appear strangely
> even though the URL itself does not contain any magic bidi characters.
> 
> Mixing up these two cases has caused a lot of confusion in this
> discussion.  Things said about one of them were mistakenly applied to
> the other, resulting in nonsense.
> 
> I proposed checking the URL for bidi magic, for case 1, and someone
> interpreted the suggestion based on case 2 and said it would be
> ineffective.

I, for one, don't understand how would such a check help us.  As I
wrote elsewhere, at least some parts of a legitimate URL can include
such characters, and we shouldn't treat those as suspicious.  Maybe
you are talking only about some parts of the URL, like the host and
the domain.

> For case 2 I proposed the user could insert newlines around the URL to
> see what it really says.  Someone replied that this would be
> ineffective because he interpreted it based on case 1.

I think it's impractical to insert newlines before and after each
URL.  It will make Web pages and HTML mail all but illegible, because
modern Web text includes URLs in the normal flow of text, which will
be interrupted by these newlines.

We might do that for URLs where we detect an attempt at
spoofing/phishing, but once those are detected, there are better
methods to undo the effects of phishing.  They were suggested earlier
in this thread, let me reiterate the alternatives:

 . modify the way the relevant directional controls are displayed to
   make them prominently apparent

 . allow the user to request a temporary display of the URL in its
   original logical order, before the reordering, or maybe do that
   automatically in a tooltip

 . replace the relevant directional controls with percent-hex encoded
   representation, which will as result disable the reordering

 . cover the relevant directional controls with a display property
   (e.g., with a display string " "), which will also disable
   reordering

Let's pick up one of these alternatives and use it, or maybe allow the
users choose any one of them.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01  3:45                           ` Eli Zaretskii
@ 2014-12-01 16:19                             ` Lars Magne Ingebrigtsen
  2014-12-01 17:39                               ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-12-01 16:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Anyway, if you want this, please show the API of the function -- what
> it should return and how.

Actually, I'm not sure.  :-) Would it make any sense to have a function
like `(displayed-directionality POSITION)' that returns either
`right-to-left' or `left-to-right?  If so, the URL-finding function
would query about the start of the URL (which would normally be the HTTP
part), and if that's `right-to-left', Here There Be Shenanigans.

>> Yes, I want to unspoof the URL.  Adding some markings to notify that
>> this has been done would also be nice, perhaps by adding a 'warning face
>> to the text or the like.
>
> Then putting a display property on the offending RLO might be the best
> solution.

On the RLO character itself or the URL affected by the RLO?  I'd rather
limit the impact of whatever we do to the URL itself, since the
presentation of the URL is the user interface question here.

> Yes, but adding RLOs or LROs just to undo some evil effect is
> something I think we should avoid, because its effect is non-local and
> can frequently be surprising and unintended.  It is better to use
> other means we have.

Sure, if a different method is available that allows us to display these
URLs in a non-spoofed way, I'm all for that.

>> And displaying ‮http://myspace.com/#/segami/moc.koobecaf//:sptth‬ with a
>> couple of visible control characters doesn't really solve the problem,
>> because most people will still assume that that's a link to Facebook,
>> not to Myspace.  Most people are not even aware that this bidi stuff
>> exists.
>
> Under my suggestion to cover the overrides with a display property,
> the URL will not be reversed on display.  Did you try that?

Oh, they won't?  I thought you meant adding a display property to the
RLO in addition to having it do what it normally does.

So is your suggestion here to disable all RLO (etc.) characters in mail
buffers?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 16:19                             ` Lars Magne Ingebrigtsen
@ 2014-12-01 17:39                               ` Eli Zaretskii
  2014-12-01 17:49                                 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 17:39 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 01 Dec 2014 17:19:30 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Anyway, if you want this, please show the API of the function -- what
> > it should return and how.
> 
> Actually, I'm not sure.  :-) Would it make any sense to have a function
> like `(displayed-directionality POSITION)' that returns either
> `right-to-left' or `left-to-right?  If so, the URL-finding function
> would query about the start of the URL (which would normally be the HTTP
> part), and if that's `right-to-left', Here There Be Shenanigans.

How is this different from the previous suggestion?

> >> Yes, I want to unspoof the URL.  Adding some markings to notify that
> >> this has been done would also be nice, perhaps by adding a 'warning face
> >> to the text or the like.
> >
> > Then putting a display property on the offending RLO might be the best
> > solution.
> 
> On the RLO character itself or the URL affected by the RLO?

On the RLO.  The URL will be left intact, and will show correctly
after you put the display property.

> >> And displaying ‮http://myspace.com/#/segami/moc.koobecaf//:sptth‬ with a
> >> couple of visible control characters doesn't really solve the problem,
> >> because most people will still assume that that's a link to Facebook,
> >> not to Myspace.  Most people are not even aware that this bidi stuff
> >> exists.
> >
> > Under my suggestion to cover the overrides with a display property,
> > the URL will not be reversed on display.  Did you try that?
> 
> Oh, they won't?  I thought you meant adding a display property to the
> RLO in addition to having it do what it normally does.

Any character covered by a display property effectively loses its bidi
properties, as described by this paragraph in the ELisp manual:

     Text covered by `display' text properties, by overlays with
  `display' properties whose value is a string, and by any other
  properties that replace buffer text, is treated as a single unit when
  it is reordered for display.  That is, the entire chunk of text covered
  by these properties is reordered together.  Moreover, the bidirectional
  properties of the characters in such a chunk of text are ignored, and
  Emacs reorders them as if they were replaced with a single character
  `U+FFFC', known as the "Object Replacement Character".  This means that
  placing a display property over a portion of text may change the way
  that the surrounding text is reordered for display.  To prevent this
  unexpected effect, always place such properties on text whose
  directionality is identical with text that surrounds it.

> So is your suggestion here to disable all RLO (etc.) characters in mail
> buffers?

No, only RLOs that affect URLs.

Specifically, I suggest to look for RLO before a URL on the same
physical line, and PDF or hard newline after it, and if found, cover
it by a display property whose value is e.g. a string " ".  Since just
the fact that you find an RLO before doesn't yet mean that it's a
malicious RLO (other bidirectional controls which you don't want to
know about can countermand the RLO before it affects the URL display),
I suggest to augment that by checking that the URL's host and domain
parts consist of LTR characters whose directionality was overridden.
The latter part is to be done by calling a new primitive mentioned
above.

Given all this evidence, I think it's pretty much certain that we
found our offending RLO.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 17:39                               ` Eli Zaretskii
@ 2014-12-01 17:49                                 ` Lars Magne Ingebrigtsen
  2014-12-01 18:22                                   ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-12-01 17:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> > Anyway, if you want this, please show the API of the function -- what
>> > it should return and how.
>> 
>> Actually, I'm not sure.  :-) Would it make any sense to have a function
>> like `(displayed-directionality POSITION)' that returns either
>> `right-to-left' or `left-to-right?  If so, the URL-finding function
>> would query about the start of the URL (which would normally be the HTTP
>> part), and if that's `right-to-left', Here There Be Shenanigans.
>
> How is this different from the previous suggestion?

I'm not sure what you are referring to.

>> So is your suggestion here to disable all RLO (etc.) characters in mail
>> buffers?
>
> No, only RLOs that affect URLs.
>
> Specifically, I suggest to look for RLO before a URL on the same
> physical line, and PDF or hard newline after it, and if found, cover
> it by a display property whose value is e.g. a string " ".  Since just
> the fact that you find an RLO before doesn't yet mean that it's a
> malicious RLO (other bidirectional controls which you don't want to
> know about can countermand the RLO before it affects the URL display),
> I suggest to augment that by checking that the URL's host and domain
> parts consist of LTR characters whose directionality was overridden.
> The latter part is to be done by calling a new primitive mentioned
> above.
>
> Given all this evidence, I think it's pretty much certain that we
> found our offending RLO.

If you think that that's sufficient (that we only need to look for
preceding RLOs on the same line), then this sounds like a good solution
to me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 17:49                                 ` Lars Magne Ingebrigtsen
@ 2014-12-01 18:22                                   ` Eli Zaretskii
  2014-12-01 18:28                                     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 18:22 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 01 Dec 2014 18:49:58 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> > Anyway, if you want this, please show the API of the function -- what
> >> > it should return and how.
> >> 
> >> Actually, I'm not sure.  :-) Would it make any sense to have a function
> >> like `(displayed-directionality POSITION)' that returns either
> >> `right-to-left' or `left-to-right?  If so, the URL-finding function
> >> would query about the start of the URL (which would normally be the HTTP
> >> part), and if that's `right-to-left', Here There Be Shenanigans.
> >
> > How is this different from the previous suggestion?
> 
> I'm not sure what you are referring to.

I'm saying that asking about "characters between FROM and TO that were
supposed to be LTR, but was forced to display as RTL", and asking
essentially the same question about a character at POS, is actually
asking the same question.  IOW, the same API will be able to satisfy
both needs.

  (defun bidi-find-overridden-directionality (from to)
     "Return position between FROM and TO where directionality was overridden.

   This function returns the first character position in the specified
   region where there is a character whose `bidi-class' property is `L',
   but which was forced to display as `R' by a directional override,
   and likewise with characters whose `bidi-class' is `R' or `AL'
   that were forced to display as `L'.

   Strong directional characters `L', `R', and `AL' can have their
   intrinsic directionality overridden by directional override
   control characters RLO \(u+202e) and LRO \(u+202d)."

OK?

If you want, the function can return a cons cell (POS . DIR), where
POS is the position and DIR is the intrinsic directionality of the
overridden character.  Or even (POS . DIR-ORIG DIR-OVERRIDDEN).

> > No, only RLOs that affect URLs.
> >
> > Specifically, I suggest to look for RLO before a URL on the same
> > physical line, and PDF or hard newline after it, and if found, cover
> > it by a display property whose value is e.g. a string " ".  Since just
> > the fact that you find an RLO before doesn't yet mean that it's a
> > malicious RLO (other bidirectional controls which you don't want to
> > know about can countermand the RLO before it affects the URL display),
> > I suggest to augment that by checking that the URL's host and domain
> > parts consist of LTR characters whose directionality was overridden.
> > The latter part is to be done by calling a new primitive mentioned
> > above.
> >
> > Given all this evidence, I think it's pretty much certain that we
> > found our offending RLO.
> 
> If you think that that's sufficient (that we only need to look for
> preceding RLOs on the same line), then this sounds like a good solution
> to me.

We need to look for an RLO on the same line when a LTR character was
forced to display as RTL, and for LRO in the opposite case.

This will detect the case you've demonstrated at the beginning of this
thread.  I don't know about other similar cases, so if you don't know
either, I suggest to treat this problem, and take it from there.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 18:22                                   ` Eli Zaretskii
@ 2014-12-01 18:28                                     ` Lars Magne Ingebrigtsen
  2014-12-02 14:17                                       ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-12-01 18:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>   (defun bidi-find-overridden-directionality (from to)
>      "Return position between FROM and TO where directionality was overridden.
>
>    This function returns the first character position in the specified
>    region where there is a character whose `bidi-class' property is `L',
>    but which was forced to display as `R' by a directional override,
>    and likewise with characters whose `bidi-class' is `R' or `AL'
>    that were forced to display as `L'.
>
>    Strong directional characters `L', `R', and `AL' can have their
>    intrinsic directionality overridden by directional override
>    control characters RLO \(u+202e) and LRO \(u+202d)."
>
> OK?

Yes, that sounds perfect.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 16:18             ` Eli Zaretskii
@ 2014-12-01 18:32               ` Stephen J. Turnbull
  2014-12-01 19:12                 ` Eli Zaretskii
  2014-12-02 14:42               ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: Stephen J. Turnbull @ 2014-12-01 18:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, rms, emacs-devel

Eli Zaretskii writes:

 >  . modify the way the relevant directional controls are displayed to
 >    make them prominently apparent

-0 I don't think this will help enough, especially for the users who
would most benefit from Emacs's automated paranoia (ie, those who read
bidi but not RFCs).

 >  . allow the user to request a temporary display of the URL in its
 >    original logical order, before the reordering, or maybe do that
 >    automatically in a tooltip

+1 for the tooltip, with url-encoding for format characters, which are
non-conforming to RFC 3987 anyway.

Note that RFC 3987 specifies that bidirectional IRIs must *always* be
displayed with the UBA, and as if in an LRE embedding.  I'm not sure
how you would enforce it, but I believe this would defang larsi's
example (ie, at the start of the URI proper in logical order insert a
LRE, and at the end a PDF -- any directional format characters between
those points are nonconforming to RFC 3987, section 4.1, last
paragraph).

 >  . replace the relevant directional controls with percent-hex encoded
 >    representation, which will as result disable the reordering

-1  If they're outside of the IRI, this will just make things ugly.
If they're inside the IRI, they're non-conforming and therefore bogus,
and would be caught by the tooltip.

 >  . cover the relevant directional controls with a display property
 >    (e.g., with a display string " "), which will also disable
 >    reordering

-0 This is just a specific implementation of the first option above, right?




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 18:32               ` Stephen J. Turnbull
@ 2014-12-01 19:12                 ` Eli Zaretskii
  2014-12-01 20:08                   ` Stephen J. Turnbull
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 19:12 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: larsi, rms, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: rms@gnu.org,
>     larsi@gnus.org,
>     emacs-devel@gnu.org
> Date: Tue, 02 Dec 2014 03:32:07 +0900
> 
> Eli Zaretskii writes:
> 
>  >  . modify the way the relevant directional controls are displayed to
>  >    make them prominently apparent
> 
> -0 I don't think this will help enough, especially for the users who
> would most benefit from Emacs's automated paranoia (ie, those who read
> bidi but not RFCs).

This alternative makes the least changes on display.

> Note that RFC 3987 specifies that bidirectional IRIs must *always* be
> displayed with the UBA, and as if in an LRE embedding.  I'm not sure
> how you would enforce it, but I believe this would defang larsi's
> example (ie, at the start of the URI proper in logical order insert a
> LRE, and at the end a PDF -- any directional format characters between
> those points are nonconforming to RFC 3987, section 4.1, last
> paragraph).

Using an LRE..PDF embedding is a possibility, but it can be defeated:
the UBA mandates that any embeddings above some predefined fixed depth
are to be ignored.  So a malicious code could insert a large enough
number of RLOs such that any LRE would be ignored.

That's one of the reasons why I prefer not to poke the text with
additional directional controls.

>  >  . replace the relevant directional controls with percent-hex encoded
>  >    representation, which will as result disable the reordering
> 
> -1  If they're outside of the IRI, this will just make things ugly.

Ugly, yes.  But if these cases are sufficiently rare, that ugliness is
useful, I think, as it will attract attention.

> If they're inside the IRI, they're non-conforming and therefore bogus,
> and would be caught by the tooltip.

Yes, but tooltips could be overlooked (or even disabled globally by
the user).

>  >  . cover the relevant directional controls with a display property
>  >    (e.g., with a display string " "), which will also disable
>  >    reordering
> 
> -0 This is just a specific implementation of the first option above, right?

No, it also disables reordering, whereas the first one doesn't.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 21:05                       ` Eli Zaretskii
  2014-11-30 21:36                         ` Lars Magne Ingebrigtsen
@ 2014-12-01 19:15                         ` Richard Stallman
  2014-12-01 19:15                         ` Richard Stallman
  2 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-01 19:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Next, given that you have detected the spoofed URL, what do you want
  > to do with it?  Do you want to highlight it, do you want to de-spoof
  > (i.e. undo the spoofing) in some way, but still leave some indication
  > of the fact that it was spoofed, or maybe you want to remove any trace
  > of the spoofing as if it never happened (and leave the user oblivious
  > to the fact it did)?

I think that all commands to fetch a URL should ask for confirmation
about a URL whose display may have been confusing due to bidi.

The message should appear in a window, so it doesn't have to be terse.
It should present everything that is interesting, including the URL as
it appears in the actual context, and the URL as would appear in a
normal LTR context, and the real URL that will be fetched.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-11-30 21:05                       ` Eli Zaretskii
  2014-11-30 21:36                         ` Lars Magne Ingebrigtsen
  2014-12-01 19:15                         ` Richard Stallman
@ 2014-12-01 19:15                         ` Richard Stallman
  2014-12-01 19:34                           ` Eli Zaretskii
  2 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-01 19:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

To be able copy some text into another buffer and have that text
display there just as it displayed in the original buffer is important
for user messages about what's going on with bidi.

I think it requires two facilities.  (Please correct me if I'm wrong.)

1. A way for a Lisp program to get, for a specified region, a
short description of the outside bidi context that affects bidi
treatment of that region.

The result should be a small amount of data, computed solely from the
text outside the specified region.  The result should encapsulate
everything about the text outside the specified region that can
possibly affect the bidi treatment of whatever text might be inside
the region.

Thus, any change in the text outside the specified region, which gives
the same encapsulated data, will not affect bidi treatment of text
inside the region.

Ideally, this data should have a transparent documented format.

It could be called 'bidi-context'.

If this can't be done in a way that is independent of the text inside
the specified region, as a fallback it could be done in a way that
works only for the current text inside that region.

2. Given such encapsulated context data, a straightforward way to
create an equivalent bidi context in the current buffer.  I expect it
would work by inserting some magic bidi characters.  (Can all such
contexts be replicated by inserting some magic bidi characters?)

It could be called 'replicate-bidi-context'.

Are these feasible to implement?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01  8:01                 ` chad
  2014-12-01 15:58                   ` Eli Zaretskii
@ 2014-12-01 19:17                   ` Richard Stallman
  1 sibling, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-01 19:17 UTC (permalink / raw)
  To: chad; +Cc: eliz, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Yes, but this would only be a sign of trouble if the rest of buffer
  > > text is strictly left to right.  And even then, there are legitimate
  > > URLs that have RTL characters, e.g. in Google queries.

  > This is a great point. Does this happen often enough that it would
  > be troublesome to add a warning or prompt about it?

Maybe it is enough to ensure that the host name is not
confused.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 19:15                         ` Richard Stallman
@ 2014-12-01 19:34                           ` Eli Zaretskii
  2014-12-01 20:21                             ` Eli Zaretskii
  2014-12-02 14:44                             ` Richard Stallman
  0 siblings, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 19:34 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Mon, 01 Dec 2014 14:15:41 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
> 1. A way for a Lisp program to get, for a specified region, a
> short description of the outside bidi context that affects bidi
> treatment of that region.
> 
> The result should be a small amount of data, computed solely from the
> text outside the specified region.  The result should encapsulate
> everything about the text outside the specified region that can
> possibly affect the bidi treatment of whatever text might be inside
> the region.
> 
> Thus, any change in the text outside the specified region, which gives
> the same encapsulated data, will not affect bidi treatment of text
> inside the region.
> 
> Ideally, this data should have a transparent documented format.
> 
> It could be called 'bidi-context'.
> 
> If this can't be done in a way that is independent of the text inside
> the specified region, as a fallback it could be done in a way that
> works only for the current text inside that region.
> 
> 2. Given such encapsulated context data, a straightforward way to
> create an equivalent bidi context in the current buffer.  I expect it
> would work by inserting some magic bidi characters.  (Can all such
> contexts be replicated by inserting some magic bidi characters?)
> 
> It could be called 'replicate-bidi-context'.
> 
> Are these feasible to implement?

The first one sounds pretty complicated.  I need to think about its
feasibility.  It could require analysis of a very large chunk of
buffer text, at least in theory.  What's more, the UBA specifies how
to reorder text given the contents, but not how to do the reverse.

Anyway, what's more important: you can have 2 without 1.  The trick is
to capture the visual order of the text you want to copy (can be done
by looking at the current glyph matrix), and then create a string
whose logical order is identical to the captured visual order, and
embed that string in LRO..PDF, which will ensure the visual order will
not change on display.

The disadvantage of this is that you recreate the order, but not the
reordering, so e.g. cursor motion will be different -- you won't see
the jumps as in the URL phishing example.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 19:12                 ` Eli Zaretskii
@ 2014-12-01 20:08                   ` Stephen J. Turnbull
  2014-12-01 20:42                     ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Stephen J. Turnbull @ 2014-12-01 20:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, rms, emacs-devel

Eli Zaretskii writes:

 > > Note that RFC 3987 specifies that bidirectional IRIs must *always* be
 > > displayed with the UBA, and as if in an LRE embedding.  I'm not sure
 > > how you would enforce it, but I believe this would defang larsi's
 > > example (ie, at the start of the URI proper in logical order insert a
 > > LRE, and at the end a PDF -- any directional format characters between
 > > those points are nonconforming to RFC 3987, section 4.1, last
 > > paragraph).
 > 
 > Using an LRE..PDF embedding is a possibility, but it can be defeated:
 > the UBA mandates that any embeddings above some predefined fixed depth
 > are to be ignored.  So a malicious code could insert a large enough
 > number of RLOs such that any LRE would be ignored.

Note that RFC 3987 is a MUST, and OTOH does not specify an
implementation (probably precisely because of the nesting issue).

 > That's one of the reasons why I prefer not to poke the text with
 > additional directional controls.

You don't need to poke them into the text.  You just MUST display IRIs
"as if" there were an effective embedding.  I'm aware of the GNU
mantra "standards are sometimes not a terrible idea -- but only
sometimes".  But in this case I think conformance is a very good idea.

 > > If they're inside the IRI, they're non-conforming and therefore bogus,
 > > and would be caught by the tooltip.
 > 
 > Yes, but tooltips could be overlooked (or even disabled globally by
 > the user).

I think for the cases we've identified so far (LTR-only text in a RTL
context, RTL-only text in an LTR context, and directional controls
embedded in an IRI) you probably want to require the user who clicks
on them to confirm that they want to follow this misleading link,
anyway.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 19:34                           ` Eli Zaretskii
@ 2014-12-01 20:21                             ` Eli Zaretskii
  2014-12-01 20:30                               ` David Kastrup
  2014-12-02 14:45                               ` Richard Stallman
  2014-12-02 14:44                             ` Richard Stallman
  1 sibling, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 20:21 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Mon, 01 Dec 2014 21:34:46 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
> 
> The first one sounds pretty complicated.  I need to think about its
> feasibility.

A simple (as in "KISS") strategy that should always work is to copy
the entire physical line around the region.  The disadvantage is, of
course, that it could be very long in some rare cases.  Optimizing
that would probably require replacing runs of certain types of
characters with a single representative character of the same type,
and keeping all the directional controls.

We could also replace strong directional characters L/R/AL with the
corresponding mark (LRM/RLM/ALM), which are displayed as (thin)
spaces, and so will be almost invisible, keeping an illusion of
copying just the region of text and nothing else.

Is this good enough?



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 20:21                             ` Eli Zaretskii
@ 2014-12-01 20:30                               ` David Kastrup
  2014-12-01 20:45                                 ` Eli Zaretskii
  2014-12-02 14:45                               ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: David Kastrup @ 2014-12-01 20:30 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Mon, 01 Dec 2014 21:34:46 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: larsi@gnus.org, emacs-devel@gnu.org
>> 
>> The first one sounds pretty complicated.  I need to think about its
>> feasibility.
>
> A simple (as in "KISS") strategy that should always work is to copy
> the entire physical line around the region.  The disadvantage is, of
> course, that it could be very long in some rare cases.  Optimizing
> that would probably require replacing runs of certain types of
> characters with a single representative character of the same type,
> and keeping all the directional controls.
>
> We could also replace strong directional characters L/R/AL with the
> corresponding mark (LRM/RLM/ALM), which are displayed as (thin)
> spaces, and so will be almost invisible, keeping an illusion of
> copying just the region of text and nothing else.
>
> Is this good enough?

Wouldn't it just be enough to turn off bidi-display-reordering in the
minibuffer when inputting/displaying the URL?

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 20:08                   ` Stephen J. Turnbull
@ 2014-12-01 20:42                     ` Eli Zaretskii
  0 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 20:42 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: larsi, rms, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: larsi@gnus.org,
>     rms@gnu.org,
>     emacs-devel@gnu.org
> Date: Tue, 02 Dec 2014 05:08:11 +0900
> 
>  > That's one of the reasons why I prefer not to poke the text with
>  > additional directional controls.
> 
> You don't need to poke them into the text.  You just MUST display IRIs
> "as if" there were an effective embedding.

We don't (yet) have the machinery to do that, except by inserting an
LRE.

> I think for the cases we've identified so far (LTR-only text in a RTL
> context, RTL-only text in an LTR context, and directional controls
> embedded in an IRI) you probably want to require the user who clicks
> on them to confirm that they want to follow this misleading link,
> anyway.

That's something for Lars to worry about, I will just provide the
detection infrastructure.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 20:30                               ` David Kastrup
@ 2014-12-01 20:45                                 ` Eli Zaretskii
  2014-12-02 14:45                                   ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-01 20:45 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

> From: David Kastrup <dak@gnu.org>
> Date: Mon, 01 Dec 2014 21:30:03 +0100
> 
> Wouldn't it just be enough to turn off bidi-display-reordering in the
> minibuffer when inputting/displaying the URL?

That's not what Richard wanted, AFAIU.  He wanted a way of citing a
chunk of text in a mail message, in a way that ensures the cited text
will have the same visual order as the original.

And it is not only about URLs.  Or maybe I'm confused.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 18:28                                     ` Lars Magne Ingebrigtsen
@ 2014-12-02 14:17                                       ` Eli Zaretskii
  2014-12-02 16:31                                         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-02 14:17 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: emacs-devel

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 01 Dec 2014 19:28:31 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >   (defun bidi-find-overridden-directionality (from to)
> >      "Return position between FROM and TO where directionality was overridden.
> >
> >    This function returns the first character position in the specified
> >    region where there is a character whose `bidi-class' property is `L',
> >    but which was forced to display as `R' by a directional override,
> >    and likewise with characters whose `bidi-class' is `R' or `AL'
> >    that were forced to display as `L'.
> >
> >    Strong directional characters `L', `R', and `AL' can have their
> >    intrinsic directionality overridden by directional override
> >    control characters RLO \(u+202e) and LRO \(u+202d)."
> >
> > OK?
> 
> Yes, that sounds perfect.

It is now implemented on master.  (Please read the doc string, as I
did slightly more than I promised, hope you will find those additions
useful.)



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 15:58                   ` Eli Zaretskii
@ 2014-12-02 14:41                     ` Richard Stallman
  0 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-02 14:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yandros, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > https://www.google.co.il/search?q=   +      +      &ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a&channel=sb&gfe_rd=cr&ei=g498VJeSJ8Sg8wfD5IDwBQ

In this example, one of the arguments is RTL.  Maybe we can consider
the case where some arguments are RTL to be safe enough, and avoid
warning for it.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 16:17                     ` Eli Zaretskii
@ 2014-12-02 14:42                       ` Richard Stallman
  2014-12-02 14:48                         ` Eli Zaretskii
  2014-12-02 14:42                       ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-02 14:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > One idea: change the mode line color when there is any RTL text
  > > (in the buffer, or on the screen, whichever is easier).

  > That's possible, but I think it's too drastic.  Just having RTL text
  > doesn't yet constitute any danger or require special vigilance on the
  > part of the user,

It requires special vigilance if the user isn't expecting it!

I am not saying that RTL per se is dangerous.  I'm suggesting we
should warn users very visibly about RTL text it if they don't
normally use it and are perhaps not expecting it.

Changing the color of the mode line was my first idea.  Another idea
is to display "This buffer contains right-to-left text\n\n" at the start
of the buffer.

People like you who are accustomed to RTL editing would set a flag
to disable those messages.

  > > Another idea: make magic bidi characters visible by default.  People
  > > who edit in RTL languages and get used to bidi could set a user option
  > > to make them invisible.

  > This is both possible and easy, we already have infrastructure for
  > this.  Not sure it's enough, though:

I don't think it is enough by itself.  We should continue with the
other proposed measures too.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 16:17                     ` Eli Zaretskii
  2014-12-02 14:42                       ` Richard Stallman
@ 2014-12-02 14:42                       ` Richard Stallman
  2014-12-02 14:52                         ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-02 14:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > We need a convenient _user-level_ feature to make them visible.

  > We have glyphless-char-display-control, which is a defcustom.  If that
  > is still too technical, we can have a minor mode to set that for these
  > directional controls,

A minor mode would be convenient enough for non-wizard users.

  > I think we should prefer making them visible only in the context where
  > they could cause harm.  Making them visible everywhere could be an
  > annoyance.

It would only be an annoyance for users who really use bidi,
and they would turn it off so it would not annoy them again.

  > But even if I'd agree with you, making a convenient and reliable way
  > of going back to unidirectional display of Emacs 23 and before would
  > require a lot of work, because the current display engine no longer
  > supports unidirectional display without reordering, at least not
  > reliably.

It is easy to make an option turn off bidi processing.
All it has to do is make all characters seem LTR.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 16:18             ` Eli Zaretskii
  2014-12-01 18:32               ` Stephen J. Turnbull
@ 2014-12-02 14:42               ` Richard Stallman
  2014-12-02 14:54                 ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-02 14:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > 1. There are magic bidi characters inside the URL.

  > By "magic bidi characters" do you mean printable characters from RTL
  > scripts, or do you mean the directional controls?

I think I mean the directional controls, but I can't be sure.  I don't
know this terminology enough.

  > I think it's impractical to insert newlines before and after each
  > URL.

We are miscommunicating.  What I said is that the USER can insert
newlines in order to see what a certain URL looks like, free of
influence from its surroundings.


-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 19:34                           ` Eli Zaretskii
  2014-12-01 20:21                             ` Eli Zaretskii
@ 2014-12-02 14:44                             ` Richard Stallman
  2014-12-02 15:00                               ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-02 14:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The first one sounds pretty complicated.  I need to think about its
  > feasibility.  It could require analysis of a very large chunk of
  > buffer text, at least in theory.

Doesn't each paragraph do bidi separately?  If so, at most this requires
analyzing one paragraph before and after the region.

				      What's more, the UBA specifies how
  > to reorder text given the contents, but not how to do the reverse.

How does this relate to what I proposed?  I don't see it so I suspect
a misunderstanding.

  > Anyway, what's more important: you can have 2 without 1.

I don't understand what that would mean.

  >   The trick is
  > to capture the visual order of the text you want to copy (can be done
  > by looking at the current glyph matrix), and then create a string
  > whose logical order is identical to the captured visual order,

That seems more complicated and less desirable.  For the job I have in
mind, it is more elegant to COPY the text in question into the
message.  But one needs to make sure it will display the same in this
new context as in the original context.  That's what the proposed
feature is for.

The facility you propose here might be useful too, for other purposes.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 20:21                             ` Eli Zaretskii
  2014-12-01 20:30                               ` David Kastrup
@ 2014-12-02 14:45                               ` Richard Stallman
  2014-12-02 15:03                                 ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-02 14:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > A simple (as in "KISS") strategy that should always work is to copy
  > the entire physical line around the region.

1. Is that physical line sufficient to determine the bidi context for
the region?  I don't know.  If you say it is, I believe you.

2. It would be unclear to include the whole line in the message
if the message is about just part of it (such as, a URL).

So what I am looking for is a way to simplify the rest of that line
into something that would create an equivalent bidi context
for the region to be copied.

  >   Optimizing
  > that would probably require replacing runs of certain types of
  > characters with a single representative character of the same type,
  > and keeping all the directional controls.

  > We could also replace strong directional characters L/R/AL with the
  > corresponding mark (LRM/RLM/ALM), which are displayed as (thin)
  > spaces, and so will be almost invisible, keeping an illusion of
  > copying just the region of text and nothing else.

This sounds like the sort of thing I proposed.

Another possible interface would be
'buffer-substring-preserve-bidi-context'.
It would copy a specified part of the buffer, but prefix and suffix it
with whatever is necessary to cause that part to display the same,
bidi-wise, as it did in its original buffer.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-01 20:45                                 ` Eli Zaretskii
@ 2014-12-02 14:45                                   ` Richard Stallman
  0 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-02 14:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dak, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > That's not what Richard wanted, AFAIU.  He wanted a way of citing a
  > chunk of text in a mail message,

Actually, the message I am thinking about are not email.

I am thinking about messages to display (in an Emacs temp buffer)
to give information to the user, or query the user.
We may want to include a pertinent part of the buffer into the message,
and we should make sure it gets bidi-formatted the same way in the message
that it does in its original context.

The text to copy from the buffer might be a URL, or anything.  This
facility would be general, but we might want to use it as part of
handling strange URLs.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:42                       ` Richard Stallman
@ 2014-12-02 14:48                         ` Eli Zaretskii
  2014-12-03  8:38                           ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-02 14:48 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Tue, 02 Dec 2014 09:42:38 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > > One idea: change the mode line color when there is any RTL text
>   > > (in the buffer, or on the screen, whichever is easier).
> 
>   > That's possible, but I think it's too drastic.  Just having RTL text
>   > doesn't yet constitute any danger or require special vigilance on the
>   > part of the user,
> 
> It requires special vigilance if the user isn't expecting it!
> 
> I am not saying that RTL per se is dangerous.  I'm suggesting we
> should warn users very visibly about RTL text it if they don't
> normally use it and are perhaps not expecting it.

We don't know if this particular user normally uses RTL.  We could
introduce an option through which users could tell us that they want
such warnings.  But in general, things that are not dangerous don't
warrant a warning.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:42                       ` Richard Stallman
@ 2014-12-02 14:52                         ` Eli Zaretskii
  2014-12-02 18:05                           ` Eli Zaretskii
                                             ` (2 more replies)
  0 siblings, 3 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-02 14:52 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Tue, 02 Dec 2014 09:42:42 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > > We need a convenient _user-level_ feature to make them visible.
> 
>   > We have glyphless-char-display-control, which is a defcustom.  If that
>   > is still too technical, we can have a minor mode to set that for these
>   > directional controls,
> 
> A minor mode would be convenient enough for non-wizard users.
> 
>   > I think we should prefer making them visible only in the context where
>   > they could cause harm.  Making them visible everywhere could be an
>   > annoyance.
> 
> It would only be an annoyance for users who really use bidi,
> and they would turn it off so it would not annoy them again.

But even users who do use bidi would like to be warned when these
controls are part of potential URL phishing.  So there's a
contradiction here, at least for those users: they would like a
warning when these controls could be harmful, but would like to avoid
the warning when they aren't.

>   > But even if I'd agree with you, making a convenient and reliable way
>   > of going back to unidirectional display of Emacs 23 and before would
>   > require a lot of work, because the current display engine no longer
>   > supports unidirectional display without reordering, at least not
>   > reliably.
> 
> It is easy to make an option turn off bidi processing.
> All it has to do is make all characters seem LTR.

That doesn't disable reordering, it just makes the results
indistinguishable.  Perhaps I don't understand what you want to do
with this option.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:42               ` Richard Stallman
@ 2014-12-02 14:54                 ` Eli Zaretskii
  2014-12-03  8:39                   ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-02 14:54 UTC (permalink / raw)
  To: rms; +Cc: stephen, larsi, emacs-devel

> Date: Tue, 02 Dec 2014 09:42:54 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: stephen@xemacs.org, larsi@gnus.org, emacs-devel@gnu.org
> 
>   > I think it's impractical to insert newlines before and after each
>   > URL.
> 
> We are miscommunicating.  What I said is that the USER can insert
> newlines in order to see what a certain URL looks like, free of
> influence from its surroundings.

In that case, it's not a very good idea, IMO.  First, some buffers are
read-only.  Second, when the display is sufficiently jumbled by
directional controls, users who are not acquainted with bidi will have
trouble figuring out where to insert the newlines.  Even I sometimes
fail to insert them in the correct position.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:44                             ` Richard Stallman
@ 2014-12-02 15:00                               ` Eli Zaretskii
  2014-12-03  8:39                                 ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-02 15:00 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Tue, 02 Dec 2014 09:44:17 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > The first one sounds pretty complicated.  I need to think about its
>   > feasibility.  It could require analysis of a very large chunk of
>   > buffer text, at least in theory.
> 
> Doesn't each paragraph do bidi separately?

Yes.

> If so, at most this requires analyzing one paragraph before and
> after the region.

That's correct, but a paragraph can be very long in some specialized
cases.  E.g., log files written by software frequently have very long
paragraphs.

> 				      What's more, the UBA specifies how
>   > to reorder text given the contents, but not how to do the reverse.
> 
> How does this relate to what I proposed?  I don't see it so I suspect
> a misunderstanding.

One way of looking at your request is to think of it as an interface
that takes reordered text in the visual order and reconstructs the
bidi context that leads to it.  The way the UBA is described doesn't
lend itself easily to such a reconstruction.

>   > Anyway, what's more important: you can have 2 without 1.
> 
> I don't understand what that would mean.

It means we can display the copied text in the same visual order
without analyzing the context that caused that visual order.

> The facility you propose here might be useful too, for other purposes.

It is already being used (I needed in the Emacs test suite to visually
compare the results of reordering with the reference implementation).



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:45                               ` Richard Stallman
@ 2014-12-02 15:03                                 ` Eli Zaretskii
  2014-12-03  8:39                                   ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-02 15:03 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Tue, 02 Dec 2014 09:45:08 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > A simple (as in "KISS") strategy that should always work is to copy
>   > the entire physical line around the region.
> 
> 1. Is that physical line sufficient to determine the bidi context for
> the region?  I don't know.  If you say it is, I believe you.

I say yes.  For this purpose, line == paragraph.

>   >   Optimizing
>   > that would probably require replacing runs of certain types of
>   > characters with a single representative character of the same type,
>   > and keeping all the directional controls.
> 
>   > We could also replace strong directional characters L/R/AL with the
>   > corresponding mark (LRM/RLM/ALM), which are displayed as (thin)
>   > spaces, and so will be almost invisible, keeping an illusion of
>   > copying just the region of text and nothing else.
> 
> This sounds like the sort of thing I proposed.

OK, I will work on it.

> Another possible interface would be
> 'buffer-substring-preserve-bidi-context'.
> It would copy a specified part of the buffer, but prefix and suffix it
> with whatever is necessary to cause that part to display the same,
> bidi-wise, as it did in its original buffer.

How is this different (you say "another possible interface")?



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:17                                       ` Eli Zaretskii
@ 2014-12-02 16:31                                         ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 133+ messages in thread
From: Lars Magne Ingebrigtsen @ 2014-12-02 16:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> It is now implemented on master.  (Please read the doc string, as I
> did slightly more than I promised, hope you will find those additions
> useful.)

Great!

It looks like I won't get a chance to do much, if any, work on the
URL-recognising code until the weekend, so if somebody else wants to
handle that bit -- please do.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:52                         ` Eli Zaretskii
@ 2014-12-02 18:05                           ` Eli Zaretskii
  2014-12-03 17:13                             ` Richard Stallman
  2014-12-03 17:13                           ` Richard Stallman
  2014-12-03 17:13                           ` Richard Stallman
  2 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-02 18:05 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Tue, 02 Dec 2014 16:52:15 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
> 
> > It is easy to make an option turn off bidi processing.
> > All it has to do is make all characters seem LTR.
> 
> That doesn't disable reordering, it just makes the results
> indistinguishable.

Actually, even this is not true: the directional overrides will still
have their effect.  So deeper changes are needed to countermand that
as well.

And I still don't understand the purpose of such a feature.  Users who
cannot read RTL won't be able to understand the text either way, and
don't know what is "the right" display to make any sense out of what
will be presented when bidi processing is "turned off".



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:48                         ` Eli Zaretskii
@ 2014-12-03  8:38                           ` Richard Stallman
  2014-12-03 11:56                             ` Nicolas Richard
  2014-12-03 17:38                             ` Eli Zaretskii
  0 siblings, 2 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-03  8:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > I am not saying that RTL per se is dangerous.  I'm suggesting we
  > > should warn users very visibly about RTL text it if they don't
  > > normally use it and are perhaps not expecting it.

  > We don't know if this particular user normally uses RTL.  We could
  > introduce an option through which users could tell us that they want
  > such warnings.

Exactly.  If we introduce a variable to set if you use RTL text,
we will know who normally uses RTL text.

		    But in general, things that are not dangerous don't
  > warrant a warning.

RTL is dangerous in SOME CASES, and that's enough reason to warn
about it.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:54                 ` Eli Zaretskii
@ 2014-12-03  8:39                   ` Richard Stallman
  0 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-03  8:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > We are miscommunicating.  What I said is that the USER can insert
  > > newlines in order to see what a certain URL looks like, free of
  > > influence from its surroundings.

  > In that case, it's not a very good idea, IMO.  First, some buffers are
  > read-only.  Second, when the display is sufficiently jumbled by
  > directional controls, users who are not acquainted with bidi will have
  > trouble figuring out where to insert the newlines.  Even I sometimes
  > fail to insert them in the correct position.

This increases the need to do something else about the problem.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 15:00                               ` Eli Zaretskii
@ 2014-12-03  8:39                                 ` Richard Stallman
  0 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-03  8:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > One way of looking at your request is to think of it as an interface
  > that takes reordered text in the visual order and reconstructs the
  > bidi context that leads to it.

However, that's not what I requested.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 15:03                                 ` Eli Zaretskii
@ 2014-12-03  8:39                                   ` Richard Stallman
  2014-12-03 17:39                                     ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-03  8:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Another possible interface would be
  > > 'buffer-substring-preserve-bidi-context'.
  > > It would copy a specified part of the buffer, but prefix and suffix it
  > > with whatever is necessary to cause that part to display the same,
  > > bidi-wise, as it did in its original buffer.

  > How is this different (you say "another possible interface")?

First I proposed an interface that would return a representation of
the bidi context that affects a certain region.  This representation
would NOT include the text of that region.  It would only represent
the context _around_ that region, not the contents of that region.

Along with that I proposed a function to convert that representation
of context into magic bidi characters that will reproduce that context.

The second proposed interface would copy the text of a region, while
adding to it something to reproduce the bidi effect of its context.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03  8:38                           ` Richard Stallman
@ 2014-12-03 11:56                             ` Nicolas Richard
  2014-12-03 17:12                               ` Richard Stallman
  2014-12-03 17:38                             ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Nicolas Richard @ 2014-12-03 11:56 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Eli Zaretskii, larsi, emacs-devel

Richard Stallman <rms@gnu.org> writes:
> RTL is dangerous in SOME CASES, and that's enough reason to warn
> about it.

IMO this implies that RTL users are the ones that should adjust to the
"normal" LTR world. While it may reflect the current state of the
world, I don't think it's the right thing to do.

-- 
Nicolas



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03 11:56                             ` Nicolas Richard
@ 2014-12-03 17:12                               ` Richard Stallman
  0 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-03 17:12 UTC (permalink / raw)
  To: Nicolas Richard; +Cc: eliz, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > IMO this implies that RTL users are the ones that should adjust to the
  > "normal" LTR world. While it may reflect the current state of the
  > world, I don't think it's the right thing to do.

This is not a symbolic gesture.  It's a matter of practicality.
There are already warnings and notifications that Emacs gives by
default, and that you can turn off by setting a flag.  This
would be one more kind.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 18:05                           ` Eli Zaretskii
@ 2014-12-03 17:13                             ` Richard Stallman
  2014-12-03 18:14                               ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-03 17:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > > It is easy to make an option turn off bidi processing.
  > > > All it has to do is make all characters seem LTR.
  > > 
  > > That doesn't disable reordering, it just makes the results
  > > indistinguishable.

  > Actually, even this is not true: the directional overrides will still
  > have their effect.  So deeper changes are needed to countermand that
  > as well.

It should not be hard for the same flag to tell the code not to
recognize those characters.

  > And I still don't understand the purpose of such a feature.  Users who
  > cannot read RTL won't be able to understand the text either way, and
  > don't know what is "the right" display to make any sense out of what
  > will be presented when bidi processing is "turned off".

One use of disabling bidi is that you'll see what the strange URL
really consists of.  And likewise any other texts that involve
bidi: you'll see what the real sequence of characters is.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:52                         ` Eli Zaretskii
  2014-12-02 18:05                           ` Eli Zaretskii
@ 2014-12-03 17:13                           ` Richard Stallman
  2014-12-03 17:13                           ` Richard Stallman
  2 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-03 17:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > It would only be an annoyance for users who really use bidi,
  > > and they would turn it off so it would not annoy them again.

  > But even users who do use bidi would like to be warned when these
  > controls are part of potential URL phishing.  So there's a
  > contradiction here, at least for those users: they would like a
  > warning when these controls could be harmful, but would like to avoid
  > the warning when they aren't.

I agree we want other features to deal specifically with these
confusing URLs.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-02 14:52                         ` Eli Zaretskii
  2014-12-02 18:05                           ` Eli Zaretskii
  2014-12-03 17:13                           ` Richard Stallman
@ 2014-12-03 17:13                           ` Richard Stallman
  2 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-03 17:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > It is easy to make an option turn off bidi processing.
  > > All it has to do is make all characters seem LTR.

  > That doesn't disable reordering, it just makes the results
  > indistinguishable.  Perhaps I don't understand what you want to do
  > with this option.

I think we are miscommunicating.  If every character is considered
to imply left-to-right, the ordering will be what it was before
we had bidi support.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03  8:38                           ` Richard Stallman
  2014-12-03 11:56                             ` Nicolas Richard
@ 2014-12-03 17:38                             ` Eli Zaretskii
  2014-12-04 14:30                               ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-03 17:38 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Wed, 03 Dec 2014 03:38:59 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
> 		    But in general, things that are not dangerous don't
>   > warrant a warning.
> 
> RTL is dangerous in SOME CASES, and that's enough reason to warn
> about it.

My point is that we should try to narrow down the cases where we issue
a warning, ideally only to those SOME CASES where they can actually be
harmful.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03  8:39                                   ` Richard Stallman
@ 2014-12-03 17:39                                     ` Eli Zaretskii
  2014-12-04  9:41                                       ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-03 17:39 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Wed, 03 Dec 2014 03:39:03 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
> The second proposed interface would copy the text of a region, while
> adding to it something to reproduce the bidi effect of its context.

That was how I understood the first suggestion, so that's what I'm
working on.  It is easier to do that than invent a representation.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03 17:13                             ` Richard Stallman
@ 2014-12-03 18:14                               ` Eli Zaretskii
  2014-12-05 22:44                                 ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-03 18:14 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Wed, 03 Dec 2014 12:13:04 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > Actually, even this is not true: the directional overrides will still
>   > have their effect.  So deeper changes are needed to countermand that
>   > as well.
> 
> It should not be hard for the same flag to tell the code not to
> recognize those characters.

The point is it's not just a change in some table.  The code needs to
be changed as well, then tested, debugged, and maintained.  Without a
good reason, that's just waste of resources.

>   > And I still don't understand the purpose of such a feature.  Users who
>   > cannot read RTL won't be able to understand the text either way, and
>   > don't know what is "the right" display to make any sense out of what
>   > will be presented when bidi processing is "turned off".
> 
> One use of disabling bidi is that you'll see what the strange URL
> really consists of.

We already have a better solution for that, I just added yesterday the
infrastructure that enables such a solution.  We can now stop talking
about the "reversed URL" case, it's a problem that is all but solved.

> And likewise any other texts that involve bidi: you'll see what the
> real sequence of characters is.

If it's the same case as with reversed URL, i.e. obfuscation by using
directional overrides, then the same solution will work there.  If
it's something else, seeing RTL text in logical order will not help
anyone who doesn't already know how to read that text in its reordered
for display form.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03 17:39                                     ` Eli Zaretskii
@ 2014-12-04  9:41                                       ` Eli Zaretskii
  2014-12-05 11:16                                         ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-04  9:41 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Wed, 03 Dec 2014 19:39:42 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
> 
> > Date: Wed, 03 Dec 2014 03:39:03 -0500
> > From: Richard Stallman <rms@gnu.org>
> > CC: larsi@gnus.org, emacs-devel@gnu.org
> > 
> > The second proposed interface would copy the text of a region, while
> > adding to it something to reproduce the bidi effect of its context.
> 
> That was how I understood the first suggestion, so that's what I'm
> working on.  It is easier to do that than invent a representation.

I have now implemented on master:

  (defun buffer-substring-with-bidi-context (start end &optional no-properties)
    "Return portion of current buffer between START and END with bidi context.

  This function works similar to `buffer-substring', but it prepends and
  appends to the text bidi directional control characters necessary to
  preserve the visual appearance of the text if it is inserted at another
  place.  This is useful when the buffer substring includes bidirectional
  text and control characters that cause non-trivial reordering on display.
  If copied verbatim, such text can have a very different visual appearance,
  and can also change the visual appearance of the surrounding text at the
  destination of the copy.

  Optional argument NO-PROPERTIES, if non-nil, means copy the text without
  the text properties."

Based on the fuss this generated, I now expect to see Lisp programs
using this to start popping like mushrooms after the rain ;-)



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03 17:38                             ` Eli Zaretskii
@ 2014-12-04 14:30                               ` Richard Stallman
  2014-12-04 15:53                                 ` Stefan Monnier
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-04 14:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > My point is that we should try to narrow down the cases where we issue
  > a warning, ideally only to those SOME CASES where they can actually be
  > harmful.

I agree that we should do this.  But it is also useful to warn
users that a buffer contains RTL text when they don't expect any.

If the buffer is all RTL text, the user will see that, and none of it
will make sense to him anyway.  So no warning is needed.

But if the buffer is mostly ordinary LTR text, but has a little RTL
text in it, the non-bidi user will probably not notice that and could
get fooled.  That is the case for which I think a warning is useful.

But there is no harm in giving the warning in both of these cases.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-04 14:30                               ` Richard Stallman
@ 2014-12-04 15:53                                 ` Stefan Monnier
  2014-12-04 17:30                                   ` Eli Zaretskii
  2014-12-04 20:25                                   ` Paul Eggert
  0 siblings, 2 replies; 133+ messages in thread
From: Stefan Monnier @ 2014-12-04 15:53 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Eli Zaretskii, larsi, emacs-devel

> But if the buffer is mostly ordinary LTR text, but has a little RTL
> text in it, the non-bidi user will probably not notice that and could
> get fooled.  That is the case for which I think a warning is useful.

When I see a bit of hebrew text in a buffer, I wouldn't know if it's
displayed L2R or R2L and either way wouldn't make any difference to me,
so I'm definitely not "fooled".

This happens reasonably often, and I wouldn't want to be "warned" that
there's some R2L script in my buffer, since I can see it plainly since
the characters are different anyway.

The problematic case that started this thread was because strongly L2R
characters were displayed in R2L fashion because of their context.
And *that* is indeed a problem, because there was no obvious visual
clue: the reversed chars were all latin chars.

So, if we want to emit a warning, it should not be when "there's some
R2L text in an L2R context" but only when L2R characters end up layed out in
R2L because of the context.

I'm not familiar enough with bidi uses to know for sure whether such
"forced wrong-way layout" is something that can occur regularly in
normal/legitimate situations, but at least it's something that would
fool me every time, so I think a warning would be OK for those cases.


        Stefan



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-04 15:53                                 ` Stefan Monnier
@ 2014-12-04 17:30                                   ` Eli Zaretskii
  2014-12-04 20:25                                   ` Paul Eggert
  1 sibling, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-04 17:30 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: larsi, rms, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  larsi@gnus.org,  emacs-devel@gnu.org
> Date: Thu, 04 Dec 2014 10:53:29 -0500
> 
> The problematic case that started this thread was because strongly L2R
> characters were displayed in R2L fashion because of their context.
> And *that* is indeed a problem, because there was no obvious visual
> clue: the reversed chars were all latin chars.

We now have a primitive that can be used to detect such regions in a
buffer.  So we can implement a warning in those cases.

> So, if we want to emit a warning, it should not be when "there's some
> R2L text in an L2R context" but only when L2R characters end up layed out in
> R2L because of the context.

And likewise with R2L characters that end up displayed left to right
(although the target audience for this would be much smaller).

> I'm not familiar enough with bidi uses to know for sure whether such
> "forced wrong-way layout" is something that can occur regularly in
> normal/legitimate situations

There's no reason for it to occur regularly.  Its main purpose is to
satisfy very specific and rare circumstances, like when you need to
show R2L text in logical order (e.g., for didactic reasons), or force
punctuation characters to display in a particular visual order.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-04 15:53                                 ` Stefan Monnier
  2014-12-04 17:30                                   ` Eli Zaretskii
@ 2014-12-04 20:25                                   ` Paul Eggert
  1 sibling, 0 replies; 133+ messages in thread
From: Paul Eggert @ 2014-12-04 20:25 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 12/04/2014 07:53 AM, Stefan Monnier wrote:
> So, if we want to emit a warning, it should not be when "there's some
> R2L text in an L2R context" but only when L2R characters end up layed out in
> R2L because of the context.

How about if we reverse the letters as well as issue a warning? That is, 
instead of merely displaying "ces" for a reversed "sec", we also display 
the individual characters reversed (so it would display like "ↄɘƨ").  On 
a graphical display we should be able to do that reasonably well, and 
it'd be a strong visual cue.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-04  9:41                                       ` Eli Zaretskii
@ 2014-12-05 11:16                                         ` Richard Stallman
  2014-12-05 11:28                                           ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-05 11:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

Thanks.  This construct will be useful for warning users
about strange bidi in URLs.

Do we need any new features to make it possible to show
how the strange bidi text would really be interpreted?

I think the feature you proposed, which would examine how text is
actually displayed and represent that with text that is
straightfoward, may be useful too.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 11:16                                         ` Richard Stallman
@ 2014-12-05 11:28                                           ` Eli Zaretskii
  2014-12-05 22:43                                             ` Richard Stallman
  2014-12-05 22:43                                             ` Richard Stallman
  0 siblings, 2 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-05 11:28 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Fri, 05 Dec 2014 06:16:22 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
> Do we need any new features to make it possible to show
> how the strange bidi text would really be interpreted?

Not sure I understand what you mean here, but if I do, then this is up
to applications, because only they know the meaning of a particular
piece of displayed text and its interpretation.

> I think the feature you proposed, which would examine how text is
> actually displayed and represent that with text that is
> straightfoward, may be useful too.

Again, not sure what proposition you allude to here.  Doesn't
buffer-substring-with-bidi-context already do that?  If not, what is
missing?



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 11:28                                           ` Eli Zaretskii
@ 2014-12-05 22:43                                             ` Richard Stallman
  2014-12-05 23:15                                               ` Eli Zaretskii
  2014-12-05 22:43                                             ` Richard Stallman
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-05 22:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Do we need any new features to make it possible to show
  > > how the strange bidi text would really be interpreted?

  > Not sure I understand what you mean here, but if I do, then this is up
  > to applications, because only they know the meaning of a particular
  > piece of displayed text and its interpretation.

In principle they might vary, but in practice I think most of them
will use the characters in the order they appear in the buffer.

So we need a way to show what a certain piece of text would look like
with all bidi effects suppressed.  One that would force them to
display in strict LTR order.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 11:28                                           ` Eli Zaretskii
  2014-12-05 22:43                                             ` Richard Stallman
@ 2014-12-05 22:43                                             ` Richard Stallman
  2014-12-05 23:17                                               ` Eli Zaretskii
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-05 22:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > I think the feature you proposed, which would examine how text is
  > > actually displayed and represent that with text that is
  > > straightfoward, may be useful too.

  > Again, not sure what proposition you allude to here.  Doesn't
  > buffer-substring-with-bidi-context already do that?  If not, what is
  > missing?

A few days ago we had a misunderstanding -- I proposed the feature
which you've now implemented, but you proposed a different feature.
You proposed that Emacs would examine the text as actually reordered
by display, and present that as a string in the display order.

That was a different thing from what I had proposed.
But I think it is a good idea.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-03 18:14                               ` Eli Zaretskii
@ 2014-12-05 22:44                                 ` Richard Stallman
  2014-12-05 23:19                                   ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-05 22:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > One use of disabling bidi is that you'll see what the strange URL
  > > really consists of.

  > We already have a better solution for that, I just added yesterday the
  > infrastructure that enables such a solution.

Could you tell me what that solution is?  I'm concerned that we
may be miscommunicating again.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 22:43                                             ` Richard Stallman
@ 2014-12-05 23:15                                               ` Eli Zaretskii
  2014-12-06 12:06                                                 ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-05 23:15 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Fri, 05 Dec 2014 17:43:42 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > > Do we need any new features to make it possible to show
>   > > how the strange bidi text would really be interpreted?
> 
>   > Not sure I understand what you mean here, but if I do, then this is up
>   > to applications, because only they know the meaning of a particular
>   > piece of displayed text and its interpretation.
> 
> In principle they might vary, but in practice I think most of them
> will use the characters in the order they appear in the buffer.

That's true, but that still doesn't say how should each application
show that to the user.

> So we need a way to show what a certain piece of text would look like
> with all bidi effects suppressed.  One that would force them to
> display in strict LTR order.

We were through this: it won't help, unless the logical-order text
consists only of LTR characters.  And for that, we already have a
solution that detects the fraud.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 22:43                                             ` Richard Stallman
@ 2014-12-05 23:17                                               ` Eli Zaretskii
  2014-12-06 12:06                                                 ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-05 23:17 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Fri, 05 Dec 2014 17:43:43 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
> A few days ago we had a misunderstanding -- I proposed the feature
> which you've now implemented, but you proposed a different feature.
> You proposed that Emacs would examine the text as actually reordered
> by display, and present that as a string in the display order.
> 
> That was a different thing from what I had proposed.
> But I think it is a good idea.

I can do that.  But since the feature you suggested is already
implemented, what would be the use of the alternative?  They both try
to achieve the same goal.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 22:44                                 ` Richard Stallman
@ 2014-12-05 23:19                                   ` Eli Zaretskii
  2014-12-07  9:20                                     ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-05 23:19 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Fri, 05 Dec 2014 17:44:37 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > > One use of disabling bidi is that you'll see what the strange URL
>   > > really consists of.
> 
>   > We already have a better solution for that, I just added yesterday the
>   > infrastructure that enables such a solution.
> 
> Could you tell me what that solution is?  I'm concerned that we
> may be miscommunicating again.

I meant this primitive:

  (bidi-find-overridden-directionality FROM TO &optional OBJECT)

  Return position between FROM and TO where directionality was overridden.

  This function returns the first character position in the specified
  region of OBJECT where there is a character whose `bidi-class' property
  is `L', but which was forced to display as `R' by a directional
  override, and likewise with characters whose `bidi-class' is `R'
  or `AL' that were forced to display as `L'.

  If no such character is found, the function returns nil.

  OBJECT is a Lisp string or buffer to search for overridden
  directionality, and defaults to the current buffer if nil or omitted.
  OBJECT can also be a window, in which case the function will search
  the buffer displayed in that window.  Passing the window instead of
  a buffer is preferable when the buffer is displayed in some window,
  because this function will then be able to correctly account for
  window-specific overlays, which can affect the results.

  Strong directional characters `L', `R', and `AL' can have their
  intrinsic directionality overridden by directional override
  control characters RLO (u+202e) and LRO (u+202d).  See the
  function `get-char-code-property' for a way to inquire about
  the `bidi-class' property of a character.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 23:17                                               ` Eli Zaretskii
@ 2014-12-06 12:06                                                 ` Richard Stallman
  0 siblings, 0 replies; 133+ messages in thread
From: Richard Stallman @ 2014-12-06 12:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > A few days ago we had a misunderstanding -- I proposed the feature
  > > which you've now implemented, but you proposed a different feature.
  > > You proposed that Emacs would examine the text as actually reordered
  > > by display, and present that as a string in the display order.
  > > 
  > > That was a different thing from what I had proposed.
  > > But I think it is a good idea.

  > I can do that.  But since the feature you suggested is already
  > implemented, what would be the use of the alternative?  They both try
  > to achieve the same goal.

Maybe you are right, since they would look the same in display.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 23:15                                               ` Eli Zaretskii
@ 2014-12-06 12:06                                                 ` Richard Stallman
  2014-12-06 12:59                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-06 12:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > In principle they might vary, but in practice I think most of them
  > > will use the characters in the order they appear in the buffer.

  > That's true, but that still doesn't say how should each application
  > show that to the user.

I don't entirely understand what sort of variation you have in mind,
but I think we should make all such applications handle this
as uniformly as possible.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-06 12:06                                                 ` Richard Stallman
@ 2014-12-06 12:59                                                   ` Eli Zaretskii
  0 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-06 12:59 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Sat, 06 Dec 2014 07:06:50 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > > In principle they might vary, but in practice I think most of them
>   > > will use the characters in the order they appear in the buffer.
> 
>   > That's true, but that still doesn't say how should each application
>   > show that to the user.
> 
> I don't entirely understand what sort of variation you have in mind,
> but I think we should make all such applications handle this
> as uniformly as possible.

The danger in using such obfuscated strings is different in each
application.  That's because each application assigns different
semantics to the various portions of the string, and does different
things with each portion.  IOW, the semantics of these strings depends
on the application, and thus our solution to warn the user about the
dangers is probably going to be different in each case.

Until now we had only one use case: the URL.  For that use case, we
understood the implications, and we now have the infrastructure to
detect the obfuscation.  We still don't know what will the application
using URLs (in this case, eww) want to do to warn the user and ask for
their permission.  One way is to show the "real" URL to the user,
which will automatically solve the obfuscation problem and display the
URL in its "normal" form -- without the need to turn off the bidi
reordering.  Maybe there are other, better ways -- we just need to
wait and see.

And that's just a single application for which we have a use case we
understand quite well.  Other use cases are yet to come.  When they
do, we should analyze them as we did with this one.

It could be that eventually we come to the conclusion you are
proposing now: that we need a way to display some string in its
logical order of characters.  If and when we arrive to such a
conclusion, there will be sufficient weight to it to justify the
change in the code.  We are not there yet, and it is not clear to me
that we will indeed arrive at that conclusion.  We have at least
partial evidence that this might not be required: no other application
out there does this, AFAIK.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-05 23:19                                   ` Eli Zaretskii
@ 2014-12-07  9:20                                     ` Richard Stallman
  2014-12-07 15:50                                       ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-07  9:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > >   > We already have a better solution for that, I just added yesterday the
  > >   > infrastructure that enables such a solution.
  > > 
  > > Could you tell me what that solution is?  I'm concerned that we
  > > may be miscommunicating again.

  > I meant this primitive:

  >   (bidi-find-overridden-directionality FROM TO &optional OBJECT)

  >   Return position between FROM and TO where directionality was overridden.

This looks like a way to _test_ part of a buffer or string to see if
it has any bidi strangeness.  Could you confirm?

If so, the questionis: once you detect the strangeness, what then?
I suppose the next step is either an error message or a query.
In either case, I think we should show the user (1) what the text
looks like and (2) what's actually in it.

With your implementation of context-regeneration, we can show what
the text looks like.

How can we show what it really is?

Perhaps what we want is a suppress-bidi property, or a bidi property
that would specify the direction for certain text.  These properties
would override all bidi attributes of the characters themselves.


-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-07  9:20                                     ` Richard Stallman
@ 2014-12-07 15:50                                       ` Eli Zaretskii
  2014-12-08  0:26                                         ` Richard Stallman
  0 siblings, 1 reply; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-07 15:50 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Sun, 07 Dec 2014 04:20:31 -0500
> From: Richard Stallman <rms@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
> 
>   > >   > We already have a better solution for that, I just added yesterday the
>   > >   > infrastructure that enables such a solution.
>   > > 
>   > > Could you tell me what that solution is?  I'm concerned that we
>   > > may be miscommunicating again.
> 
>   > I meant this primitive:
> 
>   >   (bidi-find-overridden-directionality FROM TO &optional OBJECT)
> 
>   >   Return position between FROM and TO where directionality was overridden.
> 
> This looks like a way to _test_ part of a buffer or string to see if
> it has any bidi strangeness.  Could you confirm?

Yes, that's the purpose of that primitive.

> If so, the questionis: once you detect the strangeness, what then?

It's up to the application.  Lars requested the above infrastructure
for eww, so I guess we will need to see what eww does to handle these
"reversed" URLs.  It's possible that eww will need some further
assistance in that matter, in which case it should come up with the
requirements, and we (probably I) should implement whatever is needed.

> I suppose the next step is either an error message or a query.
> In either case, I think we should show the user (1) what the text
> looks like and (2) what's actually in it.
> 
> With your implementation of context-regeneration, we can show what
> the text looks like.
> 
> How can we show what it really is?

That's easy: copy the text without the directional override and
display it in some other buffer.  The position returned by
bidi-find-overridden-directionality is of the 1st character following
the override control, so copying the text starting at that position
will exclude the override and avoid its effects.

The advantage of this method as compared to presenting the text
non-reordered (a.k.a. "disable bidi") is that the above method works
for RTL text that is similarly obfuscated by the LRO character,
whereas disabling bidi reordering will show RTL text in the order that
is very hard, sometimes impossible, to read correctly (it has the same
effect as showing words in reversed order to a user of a left-to-right
script).

> Perhaps what we want is a suppress-bidi property, or a bidi property
> that would specify the direction for certain text.  These properties
> would override all bidi attributes of the characters themselves.

I think this won't be needed, but if it is, then it certainly can be
done.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-07 15:50                                       ` Eli Zaretskii
@ 2014-12-08  0:26                                         ` Richard Stallman
  2014-12-08 15:46                                           ` Eli Zaretskii
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Stallman @ 2014-12-08  0:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > If so, the questionis: once you detect the strangeness, what then?

  > It's up to the application.

Alas, that's ducking the issue.  We need to confront this issue.

  > That's easy: copy the text without the directional override and
  > display it in some other buffer.  The position returned by
  > bidi-find-overridden-directionality is of the 1st character following
  > the override control, so copying the text starting at that position
  > will exclude the override and avoid its effects.

That is the first magic bidi char, but there could be more.  It would
be necessary to remove them all.

However, is simply removing them correct?  In general, do magic bidi
characters get include in the URL that is passed to the browser?  I would expect so.

If so, a string which does not include them is inaccurate, and the
accurate thing to do is to include them and display them (perhaps in
hex) while suppressing their bidi effect.

Also, don't some RTL characters cause some normally LTR characters to
display RTL?  That too could cause confusion, right?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: Bidirectional text and URLs
  2014-12-08  0:26                                         ` Richard Stallman
@ 2014-12-08 15:46                                           ` Eli Zaretskii
  0 siblings, 0 replies; 133+ messages in thread
From: Eli Zaretskii @ 2014-12-08 15:46 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel

> Date: Sun, 07 Dec 2014 19:26:33 -0500
> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, emacs-devel@gnu.org
> 
>  > > If so, the questionis: once you detect the strangeness, what then?
> 
>   > It's up to the application.
> 
> Alas, that's ducking the issue.  We need to confront this issue.

We _are_ confronting it.  We are methodically analyzing the issue
piecemeal, identifying the separate parts of it, and providing
solutions to each part as soon as it is well-defined and understood.

The problem we are dealing with is a very complex one.  It involves
multiple disciplines: bidi reordering, URL construction and display,
Internet security, cultural differences, human perception of visual
cues, etc.  Part of the solution should be in the infrastructure and
primitives, part on the application and UI level.  Moreover, we are in
uncharted territory, with no prior art or standards to guide us.
Plus, we don't have any single individual on board who'd have a good
understanding of all the aspects of the problem.

When dealing with such hard issues, it is IME methodologically wrong
to charge ahead without a sufficiently clear definition and
understanding of each part of the problem and the alternatives for
their solutions.

We have now identified the first part: how to find the potentially
fraudulent URL, and we have a clear understanding of it.  We have a
solution for that part of the problem that seems to satisfy the
requirements of the application programmer who brought up this issue.

The next step should be for the application to try using this
infrastructure to address the issue on the application and UI levels.
It is possible that that such an attempt will result in feedback that
will require changes in the infrastructure, or some additional
functionality there.  Or the application developers will decide that
this part of the problem is successfully solved, and will request
assistance in solving the next part, which will need to be defined in
clear terms.

And so on and so forth -- we will break this complex issue into
individual parts and solve them one by one on the level each part
belongs to.  That's not "ducking the issue" in my book.

What you seem to expect is that we start coding solutions to problems
that are at best very vaguely defined, without any practical
experience to back that up, guided only by some intuition.  IME, this
is a recipe for wrong solutions and for waste of time and energy.  I
submit that there's no one around here, including myself, whose
intuition in this matter I would trust, because intuition is only
reliable when it is based on knowledge and experience in the subject
matter, and we don't have such individuals at our disposal.

So I don't see any reasons to rush into coding under the
circumstances.

>   > That's easy: copy the text without the directional override and
>   > display it in some other buffer.  The position returned by
>   > bidi-find-overridden-directionality is of the 1st character following
>   > the override control, so copying the text starting at that position
>   > will exclude the override and avoid its effects.
> 
> That is the first magic bidi char, but there could be more.

Inside the URL?  Extremely unlikely, see below.  In any case, the
presented use case didn't have them.  I'd like to see a complete
solution for this simple use case, before we move to more complex ones
(if they exist).

> It would be necessary to remove them all.

I don't think it's a problem, not a likely one anyway.  But if it is,
it should be almost trivial to use that primitive iteratively to
reconstruct the string with all the overrides removed.

> However, is simply removing them correct?

Yes, I think so.

> In general, do magic bidi characters get include in the URL that is
> passed to the browser?  I would expect so.

Using the directional control characters as part of the URL is
forbidden by the relevant standards.  The authorities that approve
domain names will reject them if they include such characters.  So I
think URLs which include them will be non-existent, or at least very
rare.  The use case which started this thread of discussion had the
control characters outside the URL itself, even outside the protocol
part of it.

> If so, a string which does not include them is inaccurate, and the
> accurate thing to do is to include them and display them (perhaps in
> hex) while suppressing their bidi effect.

Removing them and suppressing their effect give rise to the same
visual appearance, since these controls display as very thin spaces,
and thus are almost invisible on the screen.  That's why this type of
fraud came into existence in the first place.

As for using hex, that was one alternative I suggested earlier in this
thread.  It is still on the table, and doesn't require any
infrastructure changes to do its job.  But people liked this proposal
less, so eventually I coded the primitive to find the spoofed
characters as a means for supporting other solutions.

> Also, don't some RTL characters cause some normally LTR characters to
> display RTL?

No.  LTR characters always display left to right, unless overridden by
the RLO control (which simply makes every character act as an RTL
character).



^ permalink raw reply	[flat|nested] 133+ messages in thread

end of thread, other threads:[~2014-12-08 15:46 UTC | newest]

Thread overview: 133+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-28  2:51 Bidirectional text and URLs Lars Magne Ingebrigtsen
2014-11-28  3:27 ` Stephen J. Turnbull
2014-11-28 14:54   ` Eli Zaretskii
2014-11-29  6:09     ` Stephen J. Turnbull
2014-11-29  8:22       ` Eli Zaretskii
2014-11-29 17:05         ` Richard Stallman
2014-11-29 17:13           ` Lars Magne Ingebrigtsen
2014-11-29 17:49             ` Lars Magne Ingebrigtsen
2014-11-29 17:54               ` Lars Magne Ingebrigtsen
2014-11-29 18:24                 ` Eli Zaretskii
2014-11-29 18:29                   ` Lars Magne Ingebrigtsen
2014-11-30  9:38                   ` Richard Stallman
2014-11-30 15:21                     ` Eli Zaretskii
2014-11-29 18:18               ` Eli Zaretskii
2014-11-29 18:33                 ` Lars Magne Ingebrigtsen
2014-11-29 18:47                   ` Eli Zaretskii
2014-11-29 19:12                     ` Andreas Schwab
2014-11-29 19:31                       ` Lars Magne Ingebrigtsen
2014-11-29 19:39                         ` Andreas Schwab
2014-11-29 20:13                       ` Eli Zaretskii
2014-11-30 16:26                 ` Lars Magne Ingebrigtsen
2014-11-30 17:29                   ` Yuri Khan
2014-11-30 17:57                     ` Lars Magne Ingebrigtsen
2014-11-30 18:18                       ` Eli Zaretskii
2014-11-30 17:53                   ` Eli Zaretskii
2014-11-30 18:13                     ` Lars Magne Ingebrigtsen
2014-11-30 19:06                       ` Lars Magne Ingebrigtsen
2014-11-30 19:10                         ` Lars Magne Ingebrigtsen
2014-11-30 20:41                           ` Eli Zaretskii
2014-11-30 19:19                       ` Lars Magne Ingebrigtsen
2014-11-30 21:05                       ` Eli Zaretskii
2014-11-30 21:36                         ` Lars Magne Ingebrigtsen
2014-12-01  3:45                           ` Eli Zaretskii
2014-12-01 16:19                             ` Lars Magne Ingebrigtsen
2014-12-01 17:39                               ` Eli Zaretskii
2014-12-01 17:49                                 ` Lars Magne Ingebrigtsen
2014-12-01 18:22                                   ` Eli Zaretskii
2014-12-01 18:28                                     ` Lars Magne Ingebrigtsen
2014-12-02 14:17                                       ` Eli Zaretskii
2014-12-02 16:31                                         ` Lars Magne Ingebrigtsen
2014-12-01 19:15                         ` Richard Stallman
2014-12-01 19:15                         ` Richard Stallman
2014-12-01 19:34                           ` Eli Zaretskii
2014-12-01 20:21                             ` Eli Zaretskii
2014-12-01 20:30                               ` David Kastrup
2014-12-01 20:45                                 ` Eli Zaretskii
2014-12-02 14:45                                   ` Richard Stallman
2014-12-02 14:45                               ` Richard Stallman
2014-12-02 15:03                                 ` Eli Zaretskii
2014-12-03  8:39                                   ` Richard Stallman
2014-12-03 17:39                                     ` Eli Zaretskii
2014-12-04  9:41                                       ` Eli Zaretskii
2014-12-05 11:16                                         ` Richard Stallman
2014-12-05 11:28                                           ` Eli Zaretskii
2014-12-05 22:43                                             ` Richard Stallman
2014-12-05 23:15                                               ` Eli Zaretskii
2014-12-06 12:06                                                 ` Richard Stallman
2014-12-06 12:59                                                   ` Eli Zaretskii
2014-12-05 22:43                                             ` Richard Stallman
2014-12-05 23:17                                               ` Eli Zaretskii
2014-12-06 12:06                                                 ` Richard Stallman
2014-12-02 14:44                             ` Richard Stallman
2014-12-02 15:00                               ` Eli Zaretskii
2014-12-03  8:39                                 ` Richard Stallman
2014-11-30  9:38               ` Richard Stallman
2014-11-30 15:27                 ` Eli Zaretskii
2014-12-01 10:17                   ` Richard Stallman
2014-12-01 16:17                     ` Eli Zaretskii
2014-12-02 14:42                       ` Richard Stallman
2014-12-02 14:48                         ` Eli Zaretskii
2014-12-03  8:38                           ` Richard Stallman
2014-12-03 11:56                             ` Nicolas Richard
2014-12-03 17:12                               ` Richard Stallman
2014-12-03 17:38                             ` Eli Zaretskii
2014-12-04 14:30                               ` Richard Stallman
2014-12-04 15:53                                 ` Stefan Monnier
2014-12-04 17:30                                   ` Eli Zaretskii
2014-12-04 20:25                                   ` Paul Eggert
2014-12-02 14:42                       ` Richard Stallman
2014-12-02 14:52                         ` Eli Zaretskii
2014-12-02 18:05                           ` Eli Zaretskii
2014-12-03 17:13                             ` Richard Stallman
2014-12-03 18:14                               ` Eli Zaretskii
2014-12-05 22:44                                 ` Richard Stallman
2014-12-05 23:19                                   ` Eli Zaretskii
2014-12-07  9:20                                     ` Richard Stallman
2014-12-07 15:50                                       ` Eli Zaretskii
2014-12-08  0:26                                         ` Richard Stallman
2014-12-08 15:46                                           ` Eli Zaretskii
2014-12-03 17:13                           ` Richard Stallman
2014-12-03 17:13                           ` Richard Stallman
2014-11-29 17:14         ` Ted Zlatanov
2014-11-30 13:42         ` Stephen J. Turnbull
2014-11-30 15:36           ` Eli Zaretskii
2014-12-01 10:18           ` Richard Stallman
2014-12-01 16:18             ` Eli Zaretskii
2014-12-01 18:32               ` Stephen J. Turnbull
2014-12-01 19:12                 ` Eli Zaretskii
2014-12-01 20:08                   ` Stephen J. Turnbull
2014-12-01 20:42                     ` Eli Zaretskii
2014-12-02 14:42               ` Richard Stallman
2014-12-02 14:54                 ` Eli Zaretskii
2014-12-03  8:39                   ` Richard Stallman
2014-11-28 11:19 ` Ted Zlatanov
2014-11-28 13:58   ` Lars Magne Ingebrigtsen
2014-11-28 19:49     ` Ted Zlatanov
2014-11-28 21:02       ` Stefan Monnier
2014-11-29  0:26         ` Ted Zlatanov
2014-11-28 22:26       ` Eli Zaretskii
2014-11-28 14:24   ` Stefan Monnier
2014-11-28 14:57   ` Eli Zaretskii
2014-11-29  6:17   ` Stephen J. Turnbull
2014-11-28 14:45 ` Eli Zaretskii
2014-11-28 17:09 ` Richard Stallman
2014-11-28 18:28   ` Eli Zaretskii
2014-11-29 17:03     ` Richard Stallman
2014-11-29 17:06       ` Eli Zaretskii
2014-11-30  9:37         ` Richard Stallman
2014-11-30 15:16           ` Eli Zaretskii
2014-12-01 10:18             ` Richard Stallman
2014-12-01 16:02               ` Eli Zaretskii
2014-11-28 19:28   ` Andreas Schwab
2014-11-29 17:04     ` Richard Stallman
2014-11-29 17:11       ` Eli Zaretskii
2014-11-30  9:38         ` Richard Stallman
2014-11-30 15:20           ` Eli Zaretskii
2014-11-30 23:39             ` chad
2014-12-01  3:49               ` Eli Zaretskii
2014-12-01  8:01                 ` chad
2014-12-01 15:58                   ` Eli Zaretskii
2014-12-02 14:41                     ` Richard Stallman
2014-12-01 19:17                   ` Richard Stallman
2014-12-01 10:18             ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).