From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Bidirectional text and URLs Date: Sun, 30 Nov 2014 23:05:36 +0200 Message-ID: <83oaro8km7.fsf@gnu.org> References: <87a93cngwv.fsf@uwakimon.sk.tsukuba.ac.jp> <837fyfml31.fsf@gnu.org> <874mtio7wh.fsf@uwakimon.sk.tsukuba.ac.jp> <83r3wml8kq.fsf@gnu.org> <83zjb9an0q.fsf@gnu.org> <831toka82r.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1417381566 28361 80.91.229.3 (30 Nov 2014 21:06:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 30 Nov 2014 21:06:06 +0000 (UTC) Cc: emacs-devel@gnu.org To: Lars Magne Ingebrigtsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 30 22:06:00 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XvBgu-0001tX-9Q for ged-emacs-devel@m.gmane.org; Sun, 30 Nov 2014 22:06:00 +0100 Original-Received: from localhost ([::1]:51764 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvBgt-0002rd-D8 for ged-emacs-devel@m.gmane.org; Sun, 30 Nov 2014 16:05:59 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53787) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvBgb-0002rK-6h for emacs-devel@gnu.org; Sun, 30 Nov 2014 16:05:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XvBgV-0002SQ-9L for emacs-devel@gnu.org; Sun, 30 Nov 2014 16:05:41 -0500 Original-Received: from mtaout27.012.net.il ([80.179.55.183]:41626) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvBgU-0002SA-QG for emacs-devel@gnu.org; Sun, 30 Nov 2014 16:05:35 -0500 Original-Received: from conversion-daemon.mtaout27.012.net.il by mtaout27.012.net.il (HyperSendmail v2007.08) id <0NFV00600EBS3F00@mtaout27.012.net.il> for emacs-devel@gnu.org; Sun, 30 Nov 2014 23:01:09 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout27.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NFV000VGEDXKG40@mtaout27.012.net.il>; Sun, 30 Nov 2014 23:01:09 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.183 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:178544 Archived-At: > From: Lars Magne Ingebrigtsen > Date: Sun, 30 Nov 2014 19:13:54 +0100 > Cc: emacs-devel@gnu.org >=20 > Because I was wondering whether my suggestion from yesterday (that = we > insert LRO/PDF characters into URLs if there is an LRO present in t= he > buffer when recognising URLs) is at all feasible, and from your > explanation, it seems like it would be. IMO, you are jumping to solutions too early, without a good understanding of the real problem. I also guess that you meant RLO, not LRO. The latter makes the embedded text render like strict left-to-right characters, so it doesn't need any special handling and cannot do any harm in URLs that use left-to-right characters (which is 99.99% of URLs). Can we please take a step back and try to identify the real problem here? What exactly are we trying to detect and handle? Is it true that we are trying to detect URLs whose characters got their "normal" bidirectional properties overridden by some directional control characters? If so, I can write a primitive that will take a region o= f buffer text and examine it to detect this. If it is something else, please tell what that is, and chances are yo= u can have it without having to go through a crash course in UBA. In any way, it is IMO wrong to look for specific controls that you just happened to learn yesterday. They are not what you need to look for, they are just one sign of what you are looking for. The UBA is too complex an algorithm, and it keeps evolving, so chances are there will be more ways to do these tricks. You need to define what is it that you are looking for, not search for this or that sign. Next, given that you have detected the spoofed URL, what do you want to do with it? Do you want to highlight it, do you want to de-spoof (i.e. undo the spoofing) in some way, but still leave some indication of the fact that it was spoofed, or maybe you want to remove any trac= e of the spoofing as if it never happened (and leave the user oblivious to the fact it did)? Given the answers to those questions, there's any number of possible solutions that do NOT require inserting more directional controls. Some of the possible solutions were already mentioned in this thread. Here's another: cover the offending RLO with a display property showing whatever you want -- a warning sign, a smiley, a string made of a SPC character, anything. You can try it with your example: you will see the spoofing gone immediately. Why is this worse than inserting directional controls whose effect on the surrounding text can be far reaching? > 2) If there is an LRO in the buffer, then, after recognising an URL= , it > is further treated. >=20 > * If it contains no strongly right-to-left characters, we just wrap= it > in an LRO/PDF pair. URLs like "http://myspace.com" will then be > guaranteed to be displayed reading left-to-right. >=20 > * If the URL is like http://=D7=90=D7=91=D7=92.=D7=93=D7=94=D7=95= =D7=96=D7=97=D7=98.=D7=A7=D7=95=D7=9D, we would segment the URL > into strongly-left-to-right-with-weak-chars and > strongly-right-to-left-with-weak-chars segments. We wrap each > left-to-right-with-weak-chars in LRO/PDF pairs. This will change how these URLs are displayed, in a way that users will not like, and personally it sounds to me like another kind of phishing. > Emacs already exposes the weak/strong/LTR/RTL status of each charac= ter, > so function to do this LRO/PDF insertion is trivial. It's like a > seven-line Elisp function or something. It's easy to insert them, yes. But the effect is not what you or our users necessarily want. More importantly, there are better ways to deal with that, provided that we DEFINE WHAT PROBLEMS DO WE WANT TO SOLVE, AND HOW. > >From what you say, sounds like it would make the display of these = URLs > acceptable for bidi readers, too -- this would be the normal displa= y of > these URLs, anyway. No, it isn't. You cannot get the correct display by overriding the bidi properties with LRO or its ilk. You can see the differences by moving point with C-f.