From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Bidirectional text and URLs Date: Mon, 01 Dec 2014 19:39:35 +0200 Message-ID: <83y4qr6zhk.fsf@gnu.org> References: <87a93cngwv.fsf@uwakimon.sk.tsukuba.ac.jp> <837fyfml31.fsf@gnu.org> <874mtio7wh.fsf@uwakimon.sk.tsukuba.ac.jp> <83r3wml8kq.fsf@gnu.org> <83zjb9an0q.fsf@gnu.org> <831toka82r.fsf@gnu.org> <83oaro8km7.fsf@gnu.org> <83iohw824c.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1417455610 30887 80.91.229.3 (1 Dec 2014 17:40:10 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 1 Dec 2014 17:40:10 +0000 (UTC) Cc: emacs-devel@gnu.org To: Lars Magne Ingebrigtsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Dec 01 18:40:04 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XvUx7-0004bx-Td for ged-emacs-devel@m.gmane.org; Mon, 01 Dec 2014 18:40:02 +0100 Original-Received: from localhost ([::1]:32966 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvUx7-00015g-HZ for ged-emacs-devel@m.gmane.org; Mon, 01 Dec 2014 12:40:01 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40566) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvUwo-00015T-Ml for emacs-devel@gnu.org; Mon, 01 Dec 2014 12:39:48 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XvUwe-0007ry-1G for emacs-devel@gnu.org; Mon, 01 Dec 2014 12:39:42 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:35997) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvUwd-0007rf-P3 for emacs-devel@gnu.org; Mon, 01 Dec 2014 12:39:31 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NFW00000ZKPI300@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Mon, 01 Dec 2014 19:39:29 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NFW000X4ZPS6Z60@a-mtaout22.012.net.il>; Mon, 01 Dec 2014 19:39:29 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:178606 Archived-At: > From: Lars Magne Ingebrigtsen > Cc: emacs-devel@gnu.org > Date: Mon, 01 Dec 2014 17:19:30 +0100 >=20 > Eli Zaretskii writes: >=20 > > Anyway, if you want this, please show the API of the function -- = what > > it should return and how. >=20 > Actually, I'm not sure. :-) Would it make any sense to have a func= tion > like `(displayed-directionality POSITION)' that returns either > `right-to-left' or `left-to-right? If so, the URL-finding function > would query about the start of the URL (which would normally be the= HTTP > part), and if that's `right-to-left', Here There Be Shenanigans. How is this different from the previous suggestion? > >> Yes, I want to unspoof the URL. Adding some markings to notify = that > >> this has been done would also be nice, perhaps by adding a 'warn= ing face > >> to the text or the like. > > > > Then putting a display property on the offending RLO might be the= best > > solution. >=20 > On the RLO character itself or the URL affected by the RLO? On the RLO. The URL will be left intact, and will show correctly after you put the display property. > >> And displaying =E2=80=AEhttp://myspace.com/#/segami/moc.koobecaf= //:sptth=E2=80=AC with a > >> couple of visible control characters doesn't really solve the pr= oblem, > >> because most people will still assume that that's a link to Face= book, > >> not to Myspace. Most people are not even aware that this bidi s= tuff > >> exists. > > > > Under my suggestion to cover the overrides with a display propert= y, > > the URL will not be reversed on display. Did you try that? >=20 > Oh, they won't? I thought you meant adding a display property to t= he > RLO in addition to having it do what it normally does. Any character covered by a display property effectively loses its bid= i properties, as described by this paragraph in the ELisp manual: Text covered by `display' text properties, by overlays with `display' properties whose value is a string, and by any other properties that replace buffer text, is treated as a single unit wh= en it is reordered for display. That is, the entire chunk of text cov= ered by these properties is reordered together. Moreover, the bidirecti= onal properties of the characters in such a chunk of text are ignored, a= nd Emacs reorders them as if they were replaced with a single characte= r `U+FFFC', known as the "Object Replacement Character". This means = that placing a display property over a portion of text may change the wa= y that the surrounding text is reordered for display. To prevent thi= s unexpected effect, always place such properties on text whose directionality is identical with text that surrounds it. > So is your suggestion here to disable all RLO (etc.) characters in = mail > buffers? No, only RLOs that affect URLs. Specifically, I suggest to look for RLO before a URL on the same physical line, and PDF or hard newline after it, and if found, cover it by a display property whose value is e.g. a string " ". Since jus= t the fact that you find an RLO before doesn't yet mean that it's a malicious RLO (other bidirectional controls which you don't want to know about can countermand the RLO before it affects the URL display)= , I suggest to augment that by checking that the URL's host and domain parts consist of LTR characters whose directionality was overridden. The latter part is to be done by calling a new primitive mentioned above. Given all this evidence, I think it's pretty much certain that we found our offending RLO.