From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Bidirectional text and URLs Date: Mon, 01 Dec 2014 20:22:36 +0200 Message-ID: <83r3wj6xhv.fsf@gnu.org> References: <87a93cngwv.fsf@uwakimon.sk.tsukuba.ac.jp> <837fyfml31.fsf@gnu.org> <874mtio7wh.fsf@uwakimon.sk.tsukuba.ac.jp> <83r3wml8kq.fsf@gnu.org> <83zjb9an0q.fsf@gnu.org> <831toka82r.fsf@gnu.org> <83oaro8km7.fsf@gnu.org> <83iohw824c.fsf@gnu.org> <83y4qr6zhk.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1417458186 11132 80.91.229.3 (1 Dec 2014 18:23:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 1 Dec 2014 18:23:06 +0000 (UTC) Cc: emacs-devel@gnu.org To: Lars Magne Ingebrigtsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Dec 01 19:22:59 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XvVcf-0000bv-9B for ged-emacs-devel@m.gmane.org; Mon, 01 Dec 2014 19:22:57 +0100 Original-Received: from localhost ([::1]:33145 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvVce-00044D-Rf for ged-emacs-devel@m.gmane.org; Mon, 01 Dec 2014 13:22:56 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56844) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvVcL-000444-Fi for emacs-devel@gnu.org; Mon, 01 Dec 2014 13:22:43 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XvVcF-0000Ih-Lu for emacs-devel@gnu.org; Mon, 01 Dec 2014 13:22:37 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:42757) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvVcF-0000IP-F1 for emacs-devel@gnu.org; Mon, 01 Dec 2014 13:22:31 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NFX000001DWVE00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Mon, 01 Dec 2014 20:22:30 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NFX0006K1PHD8A0@a-mtaout22.012.net.il>; Mon, 01 Dec 2014 20:22:30 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:178615 Archived-At: > From: Lars Magne Ingebrigtsen > Cc: emacs-devel@gnu.org > Date: Mon, 01 Dec 2014 18:49:58 +0100 > > Eli Zaretskii writes: > > >> > Anyway, if you want this, please show the API of the function -- what > >> > it should return and how. > >> > >> Actually, I'm not sure. :-) Would it make any sense to have a function > >> like `(displayed-directionality POSITION)' that returns either > >> `right-to-left' or `left-to-right? If so, the URL-finding function > >> would query about the start of the URL (which would normally be the HTTP > >> part), and if that's `right-to-left', Here There Be Shenanigans. > > > > How is this different from the previous suggestion? > > I'm not sure what you are referring to. I'm saying that asking about "characters between FROM and TO that were supposed to be LTR, but was forced to display as RTL", and asking essentially the same question about a character at POS, is actually asking the same question. IOW, the same API will be able to satisfy both needs. (defun bidi-find-overridden-directionality (from to) "Return position between FROM and TO where directionality was overridden. This function returns the first character position in the specified region where there is a character whose `bidi-class' property is `L', but which was forced to display as `R' by a directional override, and likewise with characters whose `bidi-class' is `R' or `AL' that were forced to display as `L'. Strong directional characters `L', `R', and `AL' can have their intrinsic directionality overridden by directional override control characters RLO \(u+202e) and LRO \(u+202d)." OK? If you want, the function can return a cons cell (POS . DIR), where POS is the position and DIR is the intrinsic directionality of the overridden character. Or even (POS . DIR-ORIG DIR-OVERRIDDEN). > > No, only RLOs that affect URLs. > > > > Specifically, I suggest to look for RLO before a URL on the same > > physical line, and PDF or hard newline after it, and if found, cover > > it by a display property whose value is e.g. a string " ". Since just > > the fact that you find an RLO before doesn't yet mean that it's a > > malicious RLO (other bidirectional controls which you don't want to > > know about can countermand the RLO before it affects the URL display), > > I suggest to augment that by checking that the URL's host and domain > > parts consist of LTR characters whose directionality was overridden. > > The latter part is to be done by calling a new primitive mentioned > > above. > > > > Given all this evidence, I think it's pretty much certain that we > > found our offending RLO. > > If you think that that's sufficient (that we only need to look for > preceding RLOs on the same line), then this sounds like a good solution > to me. We need to look for an RLO on the same line when a LTR character was forced to display as RTL, and for LRO in the opposite case. This will detect the case you've demonstrated at the beginning of this thread. I don't know about other similar cases, so if you don't know either, I suggest to treat this problem, and take it from there.