From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Bidirectional text and URLs Date: Sat, 29 Nov 2014 15:09:02 +0900 Message-ID: <874mtio7wh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87a93cngwv.fsf@uwakimon.sk.tsukuba.ac.jp> <837fyfml31.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1417241401 30583 80.91.229.3 (29 Nov 2014 06:10:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Nov 2014 06:10:01 +0000 (UTC) Cc: larsi@gnus.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 29 07:09:54 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XubE8-0005Hd-MF for ged-emacs-devel@m.gmane.org; Sat, 29 Nov 2014 07:09:52 +0100 Original-Received: from localhost ([::1]:46840 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XubE7-0006p9-Rv for ged-emacs-devel@m.gmane.org; Sat, 29 Nov 2014 01:09:51 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52368) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XubDn-0006p0-EP for emacs-devel@gnu.org; Sat, 29 Nov 2014 01:09:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XubDf-0006Th-V7 for emacs-devel@gnu.org; Sat, 29 Nov 2014 01:09:31 -0500 Original-Received: from shako.sk.tsukuba.ac.jp ([130.158.97.161]:37654) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XubDX-0006R9-Pu; Sat, 29 Nov 2014 01:09:16 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by shako.sk.tsukuba.ac.jp (Postfix) with ESMTPS id 4C35D1C3958; Sat, 29 Nov 2014 15:09:02 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 266C01A273D; Sat, 29 Nov 2014 15:09:02 +0900 (JST) In-Reply-To: <837fyfml31.fsf@gnu.org> X-Mailer: VM undefined under 21.5 (beta34) "kale" acf1c26e3019 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 130.158.97.161 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:178443 Archived-At: Eli Zaretskii writes: > Not really, not in this particular field. > > > but I would say that given that the UAX#9 bidi algorithm does what's > > wanted 99.44% of the time, it makes sense to mark text reordered by > > RTL markers with a warning face > > That might be considered an annoyance by users of bidi scripts. > There's any number of perfectly valid URLs that use the same > formatting control characters. Why? Because many displays don't implement UAX#9? Or is it because UAX#9 defines segments in a way that would reorder the components of a domain name or path? That is, the logical URL http://www.example.com/ABC/DEF/ is expected by a bidi reader to appear as http://www.example.com/CBA/FED/ but UAX#9 would display it as http://www.example.com/FED/CBA/ (the natural direction of lowercase characters is LTR, the natural direction of uppercase characters is RTL)? (Or perhaps the reverse misdisplay.) Whatever the reason, I'd have to say that's too bad for users of bidi languages, because that means *any* bidi URLs is ambiguous, and therefore subject to being deliberately obfuscated by reflection and/or jumbling, regardless of the presence of directional controls. > What you suggest might be TRT when left-to-right text is enclosed > within directional override controls (which is what Lars did in his > example). These controls assign right-to-left directionality to all > the enclosed characters, which is indeed highly suspicious in URLs. This isn't hard to detect. But there is also the case where you have a word which is a different word when reflected. I assume that this is the case in bidi languages as well, and of course any jumble is possible as a domain or path component which is an abbreviation. And any useful jumble can probably be registered as a domain, and certainly incorporated in a path. > In addition to using a special face, another possibility is to present > the directional overrides in these cases in percent-hex notation, > which will disable their effect on the enclosed text. Of course, this > should be only done when the enclosed text is entirely made of LTR > characters and neutrals. Well, no. I assume that bidi readers are as vulnerable to phishing and other frauds as non-bidi readers (hard as that may be to believe for you bidi readers). That is not yet clear. > > You do need a way to turn it off, or to make it reasonably smart, in > > the case of ASCII which is often mixed with other charsets. > > Not sure what you mean here. As above, where the domain name is ASCII and the path is RTL. Or the path (or the domain) might be mixed. > "Turn off" how? "We need to decide what we want to do, and then look for a mechanism." > And how do you do that without unduly punishing perfectly valid > URLs that need these controls to avoid visual "jumbles"? I hate to tell you, but the phishers have *already* started punishing those perfectly valid URLs. You have a choice of punishment, that's all: "jumbled display" vs. "defrauded users". Except that as I say above, apparently all bidi URLs must now be considered to offer suspicious display under some circumstances, so maybe you have no choice about the defrauded users. In that case I suppose avoiding jumbles does take precedence.