From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Bidirectional text and URLs Date: Mon, 08 Dec 2014 17:46:49 +0200 Message-ID: <83zjayxhxy.fsf@gnu.org> References: <87a93cngwv.fsf@uwakimon.sk.tsukuba.ac.jp> <837fyfml31.fsf@gnu.org> <874mtio7wh.fsf@uwakimon.sk.tsukuba.ac.jp> <83r3wml8kq.fsf@gnu.org> <83a938aeuc.fsf@gnu.org> <838uir8huv.fsf@gnu.org> <831toi6r4w.fsf@gnu.org> <83sigy53m2.fsf@gnu.org> <83d28061nt.fsf@gnu.org> <83zjb11y76.fsf@gnu.org> <83iohnzcfk.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1418053658 3540 80.91.229.3 (8 Dec 2014 15:47:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 8 Dec 2014 15:47:38 +0000 (UTC) Cc: larsi@gnus.org, emacs-devel@gnu.org To: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Dec 08 16:47:28 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xy0X1-0005Td-UZ for ged-emacs-devel@m.gmane.org; Mon, 08 Dec 2014 16:47:28 +0100 Original-Received: from localhost ([::1]:34620 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xy0X1-0000Yt-Jq for ged-emacs-devel@m.gmane.org; Mon, 08 Dec 2014 10:47:27 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34584) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xy0Wt-0000Yf-CP for emacs-devel@gnu.org; Mon, 08 Dec 2014 10:47:25 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xy0Wn-0002G8-8b for emacs-devel@gnu.org; Mon, 08 Dec 2014 10:47:19 -0500 Original-Received: from mtaout24.012.net.il ([80.179.55.180]:33280) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xy0Wm-0002Fw-RA; Mon, 08 Dec 2014 10:47:13 -0500 Original-Received: from conversion-daemon.mtaout24.012.net.il by mtaout24.012.net.il (HyperSendmail v2007.08) id <0NG900O00SOYTU00@mtaout24.012.net.il>; Mon, 08 Dec 2014 17:39:12 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout24.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NG900F28STCT790@mtaout24.012.net.il>; Mon, 08 Dec 2014 17:39:12 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.180 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:179397 Archived-At: > Date: Sun, 07 Dec 2014 19:26:33 -0500 > From: Richard Stallman > CC: larsi@gnus.org, emacs-devel@gnu.org > > > > If so, the questionis: once you detect the strangeness, what then? > > > It's up to the application. > > Alas, that's ducking the issue. We need to confront this issue. We _are_ confronting it. We are methodically analyzing the issue piecemeal, identifying the separate parts of it, and providing solutions to each part as soon as it is well-defined and understood. The problem we are dealing with is a very complex one. It involves multiple disciplines: bidi reordering, URL construction and display, Internet security, cultural differences, human perception of visual cues, etc. Part of the solution should be in the infrastructure and primitives, part on the application and UI level. Moreover, we are in uncharted territory, with no prior art or standards to guide us. Plus, we don't have any single individual on board who'd have a good understanding of all the aspects of the problem. When dealing with such hard issues, it is IME methodologically wrong to charge ahead without a sufficiently clear definition and understanding of each part of the problem and the alternatives for their solutions. We have now identified the first part: how to find the potentially fraudulent URL, and we have a clear understanding of it. We have a solution for that part of the problem that seems to satisfy the requirements of the application programmer who brought up this issue. The next step should be for the application to try using this infrastructure to address the issue on the application and UI levels. It is possible that that such an attempt will result in feedback that will require changes in the infrastructure, or some additional functionality there. Or the application developers will decide that this part of the problem is successfully solved, and will request assistance in solving the next part, which will need to be defined in clear terms. And so on and so forth -- we will break this complex issue into individual parts and solve them one by one on the level each part belongs to. That's not "ducking the issue" in my book. What you seem to expect is that we start coding solutions to problems that are at best very vaguely defined, without any practical experience to back that up, guided only by some intuition. IME, this is a recipe for wrong solutions and for waste of time and energy. I submit that there's no one around here, including myself, whose intuition in this matter I would trust, because intuition is only reliable when it is based on knowledge and experience in the subject matter, and we don't have such individuals at our disposal. So I don't see any reasons to rush into coding under the circumstances. > > That's easy: copy the text without the directional override and > > display it in some other buffer. The position returned by > > bidi-find-overridden-directionality is of the 1st character following > > the override control, so copying the text starting at that position > > will exclude the override and avoid its effects. > > That is the first magic bidi char, but there could be more. Inside the URL? Extremely unlikely, see below. In any case, the presented use case didn't have them. I'd like to see a complete solution for this simple use case, before we move to more complex ones (if they exist). > It would be necessary to remove them all. I don't think it's a problem, not a likely one anyway. But if it is, it should be almost trivial to use that primitive iteratively to reconstruct the string with all the overrides removed. > However, is simply removing them correct? Yes, I think so. > In general, do magic bidi characters get include in the URL that is > passed to the browser? I would expect so. Using the directional control characters as part of the URL is forbidden by the relevant standards. The authorities that approve domain names will reject them if they include such characters. So I think URLs which include them will be non-existent, or at least very rare. The use case which started this thread of discussion had the control characters outside the URL itself, even outside the protocol part of it. > If so, a string which does not include them is inaccurate, and the > accurate thing to do is to include them and display them (perhaps in > hex) while suppressing their bidi effect. Removing them and suppressing their effect give rise to the same visual appearance, since these controls display as very thin spaces, and thus are almost invisible on the screen. That's why this type of fraud came into existence in the first place. As for using hex, that was one alternative I suggested earlier in this thread. It is still on the table, and doesn't require any infrastructure changes to do its job. But people liked this proposal less, so eventually I coded the primitive to find the spoofed characters as a means for supporting other solutions. > Also, don't some RTL characters cause some normally LTR characters to > display RTL? No. LTR characters always display left to right, unless overridden by the RLO control (which simply makes every character act as an RTL character).