From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Bidirectional text and URLs Date: Sun, 30 Nov 2014 19:53:32 +0200 Message-ID: <831toka82r.fsf@gnu.org> References: <87a93cngwv.fsf@uwakimon.sk.tsukuba.ac.jp> <837fyfml31.fsf@gnu.org> <874mtio7wh.fsf@uwakimon.sk.tsukuba.ac.jp> <83r3wml8kq.fsf@gnu.org> <83zjb9an0q.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1417370033 21129 80.91.229.3 (30 Nov 2014 17:53:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 30 Nov 2014 17:53:53 +0000 (UTC) Cc: emacs-devel@gnu.org To: Lars Magne Ingebrigtsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 30 18:53:48 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xv8gs-00084i-Qv for ged-emacs-devel@m.gmane.org; Sun, 30 Nov 2014 18:53:46 +0100 Original-Received: from localhost ([::1]:51133 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xv8gs-0007OG-Ab for ged-emacs-devel@m.gmane.org; Sun, 30 Nov 2014 12:53:46 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48154) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xv8gj-0007O6-Ch for emacs-devel@gnu.org; Sun, 30 Nov 2014 12:53:43 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xv8gd-0004bi-90 for emacs-devel@gnu.org; Sun, 30 Nov 2014 12:53:37 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:35713) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xv8gd-0004bc-1K for emacs-devel@gnu.org; Sun, 30 Nov 2014 12:53:31 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NFV009005I6HA00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Sun, 30 Nov 2014 19:53:29 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NFV009WE5P4BE30@a-mtaout22.012.net.il>; Sun, 30 Nov 2014 19:53:29 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:178528 Archived-At: > From: Lars Magne Ingebrigtsen > Date: Sun, 30 Nov 2014 17:26:33 +0100 >=20 > Just a point of clarification: When people embed URLs in paragraphs= with > mainly right-to-left script (like Hebrew) Let's clear up terminology first, OK? There's no distinction in bidi display and bidi scripts between "paragraphs with mainly right-to-left scripts" and "paragraphs with mainly left-to-right scripts". Instead, there's "the base direction of a paragraph", which can be either left-to-right (LTR) or right-to-left (RTL). The former is displayed with the first characte= r (in the _visual_ order!) at the left edge of the window, while the latter at the right edge. It is true that the LTR paragraphs make most sense when most of the paragraph text is made of LTR characters, and the RTL paragraphs in the opposite case. But nothing prevents me from having a paragraph whose base direction is LTR which is nevertheless full of RTL characters. It is entirely legitimate and sometimes even necessary. Emacs determines the base direction of a paragraph by searching for the first strong directional character in the paragraph (this is a simplification, the actual rules described in the UBA are more complex). Buffer-local variable bidi-paragraph-direction overrides this dynamic calculation and forces a specific base direction on all paragraphs of the buffer. With this out of our way, I will assume that you were asking about URLs that are part of paragraphs whose base direction is RTL. Now let's go back to your question: > do they expect to see http://myspace.com or =E2=80=AE?http://myspac= e.com The answer to your question is "it depends". Here are 3 examples, to see them as I intended, make sure you are viewing them in a buffer whose bidi-paragraph-direction is set to nil: abc http://=D7=90=D7=91=D7=92.=D7=93=D7=94=D7=95=D7=96=D7=97=D7=98.= =D7=A7=D7=95=D7=9D =D7=90=D7=91=D7=92 http://foo.bar.com =D7=90=D7=91=D7=92 http://=D7=90=D7=91=D7=92.=D7=93=D7=94=D7=95=D7= =96=D7=97=D7=98.=D7=A7=D7=95=D7=9D The leading 3 letters (1 would be enough) cause Emacs to decide that the paragraph has LTR base direction in the 1st example and RTL base direction in the last 2 examples. Now move the cursor with C-f from the beginning of each of these thre= e lines (you can get to the beginning of a line with C-a or Home, as usual), and I hope you will see what's going on: cursor movement with C-f follows the "reading order", i.e. the order in which a human is supposed to read these URLs. To summarize: Latin characters are displayed left to right, even in RTL paragraphs, while right-to-left characters are always displayed right to left. Neutral characters (slash, period) take the direction of the surrounding text. > (If I did that correctly, the latter URL should have an RLO charact= er > preceding it so that it reads right to left.) As you see above, there's no need to use any directional overrides to see what users expect: Emacs does that automatically, by following th= e Unicode Bidirectional Algorithm (UBA). You just need to arrange for the paragraph to have a RTL base direction, which is very easy, as shown above. RLO and LRO (and the other directional control characters) are needed when you need to override the normal reordering for some reason, typically because you want punctuation characters to take a different directionality from its default. This is rarely needed when renderin= g URLs. HTH May I ask why you came up with the question?