From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Filipe Moreira Newsgroups: gmane.emacs.bugs Subject: bug#22429: Force character to be recognized as LTR inside RTL paragraph Date: Fri, 22 Jan 2016 11:54:45 +0000 Message-ID: References: <56a13f4844c61b5100000006@polymail.io> <83mvry6nq1.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a113546420215fb0529eae586 X-Trace: ger.gmane.org 1453463787 7236 80.91.229.3 (22 Jan 2016 11:56:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 22 Jan 2016 11:56:27 +0000 (UTC) Cc: 22429@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jan 22 12:56:19 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aMaK9-0002C6-Ay for geb-bug-gnu-emacs@m.gmane.org; Fri, 22 Jan 2016 12:56:17 +0100 Original-Received: from localhost ([::1]:52870 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMaK8-0002wr-Hq for geb-bug-gnu-emacs@m.gmane.org; Fri, 22 Jan 2016 06:56:16 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49621) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMaK3-0002wC-F1 for bug-gnu-emacs@gnu.org; Fri, 22 Jan 2016 06:56:13 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aMaJy-0003e6-1e for bug-gnu-emacs@gnu.org; Fri, 22 Jan 2016 06:56:09 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:40597) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMaJx-0003dz-UH for bug-gnu-emacs@gnu.org; Fri, 22 Jan 2016 06:56:05 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84) (envelope-from ) id 1aMaJu-0005Ls-Eb for bug-gnu-emacs@gnu.org; Fri, 22 Jan 2016 06:56:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Filipe Moreira Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 22 Jan 2016 11:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 22429 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 22429-submit@debbugs.gnu.org id=B22429.145346373320534 (code B ref 22429); Fri, 22 Jan 2016 11:56:02 +0000 Original-Received: (at 22429) by debbugs.gnu.org; 22 Jan 2016 11:55:33 +0000 Original-Received: from localhost ([127.0.0.1]:57050 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aMaJQ-0005L8-LN for submit@debbugs.gnu.org; Fri, 22 Jan 2016 06:55:33 -0500 Original-Received: from mail-qg0-f46.google.com ([209.85.192.46]:36122) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aMaJO-0005Kv-Iu for 22429@debbugs.gnu.org; Fri, 22 Jan 2016 06:55:31 -0500 Original-Received: by mail-qg0-f46.google.com with SMTP id e32so55366174qgf.3 for <22429@debbugs.gnu.org>; Fri, 22 Jan 2016 03:55:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=2uGq9osdZ7amVyQHQHZRk4DcvdrYO1K4EjKVNTjacz8=; b=hpValq2Tpidh5oIh6trMISNpHHJZws6FVYnhC+m1FuPQVnR2CrVE2OCzj7kOAbA+yQ RGDwmdVI4y7/3DhQLUI28XaG2ZZnwO4E1FK8rfJ2n8dy3A96A9bFg+awKbXN1KL6rlsg detI+W5Vl00/zNky3I80M7TuONsnTkLj4uy+wXcKBFzeYtcsT8Q1o60ZFdLqjTk8gAk/ 5OLL8hLSr2tCY3OMOk3f+ocan4n9f7bID4OivMcBDZJ0EFKSk5rAJMXtWmhWq0NR/toB SfH9Zl7DPwIXJ8cNp4c3eDNTALcnZ8ROvcWRegbmsCOh1o/fSYq/QoS4bqm7nwN2wjQp vJxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=2uGq9osdZ7amVyQHQHZRk4DcvdrYO1K4EjKVNTjacz8=; b=Y0vWbnQNZAE8Gn/c7WazJXdqvPJ1xy2fQaN7/r4+NZP7z+IDKGPdCpIZ6gJv28PTpz 1YwH2XdaE/9g6A0sZvH9YVQhYSeyHsVnosbYt+kACvYxHi/eBW1pd1Jjk/2/vkiCFQCY ZWc7vA5gKlV0hpDTseE3IKctsqOtxUnAZcFuiJu+RMZ8KtOBM7afilUhauqBt8L8u3fW pyhVBIBH6Umo/onVdOp7/reqm9uCdpXDCdD7MiXzz4AIMAsx7gPzSVe5WrTRri1jGxtf EEzJ6Umhi7bw/Cj3+/SmLF5Sm8nIxc6e7UFzJH5NUADvkhDTWe0PmSD0N4I9ghX3kHy2 4jAg== X-Gm-Message-State: AG10YOSGniWr+HIOiQNvSxGBvaypGLWX1vYkctgMMqvmEQ0xr/VU90Lex5K8TawdXllBodU6th6O0/IroV1eSA== X-Received: by 10.140.18.136 with SMTP id 8mr2930560qgf.64.1453463724755; Fri, 22 Jan 2016 03:55:24 -0800 (PST) Original-Received: by 10.55.135.199 with HTTP; Fri, 22 Jan 2016 03:54:45 -0800 (PST) In-Reply-To: <83mvry6nq1.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:111851 Archived-At: --001a113546420215fb0529eae586 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Eli, Thank for taking the time to look into this On Fri, Jan 22, 2016 at 8:08 AM, Eli Zaretskii wrote: > > Date: Thu, 21 Jan 2016 13:14:22 -0800 > > From: "Filipe Moreira" > > > > I=E2=80=99m using Emacs as a LaTeX editor, with the AUCTeX mode. One do= cument I=E2=80=99m > > authoring is written in English with some paragraphs in Hebrew or Greek= . > > > > The issue I have is with mixing some neutral characters that need to be > LTR, > > inside a paragraph which is RTL. An example of this is the slash (i.e. > =E2=80=98\=E2=80=99) > > character used by LaTeX to signal its commands. Inside a RTL paragraph = I > > ideally want to force Emacs to always interpret the slash character, as > well as > > the open and close brackets (i.e. {}) as LTR. > > > > This is not what happens at the moment. Here I have a visual > representation of > > the problem: > > > http://emacs.stackexchange.com/questions/19696/handling-left-to-right-ins= ide-right-to-left-paragraphs-using-emacs-and-auctex > . > > > > Is it possible to whitelist some characters that should always be > interpreted > > as LTR? > > The directionality of characters is determined by their bidirectional > class property as defined by the Unicode Character Database. Emacs > uses those definitions in its implementation of the UBA, the Unicode > Bidirectional Algorithm, when it lays out text for display. > Punctuation characters, such as \, {, and } have "weak > directionality": they take the directionality of the surrounding text, > and if the directionality on either side is different, they default to > the paragraph's base direction, which is RTL in your case. So that is > what you see. > > Emacs being Emacs, you can programmatically change the bidirectional > class of every character, but that change has global effect: it will > affect the directionality of that character everywhere in the Emacs > session. So this is not recommended. > Also this is not recommended, I would be willing to have the bidi class property of some characters set to left-to-right, like the example of the slash character. Can you point somewhere regarding this? I saw the get-char-code-property function but could not find anyway to actually change the setting. > > The correct solution to these problems is to wrap the footnote block > in the LRE..PDF or LRI..PDI control characters, so that the footnote > is rendered independently of the surrounding bidirectional context. > See the example below. Not sure if LaTeX will DTRT with directional > control characters, but if it doesn't, that's a bug/misfeature in > LaTeX. > > \begin{hebrew} > \pstart > > =D7=91=D6=BC=D6=B0=D7=A8=D6=B5=D7=90=D7=A9=D7=81=D6=B4=D6=96=D7=99=D7=AA= =E2=80=AA\footnoteA{This is a Hebrew related footnote}=E2=80=AC =D7=91=D6= =BC=D6=B8=D7=A8=D6=B8=D6=A3=D7=90 > =D7=90=D6=B1=D7=9C=D6=B9=D7=94=D6=B4=D6=91=D7=99=D7=9D =D7=90=D6=B5=D6=A5= =D7=AA =D7=94=D6=B7=D7=A9=D6=BC=D7=81=D6=B8=D7=9E=D6=B7=D6=96=D7=99=D6=B4= =D7=9D =D7=95=D6=B0=D7=90=D6=B5=D6=A5=D7=AA =D7=94=D6=B8=D7=90=D6=B8=D6=BD= =D7=A8=D6=B6=D7=A5=D7=83 > > \pend > \end{hebrew} > In this example the direction of the surrounding Hebrew text has been changed. The word =D7=91=D6=BC=D6=B0=D7=A8=D6=B5=D7=90=D7=A9=D7=81=D6=B4=D6= =96=D7=99=D7=AA should come before (i.e. on the right) of the word =D7=91=D6=BC=D6=B8=D7=A8=D6=B8=D6=A3=D7=90. So while the footnote = command is correctly shown as LTR the Hebrew text has been changed. I don't think is is the expected. See the updated image ( http://emacs.stackexchange.com/questions/19696/handling-left-to-right-insid= e-right-to-left-paragraphs-using-emacs-and-auctex) that shows TextEdit correct handling of this. > > Another possibility is to insert newlines between the footnote and the > surrounding text, as shown below. Not sure if LaTeX will be happy > with that, and I think it's uglier anyway. > > \begin{hebrew} > \pstart > > =D7=91=D6=BC=D6=B0=D7=A8=D6=B5=D7=90=D7=A9=D7=81=D6=B4=D6=96=D7=99=D7=AA > > \footnoteA{This is a Hebrew related footnote} > > =D7=91=D6=BC=D6=B8=D7=A8=D6=B8=D6=A3=D7=90 =D7=90=D6=B1=D7=9C=D6=B9=D7=94= =D6=B4=D6=91=D7=99=D7=9D =D7=90=D6=B5=D6=A5=D7=AA =D7=94=D6=B7=D7=A9=D6=BC= =D7=81=D6=B8=D7=9E=D6=B7=D6=96=D7=99=D6=B4=D7=9D =D7=95=D6=B0=D7=90=D6=B5= =D6=A5=D7=AA =D7=94=D6=B8=D7=90=D6=B8=D6=BD=D7=A8=D6=B6=D7=A5=D7=83 > > \pend > \end{hebrew} > Unfortunately for my use case this is not possible. > > I don't think there's a bug to fix here, so I'm going to close this > bug report. Any objections? > Is there any change of having a way to set the unicode bidirectionally of a character within each separate mode? Could this be considered a feature? --001a113546420215fb0529eae586 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Eli,

Thank for taking the time to lo= ok into this

On Fri= , Jan 22, 2016 at 8:08 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> Date: Thu, 21 Jan 2016 13:14:22 -0800
> From: "Filipe Moreira" <famoreira@gmail.com>
>
> I=E2=80=99m using Emacs as a LaTeX editor, with the AUCTeX mode. One d= ocument I=E2=80=99m
> authoring is written in English with some paragraphs in Hebrew or Gree= k.
>
> The issue I have is with mixing some neutral characters that need to b= e LTR,
> inside a paragraph which is RTL. An example of this is the slash (i.e.= =E2=80=98\=E2=80=99)
> character used by LaTeX to signal its commands. Inside a RTL paragraph= I
> ideally want to force Emacs to always interpret the slash character, a= s well as
> the open and close brackets (i.e. {}) as LTR.
>
> This is not what happens at the moment. Here I have a visual represent= ation of
> the problem:
> http://emacs.stackexchange.com/questions/19696= /handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auc= tex.
>
> Is it possible to whitelist some characters that should always be inte= rpreted
> as LTR?

The directionality of characters is determined by their bidirectional
class property as defined by the Unicode Character Database.=C2=A0 Emacs uses those definitions in its implementation of the UBA, the Unicode
Bidirectional Algorithm, when it lays out text for display.
Punctuation characters, such as \, {, and } have "weak
directionality": they take the directionality of the surrounding text,=
and if the directionality on either side is different, they default to
the paragraph's base direction, which is RTL in your case.=C2=A0 So tha= t is
what you see.

Emacs being Emacs, you can programmatically change the bidirectional
class of every character, but that change has global effect: it will
affect the directionality of that character everywhere in the Emacs
session.=C2=A0 So this is not recommended.

<= div>Also this is not recommended, I would be willing to have the bidi class= property of some characters set to left-to-right, like the example of the = slash character. Can you point somewhere regarding this? I saw the get-char= -code-property function but could not find anyway to actually change the se= tting.
=C2=A0

The correct solution to these problems is to wrap the footnote block
in the LRE..PDF or LRI..PDI control characters, so that the footnote
is rendered independently of the surrounding bidirectional context.
See the example below.=C2=A0 Not sure if LaTeX will DTRT with directional control characters, but if it doesn't, that's a bug/misfeature in LaTeX.

\begin{hebrew}
=C2=A0 \pstart

=D7=91=D6=BC=D6=B0=D7=A8=D6=B5=D7=90=D7=A9=D7=81=D6=B4=D6=96=D7=99=D7=AA=E2= =80=AA\footnoteA{This is a Hebrew related footnote}=E2=80=AC =D7=91=D6=BC= =D6=B8=D7=A8=D6=B8=D6=A3=D7=90 =D7=90=D6=B1=D7=9C=D6=B9=D7=94=D6=B4=D6=91= =D7=99=D7=9D =D7=90=D6=B5=D6=A5=D7=AA =D7=94=D6=B7=D7=A9=D6=BC=D7=81=D6=B8= =D7=9E=D6=B7=D6=96=D7=99=D6=B4=D7=9D =D7=95=D6=B0=D7=90=D6=B5=D6=A5=D7=AA = =D7=94=D6=B8=D7=90=D6=B8=D6=BD=D7=A8=D6=B6=D7=A5=D7=83

=C2=A0 \pend
\end{hebrew}

In this example the direct= ion of the surrounding Hebrew text has been changed. The word =D7=91=D6=BC= =D6=B0=D7=A8=D6=B5=D7=90=D7=A9=D7=81=D6=B4=D6=96=D7=99=D7=AA should come be= fore (i.e. on the right) of the word =D7=91=D6=BC=D6=B8=D7=A8=D6=B8=D6=A3= =D7=90. So while the footnote command is correctly shown as LTR the Hebrew = text has been changed. I don't think is is the expected. See the update= d image (http= ://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-ri= ght-to-left-paragraphs-using-emacs-and-auctex) that shows TextEdit corr= ect handling of this.
=C2=A0

Another possibility is to insert newlines between the footnote and the
surrounding text, as shown below.=C2=A0 Not sure if LaTeX will be happy
with that, and I think it's uglier anyway.

\begin{hebrew}
=C2=A0 \pstart

=D7=91=D6=BC=D6=B0=D7=A8=D6=B5=D7=90=D7=A9=D7=81=D6=B4=D6=96=D7=99=D7=AA
\footnoteA{This is a Hebrew related footnote}

=D7=91=D6=BC=D6=B8=D7=A8=D6=B8=D6=A3=D7=90 =D7=90=D6=B1=D7=9C=D6=B9=D7=94= =D6=B4=D6=91=D7=99=D7=9D =D7=90=D6=B5=D6=A5=D7=AA =D7=94=D6=B7=D7=A9=D6=BC= =D7=81=D6=B8=D7=9E=D6=B7=D6=96=D7=99=D6=B4=D7=9D =D7=95=D6=B0=D7=90=D6=B5= =D6=A5=D7=AA =D7=94=D6=B8=D7=90=D6=B8=D6=BD=D7=A8=D6=B6=D7=A5=D7=83

=C2=A0 \pend
\end{hebrew}

Unfortunately for my use c= ase this is not possible.=C2=A0

I don't think there's a bug to fix here, so I'm going to close = this
bug report.=C2=A0 Any objections?

Is th= ere any change of having a way to set the unicode bidirectionally of =C2=A0= a character within each separate mode? Could this be considered a feature?<= /div>

--001a113546420215fb0529eae586--