unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#22429: Force character to be recognized as LTR inside RTL paragraph
@ 2016-01-21 21:14 Filipe Moreira
  2016-01-22  8:08 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Filipe Moreira @ 2016-01-21 21:14 UTC (permalink / raw)
  To: 22429


[-- Attachment #1.1: Type: text/plain, Size: 951 bytes --]

Hi everyone,

I’m using Emacs as a LaTeX editor, with the AUCTeX mode. One document I’m authoring is written in English with some paragraphs in Hebrew or Greek. 

The issue I have is with mixing some neutral characters that need to be LTR, inside a paragraph which is RTL. An example of this is the slash (i.e. ‘\’) character used by LaTeX to signal its commands. Inside a RTL paragraph I ideally want to force Emacs to always interpret the slash character, as well as the open and close brackets (i.e. {}) as LTR. 

This is not what happens at the moment. Here I have a visual representation of the problem: 
http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex
.

Is it possible to whitelist some characters that should always be interpreted as LTR?

Thanks

Filipe Moreira

-- 

Freelance Web Developer(Ruby & Javascript)

http://coderelax.com/

[-- Attachment #1.2: Type: text/html, Size: 3071 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#22429: Force character to be recognized as LTR inside RTL paragraph
  2016-01-21 21:14 bug#22429: Force character to be recognized as LTR inside RTL paragraph Filipe Moreira
@ 2016-01-22  8:08 ` Eli Zaretskii
  2016-01-22  8:24   ` Eli Zaretskii
  2016-01-22 11:54   ` Filipe Moreira
  0 siblings, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2016-01-22  8:08 UTC (permalink / raw)
  To: Filipe Moreira; +Cc: 22429

> Date: Thu, 21 Jan 2016 13:14:22 -0800
> From: "Filipe Moreira" <famoreira@gmail.com>
> 
> I’m using Emacs as a LaTeX editor, with the AUCTeX mode. One document I’m
> authoring is written in English with some paragraphs in Hebrew or Greek. 
> 
> The issue I have is with mixing some neutral characters that need to be LTR,
> inside a paragraph which is RTL. An example of this is the slash (i.e. ‘\’)
> character used by LaTeX to signal its commands. Inside a RTL paragraph I
> ideally want to force Emacs to always interpret the slash character, as well as
> the open and close brackets (i.e. {}) as LTR. 
> 
> This is not what happens at the moment. Here I have a visual representation of
> the problem:
> http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex.
> 
> Is it possible to whitelist some characters that should always be interpreted
> as LTR?

The directionality of characters is determined by their bidirectional
class property as defined by the Unicode Character Database.  Emacs
uses those definitions in its implementation of the UBA, the Unicode
Bidirectional Algorithm, when it lays out text for display.
Punctuation characters, such as \, {, and } have "weak
directionality": they take the directionality of the surrounding text,
and if the directionality on either side is different, they default to
the paragraph's base direction, which is RTL in your case.  So that is
what you see.

Emacs being Emacs, you can programmatically change the bidirectional
class of every character, but that change has global effect: it will
affect the directionality of that character everywhere in the Emacs
session.  So this is not recommended.

The correct solution to these problems is to wrap the footnote block
in the LRE..PDF or LRI..PDI control characters, so that the footnote
is rendered independently of the surrounding bidirectional context.
See the example below.  Not sure if LaTeX will DTRT with directional
control characters, but if it doesn't, that's a bug/misfeature in
LaTeX.

\begin{hebrew}
  \pstart

בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

  \pend
\end{hebrew}

Another possibility is to insert newlines between the footnote and the
surrounding text, as shown below.  Not sure if LaTeX will be happy
with that, and I think it's uglier anyway.

\begin{hebrew}
  \pstart

בְּרֵאשִׁ֖ית

\footnoteA{This is a Hebrew related footnote}

בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

  \pend
\end{hebrew}

I don't think there's a bug to fix here, so I'm going to close this
bug report.  Any objections?





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#22429: Force character to be recognized as LTR inside RTL paragraph
  2016-01-22  8:08 ` Eli Zaretskii
@ 2016-01-22  8:24   ` Eli Zaretskii
  2016-01-22  9:31     ` Andy Moreton
  2016-01-22 11:54   ` Filipe Moreira
  1 sibling, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2016-01-22  8:24 UTC (permalink / raw)
  To: famoreira; +Cc: 22429

> Date: Fri, 22 Jan 2016 10:08:06 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 22429@debbugs.gnu.org
> 
> The correct solution to these problems is to wrap the footnote block
> in the LRE..PDF or LRI..PDI control characters, so that the footnote
> is rendered independently of the surrounding bidirectional context.

Actually, LRM should also work, you just need to put it on both sides
of the footnote, like below:

\begin{hebrew}
  \pstart

בְּרֵאשִׁ֖ית‎\footnoteA{This is a Hebrew related footnote}‎ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

  \pend
\end{hebrew}





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#22429: Force character to be recognized as LTR inside RTL paragraph
  2016-01-22  8:24   ` Eli Zaretskii
@ 2016-01-22  9:31     ` Andy Moreton
  2016-01-22 14:03       ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Moreton @ 2016-01-22  9:31 UTC (permalink / raw)
  To: 22429

On Fri 22 Jan 2016, Eli Zaretskii wrote:

>> Date: Fri, 22 Jan 2016 10:08:06 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: 22429@debbugs.gnu.org
>> 
>> The correct solution to these problems is to wrap the footnote block
>> in the LRE..PDF or LRI..PDI control characters, so that the footnote
>> is rendered independently of the surrounding bidirectional context.
>
> Actually, LRM should also work, you just need to put it on both sides
> of the footnote, like below:
>
> \begin{hebrew}
>   \pstart
>
> בְּרֵאשִׁ֖ית‎\footnoteA{This is a Hebrew related footnote}‎ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
>
>   \pend
> \end{hebrew}

While reading this message, I noticed odd behaviour of cursor motion
with <right> and <left> (i.e. right-char and left-char). 

I would expect repeated <right> to move in logical order until the end
of the buffer, but it gets stuck on the newline after "\pstart".
Likewise repeated <left> from the end gets stuck at the newline before
"\pend".

Saving this text in a file "foo.txt" showed the same behaviour (using the
latest emacs-25 branch with "emacs -Q"). Is this expected ?

    AndyM






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#22429: Force character to be recognized as LTR inside RTL paragraph
  2016-01-22  8:08 ` Eli Zaretskii
  2016-01-22  8:24   ` Eli Zaretskii
@ 2016-01-22 11:54   ` Filipe Moreira
  2016-01-22 14:01     ` Eli Zaretskii
  1 sibling, 1 reply; 8+ messages in thread
From: Filipe Moreira @ 2016-01-22 11:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22429

[-- Attachment #1: Type: text/plain, Size: 4194 bytes --]

Hi Eli,

Thank for taking the time to look into this

On Fri, Jan 22, 2016 at 8:08 AM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Thu, 21 Jan 2016 13:14:22 -0800
> > From: "Filipe Moreira" <famoreira@gmail.com>
> >
> > I’m using Emacs as a LaTeX editor, with the AUCTeX mode. One document I’m
> > authoring is written in English with some paragraphs in Hebrew or Greek.
> >
> > The issue I have is with mixing some neutral characters that need to be
> LTR,
> > inside a paragraph which is RTL. An example of this is the slash (i.e.
> ‘\’)
> > character used by LaTeX to signal its commands. Inside a RTL paragraph I
> > ideally want to force Emacs to always interpret the slash character, as
> well as
> > the open and close brackets (i.e. {}) as LTR.
> >
> > This is not what happens at the moment. Here I have a visual
> representation of
> > the problem:
> >
> http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex
> .
> >
> > Is it possible to whitelist some characters that should always be
> interpreted
> > as LTR?
>
> The directionality of characters is determined by their bidirectional
> class property as defined by the Unicode Character Database.  Emacs
> uses those definitions in its implementation of the UBA, the Unicode
> Bidirectional Algorithm, when it lays out text for display.
> Punctuation characters, such as \, {, and } have "weak
> directionality": they take the directionality of the surrounding text,
> and if the directionality on either side is different, they default to
> the paragraph's base direction, which is RTL in your case.  So that is
> what you see.
>
> Emacs being Emacs, you can programmatically change the bidirectional
> class of every character, but that change has global effect: it will
> affect the directionality of that character everywhere in the Emacs
> session.  So this is not recommended.
>

Also this is not recommended, I would be willing to have the bidi class
property of some characters set to left-to-right, like the example of the
slash character. Can you point somewhere regarding this? I saw the
get-char-code-property function but could not find anyway to actually
change the setting.


>
> The correct solution to these problems is to wrap the footnote block
> in the LRE..PDF or LRI..PDI control characters, so that the footnote
> is rendered independently of the surrounding bidirectional context.
> See the example below.  Not sure if LaTeX will DTRT with directional
> control characters, but if it doesn't, that's a bug/misfeature in
> LaTeX.
>
> \begin{hebrew}
>   \pstart
>
> בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א
> אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
>
>   \pend
> \end{hebrew}
>

In this example the direction of the surrounding Hebrew text has been
changed. The word בְּרֵאשִׁ֖ית should come before (i.e. on the right) of
the word בָּרָ֣א. So while the footnote command is correctly shown as LTR
the Hebrew text has been changed. I don't think is is the expected. See the
updated image (
http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex)
that shows TextEdit correct handling of this.


>
> Another possibility is to insert newlines between the footnote and the
> surrounding text, as shown below.  Not sure if LaTeX will be happy
> with that, and I think it's uglier anyway.
>
> \begin{hebrew}
>   \pstart
>
> בְּרֵאשִׁ֖ית
>
> \footnoteA{This is a Hebrew related footnote}
>
> בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
>
>   \pend
> \end{hebrew}
>

Unfortunately for my use case this is not possible.

>
> I don't think there's a bug to fix here, so I'm going to close this
> bug report.  Any objections?
>

Is there any change of having a way to set the unicode bidirectionally of
 a character within each separate mode? Could this be considered a feature?

[-- Attachment #2: Type: text/html, Size: 5837 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#22429: Force character to be recognized as LTR inside RTL paragraph
  2016-01-22 11:54   ` Filipe Moreira
@ 2016-01-22 14:01     ` Eli Zaretskii
  2016-01-22 15:15       ` Filipe Moreira
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2016-01-22 14:01 UTC (permalink / raw)
  To: Filipe Moreira; +Cc: 22429

> From: Filipe Moreira <famoreira@gmail.com>
> Date: Fri, 22 Jan 2016 11:54:45 +0000
> Cc: 22429@debbugs.gnu.org
> 
>     Emacs being Emacs, you can programmatically change the bidirectional
>     class of every character, but that change has global effect: it will
>     affect the directionality of that character everywhere in the Emacs
>     session. So this is not recommended.
> 
> Also this is not recommended, I would be willing to have the bidi class
> property of some characters set to left-to-right, like the example of the slash
> character.

Can you tell why?  There are ways to produce the display you expect
without changing the character properties; I described 3 such ways.
If you change the properties, the text will only display correctly on
your system, any other user who displays your text, either in Emacs or
in other editor that supports bidirectional display, will see the text
in the same jumbled order you wanted to avoid.  So I see very little
sense in such changes.

> Can you point somewhere regarding this? I saw the
> get-char-code-property function but could not find anyway to
> actually change the setting.

You want put-char-code-property.  Again, I very much recommend not to
do that.

>     \begin{hebrew}
>     \pstart
>     
>     בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת
>     הָאָֽרֶץ׃
>     
>     \pend
>     \end{hebrew}
>     
> 
> In this example the direction of the surrounding Hebrew text has been changed.
> The word בְּרֵאשִׁ֖ית should come before (i.e. on the right) of the word בָּרָ֣א. So
> while the footnote command is correctly shown as LTR the Hebrew text has been
> changed. I don't think is is the expected. See the updated image
> (http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex)
> that shows TextEdit correct handling of this.

What version of Emacs do you have?  The above renders correctly for
me, both in Emacs 24.5 and in the development version.  The word
בְּרֵאשִׁ֖ית is shown to the right of the footnote, and all the rest is
shown to the left of it.  Maybe you have an older Emacs which somehow
has a bug?

> Is there any change of having a way to set the unicode bidirectionally of a
> character within each separate mode? Could this be considered a feature?

I think it would be a misfeature, for the reasons explained above.
It's the same as using a private font to display some character in a
different shape -- you are the only one who will enjoy that shape.

However, nothing prevents a mode from using put-char-code-property in
some ingenious ways to do what you want.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#22429: Force character to be recognized as LTR inside RTL paragraph
  2016-01-22  9:31     ` Andy Moreton
@ 2016-01-22 14:03       ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2016-01-22 14:03 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 22429

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Fri, 22 Jan 2016 09:31:39 +0000
> 
> While reading this message, I noticed odd behaviour of cursor motion
> with <right> and <left> (i.e. right-char and left-char). 
> 
> I would expect repeated <right> to move in logical order until the end
> of the buffer, but it gets stuck on the newline after "\pstart".
> Likewise repeated <left> from the end gets stuck at the newline before
> "\pend".
> 
> Saving this text in a file "foo.txt" showed the same behaviour (using the
> latest emacs-25 branch with "emacs -Q"). Is this expected ?

Yes, expected.  The paragraph direction changes when you enter a
paragraph that has a different base direction, and the arrow keys are
sensitive to the paragraph base direction.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#22429: Force character to be recognized as LTR inside RTL paragraph
  2016-01-22 14:01     ` Eli Zaretskii
@ 2016-01-22 15:15       ` Filipe Moreira
  0 siblings, 0 replies; 8+ messages in thread
From: Filipe Moreira @ 2016-01-22 15:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22429

[-- Attachment #1: Type: text/plain, Size: 3424 bytes --]

On Fri, Jan 22, 2016 at 2:01 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Filipe Moreira <famoreira@gmail.com>
> > Date: Fri, 22 Jan 2016 11:54:45 +0000
> > Cc: 22429@debbugs.gnu.org
> >
> >     Emacs being Emacs, you can programmatically change the bidirectional
> >     class of every character, but that change has global effect: it will
> >     affect the directionality of that character everywhere in the Emacs
> >     session. So this is not recommended.
> >
> > Also this is not recommended, I would be willing to have the bidi class
> > property of some characters set to left-to-right, like the example of
> the slash
> > character.
>
> Can you tell why?  There are ways to produce the display you expect
> without changing the character properties; I described 3 such ways.
> If you change the properties, the text will only display correctly on
> your system, any other user who displays your text, either in Emacs or
> in other editor that supports bidirectional display, will see the text
> in the same jumbled order you wanted to avoid.  So I see very little
> sense in such changes.
>
> > Can you point somewhere regarding this? I saw the
> > get-char-code-property function but could not find anyway to
> > actually change the setting.
>
> You want put-char-code-property.  Again, I very much recommend not to
> do that.
>
> >     \begin{hebrew}
> >     \pstart
> >
> >     בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א
> אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת
> >     הָאָֽרֶץ׃
> >
> >     \pend
> >     \end{hebrew}
> >
> >
> > In this example the direction of the surrounding Hebrew text has been
> changed.
> > The word בְּרֵאשִׁ֖ית should come before (i.e. on the right) of the word
> בָּרָ֣א. So
> > while the footnote command is correctly shown as LTR the Hebrew text has
> been
> > changed. I don't think is is the expected. See the updated image
> > (
> http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex
> )
> > that shows TextEdit correct handling of this.
>
> What version of Emacs do you have?  The above renders correctly for
> me, both in Emacs 24.5 and in the development version.  The word
> בְּרֵאשִׁ֖ית is shown to the right of the footnote, and all the rest is
> shown to the left of it.  Maybe you have an older Emacs which somehow
> has a bug?
>

I have just tested wrapping the footnote command within LTM (on both ends)
in a clean Emacs 24.5.1 (started with -Q) and it worked! This wasn't
working on my normal environment so I will need to investigate why that is.


>
> > Is there any change of having a way to set the unicode bidirectionally
> of a
> > character within each separate mode? Could this be considered a feature?
>
> I think it would be a misfeature, for the reasons explained above.
> It's the same as using a private font to display some character in a
> different shape -- you are the only one who will enjoy that shape.
>
> However, nothing prevents a mode from using put-char-code-property in
> some ingenious ways to do what you want.
>

I appreciate your help. This is all new to me and I've already learned a
lot from you and others regarding this. Thank you for making Emacs so
great.

[-- Attachment #2: Type: text/html, Size: 4531 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-01-22 15:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-21 21:14 bug#22429: Force character to be recognized as LTR inside RTL paragraph Filipe Moreira
2016-01-22  8:08 ` Eli Zaretskii
2016-01-22  8:24   ` Eli Zaretskii
2016-01-22  9:31     ` Andy Moreton
2016-01-22 14:03       ` Eli Zaretskii
2016-01-22 11:54   ` Filipe Moreira
2016-01-22 14:01     ` Eli Zaretskii
2016-01-22 15:15       ` Filipe Moreira

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).