unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* improving bidi documents display
@ 2011-02-24 12:32 Eli Osherovich
  2011-02-24 19:54 ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Osherovich @ 2011-02-24 12:32 UTC (permalink / raw)
  To: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 500 bytes --]

Hello,

First of all I would like to thank you all for the job you are doing.

At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
documents, however, the way they are displayed in Emacs is not perfect.
Please look at the file attached as you can see any English text that
appears inside a Hebrew paragraph requires certain decorations around it
(e.g., \L{some English text}) these decorations are displayed in an ugly
fashion. Is there anything that can be done about it?

Thank you.

[-- Attachment #1.2: Type: text/html, Size: 619 bytes --]

[-- Attachment #2: emacs-bidi.png --]
[-- Type: image/png, Size: 56413 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-02-24 12:32 improving bidi documents display Eli Osherovich
@ 2011-02-24 19:54 ` Eli Zaretskii
  2011-02-27 10:01   ` Michael Welsh Duggan
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2011-02-24 19:54 UTC (permalink / raw)
  To: Eli Osherovich; +Cc: emacs-devel

> Date: Thu, 24 Feb 2011 14:32:35 +0200
> From: Eli Osherovich <eli.osherovich@gmail.com>
> 
> At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
> documents, however, the way they are displayed in Emacs is not perfect.
> Please look at the file attached as you can see any English text that
> appears inside a Hebrew paragraph requires certain decorations around it
> (e.g., \L{some English text}) these decorations are displayed in an ugly
> fashion.

Yes, it's a known problem.  The Unicode UAX#9 Bidirectional algorithm
(which is what Emacs implements for bidirectional display) does not
produce good results with LaTeX (and with other kinds of markup).

> Is there anything that can be done about it?

Something _should_ be done, for sure.  But for that, Someone™ should
figure out how this kind of problems could be solved using Emacs
display features.  Any solution will probably involve reordering only
parts of text, but a more detailed design suggestion is needed before
it can be implemented.  People are welcome to try to tackle this,
because I'm still busy with low-level bidi support of plain text.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-02-24 19:54 ` Eli Zaretskii
@ 2011-02-27 10:01   ` Michael Welsh Duggan
  2011-02-27 10:34     ` "Martin J. Dürst"
  2011-02-27 21:15     ` Eli Zaretskii
  0 siblings, 2 replies; 27+ messages in thread
From: Michael Welsh Duggan @ 2011-02-27 10:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Eli Osherovich, emacs-bidi, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Thu, 24 Feb 2011 14:32:35 +0200
>> From: Eli Osherovich <eli.osherovich@gmail.com>
>> 
>> At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
>> documents, however, the way they are displayed in Emacs is not perfect.
>> Please look at the file attached as you can see any English text that
>> appears inside a Hebrew paragraph requires certain decorations around it
>> (e.g., \L{some English text}) these decorations are displayed in an ugly
>> fashion.
>
> Yes, it's a known problem.  The Unicode UAX#9 Bidirectional algorithm
> (which is what Emacs implements for bidirectional display) does not
> produce good results with LaTeX (and with other kinds of markup).
>
>> Is there anything that can be done about it?
>
> Something _should_ be done, for sure.  But for that, Someone™ should
> figure out how this kind of problems could be solved using Emacs
> display features.  Any solution will probably involve reordering only
> parts of text, but a more detailed design suggestion is needed before
> it can be implemented.  People are welcome to try to tackle this,
> because I'm still busy with low-level bidi support of plain text.

I'd like to talk about this problem a little, just to get a little
understanding of the problem space.  Please be warned that although I
have read through UAX#9 a few times, and have been following (as best I
can) Eli's bidi work, I am still very much a novice, and am apt to make
improper assumptions, or misunderstand how things are supposed to work.

In the examples, below, I will use the convention in the UAX#9
document that a capital letter represents an R type character, and a
lower-case letter represents an L type character.  Formatting codes will
be typed as <RLE>, <PDF>, etc.

So, the example being used was:

Memory:  HEBREW \foo{english}
Levels:  11111111222222222221
Display: {foo{english\ WERBEH

Here the paragraph embedding level is 1 (odd, LtR) since the first
character is an R character.  The backslash, braces, and spaces are N
characters.  The N character sequence " \" takes on the current
embedding direction (1) based on rule N2.  The open brace gets level 2
based on rule N1, and the close brace gets level 1 again based on rule
N2.  Note that the close brace appears as its mirrored glyph due to rule
L4).

(Rule N1 states that runs of neutral characters between strong
characters of the same direction take on that direction.  Rule N2 states
that otherwise, they get the embedding direction.)

Here is another example:

Memory:  HEBREW \foo{HEBREW}
Levels:  1111111122211111111
Display: {WERBEH}foo\ WERBEH

In this case, note that both of the braces are mirrored in the display.

One simple, naive way of handling this for the various TeXs is to
consider all backslashes and brace characters as R characters.  This can
be simulated by surrounding each run of these characters by LRE PDF
pairs.  However, unless TeX ignores these characters completely, these
formatting characters would have to be removed before being processed by
TeX.

Another way of handling this would be to redefine the backslash and
brace characters as R characters, for purposes of the display engine.
Currently, I don't know if there is a way to do this in elisp.  bidi.c
seems to use a character table named bidi_type_table to hold this
information.  Currently this table is not exposed at the elisp layer, to
the best of my knowledge.  Maybe it would be possible to modify this
table in elisp, and possibly make it buffer local?

Another idea would be to allow a text property to override the character
type.  This feels like a very elegant, emacs-ish way to do things, but
an uneducated glance at the bidi code makes me feel like it would be
difficult to get information about text properties into this layer.
Another idea would be to use display strings including the LRE and PDF
characters to replace existing backslashes and braces.  However, display
strings do not affect the bidi algorithm at this point.

I'm really starting to ramble at this point, so I think I will send
these musings to see what Eli and others think.

-- 
Michael Welsh Duggan
(md5i@md5i.com)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-02-27 10:01   ` Michael Welsh Duggan
@ 2011-02-27 10:34     ` "Martin J. Dürst"
  2011-02-27 21:19       ` Eli Zaretskii
  2011-02-27 21:15     ` Eli Zaretskii
  1 sibling, 1 reply; 27+ messages in thread
From: "Martin J. Dürst" @ 2011-02-27 10:34 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: Eli Osherovich, oshima, emacs-bidi, emacs-devel

Hello Michael,

I and my students have been working on this problem, in the context of 
XML/HTML, on and off for quite a few years. Please have a look at some 
of the following:

http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html
http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/
http://www.sw.it.aoyama.ac.jp/2008/pub/IUC32-bidi/

For the last year, Shunsuke Oshima, a student of mine, has been working 
on an implementation for Emacs in EmacsLisp. We hope to be able to 
publish the code in the next few weeks. It seems that the problems with 
LaTeX are very much similar to those with XML/HTML, and it should be 
possible to adapt our code to LaTeX.

Our implementation is currently actually two parallel implementations, 
one based on the insertion of additional control characters (it's a pain 
to get rid of them before all save/copy/cut and similar operations), and 
one based on overlays, which is what was originally suggested for this 
purpose by Ken'ichi Handa, but is currently not working because the 
characters in overlays don't participate in the bidi algorithm (Eli 
thinks that would make things too slow).

Regards,   Martin.

On 2011/02/27 19:01, Michael Welsh Duggan wrote:
> Eli Zaretskii<eliz@gnu.org>  writes:
>
>>> Date: Thu, 24 Feb 2011 14:32:35 +0200
>>> From: Eli Osherovich<eli.osherovich@gmail.com>
>>>
>>> At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
>>> documents, however, the way they are displayed in Emacs is not perfect.
>>> Please look at the file attached as you can see any English text that
>>> appears inside a Hebrew paragraph requires certain decorations around it
>>> (e.g., \L{some English text}) these decorations are displayed in an ugly
>>> fashion.
>>
>> Yes, it's a known problem.  The Unicode UAX#9 Bidirectional algorithm
>> (which is what Emacs implements for bidirectional display) does not
>> produce good results with LaTeX (and with other kinds of markup).
>>
>>> Is there anything that can be done about it?
>>
>> Something _should_ be done, for sure.  But for that, Someone™ should
>> figure out how this kind of problems could be solved using Emacs
>> display features.  Any solution will probably involve reordering only
>> parts of text, but a more detailed design suggestion is needed before
>> it can be implemented.  People are welcome to try to tackle this,
>> because I'm still busy with low-level bidi support of plain text.
>
> I'd like to talk about this problem a little, just to get a little
> understanding of the problem space.  Please be warned that although I
> have read through UAX#9 a few times, and have been following (as best I
> can) Eli's bidi work, I am still very much a novice, and am apt to make
> improper assumptions, or misunderstand how things are supposed to work.
>
> In the examples, below, I will use the convention in the UAX#9
> document that a capital letter represents an R type character, and a
> lower-case letter represents an L type character.  Formatting codes will
> be typed as<RLE>,<PDF>, etc.
>
> So, the example being used was:
>
> Memory:  HEBREW \foo{english}
> Levels:  11111111222222222221
> Display: {foo{english\ WERBEH
>
> Here the paragraph embedding level is 1 (odd, LtR) since the first
> character is an R character.  The backslash, braces, and spaces are N
> characters.  The N character sequence " \" takes on the current
> embedding direction (1) based on rule N2.  The open brace gets level 2
> based on rule N1, and the close brace gets level 1 again based on rule
> N2.  Note that the close brace appears as its mirrored glyph due to rule
> L4).
>
> (Rule N1 states that runs of neutral characters between strong
> characters of the same direction take on that direction.  Rule N2 states
> that otherwise, they get the embedding direction.)
>
> Here is another example:
>
> Memory:  HEBREW \foo{HEBREW}
> Levels:  1111111122211111111
> Display: {WERBEH}foo\ WERBEH
>
> In this case, note that both of the braces are mirrored in the display.
>
> One simple, naive way of handling this for the various TeXs is to
> consider all backslashes and brace characters as R characters.  This can
> be simulated by surrounding each run of these characters by LRE PDF
> pairs.  However, unless TeX ignores these characters completely, these
> formatting characters would have to be removed before being processed by
> TeX.
>
> Another way of handling this would be to redefine the backslash and
> brace characters as R characters, for purposes of the display engine.
> Currently, I don't know if there is a way to do this in elisp.  bidi.c
> seems to use a character table named bidi_type_table to hold this
> information.  Currently this table is not exposed at the elisp layer, to
> the best of my knowledge.  Maybe it would be possible to modify this
> table in elisp, and possibly make it buffer local?
>
> Another idea would be to allow a text property to override the character
> type.  This feels like a very elegant, emacs-ish way to do things, but
> an uneducated glance at the bidi code makes me feel like it would be
> difficult to get information about text properties into this layer.
> Another idea would be to use display strings including the LRE and PDF
> characters to replace existing backslashes and braces.  However, display
> strings do not affect the bidi algorithm at this point.
>
> I'm really starting to ramble at this point, so I think I will send
> these musings to see what Eli and others think.
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-02-27 10:01   ` Michael Welsh Duggan
  2011-02-27 10:34     ` "Martin J. Dürst"
@ 2011-02-27 21:15     ` Eli Zaretskii
  2011-02-28  1:10       ` Miles Bader
                         ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Eli Zaretskii @ 2011-02-27 21:15 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: eli.osherovich, emacs-bidi, emacs-devel

> From: Michael Welsh Duggan <md5i@md5i.com>
> Cc: Eli Osherovich <eli.osherovich@gmail.com>,  emacs-devel@gnu.org, emacs-bidi@gnu.org
> Date: Sun, 27 Feb 2011 05:01:25 -0500
> 
> Memory:  HEBREW \foo{english}
> Levels:  11111111222222222221
> Display: {foo{english\ WERBEH

The key to a useful discussion of these matters is to decide up front
what do we want to support and what do we want the text to look like.

In this case, someone who knows about (La)TeX much more than I do
should first describe what TeX features would be useful when
typesetting bidirectional text.

With that knowledge in hand, we could then think whether the example
above is at all practical.  For example, most of the problems go away
if paragraphs have left-to-right direction; in that case the display
will be

   WERBEH \foo{english}

Maybe this is already good enough.

> One simple, naive way of handling this for the various TeXs is to
> consider all backslashes and brace characters as R characters.  This can
> be simulated by surrounding each run of these characters by LRE PDF
> pairs.  However, unless TeX ignores these characters completely, these
> formatting characters would have to be removed before being processed by
> TeX.

Again, someone who knows should tell if the bidi formatting codes need
to be removed before TeX'ing the file.

> Another way of handling this would be to redefine the backslash and
> brace characters as R characters, for purposes of the display engine.
> Currently, I don't know if there is a way to do this in elisp.  bidi.c
> seems to use a character table named bidi_type_table to hold this
> information.  Currently this table is not exposed at the elisp layer, to
> the best of my knowledge.  Maybe it would be possible to modify this
> table in elisp, and possibly make it buffer local?

I didn't expose the table to Lisp on purpose: messing with
bidirectional properties of characters is asking for trouble.  At
best, you will get text that will look different in any other editor;
at worst, you could easily crash Emacs.

> Another idea would be to allow a text property to override the character
> type.

Overlay, not text property.  The latter modifies the buffer, which is
not what you want in this case.

> This feels like a very elegant, emacs-ish way to do things, but
> an uneducated glance at the bidi code makes me feel like it would be
> difficult to get information about text properties into this layer.

You are looking at this from a wrong perspective.  The bidi reordering
engine doesn't need to access text properties or overlays; rather, the
display code should tell the reordering engine what to reorder.  The
reordering code already honors point-min and point-max, so all it
takes to do what you want is narrow the buffer to the portion of text
we want to reorder.  These portions could be marked by an overlay; the
display code already examines overlays as it goes about its job.

> Another idea would be to use display strings including the LRE and PDF
> characters to replace existing backslashes and braces.  However, display
> strings do not affect the bidi algorithm at this point.

I need a few rainy days to implement support for display strings.
However, it would be a mistake to base large portions of buffer
display on display strings, because they make redisplay too expensive.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-02-27 10:34     ` "Martin J. Dürst"
@ 2011-02-27 21:19       ` Eli Zaretskii
  2011-03-02  1:50         ` "Martin J. Dürst"
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2011-02-27 21:19 UTC (permalink / raw)
  To: "Martin J. Dürst"
  Cc: eli.osherovich, md5i, oshima, emacs-bidi, emacs-devel

> Date: Sun, 27 Feb 2011 19:34:22 +0900
> From: "Martin J. Dürst"
>  <duerst@it.aoyama.ac.jp>
> CC: Eli Zaretskii <eliz@gnu.org>, Eli Osherovich <eli.osherovich@gmail.com>,
>         emacs-bidi@gnu.org, emacs-devel@gnu.org, oshima@sw.it.aoyama.ac.jp
> 
> one based on overlays, which is what was originally suggested for this 
> purpose by Ken'ichi Handa, but is currently not working because the 
> characters in overlays don't participate in the bidi algorithm (Eli 
> thinks that would make things too slow).

If by "characters in overlays" you mean display strings, then they are
not yet reordered because I didn't have enough time to make that
happen yet, not because I think they would be slow.  If you mean
something else, you should know that overlays that specify other
properties _are_ supported.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-02-27 21:15     ` Eli Zaretskii
@ 2011-02-28  1:10       ` Miles Bader
  2011-02-28  4:02         ` Eli Zaretskii
  2011-03-02  0:58       ` James Cloos
  2011-03-02  2:09       ` [emacs-bidi] " "Martin J. Dürst"
  2 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2011-02-28  1:10 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: eli.osherovich, Michael Welsh Duggan, emacs-bidi, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:
>> Another idea would be to allow a text property to override the character
>> type.
>
> Overlay, not text property.  The latter modifies the buffer, which is
> not what you want in this case.

Wouldn't this sort of property potentially affect lots of text in the
buffer?  Overlays are not a good choice for such a thing, and the
overlay interface is generally worse for things which really are text
properties, which these seem to be...

Text-properties could be used `with-silent-modifications' or some-such
to avoid buffer modification, as is usually done with fortification.

-Miles

-- 
Logic, n. The art of thinking and reasoning in strict accordance with the
limitations and incapacities of the human misunderstanding.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-02-28  1:10       ` Miles Bader
@ 2011-02-28  4:02         ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2011-02-28  4:02 UTC (permalink / raw)
  To: Miles Bader; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

> From: Miles Bader <miles@gnu.org>
> Cc: Michael Welsh Duggan <md5i@md5i.com>,  eli.osherovich@gmail.com,  emacs-bidi@gnu.org,  emacs-devel@gnu.org
> Date: Mon, 28 Feb 2011 10:10:07 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >> Another idea would be to allow a text property to override the character
> >> type.
> >
> > Overlay, not text property.  The latter modifies the buffer, which is
> > not what you want in this case.
> 
> Wouldn't this sort of property potentially affect lots of text in the
> buffer?

Not necessarily.  For example, in a buffer with TeX stuff, the only
portions of text that would need such an overlay are the TeX
directives (which should always be rendered L2R).

> Overlays are not a good choice for such a thing, and the
> overlay interface is generally worse for things which really are text
> properties, which these seem to be...
> 
> Text-properties could be used `with-silent-modifications' or some-such
> to avoid buffer modification, as is usually done with fortification.

Text properties have advantages, and they also have disadvantages.

Anyway, there's no design yet that was already decided upon for how to
support these features.  When that design decision is made, whoever
makes it will have to consider this issue, of course.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-02-27 21:15     ` Eli Zaretskii
  2011-02-28  1:10       ` Miles Bader
@ 2011-03-02  0:58       ` James Cloos
  2011-03-02 18:59         ` Eli Zaretskii
  2011-03-02  2:09       ` [emacs-bidi] " "Martin J. Dürst"
  2 siblings, 1 reply; 27+ messages in thread
From: James Cloos @ 2011-03-02  0:58 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: eli.osherovich, Michael Welsh Duggan, emacs-bidi, emacs-devel

>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:

EZ> The key to a useful discussion of these matters is to decide up
EZ> front what do we want to support and what do we want the text to
EZ> look like.

EZ> In this case, someone who knows about (La)TeX much more than I do
EZ> should first describe what TeX features would be useful when
EZ> typesetting bidirectional text.

Given that:

  the UAX is specific to plain text

  emacs' modes uses faces to differentiate syntactically different
  text runs

  the latter is essentially the same as (invisible) markup

then one could conclude that bidi runs always should be intra-face and
never inter-face, yes?

How well does >>back to default on face changes<< translate to the level
where emacs' bidi works?

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-02-27 21:19       ` Eli Zaretskii
@ 2011-03-02  1:50         ` "Martin J. Dürst"
  2011-03-02  4:02           ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: "Martin J. Dürst" @ 2011-03-02  1:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, oshima, emacs-bidi, emacs-devel

Hello Eli,

Sorry to be late with my answer.

On 2011/02/28 6:19, Eli Zaretskii wrote:
>> Date: Sun, 27 Feb 2011 19:34:22 +0900
>> From: "Martin J. Dürst"
>>   <duerst@it.aoyama.ac.jp>
>> CC: Eli Zaretskii<eliz@gnu.org>, Eli Osherovich<eli.osherovich@gmail.com>,
>>          emacs-bidi@gnu.org, emacs-devel@gnu.org, oshima@sw.it.aoyama.ac.jp
>>
>> one based on overlays, which is what was originally suggested for this
>> purpose by Ken'ichi Handa, but is currently not working because the
>> characters in overlays don't participate in the bidi algorithm (Eli
>> thinks that would make things too slow).
>
> If by "characters in overlays" you mean display strings,

I guess yes. To be very specific, it's the before-string and 
after-string properties mentioned on 
http://www.gnu.org/s/emacs/manual/html_node/elisp/Overlay-Properties.html.

> then they are
> not yet reordered because I didn't have enough time to make that
> happen yet,

Okay, we'll wait. But please note that we are interested not in the 
reordering of the text inside each property, but in the text inside the 
properties participating in the overall bidi algorithm and reordering of 
the underlying text.

> not because I think they would be slow.

If speed is no longer an issue, that's great.

Regards,   Martin.

> If you mean
> something else, you should know that overlays that specify other
> properties _are_ supported.
>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [emacs-bidi] Re: improving bidi documents display
  2011-02-27 21:15     ` Eli Zaretskii
  2011-02-28  1:10       ` Miles Bader
  2011-03-02  0:58       ` James Cloos
@ 2011-03-02  2:09       ` "Martin J. Dürst"
  2011-03-02  2:39         ` Miles Bader
  2 siblings, 1 reply; 27+ messages in thread
From: "Martin J. Dürst" @ 2011-03-02  2:09 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: eli.osherovich, Michael Welsh Duggan, emacs-bidi, emacs-devel

Hello Eli,

On 2011/02/28 6:15, Eli Zaretskii wrote:
>> From: Michael Welsh Duggan<md5i@md5i.com>
>> Cc: Eli Osherovich<eli.osherovich@gmail.com>,  emacs-devel@gnu.org, emacs-bidi@gnu.org
>> Date: Sun, 27 Feb 2011 05:01:25 -0500
>>
>> Memory:  HEBREW \foo{english}
>> Levels:  11111111222222222221
>> Display: {foo{english\ WERBEH
>
> The key to a useful discussion of these matters is to decide up front
> what do we want to support and what do we want the text to look like.
>
> In this case, someone who knows about (La)TeX much more than I do
> should first describe what TeX features would be useful when
> typesetting bidirectional text.
>
> With that knowledge in hand, we could then think whether the example
> above is at all practical.  For example, most of the problems go away
> if paragraphs have left-to-right direction; in that case the display
> will be
>
>     WERBEH \foo{english}
>
> Maybe this is already good enough.

In some cases, it will be good enough. But if this is a word or two in a 
Hebrew paragraph, it will probably be awkward to read.

>> One simple, naive way of handling this for the various TeXs is to
>> consider all backslashes and brace characters as R characters.  This can
>> be simulated by surrounding each run of these characters by LRE PDF
>> pairs.  However, unless TeX ignores these characters completely, these
>> formatting characters would have to be removed before being processed by
>> TeX.
>
> Again, someone who knows should tell if the bidi formatting codes need
> to be removed before TeX'ing the file.
>
>> Another way of handling this would be to redefine the backslash and
>> brace characters as R characters, for purposes of the display engine.
>> Currently, I don't know if there is a way to do this in elisp.  bidi.c
>> seems to use a character table named bidi_type_table to hold this
>> information.  Currently this table is not exposed at the elisp layer, to
>> the best of my knowledge.  Maybe it would be possible to modify this
>> table in elisp, and possibly make it buffer local?
>
> I didn't expose the table to Lisp on purpose: messing with
> bidirectional properties of characters is asking for trouble.  At
> best, you will get text that will look different in any other editor;
> at worst, you could easily crash Emacs.

Getting text to look better than in another editor would be a good idea. 
Crashing Emacs would be bad, but that would reveal a bug, or not? 
Anyway, if at all, setting bidi properties of characters would have to 
be done on a buffer-by-buffer (or mode-by-mode) level, not once and for 
all for a running instance. Even then, it will only allow to take care 
of very local phenomena (which may not work for multiple-level 
embeddings), and it will only work one way for the whole buffer (which 
may not work if there are paragraphs of varying directionality).

>> Another idea would be to allow a text property to override the character
>> type.
>
> Overlay, not text property.  The latter modifies the buffer, which is
> not what you want in this case.

Just a factual question: What does it mean when you say that properties 
modify the buffer? For example, I'd expect that "modifies the buffer" 
means that these modifications get saved when the buffer gets saved, but 
there are lots of properties that I would have no idea of how they get 
saved when the text is saved as plain text (as is usual for Emacs).


>> This feels like a very elegant, emacs-ish way to do things, but
>> an uneducated glance at the bidi code makes me feel like it would be
>> difficult to get information about text properties into this layer.
>
> You are looking at this from a wrong perspective.  The bidi reordering
> engine doesn't need to access text properties or overlays; rather, the
> display code should tell the reordering engine what to reorder.  The
> reordering code already honors point-min and point-max, so all it
> takes to do what you want is narrow the buffer to the portion of text
> we want to reorder.  These portions could be marked by an overlay; the
> display code already examines overlays as it goes about its job.

I think this would work for simple cases, but for more complex cases 
(e.g., several hierarchical levels of embeddings), it's impossible to 
set three different levels of point-min and point-max.


>> Another idea would be to use display strings including the LRE and PDF
>> characters to replace existing backslashes and braces.

This is similar to what we are doing, although we leave the syntactic 
characters (backslashes and braces for LaTeX) displayed as is, and 
insert display strings with Bidi control characters before and after.

>> However, display
>> strings do not affect the bidi algorithm at this point.
>
> I need a few rainy days to implement support for display strings.

Let's hope for some rain in your area :-).

> However, it would be a mistake to base large portions of buffer
> display on display strings, because they make redisplay too expensive.

What are 'large portions of buffer'? Our current implementation 
restricts its work to the portion of the buffer that is currently 
actually displayed. This means that if you have a 1MB file and a 100x100 
character display, only 1% of the buffer actually has overlays (but it 
may have quite a few).

Regards,    Martin.


-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [emacs-bidi] Re: improving bidi documents display
  2011-03-02  2:09       ` [emacs-bidi] " "Martin J. Dürst"
@ 2011-03-02  2:39         ` Miles Bader
  2011-03-02  4:03           ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2011-03-02  2:39 UTC (permalink / raw)
  To: Martin J. Dürst
  Cc: eli.osherovich, Eli Zaretskii, emacs-devel, Michael Welsh Duggan,
	emacs-bidi

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> writes:
>> Overlay, not text property.  The latter modifies the buffer, which is
>> not what you want in this case.
>
> Just a factual question: What does it mean when you say that
> properties modify the buffer?

Adding/changing text properties marks the buffer as "modified."

One can avoid that effect by using the `with-silent-modifications' macro.

-Miles

-- 
Christian, n. One who follows the teachings of Christ so long as they are not
inconsistent with a life of sin.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-03-02  1:50         ` "Martin J. Dürst"
@ 2011-03-02  4:02           ` Eli Zaretskii
  2011-03-04 10:34             ` [emacs-bidi] " "Martin J. Dürst"
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-02  4:02 UTC (permalink / raw)
  To: Martin J. D�rst
  Cc: eli.osherovich, md5i, oshima, emacs-bidi, emacs-devel

> Date: Wed, 02 Mar 2011 10:50:19 +0900
> From: "Martin J. Dürst"
>  <duerst@it.aoyama.ac.jp>
> CC: md5i@md5i.com, eli.osherovich@gmail.com, emacs-bidi@gnu.org,
>         emacs-devel@gnu.org, oshima@sw.it.aoyama.ac.jp
> 
> Okay, we'll wait. But please note that we are interested not in the 
> reordering of the text inside each property, but in the text inside the 
> properties participating in the overall bidi algorithm and reordering of 
> the underlying text.

For that, the display strings don't need to have text in them.  You
can just have an overlay with a special property, no?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-03-02  2:39         ` Miles Bader
@ 2011-03-02  4:03           ` Eli Zaretskii
  2011-03-02  7:06             ` Miles Bader
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-02  4:03 UTC (permalink / raw)
  To: Miles Bader; +Cc: eli.osherovich, emacs-devel, md5i, emacs-bidi

> From: Miles Bader <miles@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  eli.osherovich@gmail.com,  Michael Welsh Duggan <md5i@md5i.com>,  emacs-bidi@gnu.org,  emacs-devel@gnu.org
> Date: Wed, 02 Mar 2011 11:39:22 +0900
> 
> "Martin J. Dürst" <duerst@it.aoyama.ac.jp> writes:
> >> Overlay, not text property.  The latter modifies the buffer, which is
> >> not what you want in this case.
> >
> > Just a factual question: What does it mean when you say that
> > properties modify the buffer?
> 
> Adding/changing text properties marks the buffer as "modified."
> 
> One can avoid that effect by using the `with-silent-modifications' macro.

Yes, but then copying the text copies the properties with it, which is
not what you want in this case, in general.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-02  4:03           ` Eli Zaretskii
@ 2011-03-02  7:06             ` Miles Bader
  2011-03-02 18:39               ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2011-03-02  7:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:
>> Adding/changing text properties marks the buffer as "modified."
>> 
>> One can avoid that effect by using the `with-silent-modifications' macro.
>
> Yes, but then copying the text copies the properties with it, which is
> not what you want in this case, in general.

Sure, but that's a general issue in Emacs, so there are already
mechanisms in place to help deal with it
(e.g. yank-excluded-properties).

-Miles

-- 
Back, n. That part of your friend which it is your privilege to contemplate in
your adversity.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-02  7:06             ` Miles Bader
@ 2011-03-02 18:39               ` Eli Zaretskii
  2011-03-03  1:32                 ` Miles Bader
                                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-02 18:39 UTC (permalink / raw)
  To: Miles Bader; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

> From: Miles Bader <miles@gnu.org>
> Cc: eli.osherovich@gmail.com, emacs-devel@gnu.org, md5i@md5i.com,
>  emacs-bidi@gnu.org
> Date: Wed, 02 Mar 2011 16:06:10 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >> Adding/changing text properties marks the buffer as "modified."
> >> 
> >> One can avoid that effect by using the `with-silent-modifications' macro.
> >
> > Yes, but then copying the text copies the properties with it, which is
> > not what you want in this case, in general.
> 
> Sure, but that's a general issue in Emacs, so there are already
> mechanisms in place to help deal with it
> (e.g. yank-excluded-properties).

This one's different, believe me: no other text property changes the
_order_ of characters on display in creative ways.  It could easily
render the text illegible, under just the right circumstances.  Other
text properties are either non-intrusive, or are almost immediately
fixed by JIT Lock, or are simply rare enough to not get in our way.
This one, if used to implement partial reordering of buffer text, will
be ubiquitous in any buffer with program sources that include bidi
comments and strings, i.e. we will see it _a_lot_.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-03-02  0:58       ` James Cloos
@ 2011-03-02 18:59         ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-02 18:59 UTC (permalink / raw)
  To: James Cloos; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

> From: James Cloos <cloos@jhcloos.com>
> Cc: Michael Welsh Duggan <md5i@md5i.com>,  eli.osherovich@gmail.com,  emacs-bidi@gnu.org,  emacs-devel@gnu.org
> Copyright: Copyright 2011 James Cloos
> OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B  63E7 997A 9F17 ED7D AEA6
> Date: Tue, 01 Mar 2011 19:58:44 -0500
> 
>   the UAX is specific to plain text

Not entirely true.  It's just that in most structured text, the
reordering should affect only certain portions of the text.  But where
we do reorder, UAX#9 should still be in effect.

>   emacs' modes uses faces to differentiate syntactically different
>   text runs

Not true at all.  You are probably thinking about font lock, but faces
are also used for other purposes, like region highlight, mouse
highlight, hl-line mode, etc.  These other uses do cross bidi runs
quite easily and frequently.

>   the latter is essentially the same as (invisible) markup

Who is?

> then one could conclude that bidi runs always should be intra-face and
> never inter-face, yes?

No, see above.

> How well does >>back to default on face changes<< translate to the level
> where emacs' bidi works?

Not at all.  The reordering engine works on lower level than the
face-sensitive code.  In a nutshell, reordering is implemented as an
abstract "get to next character" operation that is what the display
iterator does when it walks the visible portion of the buffer and
prepares it for display.  Faces and the rest are considered only
_after_ getting to the next character.  By the time the iterator
notices that the next character has a different face, the bidi
reordering has effectively already happened.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-02 18:39               ` Eli Zaretskii
@ 2011-03-03  1:32                 ` Miles Bader
  2011-03-03  4:07                   ` Eli Zaretskii
  2011-03-04  3:58                 ` Stefan Monnier
  2011-03-04  4:25                 ` Miles Bader
  2 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2011-03-03  1:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:
>> Sure, but that's a general issue in Emacs, so there are already
>> mechanisms in place to help deal with it
>> (e.g. yank-excluded-properties).
>
> This one's different, believe me: no other text property changes the
> _order_ of characters on display in creative ways.  It could easily
> render the text illegible, under just the right circumstances.

But isn't the "changed order" natural for the characters it's attached
to?

-miles

-- 
XML is like violence.  If it doesn't solve your problem, you're not
using enough of it.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-03  1:32                 ` Miles Bader
@ 2011-03-03  4:07                   ` Eli Zaretskii
  2011-03-03  6:11                     ` "Martin J. Dürst"
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-03  4:07 UTC (permalink / raw)
  To: Miles Bader; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

> From: Miles Bader <miles@gnu.org>
> Cc: eli.osherovich@gmail.com,  md5i@md5i.com,  emacs-bidi@gnu.org,  emacs-devel@gnu.org
> Date: Thu, 03 Mar 2011 10:32:49 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >> Sure, but that's a general issue in Emacs, so there are already
> >> mechanisms in place to help deal with it
> >> (e.g. yank-excluded-properties).
> >
> > This one's different, believe me: no other text property changes the
> > _order_ of characters on display in creative ways.  It could easily
> > render the text illegible, under just the right circumstances.
> 
> But isn't the "changed order" natural for the characters it's attached
> to?

Only in the context of the kind of text (e.g., TeX) it was copied
from.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-03-03  4:07                   ` Eli Zaretskii
@ 2011-03-03  6:11                     ` "Martin J. Dürst"
  2011-03-03 10:40                       ` [emacs-bidi] " Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: "Martin J. Dürst" @ 2011-03-03  6:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, emacs-devel, emacs-bidi, Miles Bader

On 2011/03/03 13:07, Eli Zaretskii wrote:
>> From: Miles Bader<miles@gnu.org>
>> Cc: eli.osherovich@gmail.com,  md5i@md5i.com,  emacs-bidi@gnu.org,  emacs-devel@gnu.org
>> Date: Thu, 03 Mar 2011 10:32:49 +0900
>>
>> Eli Zaretskii<eliz@gnu.org>  writes:
>>>> Sure, but that's a general issue in Emacs, so there are already
>>>> mechanisms in place to help deal with it
>>>> (e.g. yank-excluded-properties).
>>>
>>> This one's different, believe me: no other text property changes the
>>> _order_ of characters on display in creative ways.  It could easily
>>> render the text illegible, under just the right circumstances.
>>
>> But isn't the "changed order" natural for the characters it's attached
>> to?
>
> Only in the context of the kind of text (e.g., TeX) it was copied
> from.

The copying may work if the feature is switched on for all buffers. The 
reason for this is that things have to be reevaluated/fixed anyway every 
time some buffer changes (e.g. insertion or deletion of a character) 
happens. So if the text is copied to a buffer that doesn't do any 
explicit reordering on top of the bidi algorithm, the special 
overlays/properties/whatever will just be purged out.

On the other hand, it may also make sense to copy a piece of TeX *with* 
the explicit TeX-specific reordering, because otherwise it may look like 
garbage. Please remember that the goal of all these activities is to 
make TeX or HTML/XML *legible*, because it's often illegible just with 
the basic bidi algorithm.


Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [emacs-bidi] Re: improving bidi documents display
  2011-03-03  6:11                     ` "Martin J. Dürst"
@ 2011-03-03 10:40                       ` Eli Zaretskii
  2011-03-04 10:34                         ` "Martin J. Dürst"
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-03 10:40 UTC (permalink / raw)
  To: Martin J. Dürst; +Cc: eli.osherovich, md5i, emacs-devel, emacs-bidi, miles

> Date: Thu, 03 Mar 2011 15:11:06 +0900
> From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
> CC: Miles Bader <miles@gnu.org>, eli.osherovich@gmail.com, md5i@md5i.com,
>         emacs-bidi@gnu.org, emacs-devel@gnu.org
> 
> >> But isn't the "changed order" natural for the characters it's attached
> >> to?
> >
> > Only in the context of the kind of text (e.g., TeX) it was copied
> > from.
> 
> The copying may work if the feature is switched on for all buffers.

What feature is that?

> The 
> reason for this is that things have to be reevaluated/fixed anyway every 
> time some buffer changes (e.g. insertion or deletion of a character) 
> happens. So if the text is copied to a buffer that doesn't do any 
> explicit reordering on top of the bidi algorithm, the special 
> overlays/properties/whatever will just be purged out.

This is probably a misunderstanding, prompted by the fact that we are
discussing an imaginary feature.

I was talking about a special text property which tells the display
engine to reorder the characters covered by the property.  By default,
text properties are yanked together with the text.  And since yanking
text (as any other insertion) triggers redisplay, the display engine
_will_ notice this special property and _will_ reorder the text it
covers.

As for a buffer that "doesn't do any explicit reordering", I'm not
sure what you mean.  The current plans are to turn on the bidi
reordering in all buffers by default.  Whether this constitutes
"explicit reordering", I'm not sure, so I don't understand what you
mean, and I also don't see how is that relevant to the effect of
copying the text properties.

If you suggest that this specific property should be stripped off by
yanking, i.e. to add them to yank-excluded-properties, then it's
possible, but not necessarily DWIM-ish, because yanking in the same
buffer would need to leave these properties intact.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-02 18:39               ` Eli Zaretskii
  2011-03-03  1:32                 ` Miles Bader
@ 2011-03-04  3:58                 ` Stefan Monnier
  2011-03-04  8:04                   ` Eli Zaretskii
  2011-03-04  4:25                 ` Miles Bader
  2 siblings, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2011-03-04  3:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, emacs-devel, emacs-bidi, Miles Bader

> This one's different, believe me: no other text property changes the
> _order_ of characters on display in creative ways.  It could easily
> render the text illegible, under just the right circumstances.  Other
> text properties are either non-intrusive, or are almost immediately
> fixed by JIT Lock, or are simply rare enough to not get in our way.

But isn't it the case that the properties we'd add in this case would
also be added via jit-lock?  So they'd very naturally belong in
yank-excluded-properties.


        Stefan




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-02 18:39               ` Eli Zaretskii
  2011-03-03  1:32                 ` Miles Bader
  2011-03-04  3:58                 ` Stefan Monnier
@ 2011-03-04  4:25                 ` Miles Bader
  2011-03-04  9:52                   ` Eli Zaretskii
  2 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2011-03-04  4:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:
> This one's different, believe me: no other text property changes the
> _order_ of characters on display in creative ways.  It could easily
> render the text illegible, under just the right circumstances.  Other
> text properties are either non-intrusive, or are almost immediately
> fixed by JIT Lock, or are simply rare enough to not get in our way.
> This one, if used to implement partial reordering of buffer text, will
> be ubiquitous in any buffer with program sources that include bidi
> comments and strings, i.e. we will see it _a_lot_.

I'm a bit confused by how an ordering that's "good" in (e.g.) a TeX
document, would be "illegible" in another context...

Can you give an example of such a case?

Thanks,

-Miles

-- 
`There are more things in heaven and earth, Horatio,
 Than are dreamt of in your philosophy.'



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-04  3:58                 ` Stefan Monnier
@ 2011-03-04  8:04                   ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-04  8:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eli.osherovich, md5i, emacs-devel, emacs-bidi, miles

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Miles Bader <miles@gnu.org>,  eli.osherovich@gmail.com,  md5i@md5i.com,  emacs-bidi@gnu.org,  emacs-devel@gnu.org
> Date: Thu, 03 Mar 2011 22:58:32 -0500
> 
> > This one's different, believe me: no other text property changes the
> > _order_ of characters on display in creative ways.  It could easily
> > render the text illegible, under just the right circumstances.  Other
> > text properties are either non-intrusive, or are almost immediately
> > fixed by JIT Lock, or are simply rare enough to not get in our way.
> 
> But isn't it the case that the properties we'd add in this case would
> also be added via jit-lock?

Why? what's jit-lock got to do with this?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: improving bidi documents display
  2011-03-04  4:25                 ` Miles Bader
@ 2011-03-04  9:52                   ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2011-03-04  9:52 UTC (permalink / raw)
  To: Miles Bader; +Cc: eli.osherovich, md5i, emacs-bidi, emacs-devel

> From: Miles Bader <miles@gnu.org>
> Cc: eli.osherovich@gmail.com, md5i@md5i.com, emacs-bidi@gnu.org,
>  emacs-devel@gnu.org
> Date: Fri, 04 Mar 2011 13:25:17 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> > This one's different, believe me: no other text property changes the
> > _order_ of characters on display in creative ways.  It could easily
> > render the text illegible, under just the right circumstances.  Other
> > text properties are either non-intrusive, or are almost immediately
> > fixed by JIT Lock, or are simply rare enough to not get in our way.
> > This one, if used to implement partial reordering of buffer text, will
> > be ubiquitous in any buffer with program sources that include bidi
> > comments and strings, i.e. we will see it _a_lot_.
> 
> I'm a bit confused by how an ordering that's "good" in (e.g.) a TeX
> document, would be "illegible" in another context...
> 
> Can you give an example of such a case?

Not really, since we are talking about an idea, not about a feature
that has a detailed design which we could reason about.  Any example I
give can be countermanded by saying something like "but if we
implement the idea such and such, then this particular problem won't
happen".

I can, however, show you why I'm afraid of leaving the "reordering"
properties on the text.  The reason is that the visual results of
reordering are highly context-dependent.  Changing a single character
very far away from the locus of reordering can completely change the
visual appearance of an unrelated portion of text.

You can try the below in Emacs 24, after setting
bidi-display-reordering non-nil (but be sure to do that in a buffer
other than *scratch* or any other buffer whose major mode is
programming-language oriented, because these are forced to have a
left-to-right paragraph direction).

Here's an example.  Type this text, left to right:

 abcdef \foo{ABCDEF} xyz

(As usual, lower-case letters are Latin, upper-case are in some R2L
script, such as Arabic or Hebrew.)

The above will be displayed like this:

 abcdef \foo{FEDCBA} xyz

So far so good, right?  And looks "good" in a TeX document, right?

Now go to the beginning of the line (C-a) and type a single R2L
letter (W).  The resulting display will change dramatically:

                      xyz {FEDCBA}abcdef \fooW

The reason, in this case, for such a significant change is that
directionality of the first character of a paragraph determines the
"base direction" of that paragraph.  (Actually, only "strong
directional characters" count, those which have "L" or "R" bidi
category in the Unicode data base.)

There are other examples, but the underlying reason is the same: bidi
reordering is highly dependent on the surrounding context.  Copy a
region of text to another context, and you might have undesirable
results.  If these results are due to characters that are part of the
buffer text (as in the above example), one can cope with that by
editing after the yank.  But if the results are due to some invisible
factor like text properties, this is tougher on users.

That is why I prefer that these invisible factors not be copied with
the text.  If needed, they will be re-applied according to the rules
of the target buffer.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [emacs-bidi] Re: improving bidi documents display
  2011-03-02  4:02           ` Eli Zaretskii
@ 2011-03-04 10:34             ` "Martin J. Dürst"
  0 siblings, 0 replies; 27+ messages in thread
From: "Martin J. Dürst" @ 2011-03-04 10:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, oshima, emacs-bidi, emacs-devel

Hello Eli,

On 2011/03/02 13:02, Eli Zaretskii wrote:
>> Date: Wed, 02 Mar 2011 10:50:19 +0900
>> From: "Martin J. Dürst"
>>   <duerst@it.aoyama.ac.jp>
>> CC: md5i@md5i.com, eli.osherovich@gmail.com, emacs-bidi@gnu.org,
>>          emacs-devel@gnu.org, oshima@sw.it.aoyama.ac.jp
>>
>> Okay, we'll wait. But please note that we are interested not in the
>> reordering of the text inside each property, but in the text inside the
>> properties participating in the overall bidi algorithm and reordering of
>> the underlying text.
>
> For that, the display strings don't need to have text in them.  You
> can just have an overlay with a special property, no?

That would be possible. It would essentially mean simulating Bidi 
control characters/structure with overlays. As an example, to simulate 
embeddings and override (I'll call these bidi ranges from now on), we 
would need an overlay that covers the bidi range, with a new property 
(let's name it bidi-range just for the moment) that takes four different 
values corresponding to LRE, RLE, LRO, and RLO. Because the overlay 
indicates the extent of the bidi range, there is no need for something 
like the PDF character.

As an equivalent for LRM and RLM, the most straightforward way to 
implement them would be a property (maybe the same as above) that can 
take the values LRM-before, LRM-after, RLM-before, and RLM-after.

The above is a very direct simulation; I don't know how much easier it 
would be to implement than just Bidi processing the before and after 
text in an overlay.

There are some issues with the proposal above. First, if two overlays 
indicating bidi ranges overlap without one of them completely being 
contained in the other, then it has to be defined exactly what that 
means. (It's not possible to have such a case when using bidi control 
characters.) Second, there may be a case where one wants a Bidi mark 
(LRM or RLM) both before and after a character (or range of characters). 
Actually, that's quite a frequent case. That could be addressed by 
having bidi-mark-before and bidi-mark-after properties (or two separate 
overlays).

It may also be possible to have some more elaborate bidi properties on 
overlays. One example would be to have a property to set the bidi class 
of a character. Strong LTR and strong RTL in particular would be quite 
helpful. Currently, we simulate them by having LRMs or RLMs before and 
after a character. Another idea is to have something equivalent to bidi 
insulation as currently being discussed for HTML5/CSS3. See
http://www.w3.org/International/docs/html-bidi-requirements/#bidi-isolation 
for a problem description, 
http://dev.w3.org/html5/spec/Overview.html#the-bdi-element for the HTML5 
solution, and
http://dev.w3.org/csswg/css3-writing-modes/#unicode-bidi for the 
corresponding piece of technology in CSS. Bidi isolation can be 
simulated with the existing bidi control characters, but it's necessary 
to look at the bidi classes of surrounding characters.

Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: improving bidi documents display
  2011-03-03 10:40                       ` [emacs-bidi] " Eli Zaretskii
@ 2011-03-04 10:34                         ` "Martin J. Dürst"
  0 siblings, 0 replies; 27+ messages in thread
From: "Martin J. Dürst" @ 2011-03-04 10:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eli.osherovich, md5i, emacs-devel, emacs-bidi, miles

Hello Eli,

On 2011/03/03 19:40, Eli Zaretskii wrote:
>> Date: Thu, 03 Mar 2011 15:11:06 +0900
>> From: "Martin J. Dürst"<duerst@it.aoyama.ac.jp>
>> CC: Miles Bader<miles@gnu.org>, eli.osherovich@gmail.com, md5i@md5i.com,
>>          emacs-bidi@gnu.org, emacs-devel@gnu.org
>>
>>>> But isn't the "changed order" natural for the characters it's attached
>>>> to?
>>>
>>> Only in the context of the kind of text (e.g., TeX) it was copied
>>> from.
>>
>> The copying may work if the feature is switched on for all buffers.
>
> What feature is that?

Sorry to be unclear. The feature that makes bidi-containing LaTeX or 
XML/HTML readable. It is a feature similar to syntax coloring.

>> The
>> reason for this is that things have to be reevaluated/fixed anyway every
>> time some buffer changes (e.g. insertion or deletion of a character)
>> happens. So if the text is copied to a buffer that doesn't do any
>> explicit reordering on top of the bidi algorithm, the special
>> overlays/properties/whatever will just be purged out.
>
> This is probably a misunderstanding, prompted by the fact that we are
> discussing an imaginary feature.

Well, if you paste (yank in emacs terminology) text into a buffer, 
syntax coloring gets updated as necessary. The same would happen (or 
currently happens with our implementation) for the code that tweaks the 
bidi display.

> I was talking about a special text property which tells the display
> engine to reorder the characters covered by the property.  By default,
> text properties are yanked together with the text.  And since yanking
> text (as any other insertion) triggers redisplay, the display engine
> _will_ notice this special property and _will_ reorder the text it
> covers.

Yes, but before that, there is some hook triggered that goes and tweaks 
these properties to fix the context where the text was yanked.

Let's take an example. Let's say I copy some text from a buffer with XML 
mode to a buffer with LaTeX mode. Then there will be some bidi tweaking 
properties in the text that's copied, and they will be copied with the 
text, but when they get yanked, the hook code will throw them out and 
put in others to tweak the text according to what's thought best for LaTeX.

> As for a buffer that "doesn't do any explicit reordering", I'm not
> sure what you mean.

I mean e.g. a very simple plain text buffer that has no bidi fixing hook 
activated (not even a simple one for cleaning out properties yanked in 
from elsewhere).


> The current plans are to turn on the bidi
> reordering in all buffers by default.  Whether this constitutes
> "explicit reordering", I'm not sure, so I don't understand what you
> mean, and I also don't see how is that relevant to the effect of
> copying the text properties.

Turning the UAX #15 bidi reordering on in all buffers is just great. But 
it's not what I mean. It's a layer on top of it (and we currently are 
discussing how and what exactly to put it on top).

To go back to the syntax coloring example (which may not be perfect, but 
I hope it gets the point across), the equivalent would be:
- Color display is switched on in all buffers
   <=> bidi reordering is switched on in all buffers
- Syntax coloring is switched  only in some buffers
   (e.g. not in plain text buffers)
   <=> Bidi fixup logic is switched on only in some buffers
- Copying text from a buffer with syntax coloring into a buffer without
   syntax coloring leaves leftovers of syntax coloring in the target
   buffer (*)
   <=> Copying text with bidi fixup properties from a buffer with bidi
   fixup enabled (e.g. LaTeX mode buffer or XML/HTML mode buffer) into
   a buffer without bidi fixup enabled will leave some leftovers of bidi
   fixup in the target buffer (#)

What (I'm thinking) you wrote in an earlier mail is that if we use 
properties, we get the problem indicated at (#). What I tried to explain 
in my previous mail was that you won't as long as the conditions are 
similar as those for syntax coloring. [I'm not at all sure syntax 
coloring uses properties, because I don't know the internals of emacs 
well enough, but I hope that it can provide an example and serve as a 
parallel.]

> If you suggest that this specific property should be stripped off by
> yanking, i.e. to add them to yank-excluded-properties, then it's
> possible, but not necessarily DWIM-ish, because yanking in the same
> buffer would need to leave these properties intact.

No, actually it wouldn't (necessarily) need to leave them intact, 
because in a new context, it's easily possible that they need to be 
fixed. Again here the parallel to syntax coloring should work.

Regards,   Martin.


-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-03-04 10:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-24 12:32 improving bidi documents display Eli Osherovich
2011-02-24 19:54 ` Eli Zaretskii
2011-02-27 10:01   ` Michael Welsh Duggan
2011-02-27 10:34     ` "Martin J. Dürst"
2011-02-27 21:19       ` Eli Zaretskii
2011-03-02  1:50         ` "Martin J. Dürst"
2011-03-02  4:02           ` Eli Zaretskii
2011-03-04 10:34             ` [emacs-bidi] " "Martin J. Dürst"
2011-02-27 21:15     ` Eli Zaretskii
2011-02-28  1:10       ` Miles Bader
2011-02-28  4:02         ` Eli Zaretskii
2011-03-02  0:58       ` James Cloos
2011-03-02 18:59         ` Eli Zaretskii
2011-03-02  2:09       ` [emacs-bidi] " "Martin J. Dürst"
2011-03-02  2:39         ` Miles Bader
2011-03-02  4:03           ` Eli Zaretskii
2011-03-02  7:06             ` Miles Bader
2011-03-02 18:39               ` Eli Zaretskii
2011-03-03  1:32                 ` Miles Bader
2011-03-03  4:07                   ` Eli Zaretskii
2011-03-03  6:11                     ` "Martin J. Dürst"
2011-03-03 10:40                       ` [emacs-bidi] " Eli Zaretskii
2011-03-04 10:34                         ` "Martin J. Dürst"
2011-03-04  3:58                 ` Stefan Monnier
2011-03-04  8:04                   ` Eli Zaretskii
2011-03-04  4:25                 ` Miles Bader
2011-03-04  9:52                   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).