unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Michael Welsh Duggan <md5i@md5i.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Eli Osherovich <eli.osherovich@gmail.com>,
	emacs-bidi@gnu.org, emacs-devel@gnu.org
Subject: Re: improving bidi documents display
Date: Sun, 27 Feb 2011 05:01:25 -0500	[thread overview]
Message-ID: <87wrklpzii.fsf@maru.md5i.com> (raw)
In-Reply-To: <837hcpryxr.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 24 Feb 2011 21:54:08 +0200")

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Thu, 24 Feb 2011 14:32:35 +0200
>> From: Eli Osherovich <eli.osherovich@gmail.com>
>> 
>> At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
>> documents, however, the way they are displayed in Emacs is not perfect.
>> Please look at the file attached as you can see any English text that
>> appears inside a Hebrew paragraph requires certain decorations around it
>> (e.g., \L{some English text}) these decorations are displayed in an ugly
>> fashion.
>
> Yes, it's a known problem.  The Unicode UAX#9 Bidirectional algorithm
> (which is what Emacs implements for bidirectional display) does not
> produce good results with LaTeX (and with other kinds of markup).
>
>> Is there anything that can be done about it?
>
> Something _should_ be done, for sure.  But for that, Someone™ should
> figure out how this kind of problems could be solved using Emacs
> display features.  Any solution will probably involve reordering only
> parts of text, but a more detailed design suggestion is needed before
> it can be implemented.  People are welcome to try to tackle this,
> because I'm still busy with low-level bidi support of plain text.

I'd like to talk about this problem a little, just to get a little
understanding of the problem space.  Please be warned that although I
have read through UAX#9 a few times, and have been following (as best I
can) Eli's bidi work, I am still very much a novice, and am apt to make
improper assumptions, or misunderstand how things are supposed to work.

In the examples, below, I will use the convention in the UAX#9
document that a capital letter represents an R type character, and a
lower-case letter represents an L type character.  Formatting codes will
be typed as <RLE>, <PDF>, etc.

So, the example being used was:

Memory:  HEBREW \foo{english}
Levels:  11111111222222222221
Display: {foo{english\ WERBEH

Here the paragraph embedding level is 1 (odd, LtR) since the first
character is an R character.  The backslash, braces, and spaces are N
characters.  The N character sequence " \" takes on the current
embedding direction (1) based on rule N2.  The open brace gets level 2
based on rule N1, and the close brace gets level 1 again based on rule
N2.  Note that the close brace appears as its mirrored glyph due to rule
L4).

(Rule N1 states that runs of neutral characters between strong
characters of the same direction take on that direction.  Rule N2 states
that otherwise, they get the embedding direction.)

Here is another example:

Memory:  HEBREW \foo{HEBREW}
Levels:  1111111122211111111
Display: {WERBEH}foo\ WERBEH

In this case, note that both of the braces are mirrored in the display.

One simple, naive way of handling this for the various TeXs is to
consider all backslashes and brace characters as R characters.  This can
be simulated by surrounding each run of these characters by LRE PDF
pairs.  However, unless TeX ignores these characters completely, these
formatting characters would have to be removed before being processed by
TeX.

Another way of handling this would be to redefine the backslash and
brace characters as R characters, for purposes of the display engine.
Currently, I don't know if there is a way to do this in elisp.  bidi.c
seems to use a character table named bidi_type_table to hold this
information.  Currently this table is not exposed at the elisp layer, to
the best of my knowledge.  Maybe it would be possible to modify this
table in elisp, and possibly make it buffer local?

Another idea would be to allow a text property to override the character
type.  This feels like a very elegant, emacs-ish way to do things, but
an uneducated glance at the bidi code makes me feel like it would be
difficult to get information about text properties into this layer.
Another idea would be to use display strings including the LRE and PDF
characters to replace existing backslashes and braces.  However, display
strings do not affect the bidi algorithm at this point.

I'm really starting to ramble at this point, so I think I will send
these musings to see what Eli and others think.

-- 
Michael Welsh Duggan
(md5i@md5i.com)



  reply	other threads:[~2011-02-27 10:01 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-24 12:32 improving bidi documents display Eli Osherovich
2011-02-24 19:54 ` Eli Zaretskii
2011-02-27 10:01   ` Michael Welsh Duggan [this message]
2011-02-27 10:34     ` "Martin J. Dürst"
2011-02-27 21:19       ` Eli Zaretskii
2011-03-02  1:50         ` "Martin J. Dürst"
2011-03-02  4:02           ` Eli Zaretskii
2011-03-04 10:34             ` [emacs-bidi] " "Martin J. Dürst"
2011-02-27 21:15     ` Eli Zaretskii
2011-02-28  1:10       ` Miles Bader
2011-02-28  4:02         ` Eli Zaretskii
2011-03-02  0:58       ` James Cloos
2011-03-02 18:59         ` Eli Zaretskii
2011-03-02  2:09       ` [emacs-bidi] " "Martin J. Dürst"
2011-03-02  2:39         ` Miles Bader
2011-03-02  4:03           ` Eli Zaretskii
2011-03-02  7:06             ` Miles Bader
2011-03-02 18:39               ` Eli Zaretskii
2011-03-03  1:32                 ` Miles Bader
2011-03-03  4:07                   ` Eli Zaretskii
2011-03-03  6:11                     ` "Martin J. Dürst"
2011-03-03 10:40                       ` [emacs-bidi] " Eli Zaretskii
2011-03-04 10:34                         ` "Martin J. Dürst"
2011-03-04  3:58                 ` Stefan Monnier
2011-03-04  8:04                   ` Eli Zaretskii
2011-03-04  4:25                 ` Miles Bader
2011-03-04  9:52                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wrklpzii.fsf@maru.md5i.com \
    --to=md5i@md5i.com \
    --cc=eli.osherovich@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-bidi@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).