unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
To: Michael Welsh Duggan <md5i@md5i.com>
Cc: Eli Osherovich <eli.osherovich@gmail.com>,
	oshima@sw.it.aoyama.ac.jp, emacs-bidi@gnu.org,
	emacs-devel@gnu.org
Subject: Re: Re: improving bidi documents display
Date: Sun, 27 Feb 2011 19:34:22 +0900	[thread overview]
Message-ID: <4D6A28AE.8010607@it.aoyama.ac.jp> (raw)
In-Reply-To: <87wrklpzii.fsf@maru.md5i.com>

Hello Michael,

I and my students have been working on this problem, in the context of 
XML/HTML, on and off for quite a few years. Please have a look at some 
of the following:

http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html
http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/
http://www.sw.it.aoyama.ac.jp/2008/pub/IUC32-bidi/

For the last year, Shunsuke Oshima, a student of mine, has been working 
on an implementation for Emacs in EmacsLisp. We hope to be able to 
publish the code in the next few weeks. It seems that the problems with 
LaTeX are very much similar to those with XML/HTML, and it should be 
possible to adapt our code to LaTeX.

Our implementation is currently actually two parallel implementations, 
one based on the insertion of additional control characters (it's a pain 
to get rid of them before all save/copy/cut and similar operations), and 
one based on overlays, which is what was originally suggested for this 
purpose by Ken'ichi Handa, but is currently not working because the 
characters in overlays don't participate in the bidi algorithm (Eli 
thinks that would make things too slow).

Regards,   Martin.

On 2011/02/27 19:01, Michael Welsh Duggan wrote:
> Eli Zaretskii<eliz@gnu.org>  writes:
>
>>> Date: Thu, 24 Feb 2011 14:32:35 +0200
>>> From: Eli Osherovich<eli.osherovich@gmail.com>
>>>
>>> At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
>>> documents, however, the way they are displayed in Emacs is not perfect.
>>> Please look at the file attached as you can see any English text that
>>> appears inside a Hebrew paragraph requires certain decorations around it
>>> (e.g., \L{some English text}) these decorations are displayed in an ugly
>>> fashion.
>>
>> Yes, it's a known problem.  The Unicode UAX#9 Bidirectional algorithm
>> (which is what Emacs implements for bidirectional display) does not
>> produce good results with LaTeX (and with other kinds of markup).
>>
>>> Is there anything that can be done about it?
>>
>> Something _should_ be done, for sure.  But for that, Someone™ should
>> figure out how this kind of problems could be solved using Emacs
>> display features.  Any solution will probably involve reordering only
>> parts of text, but a more detailed design suggestion is needed before
>> it can be implemented.  People are welcome to try to tackle this,
>> because I'm still busy with low-level bidi support of plain text.
>
> I'd like to talk about this problem a little, just to get a little
> understanding of the problem space.  Please be warned that although I
> have read through UAX#9 a few times, and have been following (as best I
> can) Eli's bidi work, I am still very much a novice, and am apt to make
> improper assumptions, or misunderstand how things are supposed to work.
>
> In the examples, below, I will use the convention in the UAX#9
> document that a capital letter represents an R type character, and a
> lower-case letter represents an L type character.  Formatting codes will
> be typed as<RLE>,<PDF>, etc.
>
> So, the example being used was:
>
> Memory:  HEBREW \foo{english}
> Levels:  11111111222222222221
> Display: {foo{english\ WERBEH
>
> Here the paragraph embedding level is 1 (odd, LtR) since the first
> character is an R character.  The backslash, braces, and spaces are N
> characters.  The N character sequence " \" takes on the current
> embedding direction (1) based on rule N2.  The open brace gets level 2
> based on rule N1, and the close brace gets level 1 again based on rule
> N2.  Note that the close brace appears as its mirrored glyph due to rule
> L4).
>
> (Rule N1 states that runs of neutral characters between strong
> characters of the same direction take on that direction.  Rule N2 states
> that otherwise, they get the embedding direction.)
>
> Here is another example:
>
> Memory:  HEBREW \foo{HEBREW}
> Levels:  1111111122211111111
> Display: {WERBEH}foo\ WERBEH
>
> In this case, note that both of the braces are mirrored in the display.
>
> One simple, naive way of handling this for the various TeXs is to
> consider all backslashes and brace characters as R characters.  This can
> be simulated by surrounding each run of these characters by LRE PDF
> pairs.  However, unless TeX ignores these characters completely, these
> formatting characters would have to be removed before being processed by
> TeX.
>
> Another way of handling this would be to redefine the backslash and
> brace characters as R characters, for purposes of the display engine.
> Currently, I don't know if there is a way to do this in elisp.  bidi.c
> seems to use a character table named bidi_type_table to hold this
> information.  Currently this table is not exposed at the elisp layer, to
> the best of my knowledge.  Maybe it would be possible to modify this
> table in elisp, and possibly make it buffer local?
>
> Another idea would be to allow a text property to override the character
> type.  This feels like a very elegant, emacs-ish way to do things, but
> an uneducated glance at the bidi code makes me feel like it would be
> difficult to get information about text properties into this layer.
> Another idea would be to use display strings including the LRE and PDF
> characters to replace existing backslashes and braces.  However, display
> strings do not affect the bidi algorithm at this point.
>
> I'm really starting to ramble at this point, so I think I will send
> these musings to see what Eli and others think.
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

  reply	other threads:[~2011-02-27 10:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-24 12:32 improving bidi documents display Eli Osherovich
2011-02-24 19:54 ` Eli Zaretskii
2011-02-27 10:01   ` Michael Welsh Duggan
2011-02-27 10:34     ` "Martin J. Dürst" [this message]
2011-02-27 21:19       ` Eli Zaretskii
2011-03-02  1:50         ` "Martin J. Dürst"
2011-03-02  4:02           ` Eli Zaretskii
2011-03-04 10:34             ` [emacs-bidi] " "Martin J. Dürst"
2011-02-27 21:15     ` Eli Zaretskii
2011-02-28  1:10       ` Miles Bader
2011-02-28  4:02         ` Eli Zaretskii
2011-03-02  0:58       ` James Cloos
2011-03-02 18:59         ` Eli Zaretskii
2011-03-02  2:09       ` [emacs-bidi] " "Martin J. Dürst"
2011-03-02  2:39         ` Miles Bader
2011-03-02  4:03           ` Eli Zaretskii
2011-03-02  7:06             ` Miles Bader
2011-03-02 18:39               ` Eli Zaretskii
2011-03-03  1:32                 ` Miles Bader
2011-03-03  4:07                   ` Eli Zaretskii
2011-03-03  6:11                     ` "Martin J. Dürst"
2011-03-03 10:40                       ` [emacs-bidi] " Eli Zaretskii
2011-03-04 10:34                         ` "Martin J. Dürst"
2011-03-04  3:58                 ` Stefan Monnier
2011-03-04  8:04                   ` Eli Zaretskii
2011-03-04  4:25                 ` Miles Bader
2011-03-04  9:52                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6A28AE.8010607@it.aoyama.ac.jp \
    --to=duerst@it.aoyama.ac.jp \
    --cc=eli.osherovich@gmail.com \
    --cc=emacs-bidi@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=md5i@md5i.com \
    --cc=oshima@sw.it.aoyama.ac.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).