From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?windows-1252?Q?=22Martin_J=2E_D=FCrst=22?= Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Re: improving bidi documents display Date: Sun, 27 Feb 2011 19:34:22 +0900 Organization: Aoyama Gakuin University Message-ID: <4D6A28AE.8010607@it.aoyama.ac.jp> References: <837hcpryxr.fsf@gnu.org> <87wrklpzii.fsf@maru.md5i.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1298802899 22456 80.91.229.12 (27 Feb 2011 10:34:59 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 27 Feb 2011 10:34:59 +0000 (UTC) Cc: Eli Osherovich , oshima@sw.it.aoyama.ac.jp, emacs-bidi@gnu.org, emacs-devel@gnu.org To: Michael Welsh Duggan Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Sun Feb 27 11:34:53 2011 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ptdxh-0006kf-7C for gnu-emacs-bidi@m.gmane.org; Sun, 27 Feb 2011 11:34:49 +0100 Original-Received: from localhost ([127.0.0.1]:34216 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ptdxg-0003yB-L5 for gnu-emacs-bidi@m.gmane.org; Sun, 27 Feb 2011 05:34:48 -0500 Original-Received: from [140.186.70.92] (port=35233 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ptdxd-0003xd-Or for emacs-bidi@gnu.org; Sun, 27 Feb 2011 05:34:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PtdxR-0002f0-Jm for emacs-bidi@gnu.org; Sun, 27 Feb 2011 05:34:45 -0500 Original-Received: from scintmta02.scbb.aoyama.ac.jp ([133.2.253.34]:51924) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PtdxQ-0002eP-W3 for emacs-bidi@gnu.org; Sun, 27 Feb 2011 05:34:33 -0500 Original-Received: from scmse02.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta02.scbb.aoyama.ac.jp (secret/secret) with SMTP id p1RAYQ7m015819 for ; Sun, 27 Feb 2011 19:34:27 +0900 Original-Received: from (unknown [133.2.206.133]) by scmse02.scbb.aoyama.ac.jp with smtp id 51e3_27fa_26799daa_425d_11e0_996c_001d096c5782; Sun, 27 Feb 2011 19:34:26 +0900 Original-Received: from [IPv6:::1] ([133.2.210.1]:57661) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id for from ; Sun, 27 Feb 2011 19:34:24 +0900 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4 In-Reply-To: <87wrklpzii.fsf@maru.md5i.com> X-MIME-Autoconverted: from 8bit to quoted-printable by scintmta02.scbb.aoyama.ac.jp id p1RAYQ7m015819 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 133.2.253.34 X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:843 gmane.emacs.devel:136577 Archived-At: Hello Michael, I and my students have been working on this problem, in the context of=20 XML/HTML, on and off for quite a few years. Please have a look at some=20 of the following: http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/ http://www.sw.it.aoyama.ac.jp/2008/pub/IUC32-bidi/ For the last year, Shunsuke Oshima, a student of mine, has been working=20 on an implementation for Emacs in EmacsLisp. We hope to be able to=20 publish the code in the next few weeks. It seems that the problems with=20 LaTeX are very much similar to those with XML/HTML, and it should be=20 possible to adapt our code to LaTeX. Our implementation is currently actually two parallel implementations,=20 one based on the insertion of additional control characters (it's a pain=20 to get rid of them before all save/copy/cut and similar operations), and=20 one based on overlays, which is what was originally suggested for this=20 purpose by Ken'ichi Handa, but is currently not working because the=20 characters in overlays don't participate in the bidi algorithm (Eli=20 thinks that would make things too slow). Regards, Martin. On 2011/02/27 19:01, Michael Welsh Duggan wrote: > Eli Zaretskii writes: > >>> Date: Thu, 24 Feb 2011 14:32:35 +0200 >>> From: Eli Osherovich >>> >>> At the moment (using rev. 103371) I can edit Hebrew/English LaTeX >>> documents, however, the way they are displayed in Emacs is not perfec= t. >>> Please look at the file attached as you can see any English text that >>> appears inside a Hebrew paragraph requires certain decorations around= it >>> (e.g., \L{some English text}) these decorations are displayed in an u= gly >>> fashion. >> >> Yes, it's a known problem. The Unicode UAX#9 Bidirectional algorithm >> (which is what Emacs implements for bidirectional display) does not >> produce good results with LaTeX (and with other kinds of markup). >> >>> Is there anything that can be done about it? >> >> Something _should_ be done, for sure. But for that, Someone=99 should >> figure out how this kind of problems could be solved using Emacs >> display features. Any solution will probably involve reordering only >> parts of text, but a more detailed design suggestion is needed before >> it can be implemented. People are welcome to try to tackle this, >> because I'm still busy with low-level bidi support of plain text. > > I'd like to talk about this problem a little, just to get a little > understanding of the problem space. Please be warned that although I > have read through UAX#9 a few times, and have been following (as best I > can) Eli's bidi work, I am still very much a novice, and am apt to make > improper assumptions, or misunderstand how things are supposed to work. > > In the examples, below, I will use the convention in the UAX#9 > document that a capital letter represents an R type character, and a > lower-case letter represents an L type character. Formatting codes wil= l > be typed as,, etc. > > So, the example being used was: > > Memory: HEBREW \foo{english} > Levels: 11111111222222222221 > Display: {foo{english\ WERBEH > > Here the paragraph embedding level is 1 (odd, LtR) since the first > character is an R character. The backslash, braces, and spaces are N > characters. The N character sequence " \" takes on the current > embedding direction (1) based on rule N2. The open brace gets level 2 > based on rule N1, and the close brace gets level 1 again based on rule > N2. Note that the close brace appears as its mirrored glyph due to rul= e > L4). > > (Rule N1 states that runs of neutral characters between strong > characters of the same direction take on that direction. Rule N2 state= s > that otherwise, they get the embedding direction.) > > Here is another example: > > Memory: HEBREW \foo{HEBREW} > Levels: 1111111122211111111 > Display: {WERBEH}foo\ WERBEH > > In this case, note that both of the braces are mirrored in the display. > > One simple, naive way of handling this for the various TeXs is to > consider all backslashes and brace characters as R characters. This ca= n > be simulated by surrounding each run of these characters by LRE PDF > pairs. However, unless TeX ignores these characters completely, these > formatting characters would have to be removed before being processed b= y > TeX. > > Another way of handling this would be to redefine the backslash and > brace characters as R characters, for purposes of the display engine. > Currently, I don't know if there is a way to do this in elisp. bidi.c > seems to use a character table named bidi_type_table to hold this > information. Currently this table is not exposed at the elisp layer, t= o > the best of my knowledge. Maybe it would be possible to modify this > table in elisp, and possibly make it buffer local? > > Another idea would be to allow a text property to override the characte= r > type. This feels like a very elegant, emacs-ish way to do things, but > an uneducated glance at the bidi code makes me feel like it would be > difficult to get information about text properties into this layer. > Another idea would be to use display strings including the LRE and PDF > characters to replace existing backslashes and braces. However, displa= y > strings do not affect the bidi algorithm at this point. > > I'm really starting to ramble at this point, so I think I will send > these musings to see what Eli and others think. > --=20 #-# Martin J. D=FCrst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp