From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Bidirectional editing in Emacs -- main design decisions Date: Sat, 10 Oct 2009 18:38:18 +0200 Message-ID: <83ljjjjjb9.fsf@gnu.org> References: <83bpkgl113.fsf@gnu.org> <200910101457.n9AEvxrW000735@beta.mvs.co.il> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1255192665 15629 80.91.229.12 (10 Oct 2009 16:37:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 10 Oct 2009 16:37:45 +0000 (UTC) Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org To: ehud@unix.mvs.co.il Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Sat Oct 10 18:37:36 2009 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Mwewp-00068o-Gc for gnu-emacs-bidi@m.gmane.org; Sat, 10 Oct 2009 18:37:35 +0200 Original-Received: from localhost ([127.0.0.1]:56172 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mwewp-0006w1-1j for gnu-emacs-bidi@m.gmane.org; Sat, 10 Oct 2009 12:37:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Mwewb-0006uJ-Mb for emacs-bidi@gnu.org; Sat, 10 Oct 2009 12:37:21 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Mwewa-0006so-8X for emacs-bidi@gnu.org; Sat, 10 Oct 2009 12:37:20 -0400 Original-Received: from [199.232.76.173] (port=46064 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mwewa-0006se-3w; Sat, 10 Oct 2009 12:37:20 -0400 Original-Received: from mtaout6.012.net.il ([84.95.2.16]:56766) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MwewZ-000639-EY; Sat, 10 Oct 2009 12:37:19 -0400 Original-Received: from conversion-daemon.i-mtaout6.012.net.il by i-mtaout6.012.net.il (HyperSendmail v2007.08) id <0KRB00I004OPS700@i-mtaout6.012.net.il>; Sat, 10 Oct 2009 18:36:22 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.70.84.229]) by i-mtaout6.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KRB00EIB4SMSO10@i-mtaout6.012.net.il>; Sat, 10 Oct 2009 18:36:22 +0200 (IST) In-reply-to: <200910101457.n9AEvxrW000735@beta.mvs.co.il> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (1203?) X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:421 gmane.emacs.devel:116072 Archived-At: > Date: Sat, 10 Oct 2009 16:57:59 +0200 > From: "Ehud Karni" > Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org > > On Fri, 09 Oct 2009 23:18:00 Eli Zaretskii wrote: > > > > Here's what I can tell about the subject (bidi display) at this point > > In general I agree with your decisions. Well, you brought up many of them (thanks!), so it isn't surprising ;-) > The search has many problems but this should not influence your bidi > reordering. The changes to various search functions can be done later. Agreed. > The user ALWAYS search for the visual text s/he sees (S/he never knows > the logical order unless she visits the file literally). She will look for visual text, but she will type the text she looks for in the logical (reading) order, not in the visual order, where characters are reversed and/or reshuffled. > The problems are caused by many reasons: > 1. Different logical inputs, even without formatting characters, can > result in the same visual output. > e.g. Logical Hebrew text + a number in LTR reading order, the > number may be before or after the Hebrew text, but in the visual > output the number will always be after (to the left of) the text. > Logical "123 HEBREW 456" appears as "123 456 WERBEH". > 2. Formatting characters are not seen and should not be searched. > 3. The visual appearance of the searched string may be different from > what it will match. e.g. The search for logical "HEBREW 3." in > RTL reading order will appear as ".3 WERBEH" but will match > also something like logical "HEBREW 3.14159" which its visual > appearance is "3.14159 WERBEH". This may be what the user wants > but it may also disturb her because she really wants to find only > (visual) ".3 WERBEH". All of these are valid and important considerations, and the search commands and primitives will have to deal with them, of course. There's also the issue of ``final'' letters in Hebrew and much more complex similar issues in Arabic, etc. I hope enough application-level Emacs programmers will come aboard and handle all this, because otherwise these scripts will never be supported well enough in Emacs. However, taking care of this is still quite far in the future. My main difficulty in making these decisions was to convince myself that, while none of these problems are trivial to solve and their solutions are not even known yet in detail, at least not to me, they are all _solvable_in_principle_ using just the logical-order text and the reordering engine (which was designed to allow it to be used by code other than just the redisplay iterator, so that it's easy to write a Lisp primitive that takes a logical-order string and returns its visual-order variant). Comments that question or contradict this conclusion are what I'm seeking now, because changing these decisions further down the road may be very difficult, to say nothing of the wasted effort. > There is also a technical question, how Emacs will show the found > string which is not connected as in the "3.14159 WERBEH" above. I didn't yet adapt support for faces to bidi display. However, my plan is to make it so that each character produced by the bidi iterator gets the correct face, like it does today, and faces are (and will be in the future) set in the logical order of buffer positions. So, in your example, the characters underlined below will have the `isearch' face: 3.14159 WERBEH -- ------ (you were saying that the search string is ".3 WERBEH"). Yes, this shows as disconnected. But other GUI applications do it that way, so I think the user will expect this behavior. > As a minimum adjustment, I think the search must ignore the formatting > characters. Yes, of course. At least by default, with an option to not ignore them. > Do you intend to support all the explicit formatting characters (LRO is > specially important as it allows to store visual strings as is) or just > the implicit (and more used) LRM and RLM ? All of them. They are already supported in the code that I'm using now. Like I said, Emacs will support the full set of features described by UAX#9. > > This design kills two birds: (a) it produces text that is compliant > > with other applications, and will display the same as in Emacs, and > > (b) it avoids the need to invent yet another Emacs infrastructure > > feature to keep information such as paragraph direction outside of > > the text itself. > > While you can store the LRM and RLM in ISO-8859-8 encoding, there is no > way to store the the other formatting characters. UAX#9 recommends to use LRM and RLM, in preference to the other codes, for this very reason. Users who will want to use the other codes (in the rare cases where they are necessary), will have to encode text in UTF-8. I don't see this as a serious problem, though: unlike several years ago, when this issue was discussed at length on emacs-bidi, the number of applications supporting UTF-8 is very large today. Heck, even Notepad groks it nowadays! > I found an editor that support the all the formatting characters, YODIT > (http://www.yudit.org/) it is GPLed, may be you can use it. Thanks, I had it installed already. > The W3C recommend not to use explicit formatting characters (i.e. > RLO/LRO/RLE/LRE/PDF) and instead to use markup (see > http://www.w3.org/International/questions/qa-bidi-controls , > specially the "reasons" section). Yes, I know. The obsession of W3C with markup is well known ;-) But Emacs is first and foremost a _text_editor_, so it doesn't make sense to me to force users to use markup just to be able to read or write bidirectional text. I also believe that converting text that uses Unicode formatting codes into markup is not such a hard job, and someone will surely come up soon enough with an Emacs function to do that.