From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Ehud Karni" Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Bidirectional editing in Emacs -- main design decisions Date: Sat, 10 Oct 2009 16:57:59 +0200 Organization: Mivtach-Simon Insurance agencies Message-ID: <200910101457.n9AEvxrW000735@beta.mvs.co.il> References: <83bpkgl113.fsf@gnu.org> Reply-To: ehud@unix.mvs.co.il NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1255188153 3131 80.91.229.12 (10 Oct 2009 15:22:33 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 10 Oct 2009 15:22:33 +0000 (UTC) Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org To: eliz@gnu.org Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Sat Oct 10 17:22:24 2009 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Mwdm2-0005mM-FC for gnu-emacs-bidi@m.gmane.org; Sat, 10 Oct 2009 17:22:22 +0200 Original-Received: from localhost ([127.0.0.1]:40640 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mwdm1-000783-Pr for gnu-emacs-bidi@m.gmane.org; Sat, 10 Oct 2009 11:22:21 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MwdlP-0006vL-3a for emacs-bidi@gnu.org; Sat, 10 Oct 2009 11:21:43 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MwdlJ-0006tz-0m for emacs-bidi@gnu.org; Sat, 10 Oct 2009 11:21:41 -0400 Original-Received: from [199.232.76.173] (port=47272 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MwdlI-0006tw-Sr; Sat, 10 Oct 2009 11:21:36 -0400 Original-Received: from [193.16.147.12] (port=53175 helo=unix.mvs.co.il) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MwdlI-0006yi-3G; Sat, 10 Oct 2009 11:21:36 -0400 Original-Received: from beta.mvs.co.il (beta [10.253.0.3]) by unix.mvs.co.il (8.13.8/8.13.7) with ESMTP id n9AEw0p1007853; Sat, 10 Oct 2009 16:58:00 +0200 Original-Received: from beta.mvs.co.il (localhost [127.0.0.1]) by beta.mvs.co.il (8.14.1/8.14.1) with ESMTP id n9AEvxsa000738; Sat, 10 Oct 2009 16:57:59 +0200 Original-Received: (from root@localhost) by beta.mvs.co.il (8.14.1/8.14.1/Submit) id n9AEvxrW000735; Sat, 10 Oct 2009 16:57:59 +0200 In-reply-to: <83bpkgl113.fsf@gnu.org> (message from Eli Zaretskii on Fri, 09 Oct 2009 23:18:00 +0200) X-Mailer: Emacs 21.3.1 rmail (send-msg 1.109) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 X-Greylist: delayed 1408 seconds by postgrey-1.27 at monty-python; Sat, 10 Oct 2009 11:21:35 EDT X-Greylist: delayed 1409 seconds by postgrey-1.27 at monty-python; Sat, 10 Oct 2009 11:21:36 EDT X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:417 gmane.emacs.devel:116068 Archived-At: On Fri, 09 Oct 2009 23:18:00 Eli Zaretskii wrote: > > Here's what I can tell about the subject (bidi display) at this point In general I agree with your decisions. > 1. Text storage > > Bidirectional text in Emacs buffers and strings is stored in strict > logical order (a.k.a. "reading order"). This is how most (if not > all) other implementations handle bidirectional text. The > advantage of this is that file and process I/O is trivial, as well > as text search. [snip] The search has many problems but this should not influence your bidi reordering. The changes to various search functions can be done later. The user ALWAYS search for the visual text s/he sees (S/he never knows the logical order unless she visits the file literally). The problems are caused by many reasons: 1. Different logical inputs, even without formatting characters, can result in the same visual output. e.g. Logical Hebrew text + a number in LTR reading order, the number may be before or after the Hebrew text, but in the visual output the number will always be after (to the left of) the text. Logical "123 HEBREW 456" appears as "123 456 WERBEH". 2. Formatting characters are not seen and should not be searched. 3. The visual appearance of the searched string may be different from what it will match. e.g. The search for logical "HEBREW 3." in RTL reading order will appear as ".3 WERBEH" but will match also something like logical "HEBREW 3.14159" which its visual appearance is "3.14159 WERBEH". This may be what the user wants but it may also disturb her because she really wants to find only (visual) ".3 WERBEH". There is also a technical question, how Emacs will show the found string which is not connected as in the "3.14159 WERBEH" above. As a minimum adjustment, I think the search must ignore the formatting characters. An option to show (or operate, in search & replace) only on found matches that are also the same visually is recommended. > 3. Bidi formatting codes are retained Agreed, but see my comment on search. > 7. Paragraph base direction > > There is a buffer-specific variable `paragraph-direction' that > allows to override this dynamic detection of the direction of each > paragraph, and force a certain base direction on all paragraphs in > the buffer. I expect, for example, each major mode for a > programming language to force the left-to-right paragraph > direction, because programming languages are written left to right, > and right-to-left scripts appear in such buffers only in strings > embedded in the program or in comments. I think a better name is `bidi-paragraphs-direction' or even `bidi-paragraphs-reading-direction'. Note the `s' in paragraphs, because it is influence all the paragraphs in the buffer. There should be a key to toggle this variable. It will very useful for the minibuffer. > 8. User control of visual order Do you intend to support all the explicit formatting characters (LRO is specially important as it allows to store visual strings as is) or just the implicit (and more used) LRM and RLM ? > This design kills two birds: (a) it produces text that is compliant > with other applications, and will display the same as in Emacs, and > (b) it avoids the need to invent yet another Emacs infrastructure > feature to keep information such as paragraph direction outside of > the text itself. While you can store the LRM and RLM in ISO-8859-8 encoding, there is no way to store the the other formatting characters. > That is all for now. If you have comments or questions, you are > welcome to voice them. I found an editor that support the all the formatting characters, YODIT (http://www.yudit.org/) it is GPLed, may be you can use it. The W3C recommend not to use explicit formatting characters (i.e. RLO/LRO/RLE/LRE/PDF) and instead to use markup (see http://www.w3.org/International/questions/qa-bidi-controls , specially the "reasons" section). Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D Better Safe Than Sorry