From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: bidi-display-reordering is now non-nil by default Date: Thu, 04 Aug 2011 01:16:15 -0400 Message-ID: References: <20110731.082721.451360942.wl@gnu.org> <20110731.085115.40009301.wl@gnu.org> <877h6yanje.fsf@fencepost.gnu.org> <878vre95g3.fsf@fencepost.gnu.org> <87fwlm7fam.fsf@fencepost.gnu.org> <87bowa7dza.fsf@fencepost.gnu.org> <877h6y7chn.fsf@fencepost.gnu.org> <831ux6cv5o.fsf@gnu.org> <87d3gpku3o.fsf@gnus.org> <834o1ypa2b.fsf@gnu.org> <87sjphhnbj.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1312434991 19921 80.91.229.12 (4 Aug 2011 05:16:31 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 4 Aug 2011 05:16:31 +0000 (UTC) Cc: larsi@gnus.org, list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Aug 04 07:16:27 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QoqID-0002A7-Ir for ged-emacs-devel@m.gmane.org; Thu, 04 Aug 2011 07:16:25 +0200 Original-Received: from localhost ([::1]:38629 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QoqID-0008RV-2p for ged-emacs-devel@m.gmane.org; Thu, 04 Aug 2011 01:16:25 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:52615) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QoqI9-0008RJ-8q for emacs-devel@gnu.org; Thu, 04 Aug 2011 01:16:23 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QoqI4-0002JJ-Ug for emacs-devel@gnu.org; Thu, 04 Aug 2011 01:16:21 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]:43593) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QoqI4-0002JF-Rr for emacs-devel@gnu.org; Thu, 04 Aug 2011 01:16:16 -0400 Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1QoqI3-0004zu-Ij; Thu, 04 Aug 2011 01:16:15 -0400 In-reply-to: <87sjphhnbj.fsf@uwakimon.sk.tsukuba.ac.jp> (stephen@xemacs.org) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.10 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:142857 Archived-At: > From: "Stephen J. Turnbull" > Cc: Lars Magne Ingebrigtsen , > list-general@mohsen.1.banan.byname.net, > emacs-devel@gnu.org > Date: Thu, 04 Aug 2011 12:23:28 +0900 > > Eli, you seem to forget that *Unicode is a wire protocol*, an inter- > application communication tool. It is not intended to be a > specification, or even recommendation, of how applications handle text > internally. Yes, but we are not talking about the internal handling. We are talking about display, which is an external, user-visible part of the issue. The Unicode Bidirectional Algorithm is a specification for converting a stream of text into an array of character glyphs on the screen. It is not a wire protocol. Nor is it specific to text external to Emacs: after all, the internal storage of text in Emacs, as in many other applications, is just a linear byte stream. Let's go back to the issue at hand: the directional control characters. A quote from UAX#9: [...] there are circumstances where an implicit bidirectional ordering is not sufficient to produce comprehensible text. To deal with these cases, a minimal set of directional formatting codes is defined to control the ordering of characters when rendered. This allows exact control of the display ordering for legible interchange and ensures that plain text used for simple items like filenames or labels can always be correctly ordered for display. The directional formatting codes are used only to influence the display ordering of text. [...] > Of course on writing a stream to the outside world, Emacs will need to > use directional marks. Surely Lars does not deny that! However, > internally, text properties could in theory suffice, just as they do > for ANSI coloring. This option (converting directional marks in external stream to some Emacs feature on I/O) was also discussed at the time (nearly 10 years ago). It is possible to implement it, but it is unnecessarily complicated, and it even has some hard-to-resolve issues. For example, what if the user inserts these characters manually? we will then face a very real risk to introduce subtle bugs whereby saving the text to a disk file, then visiting that file could produce a buffer whose contents are different. Such unexpected conversions behind user's back proved to be an annoyance, as the experience of MULE shows. > > Because (a) text properties are specific to Emacs, and (b) they cannot > > overlap (for the same property). By contrast, to force certain visual > > order, one must sometimes force some direction on a portion of text > > and then the opposite direction on an inner substring of that very > > text. Text properties won't grok that. > > Huh? Of course text properties nest. A single character can have only one property of each type. Let's say we call this property `direction' and give it 2 values: L2R and R2L. Then imagine the following scenario: . you mark a portion of text with L2R direction property . you then want to mark part of that portion with R2L direction property (there are situations where this is necessary, I can show examples if this matters, but for now please believe me) . since each character can have only one value of the direction property, you cannot do this in any simple way; you'd need to split the original region in 3 parts, which is ugly and complicates what needs to be done when text in this region is deleted (keep in mind that the UBA mandates support of up to 60 levels of such embedded direction reversals, don't ask me why, and Emacs is in full compliance) > If for some subtle reason, they don't quite nest correctly for this > purpose, overlays most likely will. Overlays don't get copied with the text, so if you copy/paste text into another area of the same buffer or into another buffer, the nice display will be lost. We could complicate the heck out of yanking so it reinserts the overlays, of course, but why complicate things if an easier way is available that is straightforward? > > What is the difference between aligning HELLO and aligning a summary > > buffer? They are both plain text, and they both are arranged to align > > nicely. > > HELLO arrives as an external plain text stream, and therefore is > governed by the Unicode standard. The summary buffer is constructed > by Emacs and it is not plain text But it should be possible to copy portions of that buffer elsewhere, and such a copy should keep its visual appearance on the screen with minimal fuss, or else users will be annoyed. Right? The question that bugged us during the early stages of the design was how do you ensure this without asking Lisp application programmers to jump through the hoops every time text is copied or saved or read. It turns out that using the directional control characters is the easiest way. > (it has a *lot* of structure, being mousable etc), and therefore is > not governed by the standard for its *implementation*. It's not governed by the standard, but following the standard is the easiest way of achieving the goal with minimal implications. > How many directional marks are needed in the Hebrew TUTORIAL, given > the full BIDI algorithm implementation Not many, but some. About 120, if my count is correct. > and how many are redundant? None. I used them only where the normal implicit reordering didn't yield the correct display. > Have you copied portions of the TUTORIAL with embedded marks into > email headers and gotten appropriate results? Yes. It works, and works seamlessly. That's the whole point of using these control characters. > I bet that, as Lars implies, Emacs is going to need > `yank-dropping-directional-marks' in some applications. If we drop the marks on yanking, text will look differently when yanked, sometimes completely differently, to the degree of being incomprehensible. I think that way lies madness, if we want a decent support of bidi scripts. So such a feature would be ill-advised, and I will do my best to convince people out of it.