From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: bidi-display-reordering is now non-nil by default Date: Fri, 05 Aug 2011 09:40:41 +0300 Message-ID: <83liv8nyxi.fsf@gnu.org> References: <20110731.082721.451360942.wl@gnu.org> <20110731.085115.40009301.wl@gnu.org> <877h6yanje.fsf@fencepost.gnu.org> <878vre95g3.fsf@fencepost.gnu.org> <87fwlm7fam.fsf@fencepost.gnu.org> <87bowa7dza.fsf@fencepost.gnu.org> <877h6y7chn.fsf@fencepost.gnu.org> <831ux6cv5o.fsf@gnu.org> <87d3gpku3o.fsf@gnus.org> <834o1ypa2b.fsf@gnu.org> <87sjphhnbj.fsf@uwakimon.sk.tsukuba.ac.jp> <87k4ath4rd.fsf@uwakimon.sk.tsukuba.ac.jp> <87ipqdgu1e.fsf@uwakimon.sk.tsukuba.ac.jp> <87fwlhglqy.fsf@uwakimon.sk.tsukuba.ac.jp> <83ty9xnkcu.fsf@gnu.org> <87bow4h6j6.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1312526593 28130 80.91.229.12 (5 Aug 2011 06:43:13 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 5 Aug 2011 06:43:13 +0000 (UTC) Cc: larsi@gnus.org, list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Aug 05 08:43:07 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QpE7e-0003Sb-Hl for ged-emacs-devel@m.gmane.org; Fri, 05 Aug 2011 08:43:06 +0200 Original-Received: from localhost ([::1]:57447 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpE7d-0004Wj-Ry for ged-emacs-devel@m.gmane.org; Fri, 05 Aug 2011 02:43:05 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:35907) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpE7Y-0004Wc-2m for emacs-devel@gnu.org; Fri, 05 Aug 2011 02:43:03 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QpE7U-00026S-4L for emacs-devel@gnu.org; Fri, 05 Aug 2011 02:43:00 -0400 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:64660) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpE7T-00026D-Pb for emacs-devel@gnu.org; Fri, 05 Aug 2011 02:42:56 -0400 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0LPF00D00ZCS5S00@a-mtaout23.012.net.il> for emacs-devel@gnu.org; Fri, 05 Aug 2011 09:42:41 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([84.228.94.185]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LPF00CJUZZ4Y6C0@a-mtaout23.012.net.il>; Fri, 05 Aug 2011 09:42:41 +0300 (IDT) In-reply-to: <87bow4h6j6.fsf@uwakimon.sk.tsukuba.ac.jp> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 80.179.55.175 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:142898 Archived-At: > From: "Stephen J. Turnbull" > Cc: larsi@gnus.org, > list-general@mohsen.1.banan.byname.net, > emacs-devel@gnu.org > Date: Fri, 05 Aug 2011 12:38:21 +0900 > > Eli Zaretskii writes: > > > They are not irrelevant. What you suggest runs the risk of adding or > > removing LRM/RLM characters to/from a file against user > > expectations. > > Sure, but byte-level equality is not part of that; character-level > equality is. LRM is also a character, for this purpose, yes? > > Again, what if the user inserts another LRM? > > Insert another non-character "marker" in the buffer, using whichever > non-character strategy were using. And now what happens if the user wants to search for that LRM character she just inserted? > > > What's wrong with reparsing the buffer from the beginning, treating > > > each change of value of the direction property as insertion of the > > > appropriate direction mark? > > > > Reparsing the whole buffer upon each insertion? Is that the way to > > make redisplay fast and efficient? > > No, that's a proof that it's *possible*, where your words claim it's > *im*possible. Impossible or unacceptable -- is there really a difference in practice? > Making it fast is a SMOP. You say it's beyond you, and > that probably means it's beyond anybody competent enough in bidi to do > the implementation. But let's not discourage anyone from trying. ;-) There's a saying that smart people learn from their experience, but wise people learn from that of others. If someone is wise and wants to learn from my experience, please read the history and the diffs of bug#9218. It had to do with a flawed design of certain aspects of bidi iteration whereby sometimes the display engine had to look from some point in the buffer to its very end. The result was a completely unusable Emacs in the buffers that were hitting this design flaw (e.g., Org Mode buffers of a few MB size). > > How do you indicate them, exactly? Emacs has no features, except > > again text properties, to indicate something like that. In any case, > > isn't it beginning to sound more and more complicated? > > Sure. And the presence of non-graphic characters in the buffer is > going to make other code more complicated. Again, LRM is just a character, like ZWNJ and friends. We need to support such characters in files anyway. And we already started, with the glyphless-char-display feature. > > > But if that doesn't work, I don't see how having explicit mark > > > characters in the buffer can work either. > > > > Explicit marks work because the reordering algorithm does TRT with > > them, whether they are redundant or not. It doesn't care. By not > > caring it makes it very easy to preserve the byte stream and not risk > > changing it behind user's back. > > The algorithm will be the same, except that it needs to work with a > "virtual" stream where some characters are not present in the buffer. > This is no different from handling faces, which *could* be represented > as characters in the buffer (and *are* in HTML, for example -- which > of course has been deprecated in favor of CSS! Hmm... :-). Actually, I think dealing with "virtual" characters means at least lots of changes in Emacs if not larger trouble. Up to v24, Emacs assumed that the correspondence between buffer text and text on display is mostly 1:1. Sure, display strings, invisible text, variable fonts, and other display features break that to some extent, but by and large, this was true. Emacs 24 changes that some more due to support of bidi. But bidi support is _a_display_only_feature_, and the current design sticks to that almost religiously. Again, the need to insert LRM/RLM etc. here and there violates the "display-only" thing, but one could claim that this is unrelated to bidi display per se: if we don't care about good looks in specially formatted buffers, we can disregard this issue; the display will still be "correct" per the UBA. This assumption, of the basic 1:1 correspondence between buffer text and the display, is very fundamental and affects many Emacs features not directly related to display. One such set of features is column counting and the vertical scrolling and indentation features that are based on it. If you look under the hood, you will see that some/many of the functions involved in the implementation of this walk buffer text, not the display structures. (Being dependent on display structures means that the related features cannot work if the display is not up to date, which is unacceptable.) Any idea whose result is "virtual" characters not in the buffer means a tremendous complication in these features, for reasons that I hope are obvious. In a nutshell, a display-only feature will leak all over the code that works with buffer text. I won't argue whether this is impractical or impossible, but I hope you will at least agree that it's undesirable. > > The _value_ doesn't matter. It's the property symbol that cannot be > > the same in overlapping regions, unless the values are identical. > > Of course the value matters. A 'direction property with a sequence > value can encode the whole stack, up to 61 levels. Then you'd need to change this value on every edit of the related text. > Again, I wouldn't want to maintain that design (space-inefficiency > and the question of consistency of neighboring regions are killers, > I think), but there are surely lighter-weight, more efficient > designs. I doubt the "surely" part. > IIUC, in XEmacs, this could easily be implemented with a zero-length > extent with appropriate stickiness attributes. I Only know about Emacs stickiness. With that, this idea will lead to proliferation of characters with the "mark" value, as text around it is added/deleted. You will need to work hard to maintain that so that there's only one place with that value. > Thank you very much for taking the time out to explain your reasons > for your design choices. I have a much better grasp of the practical > issues involved in implementing bidi in Emacsen now. You are welcome.