From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Buffer names with R2L characters Date: Thu, 23 Jun 2011 12:16:01 +0300 Message-ID: <83hb7gvrfy.fsf@gnu.org> References: <838vswwk2b.fsf@gnu.org> <201106211652.p5LGqIGr016636@beta.mvs.co.il> <83wrgfumh7.fsf@gnu.org> <201106211759.p5LHxpiT008325@beta.mvs.co.il> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1308820677 15595 80.91.229.12 (23 Jun 2011 09:17:57 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 23 Jun 2011 09:17:57 +0000 (UTC) Cc: ehud@unix.mvs.co.il, miles@gnu.org, stephen@xemacs.org, cloos@jhcloos.com, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jun 23 11:17:51 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QZg2m-0000Cw-GV for ged-emacs-devel@m.gmane.org; Thu, 23 Jun 2011 11:17:48 +0200 Original-Received: from localhost ([::1]:58060 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QZg2l-0000dn-LD for ged-emacs-devel@m.gmane.org; Thu, 23 Jun 2011 05:17:47 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:45602) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QZg1K-0000PZ-PD for emacs-devel@gnu.org; Thu, 23 Jun 2011 05:16:20 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QZg1I-0004Pp-A5 for emacs-devel@gnu.org; Thu, 23 Jun 2011 05:16:18 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:35604) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QZg1H-0004PW-S5; Thu, 23 Jun 2011 05:16:16 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0LN800900K5V2P00@a-mtaout22.012.net.il>; Thu, 23 Jun 2011 12:16:02 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.124.66.211]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LN8007YIKEMROD0@a-mtaout22.012.net.il>; Thu, 23 Jun 2011 12:16:01 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:140908 Archived-At: > From: Stefan Monnier > Cc: eliz@gnu.org, emacs-devel@gnu.org, cloos@jhcloos.com, stephen@xemacs.org, miles@gnu.org > Date: Wed, 22 Jun 2011 18:27:14 -0400 > > Maybe another way to attack the problem is to say that the < and the > > in that string are not neutral but "weak L2R" or something like that. There's no "weak L2R" bidi type or category in UAX#9. Weak types include numbers (i.e. digits) and "number separators" (plus and minus). Changing the type of '<' and '>' to number separator will not gain us anything, because these separators are treated the same as neutrals, except when they are between two numbers. Changing the type to numbers could probably solve the problem, but runs the risk of getting us in more trouble, since the treatment of numbers makes sense only for numbers. > Maybe this would also work for XML markup. It won't. In fact, it could make things worse. To see it, take the first example in this article: http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html the one that uses Arabic, copy/paste it into *scratch* in Emacs 24 with bidi-display-reordering turned on, and replace every '<' and '>' there with either '-' (a number separator) or a digit. The result is still unreadable gibberish, and in the case of digits it's even less readable. > We could specify such a thing via some char-table overriding the > default bidi properties of specified chars. We would either need to be > able to set this as a text-property over the "", or to have one for > the mode-line. First, there's no need to invent another char-table. The bidi types used by bidi.c are already specified in a char-table, so all you'd need to do is to modify it (probably its copy). Assuming we indeed want to modify the properties of '<' and '>', that is -- which I think is not a good idea. (Btw, these two characters are not the only ones that cause trouble in display of buffer names. '~' is another one, and in fact all the punctuation characters behave in the same way. Are we going to modify the properties of all of them?) And second, using text properties for overriding bidi properties is not a good idea at all, because bidi.c works below the level that pays attention to text properties. Making it aware of text properties will slow it down considerably, or require a complete redesign of how the bidi display works in general, i.e. give up the total separation between the reordering and the rest of the display engine. I don't think we want that on behalf of this relatively minor issue. Bottom line, using the directional control characters is the best way of adapting the visual appearance to user expectations when displaying plain text. XML and other non-pain text buffers are a different kind of problem. There, we would like to display correctly not just text around '>', but also comments and strings. The problems there are with all the punctuation characters near the end of the comments and strings (they display at the wrong end of the last sentence) and with L2R text embedded in the otherwise R2L text. IOW, we would like to have a way to display such comments and strings as if they were in an R2L paragraph. I don't yet know what would be a good solution to that. In fact, I don't think we have an exhaustive list of situations where the default reordering causes trouble and must be augmented by something else. > >> > I think Eli is wrong here. An example will help, a file with the > >> > (logical) name "/abc/def GHIK/LMNO qrst" when uniquified will appear > >> > as: "def ONML|KIHG qrst" which is clearly wrong. > >> > My way to solve it is as above, i.e. add zero width LRM on both sides > >> > of the separator (/ or |) in addition to the enclosing LRMs. > >> I think this is beginning to become gross. > > But it is a general solution that is easily implemented. > > Indeed, for the buffer names it seems perfectly acceptable since we > generate them ourselves and they don't go very far. I'm not sure why > Eli doesn't like this solution. I don't like the proliferation of directional marks that this will bring. I hoped that we will need these directional control characters only very rarely. These have problems on TTYs, and even in GUI sessions they are visible by default (as thin spaces), so they will disrupt the visual appearance and cursor motion. We will need to have them everywhere, e.g. in the prompt displayed by read-buffer and in other places, if we want buffer names to look the same in all contexts. But since this is the best available solution, I'm willing to try; maybe I'm wrong and the results will not be that bad after all.