From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: bidi and shaping problems in describe-input-method Date: Sat, 10 Mar 2012 11:55:54 +0900 Message-ID: <87r4x1uptx.fsf@m17n.org> References: <83pqclzrb5.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1331348200 28817 80.91.229.3 (10 Mar 2012 02:56:40 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 10 Mar 2012 02:56:40 +0000 (UTC) Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 10 03:56:39 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1S6CU3-00058Q-2A for ged-emacs-devel@m.gmane.org; Sat, 10 Mar 2012 03:56:39 +0100 Original-Received: from localhost ([::1]:58716 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S6CU2-00072W-Ag for ged-emacs-devel@m.gmane.org; Fri, 09 Mar 2012 21:56:38 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:40697) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S6CTy-000723-Ro for emacs-devel@gnu.org; Fri, 09 Mar 2012 21:56:36 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S6CTw-0005WQ-NA for emacs-devel@gnu.org; Fri, 09 Mar 2012 21:56:34 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:62237) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S6CTw-0005WK-6U; Fri, 09 Mar 2012 21:56:32 -0500 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id q2A2uQY0027446; Sat, 10 Mar 2012 11:56:26 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id q2A2uQcB011486; Sat, 10 Mar 2012 11:56:26 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id q2A2uPB8015694; Sat, 10 Mar 2012 11:56:25 +0900 (JST) env-from (handa@m17n.org) In-Reply-To: <83pqclzrb5.fsf@gnu.org> (message from Eli Zaretskii on Fri, 09 Mar 2012 18:12:46 +0200) X-detected-operating-system: by eggs.gnu.org: Solaris 9 X-Received-From: 150.29.246.133 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:148956 Archived-At: In article <83pqclzrb5.fsf@gnu.org>, Eli Zaretskii writes: > > In general, it's smarter to use LRM only where necessary. > Testing whether they are necessary is a problem in itself. You can > easily avoid inserting the marks for strong L2R characters, but they > are the minority. Most of the characters are not in that category. > And of course keyboard layouts include such characters. > > > > (defun quail-help-require-LRM (char) > > > > (or (eq (get-char-code-property char 'bidi-class) 'L) > > > > ...)) > >=20 > > > It's possible, but why bother? And with this function you will insert > > > the LRM for many characters that don't need that, like punctuation, > > > numbers, etc. > >=20 > > ??? I want a function that returns t only for a character > > that require preceding LRM in the keyboard layout. > Yes, I understand that. But the test you are suggesting, i.e. avoid > the LRM only for characters whose bidi-class is L, will not catch > numbers, punctuation, and other non-L characters. The function body I wrote is just an idea, not a complete solution, and of cource checking against L is apparently a bug. At least we must check against R (and AL). > > > Also, `lower' and `upper' could be strings, in which case you need a > > > more complex test. > >=20 > > We can give (if (string lower) (aref lower 0) lower) to that > > function. > But that doesn't DTRT. Here's an example where it will fail: ".A". Why? Keyboard cells in the keyboard layout has typically this form: (L is for lower key, U is for upper (shifted) key) ... | LU | LU | ... What we want is to display the left LU to the left of the right LU, and display each L (character or string) to the right of the corresponding U. Even if the L (of the left LU) is ".A", we don't need LRM for it. We have to insert LRM only before a character that may reorder the previous characters, and after a character that may reorder the following character. Isn't it right? > AFAIK, the only reliable way of telling whether a given string will be > reordered is to actually reorder it, and then compare with the > logical-order original. That's a nuisance, and also the results may > well depend on the characters before and after the string in the > buffer, so you need to know the context in advance, which you normally > don't. > I tried also a different solution: enclose each row of the keyboard > layout in an L2R override embedding, LRO..PDF. This inserts only 2 > control characters per row, and doesn't insert them inside the > keyboard cells, so it is cleaner, I think. But using this means that > no key description in the layout can be a string that requires > reordering individually. (By contrast, inserting an LRM between the > lower and the upper key still allows each description to be > reordered.) Can we live with such a restriction? I don't know enough > about Quail to tell. As it's possible to assign a string to a key, there will be the case that the characters in the string must be reordered. In the above case, if L is a hebrew "=D7=A9=D7=9C=D7=95=D7=9D",= it must be reordered. But, even if we surround that word with LRE and PDF, the word itself is reordered correctly, right? --- Kenichi Handa handa@m17n.org