From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: bidi and shaping problems in describe-input-method
Date: Sat, 10 Mar 2012 11:55:54 +0900
Message-ID: <87r4x1uptx.fsf@m17n.org>
References: <83pqclzrb5.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: dough.gmane.org 1331348200 28817 80.91.229.3 (10 Mar 2012 02:56:40 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Sat, 10 Mar 2012 02:56:40 +0000 (UTC)
Cc: list-general@mohsen.1.banan.byname.net, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 10 03:56:39 2012
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1S6CU3-00058Q-2A
	for ged-emacs-devel@m.gmane.org; Sat, 10 Mar 2012 03:56:39 +0100
Original-Received: from localhost ([::1]:58716 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1S6CU2-00072W-Ag
	for ged-emacs-devel@m.gmane.org; Fri, 09 Mar 2012 21:56:38 -0500
Original-Received: from eggs.gnu.org ([208.118.235.92]:40697)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <handa@m17n.org>) id 1S6CTy-000723-Ro
	for emacs-devel@gnu.org; Fri, 09 Mar 2012 21:56:36 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <handa@m17n.org>) id 1S6CTw-0005WQ-NA
	for emacs-devel@gnu.org; Fri, 09 Mar 2012 21:56:34 -0500
Original-Received: from mx1.aist.go.jp ([150.29.246.133]:62237)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <handa@m17n.org>)
	id 1S6CTw-0005WK-6U; Fri, 09 Mar 2012 21:56:32 -0500
Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123])
	by mx1.aist.go.jp  with ESMTP id q2A2uQY0027446;
	Sat, 10 Mar 2012 11:56:26 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp2.aist.go.jp
	by rqsmtp2.aist.go.jp  with ESMTP id q2A2uQcB011486;
	Sat, 10 Mar 2012 11:56:26 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp2.aist.go.jp  with ESMTP id q2A2uPB8015694;
	Sat, 10 Mar 2012 11:56:25 +0900 (JST) env-from (handa@m17n.org)
In-Reply-To: <83pqclzrb5.fsf@gnu.org> (message from Eli Zaretskii on Fri,
	09 Mar 2012 18:12:46 +0200)
X-detected-operating-system: by eggs.gnu.org: Solaris 9
X-Received-From: 150.29.246.133
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:148956
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/148956>

In article <83pqclzrb5.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > In general, it's smarter to use LRM only where necessary.

> Testing whether they are necessary is a problem in itself.  You can
> easily avoid inserting the marks for strong L2R characters, but they
> are the minority.  Most of the characters are not in that category.
> And of course keyboard layouts include such characters.

> > > > (defun quail-help-require-LRM (char)
> > > >    (or (eq (get-char-code-property char 'bidi-class) 'L)
> > > >        ...))
> >=20
> > > It's possible, but why bother?  And with this function you will insert
> > > the LRM for many characters that don't need that, like punctuation,
> > > numbers, etc.
> >=20
> > ??? I want a function that returns t only for a character
> > that require preceding LRM in the keyboard layout.

> Yes, I understand that.  But the test you are suggesting, i.e. avoid
> the LRM only for characters whose bidi-class is L, will not catch
> numbers, punctuation, and other non-L characters.

The function body I wrote is just an idea, not a complete
solution, and of cource checking against L is apparently
a bug.  At least we must check against R (and AL).

> > > Also, `lower' and `upper' could be strings, in which case you need a
> > > more complex test.
> >=20
> > We can give (if (string lower) (aref lower 0) lower) to that
> > function.

> But that doesn't DTRT.  Here's an example where it will fail: ".A".

Why?  Keyboard cells in the keyboard layout has typically
this form: (L is for lower key, U is for upper (shifted) key)

... | LU | LU | ...

What we want is to display the left LU to the left of the
right LU, and display each L (character or string) to the
right of the corresponding U.

Even if the L (of the left LU) is ".A", we don't need LRM
for it.  We have to insert LRM only before a character that
may reorder the previous characters, and after a character that
may reorder the following character.  Isn't it right?

> AFAIK, the only reliable way of telling whether a given string will be
> reordered is to actually reorder it, and then compare with the
> logical-order original.  That's a nuisance, and also the results may
> well depend on the characters before and after the string in the
> buffer, so you need to know the context in advance, which you normally
> don't.

> I tried also a different solution: enclose each row of the keyboard
> layout in an L2R override embedding, LRO..PDF.  This inserts only 2
> control characters per row, and doesn't insert them inside the
> keyboard cells, so it is cleaner, I think.  But using this means that
> no key description in the layout can be a string that requires
> reordering individually.  (By contrast, inserting an LRM between the
> lower and the upper key still allows each description to be
> reordered.)  Can we live with such a restriction?  I don't know enough
> about Quail to tell.

As it's possible to assign a string to a key, there will be
the case that the characters in the string must be
reordered.  In the above case, if L is a hebrew "=D7=A9=D7=9C=D7=95=D7=9D",=
 it
must be reordered.  But, even if we surround that word with
LRE and PDF, the word itself is reordered correctly, right?

---
Kenichi Handa
handa@m17n.org