From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Amit Aronovitch Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Re: Arabic support Date: Sat, 28 Aug 2010 13:15:44 +0300 Message-ID: References: <83bp8oml9c.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0492199081==" X-Trace: dough.gmane.org 1282991120 29647 80.91.229.12 (28 Aug 2010 10:25:20 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 28 Aug 2010 10:25:20 +0000 (UTC) Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org, Kenichi Handa To: Eli Zaretskii Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Sat Aug 28 12:25:18 2010 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OpIb7-00086d-Sr for gnu-emacs-bidi@m.gmane.org; Sat, 28 Aug 2010 12:25:18 +0200 Original-Received: from localhost ([127.0.0.1]:59431 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OpIb7-0002kg-3y for gnu-emacs-bidi@m.gmane.org; Sat, 28 Aug 2010 06:25:17 -0400 Original-Received: from [140.186.70.92] (port=57735 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OpIUK-0001Xe-Kf for emacs-bidi@gnu.org; Sat, 28 Aug 2010 06:18:31 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OpIS5-0005zt-Vf for emacs-bidi@gnu.org; Sat, 28 Aug 2010 06:15:59 -0400 Original-Received: from mail-iw0-f169.google.com ([209.85.214.169]:59212) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OpIS4-0005zT-1o; Sat, 28 Aug 2010 06:15:56 -0400 Original-Received: by iwn33 with SMTP id 33so4996561iwn.0 for ; Sat, 28 Aug 2010 03:15:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=UEQ1+4ySMNmL7C5Ysep5CgF6W7gJ3rxkNiMWdoGfzCU=; b=lDRepEHS1GX6omej9JnH1f9KmbOtzrvP+MkC+fhF/ClHotrShqScVpb0FkRgfysWCj /U3iG69CMHjt2A55bt6x3S9rUHpXrw1ej8ra6VeNjdZCFl56umqWSdUfySsSkKuNprOa wR0PfbbMt0PK6ESQKFM7BzUjwIi/iWlAc1R5E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=nQZUvq0gZ3EPEjjXliAIsbJFjND/lxK86+yRm4Ca3ti6ejhRqT3YxQ52rrYTkpFBig Y/6lTdND8clIg1x9NBI3LWeInkncm2DAMY9G1ZZGef/NnBgLw6CLzmqpcgArhf8yj+Iv ZZabNEahhAMUhCAYMHqfBR8+qcTZ/LQTqXwu8= Original-Received: by 10.231.130.99 with SMTP id r35mr2411839ibs.171.1282990544968; Sat, 28 Aug 2010 03:15:44 -0700 (PDT) Original-Received: by 10.231.168.70 with HTTP; Sat, 28 Aug 2010 03:15:44 -0700 (PDT) In-Reply-To: <83bp8oml9c.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:770 gmane.emacs.devel:129337 Archived-At: --===============0492199081== Content-Type: multipart/alternative; boundary=005045014302aee68b048edf83af --005045014302aee68b048edf83af Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Aug 27, 2010 at 12:56 PM, Eli Zaretskii wrote: > > From: Kenichi Handa > > Date: Thu, 26 Aug 2010 10:10:05 +0900 > > > > I've just committed changes to trunk for Arabic shaping. If > > there're any Arabic users in this list, please check the > > displaying of Arabic text. On GNU/Linux system, you must > > compile Emacs with libotf and m17n-lib (configure script > > should detect them automatically). > > Thanks. However, today's build behaves very strangely in a GUI > session on MS-Windows. For starters, cursor motion seems to jump > across many characters in the "Arabic" line of etc/HELLO. For > example, typing C-f in that line, I first move one character at a time > across "Arabic", as expected, then the cursor jumps to the right paren > of the leftmost parenthesized part, again as expected, and then I see > the following strange behavior: > > . C-f moves one character to the left, to buffer position 758, as > expected. > > . the next C-f jumps across many characters on the screen and lands > on position 764. > > . another C-f jumps to what is reported as position 765, but on the > screen those are several characters, maybe 5 or 6. > > . another C-f moves to the left paren at position 766, as expected. > > . yet another C-f moves to position 767, but on the screen the > cursor jumps back into one of the characters it jumped across when > it landed on position 765 two C-f keypresses earlier. > > . if I type C-b 4 times from this point, I enter a "trap", whereby > typing C-b jumps between two characters, whose buffer positions > are 764 and 765. The only way to get out of the trap is with C-a > or C-e or C-f. > > I don't read Arabic, so I cannot really say whether any of this is > expected behavior. (The "trap" with C-b is certainly not the expected > behavior.) Do you see anything similar on X? > > 1) I confirm that Arabic shaping seems to work fine on my build (27/8/10 rev. 101200, on Linux+X (Debian unstable)). 2) Logical movement with C-f/C-b in the hello file seems fine (I do not see the trap described above). 3) My Arabic is very basic, and I am not familiar with Arabic computing (keyboards etc.) - I noticed the following points, but I am not sure what i= s the expected behavior (I can only compare to other programs - gedit in this case): a) Column numbers (column-number-mode) behave strangely (I suspect that m17n-lib's invisible markup consume column numbers). For example as you mov= e using C-f in the word "=D9=87=D8=B0=D8=A7" column numbers go through "0,1,4= ,5" (i.e. the second character takes up 3 columns). If I change that to "=D8=A8=D9=87=D8= =B0=D8=A7", the column positions are "0,1,4,6,7" (the second and third chars take up 3 and 2 columns resp.?). In gedit column positions are 1 character per column and do not depend on the shaping. b) Arabic keyboard has the ligature "Lam-Alef" (U+FEFB) on the key marked "B" in qwerty keyboards. When I type this in emacs, I get Lam and Alef (which are auto-shaped correctly as the proper ligature). C-d when cursor i= s on the ligature erases the Alef and another C-d erases the Lam. This seems like proper behavior to me. However, in gedit, the "B" key produces a (U+FEFB) which is always displayed as a ligature, deleted in a single Del press, and never connected to previous character. Cut and pasting this into emacs, I get a similar behavior there. The question is: do Arabic users expect to be able to produce this "stiff" ligature? Is the behavior of gedit a bug? Should the emacs "Lam-Alef" key behave as it does (i.e. produce two characters)? thanks, Amit Aronovitch --005045014302aee68b048edf83af Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On Fri, Aug 27, 2010 at 12:= 56 PM, Eli Zaretskii <= eliz@gnu.org> wrote:
> From: Kenichi Handa <handa@m17n.o= rg>
> Date: Thu, 26 Aug 2010 10:10:05 +0900
>
> I've just committed changes to trunk for Arabic shaping. =C2=A0If<= br> > there're any Arabic users in this list, please check the
> displaying of Arabic text. =C2=A0On GNU/Linux system, you must
> compile Emacs with libotf and m17n-lib (configure script
> should detect them automatically).

Thanks. =C2=A0However, today's build behaves very strangely in a GUI session on MS-Windows. =C2=A0For starters, cursor motion seems to jump
across many characters in the "Arabic" line of etc/HELLO. =C2=A0F= or
example, typing C-f in that line, I first move one character at a time
across "Arabic", as expected, then the cursor jumps to the right = paren
of the leftmost parenthesized part, again as expected, and then I see
the following strange behavior:

=C2=A0. C-f moves one character to the left, to buffer position 758, as =C2=A0 =C2=A0expected.

=C2=A0. the next C-f jumps across many characters on the screen and lands<= br> =C2=A0 =C2=A0on position 764.

=C2=A0. another C-f jumps to what is reported as position 765, but on the<= br> =C2=A0 =C2=A0screen those are several characters, maybe 5 or 6.

=C2=A0. another C-f moves to the left paren at position 766, as expected.<= br>
=C2=A0. yet another C-f moves to position 767, but on the screen the
=C2=A0 =C2=A0cursor jumps back into one of the characters it jumped across= when
=C2=A0 =C2=A0it landed on position 765 two C-f keypresses earlier.

=C2=A0. if I type C-b 4 times from this point, I enter a "trap",= whereby
=C2=A0 =C2=A0typing C-b jumps between two characters, whose buffer positio= ns
=C2=A0 =C2=A0are 764 and 765. =C2=A0The only way to get out of the trap is= with C-a
=C2=A0 =C2=A0or C-e or C-f.

I don't read Arabic, so I cannot really say whether any of this is
expected behavior. =C2=A0(The "trap" with C-b is certainly not th= e expected
behavior.) =C2=A0Do you see anything similar on X?


1) I confirm that Arabic shaping seems= to work fine on my build (27/8/10 rev. 101200, on Linux+X (Debian unstable= )).

2) Logical movement with C-f/C-b in the hello = file seems fine (I do not see the trap described above).

3) My Arabic is very basic, and I am not familiar with = Arabic computing (keyboards etc.) - I noticed the following points, but I a= m not sure what is the expected behavior (I can only compare to other progr= ams - gedit in this case):

=C2=A0=C2=A0a) Column numbers (column-number-mode) beha= ve strangely (I suspect that m17n-lib's invisible markup consume column= numbers). For example as you move using C-f in the word "=D9=87=D8=B0= =D8=A7" column numbers go through "0,1,4,5" (i.e. the second= character takes up 3 columns). If I change that to "=D8=A8=D9=87=D8= =B0=D8=A7", the column positions are "0,1,4,6,7" (the second= and third chars take up 3 and 2 columns resp.?).
=C2=A0=C2=A0In gedit column positions are 1 character per column and d= o not depend on the shaping.

=C2=A0=C2=A0b) Arabic= keyboard has the ligature "Lam-Alef" (U+FEFB) on the key marked = "B" in qwerty keyboards. When I type this in emacs, I get Lam and= Alef (which are auto-shaped correctly as the proper ligature). C-d when cu= rsor is on the ligature erases the Alef and another C-d erases the Lam. Thi= s seems like proper behavior to me. However, in gedit, the "B" ke= y produces a (U+FEFB) which is always displayed as a ligature, deleted in a= single Del press, and never connected to previous character. Cut and pasti= ng this into emacs, I get a similar behavior there.
The question is: do Arabic users expect to be able to produce this &qu= ot;stiff" ligature? Is the behavior of gedit a bug? Should the emacs &= quot;Lam-Alef" key behave as it does (i.e. produce two characters)?

=C2=A0=C2=A0 thanks,
=C2=A0=C2=A0 =C2=A0 =C2= =A0Amit Aronovitch

--005045014302aee68b048edf83af-- --===============0492199081== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ emacs-bidi mailing list emacs-bidi@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-bidi --===============0492199081==--