From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Amit Aronovitch Newsgroups: gmane.emacs.devel,gmane.emacs.bidi Subject: Re: [emacs-bidi] Re: Arabic support Date: Mon, 30 Aug 2010 17:11:06 +0300 Message-ID: References: <83bp8oml9c.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=000e0cd244101bff49048f0b0999 X-Trace: dough.gmane.org 1283179090 17600 80.91.229.12 (30 Aug 2010 14:38:10 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 30 Aug 2010 14:38:10 +0000 (UTC) Cc: eliz@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Aug 30 16:38:05 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Oq5Ua-0007Ik-Li for ged-emacs-devel@m.gmane.org; Mon, 30 Aug 2010 16:38:05 +0200 Original-Received: from localhost ([127.0.0.1]:54610 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oq5UD-00055Z-VF for ged-emacs-devel@m.gmane.org; Mon, 30 Aug 2010 10:37:26 -0400 Original-Received: from [140.186.70.92] (port=51282 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oq5JV-0000jZ-4a for emacs-devel@gnu.org; Mon, 30 Aug 2010 10:26:25 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Oq54m-0004m7-H7 for emacs-devel@gnu.org; Mon, 30 Aug 2010 10:11:10 -0400 Original-Received: from mail-pz0-f41.google.com ([209.85.210.41]:35957) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Oq54m-0004lz-5J; Mon, 30 Aug 2010 10:11:08 -0400 Original-Received: by pzk33 with SMTP id 33so4214738pzk.0 for ; Mon, 30 Aug 2010 07:11:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=RAB2y9DvPcgDtPDsmVQZv+6dYeX353GzrT9p4luYi8o=; b=nS2tqnE4ZE1K7A9BylH+ZqqLTO87Z/Bfc66uAePolPzoXxIaGqAJx9/Q1Y8HgDXoe+ b0DUaSa+f3jcC7mg6aqIpOZ7Fb500Y66HaDOwi6yN9XGvv0lbSiPz8cp5Wh8G6/edqOS JoxGoUlEL6of8VbpsQf0iXv2QXo03hiWjNpI4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=kg2b3Ju6j25Q0El8CXoi92P0G0fABQQFVLPTHeVNnaRuGBwn0jmVWrKnzu53xqkTZk /Nhoez32ygBcck2dnq7ZBtGTU6ho0VpkTeun4cL3iajmHtpHDsG8QwZx29kgw7GatrEA IsRBoKb+Y1YnUMihgGnawyKMYkxO/yh8oUa00= Original-Received: by 10.142.233.12 with SMTP id f12mr4555708wfh.303.1283177467083; Mon, 30 Aug 2010 07:11:07 -0700 (PDT) Original-Received: by 10.231.168.70 with HTTP; Mon, 30 Aug 2010 07:11:06 -0700 (PDT) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:129421 gmane.emacs.bidi:779 Archived-At: --000e0cd244101bff49048f0b0999 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Aug 30, 2010 at 4:42 PM, Amit Aronovitch wrot= e: > > On Mon, Aug 30, 2010 at 5:07 AM, Kenichi Handa wrote: > >> In article >, >> Amit Aronovitch writes: >> >> > 1) I confirm that Arabic shaping seems to work fine on my build (27/8/= 10 >> > rev. 101200, on Linux+X (Debian unstable)). >> >> > 2) Logical movement with C-f/C-b in the hello file seems fine (I do no= t >> see >> > the trap described above). >> >> Thank yor for testing them. >> >> > 3) My Arabic is very basic, and I am not familiar with Arabic computin= g >> > (keyboards etc.) - I noticed the following points, but I am not sure >> what i=3D >> > s >> > the expected behavior (I can only compare to other programs - gedit in >> this >> > case): >> >> > a) Column numbers (column-number-mode) behave strangely (I suspect >> that >> > m17n-lib's invisible markup consume column numbers). For example as yo= u >> mov=3D >> > e >> > using C-f in the word "=3DD9=3D87=3DD8=3DB0=3DD8=3DA7" column numbers = go through >> "0,1,4=3D >> > ,5" (i.e. the >> > second character takes up 3 columns). If I change that to >> "=3DD8=3DA8=3DD9=3D87=3DD8=3D >> > =3DB0=3DD8=3DA7", the column >> > positions are "0,1,4,6,7" (the second and third chars take up 3 and 2 >> > columns resp.?). >> > In gedit column positions are 1 character per column and do not depe= nd >> on >> > the shaping. >> >> I've just committed a fix for this bug. It's not related to >> m17n-lib. >> >> > Thanks. Much better now :-) > > I also checked the diacritics (tashkil): It seems that they do not take u= p > column number in Emacs. > > In gedit, cursor movement is similar, but the vowels there do take up > column number (as for cursor movement, as in emacs: forwards/backwards sk= ips > them, while 'delete' handles them separately). I find this behavior more > consistent with the way both programs handle the lam-alef ligature (one > cursor-movement space, but two column numbers). > However, as I said, I do not know which behavior is the most natural for > Arabic users. > > Checking the *Hebrew* diacritics (nikkud), I noticed a problem: In some cases the diacritics are displayed in the wrong position (their "real" cursor position is correct, which makes the UI *very* confusing). e.g. if you type "=E2=80=AB =D7=A2=D6=B8=D7=9C=D6=B5=D7=99=D7=A0=D7=95=D6= =BC=E2=80=AC" , the Qamatz (first vowel) appears under the space instead of under the Ain (first letter). If you remove the space, the Qamatz does not appear at all. The Zeire (second vowel) appears under the Ain (first vowel) instead of the Lamed (second letter). However, the Shuruk sticks to the Vav (last letter) as it should (though the positioning is too close and to high IMHO). I do not know if this issue is specific to my build. My complete config.log is available here: http://dl.dropbox.com/u/6960989/dumps/config.log AA --000e0cd244101bff49048f0b0999 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Mon, Aug 30, 2010 at= 4:42 PM, Amit Aronovitch <aronovitch@gmail.com> wrote:

On Mon, Aug 30, 2010 at 5:07 AM, Kenichi Handa &l= t;handa@m17n.org>= ; wrote:
In article <AANLkTinFrEnuW=3DoPeBqg6=3DwYegbrR+Lani2WcmDYstVO@mai= l.gmail.com>, Amit Aronovitch <aronovitch@gmail.com> writes:

> 1) I confirm that Arabic shaping seems to work fine on my build (27/8/= 10
> rev. 101200, on Linux+X (Debian unstable)).

> 2) Logical movement with C-f/C-b in the hello file seems fine (I do no= t see
> the trap described above).

Thank yor for testing them.

> 3) My Arabic is very basic, and I am not familiar with Arabic computin= g
> (keyboards etc.) - I noticed the following points, but I am not = sure what i=3D
> s
> the expected behavior (I can only compare to other programs - ged= it in this
> case):

> =C2=A0 a) Column numbers (column-number-mode) behave strangely (I susp= ect that
> m17n-lib's invisible markup consume column numbers). For exa= mple as you mov=3D
> e
> using C-f in the word "=3DD9=3D87=3DD8=3DB0=3DD8=3DA7" colum= n numbers go through "0,1,4=3D
> ,5" (i.e. the
> second character takes up 3 columns). If I change that to "=3DD8= =3DA8=3DD9=3D87=3DD8=3D
> =3DB0=3DD8=3DA7", the column
> positions are "0,1,4,6,7" (the second and third chars t= ake up 3 and 2
> columns resp.?).
> =C2=A0 In gedit column positions are 1 character per column and do not= depend on
> the shaping.

I've just committed a fix for this bug. =C2=A0It's not relate= d to
m17n-lib.


Thanks. Much better now :-= )

I also checked the diacritics (tashkil): It seem= s that they do not take up column number in Emacs.

In gedit,=C2=A0cursor movement is similar, but the vowels there do tak= e up column number (as for cursor movement, as in emacs: forwards/backwards= skips them, while 'delete' handles them separately). I find this b= ehavior more consistent with the way both programs handle the lam-alef liga= ture (one cursor-movement space, but two column numbers).
However, as I said, I do not know which behavior is the most natural f= or Arabic users.


Checking the=C2=A0=C2=A0*Hebrew*=C2=A0diacritics (nikkud), I noticed= a problem:=C2=A0
In some cases the diacritics are displayed in the wrong position (thei= r "real" cursor position is correct, which makes the UI *very* co= nfusing). e.g. if you type "=E2=80=AB =D7=A2=D6=B8=D7=9C=D6=B5=D7=99= =D7=A0=D7=95=D6=BC=E2=80=AC" , the Qamatz (first vowel) appears under = the space instead of under the Ain (first letter). If you remove the space,= the Qamatz does not appear at all. The Zeire (second vowel) appears under = the Ain (first vowel) instead of the Lamed (second letter). However, the Sh= uruk sticks to the Vav (last letter) as it should (though the positioning i= s too close and to high IMHO).
I do not know if this issue is specific to my build.
My comp= lete config.log is available here:


=C2=A0=C2=A0AA

--000e0cd244101bff49048f0b0999--