From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Arabic support Date: Fri, 03 Sep 2010 10:00:02 +0900 Message-ID: References: <83bp8oml9c.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1283475621 22336 80.91.229.12 (3 Sep 2010 01:00:21 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 3 Sep 2010 01:00:21 +0000 (UTC) Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org, jasonr@gnu.org To: Eli Zaretskii Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Fri Sep 03 03:00:20 2010 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OrKde-0006j1-VH for gnu-emacs-bidi@m.gmane.org; Fri, 03 Sep 2010 03:00:19 +0200 Original-Received: from localhost ([127.0.0.1]:39114 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OrKde-0000WE-5D for gnu-emacs-bidi@m.gmane.org; Thu, 02 Sep 2010 21:00:18 -0400 Original-Received: from [140.186.70.92] (port=48371 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OrKda-0000Tm-0b for emacs-bidi@gnu.org; Thu, 02 Sep 2010 21:00:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OrKdY-0005Op-W7 for emacs-bidi@gnu.org; Thu, 02 Sep 2010 21:00:13 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:39674) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OrKdU-0005NQ-Qj; Thu, 02 Sep 2010 21:00:09 -0400 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id o83103e7015742; Fri, 3 Sep 2010 10:00:04 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id o83102Lb015190; Fri, 3 Sep 2010 10:00:02 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id o83102UG016804; Fri, 3 Sep 2010 10:00:02 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.71) (envelope-from ) id 1OrKdO-0002Tx-Bc; Fri, 03 Sep 2010 10:00:02 +0900 In-Reply-To: (message from Eli Zaretskii on Thu, 02 Sep 2010 10:04:45 -0400) X-detected-operating-system: by eggs.gnu.org: Solaris 9 X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:807 gmane.emacs.devel:129620 Archived-At: In article , Eli Zaretskii writes: > > A not-yet-shaped LGSTRING is created by autocmp_chars > > (composite.c) from a character sequence matching with a > > regular expression PATTERN stored in a > > composition-function-table. This pattern is > > "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el), > > and a more complicated regex for Hebrew > > (lisp/language/hebrew.el). > Thanks. So character compositions are used not only to compose > several characters into one glyph, but also to break text into > individually shaped chunks, is that right? Yes. > If so, auto-composition-mode cannot be turned off for scripts that > need this kind of "grouped shaping" without degrading the presentation > of these scripts to the point of illegibility? Yes. And auto-composition-mode cannot be turned off for any scripts that it is not enough to display glyphs corresponding to characters; they are all Indics, some East Asians, Arabic, Hebrew, etc. In this respect, Ababic is not special. Even for some Indics, LGSTRING may contain multibyte grapheme clusters. > > > I'm asking because it's possible that we will need to modify > > > w32uniscribe.c to reorder R2L characters before we pass them to the > > > Uniscribe ScriptShape API, to let it see the characters in the logical > > > order it expects them. That's if it turns out that Uniscribe cannot > > > otherwise shape them correctly. > >=20 > > ??? Currently characters and glyphs in LGSTRING are always > > in logical order. > See my mail from yesterday, where I describe that I see in GDB that > Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual > order: > http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html In this mail, you wrote: > Also, it looks like uniscribe_shape is repeatedly called from > font-shape-gstring to shape the same text that is progressively > shortened. For example, the first call will be with a 7-character > string whose contents is > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645} and this character sequence is surely in logical order. So I don't know why you think uniscribe_shape is given a LGSTRING of visual order. > The next call is with a 6-character string whose contents is > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627} > then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc. > Note that the first 7-character string is the first word of the Arabic > greeting, properly bidi-reordered for display. > Are these series of calls expected? No. I don't know why that happens on Windows. On Ubuntu, when I visit a file that contains only these lines: ------------------------------------------------------------ Arabic =D8=A7=D9=84=D8=B3=D9=91=D9=84=D8=A7=D9=85 ;;; Local Variables: ;;; bidi-display-reordering: t ;;; End: ------------------------------------------------------------ font-shape-gstring is called just once. As the lgstring is getting shorter each time, it seems that composition fails each time. autocmp_chars is mainly called from composition_reseat_it. Could you please trace the code after the first call of autocmp_chars, and find why Emacs descides that a composition fails. --- Kenichi Handa handa@m17n.org