From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Arabic support Date: Thu, 02 Sep 2010 22:01:07 +0900 Message-ID: References: <83bp8oml9c.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1283432488 7526 80.91.229.12 (2 Sep 2010 13:01:28 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 2 Sep 2010 13:01:28 +0000 (UTC) Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org, jasonr@gnu.org To: Eli Zaretskii Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Thu Sep 02 15:01:25 2010 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Or9Pw-0005U9-W1 for gnu-emacs-bidi@m.gmane.org; Thu, 02 Sep 2010 15:01:25 +0200 Original-Received: from localhost ([127.0.0.1]:40596 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Or9Pw-0003MJ-DN for gnu-emacs-bidi@m.gmane.org; Thu, 02 Sep 2010 09:01:24 -0400 Original-Received: from [140.186.70.92] (port=40518 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Or9Pp-0003Kh-Hq for emacs-bidi@gnu.org; Thu, 02 Sep 2010 09:01:19 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Or9Po-0002NK-BW for emacs-bidi@gnu.org; Thu, 02 Sep 2010 09:01:17 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:55849) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Or9Pj-0002Lh-OM; Thu, 02 Sep 2010 09:01:12 -0400 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id o82D18iN024488; Thu, 2 Sep 2010 22:01:08 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id o82D18Vb023085; Thu, 2 Sep 2010 22:01:08 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id o82D1847025137; Thu, 2 Sep 2010 22:01:08 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.71) (envelope-from ) id 1Or9Pg-0001Wl-4x; Thu, 02 Sep 2010 22:01:08 +0900 In-Reply-To: (message from Eli Zaretskii on Thu, 02 Sep 2010 07:53:15 -0400) X-detected-operating-system: by eggs.gnu.org: Solaris 9 X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:799 gmane.emacs.devel:129588 Archived-At: In article , Eli Zaretskii writes: > Where can I find the code which decides how to break text into > LGSTRINGs? I'd like to see such code for both Arabic and Hebrew, > unless it's the same code. A not-yet-shaped LGSTRING is created by autocmp_chars (composite.c) from a character sequence matching with a regular expression PATTERN stored in a composition-function-table. This pattern is "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el), and a more complicated regex for Hebrew (lisp/language/hebrew.el). > For example, can characters like digits or other neutrals be included > in the same LGSTRING with Arabic and Hebrew? Or will an LGSTRING > always include characters from one script only? LGSTRING always includes characters of the same font. So, even if you wrote PATTERN to include the other neutrals, if a user's font setting (or environment) decides to user a different font for those neutrals, they are not included in LGSTRING. By default, Emacs tries to use the same font for characters in the same script. In addition, even if you setup fonts to use the same font for, for instance, Hebrew and those neutrals, "shape" method of a font-backend may not support them. In that case, the composition fails anyway. > I'm asking because it's possible that we will need to modify > w32uniscribe.c to reorder R2L characters before we pass them to the > Uniscribe ScriptShape API, to let it see the characters in the logical > order it expects them. That's if it turns out that Uniscribe cannot > otherwise shape them correctly. ??? Currently characters and glyphs in LGSTRING are always in logical order. A "shape" method should also shape that LGSTRING in logical order. --- Kenichi Handa handa@m17n.org