unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org, jasonr@gnu.org
Subject: Re: Arabic support
Date: Fri, 03 Sep 2010 10:00:02 +0900	[thread overview]
Message-ID: <tl7r5hbzlnx.fsf@m17n.org> (raw)
In-Reply-To: <E1OrAPF-0000Gn-K7@fencepost.gnu.org> (message from Eli Zaretskii on Thu, 02 Sep 2010 10:04:45 -0400)

In article <E1OrAPF-0000Gn-K7@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > A not-yet-shaped LGSTRING is created by autocmp_chars
> > (composite.c) from a character sequence matching with a
> > regular expression PATTERN stored in a
> > composition-function-table.  This pattern is
> > "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el),
> > and a more complicated regex for Hebrew
> > (lisp/language/hebrew.el).

> Thanks.  So character compositions are used not only to compose
> several characters into one glyph, but also to break text into
> individually shaped chunks, is that right?

Yes.

> If so, auto-composition-mode cannot be turned off for scripts that
> need this kind of "grouped shaping" without degrading the presentation
> of these scripts to the point of illegibility?

Yes.  And auto-composition-mode cannot be turned off for any
scripts that it is not enough to display glyphs
corresponding to characters; they are all Indics, some East
Asians, Arabic, Hebrew, etc.  In this respect, Ababic is not
special.  Even for some Indics, LGSTRING may contain
multibyte grapheme clusters.

> > > I'm asking because it's possible that we will need to modify
> > > w32uniscribe.c to reorder R2L characters before we pass them to the
> > > Uniscribe ScriptShape API, to let it see the characters in the logical
> > > order it expects them.  That's if it turns out that Uniscribe cannot
> > > otherwise shape them correctly.
> > 
> > ??? Currently characters and glyphs in LGSTRING are always
> > in logical order.

> See my mail from yesterday, where I describe that I see in GDB that
> Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual
> order:

>   http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html

In this mail, you wrote:

> Also, it looks like uniscribe_shape is repeatedly called from
> font-shape-gstring to shape the same text that is progressively
> shortened.  For example, the first call will be with a 7-character
> string whose contents is

>    {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645}

and this character sequence is surely in logical order.  So
I don't know why you think uniscribe_shape is given a
LGSTRING of visual order.

> The next call is with a 6-character string whose contents is

>    {0x627, 0x644, 0x633, 0x651, 0x644, 0x627}

> then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc.

> Note that the first 7-character string is the first word of the Arabic
> greeting, properly bidi-reordered for display.

> Are these series of calls expected?

No.  I don't know why that happens on Windows.  On Ubuntu,
when I visit a file that contains only these lines:
------------------------------------------------------------
Arabic السّلام
;;; Local Variables:
;;; bidi-display-reordering: t
;;; End:
------------------------------------------------------------
font-shape-gstring is called just once.

As the lgstring is getting shorter each time, it seems that
composition fails each time.

autocmp_chars is mainly called from composition_reseat_it.
Could you please trace the code after the first call of
autocmp_chars, and find why Emacs descides that a
composition fails.

---
Kenichi Handa
handa@m17n.org

  reply	other threads:[~2010-09-03  1:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-26  1:10 Arabic support Kenichi Handa
2010-08-27  9:56 ` Eli Zaretskii
2010-08-28 10:15   ` Amit Aronovitch
2010-08-29  5:07     ` James Cloos
2010-08-29  5:13     ` James Cloos
2010-08-30  2:07     ` Kenichi Handa
2010-08-30 13:42       ` Amit Aronovitch
2010-08-30 14:11         ` [emacs-bidi] " Amit Aronovitch
2010-09-03  7:35           ` Kenichi Handa
2010-09-03  7:54             ` [emacs-bidi] " Amit Aronovitch
2010-09-01  2:55         ` Kenichi Handa
2010-09-01  4:58           ` Eli Zaretskii
2010-09-01  5:06             ` Kenichi Handa
2010-09-01  7:12           ` [emacs-bidi] " Stefan Monnier
2010-09-03  7:17             ` Kenichi Handa
2010-08-30  7:47   ` Kenichi Handa
2010-08-30 14:06     ` Eli Zaretskii
2010-09-01  2:17       ` Kenichi Handa
2010-09-01  3:47         ` "Martin J. Dürst"
2010-09-02  7:45           ` 大嶋 俊祐
2010-09-02  9:31             ` Eli Zaretskii
2010-09-02 12:58               ` "Martin J. Dürst"
2010-09-02 14:13                 ` [emacs-bidi] " Eli Zaretskii
2010-09-01  6:11         ` Eli Zaretskii
2010-09-01  7:08           ` Kenichi Handa
2010-09-01 17:55             ` Eli Zaretskii
2010-09-02  2:13               ` Jason Rumney
2010-09-02 11:53             ` Eli Zaretskii
2010-09-02 12:00               ` Eli Zaretskii
2010-09-02 13:09                 ` [emacs-bidi] " Jason Rumney
2010-09-02 14:29                   ` Eli Zaretskii
2010-09-02 14:37                     ` [emacs-bidi] " Jason Rumney
2010-09-02 13:01               ` Kenichi Handa
2010-09-02 14:04                 ` Eli Zaretskii
2010-09-03  1:00                   ` Kenichi Handa [this message]
2010-09-03  9:16                     ` Eli Zaretskii
2010-09-03 10:18                       ` David Kastrup
2010-09-03 11:08                       ` Kenichi Handa
2010-09-03 14:54                         ` Eli Zaretskii
2010-09-03 13:25                     ` Eli Zaretskii
2010-09-03 14:32                       ` Amit Aronovitch
2010-09-03 14:43                         ` Eli Zaretskii
2010-09-04  7:13                       ` Eli Zaretskii
2010-09-06  6:04                         ` Kenichi Handa
2010-09-04 15:29                     ` Eli Zaretskii
2010-09-02 13:48       ` Jason Rumney
2010-09-02 14:49         ` Eli Zaretskii
2010-09-06 13:45   ` Thamer Mahmoud
2010-09-07  4:22     ` TAKAHASHI Naoto
  -- strict thread matches above, loose matches on Subject: below --
2010-09-07 15:08 mhibti
2010-09-13  6:40 ` Eli Zaretskii
2010-09-16  2:07   ` Kenichi Handa
2010-09-22  3:54     ` Kenichi Handa
2010-09-22 12:27       ` Thamer Mahmoud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl7r5hbzlnx.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=eliz@gnu.org \
    --cc=emacs-bidi@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=jasonr@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).