all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Dmitry Antipov <dmantipov@yandex.ru>
Cc: 16457@debbugs.gnu.org
Subject: bug#16457: 24.3.50; crash rendering Arabic Uthmani script
Date: Thu, 16 Jan 2014 19:33:22 +0200	[thread overview]
Message-ID: <83a9ev3k7x.fsf@gnu.org> (raw)
In-Reply-To: <52D791C0.7000405@yandex.ru>

> Date: Thu, 16 Jan 2014 12:01:04 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: 16457@debbugs.gnu.org
> 
> I'm not familiar with composition sequences in detail

The compositions stuff is under-documented.  I provide some
information I know of below.

> For the uthmani-test.txt, the following code in set_iterator_to_next:
> 
>    7127                /* Composition created while scanning forward.  */
>    7128                /* Update IT's char/byte positions to point to the first
>    7129                   character of the next grapheme cluster, or to the
>    7130                   character visually after the current composition.  */
>    7131                for (i = 0; i < it->cmp_it.nchars; i++)
>    7132                  bidi_move_to_visually_next (&it->bidi_it);
>    7133                IT_BYTEPOS (*it) = it->bidi_it.bytepos;
>    7134                IT_CHARPOS (*it) = it->bidi_it.charpos;
> 
> advances IT from charpos:bytepos 11:21 to 13:25.  But the following fragment
> from scan_for_column:
> 
>     586        /* Check composition sequence.  */
>     587        if (cmp_it.id >= 0
>     588            || (scan == cmp_it.stop_pos
>     589                && composition_reseat_it (&cmp_it, scan, scan_byte, end,
>     590                                          w, NULL, Qnil)))
>     591          composition_update_it (&cmp_it, scan, scan_byte, Qnil);
>     592        if (cmp_it.id >= 0)
>     593          {
>     594            scan += cmp_it.nchars;
>     595            scan_byte += cmp_it.nbytes;
> 
> advances SCAN:SCAN_BYTE from 11:21 to 13:24.  So the byte position becomes invalid
> and FETCH_CHAR_ADVANCE decodes invalid byte sequence to invalid character C.
> Finally, CHAR_TABLE_REF (Vcomposition_function_table, C) goes out of bounds.

In effect, you are saying that cmp_it.nbytes above is incorrect.

This is really strange.  First, I cannot reproduce the crash on
MS-Windows, so the problem might be related to the shaping engine
being used (I presume yours is libotf and libm17n).  (I tried on both
Windows XP and on Windows 7, which have very different versions of
Uniscribe, and they both work fine.)

Moreover, set_iterator_to_next uses the same code from composite.c
that scan_for_column does, so it is unclear to me how the former
works, while the latter doesn't.

Specifically, cmp_it.nbytes is computed in composition_update_it as
the sum of byte-widths of all the characters being composed:

      cmp_it->width = 0;
      for (i = cmp_it->nchars - 1; i >= 0; i--)
	{
	  c = XINT (LGSTRING_CHAR (gstring, cmp_it->from + i));
	  cmp_it->nbytes += CHAR_BYTES (c);
	  cmp_it->width += CHAR_WIDTH (c);
	}

And the characters in the LGSTRING object are simply copied from the
buffer in fill_gstring_header, when LGSTRING is created:

  for (i = 0; i < len; i++)
    {
      int c;

      if (NILP (string))
	FETCH_CHAR_ADVANCE_NO_CHECK (c, from, from_byte);
      else
	FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, string, from, from_byte);
      ASET (header, i + 1, make_number (c));
    }

Could you please trace through these fragments and see what goes wrong
there?  Specifically, what characters (which Unicode codepoints) are
being composed, and what are the contents of the cmp_it structure in
scan_for_column when it advances from 11:21 to 13:24.  (Granted, here
I see it advance from 11:21 to 13:25, as expected.)

Also, what does "C-u C-x =" report when you put the cursor in column
10?

Some more details:

The LGSTRING object is created when Emacs encounters for the first
time a group of characters that should be composed together.  The
structure of LGSTRING is describe in the comments to
composition-get-gstring.  Emacs recognizes the character compositions
in composition_reseat_it, which calls autocmp_chars, which calls
composition-get-gstring, which collects the characters to be composed
by calling fill_gstring_header, as shown in the fragment above.

The LGSTRING object is then cached, such that later references to it
use the cached data, instead of computing it from scratch.  The cmp_it
structure holds an ID of the LGSTRING which can be used to look it up
in the cached.  When composition_update_it is called, simply uses the
information already stored in LGSTRING to advance past the composed
characters.

So to understand why it crashes for you, we need to find out why the
nbytes value stored by fill_gstring_header somehow became incorrect.

Btw, does the problem go away if you disable cache-long-scans?





  parent reply	other threads:[~2014-01-16 17:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15 17:24 bug#16457: 24.3.50; crash rendering Arabic Uthmani script Dmitry Antipov
2014-01-15 17:41 ` Eli Zaretskii
2014-01-15 21:44   ` Glenn Morris
2014-01-16  8:01     ` Dmitry Antipov
2014-01-16 10:07       ` Dmitry Antipov
2014-01-16 17:33         ` Eli Zaretskii
2014-01-16 17:33       ` Eli Zaretskii [this message]
2014-01-17  7:34         ` Dmitry Antipov
2014-01-17  9:10           ` Eli Zaretskii
2014-01-17 11:16             ` Dmitry Antipov
2014-01-17 12:03               ` Eli Zaretskii
2014-01-17 13:51     ` K. Handa
2014-01-19 13:45 ` K. Handa
2014-01-19 16:00   ` Dmitry Antipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83a9ev3k7x.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=16457@debbugs.gnu.org \
    --cc=dmantipov@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.