From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#16457: 24.3.50; crash rendering Arabic Uthmani script Date: Thu, 16 Jan 2014 19:33:22 +0200 Message-ID: <83a9ev3k7x.fsf@gnu.org> References: <52D6C466.9080909@yandex.ru> <838uuh3zx7.fsf@gnu.org> <7obnzcor73.fsf@fencepost.gnu.org> <52D791C0.7000405@yandex.ru> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1389893650 10806 80.91.229.3 (16 Jan 2014 17:34:10 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 16 Jan 2014 17:34:10 +0000 (UTC) Cc: 16457@debbugs.gnu.org To: Dmitry Antipov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jan 16 18:34:16 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W3qpa-0005Em-Ie for geb-bug-gnu-emacs@m.gmane.org; Thu, 16 Jan 2014 18:34:14 +0100 Original-Received: from localhost ([::1]:34128 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3qpa-0004PS-5U for geb-bug-gnu-emacs@m.gmane.org; Thu, 16 Jan 2014 12:34:14 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55640) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3qpT-0004P3-EZ for bug-gnu-emacs@gnu.org; Thu, 16 Jan 2014 12:34:12 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3qpO-0005zZ-9o for bug-gnu-emacs@gnu.org; Thu, 16 Jan 2014 12:34:07 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:39890) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3qpO-0005zT-6F for bug-gnu-emacs@gnu.org; Thu, 16 Jan 2014 12:34:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1W3qpN-0007XM-L9 for bug-gnu-emacs@gnu.org; Thu, 16 Jan 2014 12:34:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 16 Jan 2014 17:34:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16457 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16457-submit@debbugs.gnu.org id=B16457.138989363328956 (code B ref 16457); Thu, 16 Jan 2014 17:34:01 +0000 Original-Received: (at 16457) by debbugs.gnu.org; 16 Jan 2014 17:33:53 +0000 Original-Received: from localhost ([127.0.0.1]:53909 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3qpE-0007Ww-Oh for submit@debbugs.gnu.org; Thu, 16 Jan 2014 12:33:53 -0500 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:58964) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3qpB-0007Wg-5h for 16457@debbugs.gnu.org; Thu, 16 Jan 2014 12:33:50 -0500 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0MZI00M008R7FN00@a-mtaout23.012.net.il> for 16457@debbugs.gnu.org; Thu, 16 Jan 2014 19:33:24 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MZI00MRQ8RNEF20@a-mtaout23.012.net.il>; Thu, 16 Jan 2014 19:33:24 +0200 (IST) In-reply-to: <52D791C0.7000405@yandex.ru> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:83603 Archived-At: > Date: Thu, 16 Jan 2014 12:01:04 +0400 > From: Dmitry Antipov > CC: 16457@debbugs.gnu.org > > I'm not familiar with composition sequences in detail The compositions stuff is under-documented. I provide some information I know of below. > For the uthmani-test.txt, the following code in set_iterator_to_next: > > 7127 /* Composition created while scanning forward. */ > 7128 /* Update IT's char/byte positions to point to the first > 7129 character of the next grapheme cluster, or to the > 7130 character visually after the current composition. */ > 7131 for (i = 0; i < it->cmp_it.nchars; i++) > 7132 bidi_move_to_visually_next (&it->bidi_it); > 7133 IT_BYTEPOS (*it) = it->bidi_it.bytepos; > 7134 IT_CHARPOS (*it) = it->bidi_it.charpos; > > advances IT from charpos:bytepos 11:21 to 13:25. But the following fragment > from scan_for_column: > > 586 /* Check composition sequence. */ > 587 if (cmp_it.id >= 0 > 588 || (scan == cmp_it.stop_pos > 589 && composition_reseat_it (&cmp_it, scan, scan_byte, end, > 590 w, NULL, Qnil))) > 591 composition_update_it (&cmp_it, scan, scan_byte, Qnil); > 592 if (cmp_it.id >= 0) > 593 { > 594 scan += cmp_it.nchars; > 595 scan_byte += cmp_it.nbytes; > > advances SCAN:SCAN_BYTE from 11:21 to 13:24. So the byte position becomes invalid > and FETCH_CHAR_ADVANCE decodes invalid byte sequence to invalid character C. > Finally, CHAR_TABLE_REF (Vcomposition_function_table, C) goes out of bounds. In effect, you are saying that cmp_it.nbytes above is incorrect. This is really strange. First, I cannot reproduce the crash on MS-Windows, so the problem might be related to the shaping engine being used (I presume yours is libotf and libm17n). (I tried on both Windows XP and on Windows 7, which have very different versions of Uniscribe, and they both work fine.) Moreover, set_iterator_to_next uses the same code from composite.c that scan_for_column does, so it is unclear to me how the former works, while the latter doesn't. Specifically, cmp_it.nbytes is computed in composition_update_it as the sum of byte-widths of all the characters being composed: cmp_it->width = 0; for (i = cmp_it->nchars - 1; i >= 0; i--) { c = XINT (LGSTRING_CHAR (gstring, cmp_it->from + i)); cmp_it->nbytes += CHAR_BYTES (c); cmp_it->width += CHAR_WIDTH (c); } And the characters in the LGSTRING object are simply copied from the buffer in fill_gstring_header, when LGSTRING is created: for (i = 0; i < len; i++) { int c; if (NILP (string)) FETCH_CHAR_ADVANCE_NO_CHECK (c, from, from_byte); else FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, string, from, from_byte); ASET (header, i + 1, make_number (c)); } Could you please trace through these fragments and see what goes wrong there? Specifically, what characters (which Unicode codepoints) are being composed, and what are the contents of the cmp_it structure in scan_for_column when it advances from 11:21 to 13:24. (Granted, here I see it advance from 11:21 to 13:25, as expected.) Also, what does "C-u C-x =" report when you put the cursor in column 10? Some more details: The LGSTRING object is created when Emacs encounters for the first time a group of characters that should be composed together. The structure of LGSTRING is describe in the comments to composition-get-gstring. Emacs recognizes the character compositions in composition_reseat_it, which calls autocmp_chars, which calls composition-get-gstring, which collects the characters to be composed by calling fill_gstring_header, as shown in the fragment above. The LGSTRING object is then cached, such that later references to it use the cached data, instead of computing it from scratch. The cmp_it structure holds an ID of the LGSTRING which can be used to look it up in the cached. When composition_update_it is called, simply uses the information already stored in LGSTRING to advance past the composed characters. So to understand why it crashes for you, we need to find out why the nbytes value stored by fill_gstring_header somehow became incorrect. Btw, does the problem go away if you disable cache-long-scans?