From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Date: Fri, 23 Mar 2012 17:58:25 +0200 Message-ID: <83fwczux5q.fsf@gnu.org> References: <83sjgzvb6w.fsf@gnu.org> <83mx77v6jz.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: dough.gmane.org 1332518368 10064 80.91.229.3 (23 Mar 2012 15:59:28 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 23 Mar 2012 15:59:28 +0000 (UTC) Cc: 11073@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Mar 23 16:59:26 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SB6th-0005XV-0f for geb-bug-gnu-emacs@m.gmane.org; Fri, 23 Mar 2012 16:59:25 +0100 Original-Received: from localhost ([::1]:42668 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SB6tg-0006IZ-9H for geb-bug-gnu-emacs@m.gmane.org; Fri, 23 Mar 2012 11:59:24 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:48335) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SB6ta-0006I1-0L for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 11:59:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SB6tT-0001bb-El for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 11:59:17 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:56982) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SB6tT-0001bC-Av for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 11:59:11 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1SB7NK-0004mN-SQ for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 12:30:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 23 Mar 2012 16:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11073 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 11073-submit@debbugs.gnu.org id=B11073.133252019418332 (code B ref 11073); Fri, 23 Mar 2012 16:30:02 +0000 Original-Received: (at 11073) by debbugs.gnu.org; 23 Mar 2012 16:29:54 +0000 Original-Received: from localhost ([127.0.0.1]:35581 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB7N8-0004lY-7q for submit@debbugs.gnu.org; Fri, 23 Mar 2012 12:29:54 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:41333) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB7MZ-0004kt-CO for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 12:29:48 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0M1C00800HEV1400@a-mtaout22.012.net.il> for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 17:58:19 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([84.229.241.151]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M1C007XMHP6HPF0@a-mtaout22.012.net.il>; Fri, 23 Mar 2012 17:58:19 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:58017 Archived-At: > From: Stefan Monnier > Cc: 11073@debbugs.gnu.org > Date: Fri, 23 Mar 2012 10:27:39 -0400 > > > (Repeat after me: FETCH_MULTIBYTE_CHAR followed by CHAR_BYTES is not > > always equivalent to STRING_CHAR_AND_LENGTH.) > > Do we really absolutely have to have such a trap? > I mean: is there a good reason why they're not always equivalent? They are not equivalent when conversion of the multibyte form into a character unifies a CJK character that is represented by a codepoint from one of the private use areas. This unification is done in char_string, via a call to MAYBE_UNIFY_CHAR, which converts the private codepoint into the equivalent codepoint in one of the "normal" planes. The UTF-8 encoding of the unified character can be shorter or longer than the original multibyte sequence. The problem with the code I had in bidi.c, viz.: character = FETCH_MULTIBYTE_CHAR (bytepos); char_len = CHAR_BYTES (character); is that the value in `character' is not guaranteed to correspond to the multibyte sequence consumed by FETCH_MULTIBYTE_CHAR, and therefore that character's length as returned by CHAR_BYTES is not the right instrument to advance to the next character. So, I'd say that FETCH_MULTIBYTE_CHAR should only be used for fetching a single character; if one wants to advance, one should either use FETCH_CHAR_ADVANCE or (if they are paranoiac about speed, like I am) use character = STRING_CHAR_AND_LENGTH (BYTE_POS_ADDR (bytepos), length); which returns the length of the consumed sequence, and use that to advance to the next character position. And note the other gotcha: that the length returned by STRING_CHAR_AND_LENGTH is not necessarily the length of the UTF-8 encoding of the character it returns, but rather the length of the multibyte sequence which was converted to the character.