From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.bugs Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Date: Tue, 03 Apr 2012 00:22:32 -0400 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1333426976 16778 80.91.229.3 (3 Apr 2012 04:22:56 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 3 Apr 2012 04:22:56 +0000 (UTC) Cc: 11073@debbugs.gnu.org To: Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Apr 03 06:22:55 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SEvGb-0008Im-UX for geb-bug-gnu-emacs@m.gmane.org; Tue, 03 Apr 2012 06:22:50 +0200 Original-Received: from localhost ([::1]:52355 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEvGb-0003zw-08 for geb-bug-gnu-emacs@m.gmane.org; Tue, 03 Apr 2012 00:22:49 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:53009) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEvGX-0003zg-A7 for bug-gnu-emacs@gnu.org; Tue, 03 Apr 2012 00:22:46 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SEvGV-000094-F4 for bug-gnu-emacs@gnu.org; Tue, 03 Apr 2012 00:22:44 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:40763) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEvGV-00008s-BX for bug-gnu-emacs@gnu.org; Tue, 03 Apr 2012 00:22:43 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1SEvGo-0004Oz-4M for bug-gnu-emacs@gnu.org; Tue, 03 Apr 2012 00:23:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 03 Apr 2012 04:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11073 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 11073-submit@debbugs.gnu.org id=B11073.133342697516906 (code B ref 11073); Tue, 03 Apr 2012 04:23:02 +0000 Original-Received: (at 11073) by debbugs.gnu.org; 3 Apr 2012 04:22:55 +0000 Original-Received: from localhost ([127.0.0.1]:37301 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEvGh-0004Od-Ez for submit@debbugs.gnu.org; Tue, 03 Apr 2012 00:22:55 -0400 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.183]:41171) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEvGf-0004OV-Cs for 11073@debbugs.gnu.org; Tue, 03 Apr 2012 00:22:54 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicFAKU/KE9soXt6/2dsb2JhbACBX5x7eYhwnhmGGQSbGYQJ X-IronPort-AV: E=Sophos;i="4.73,1,1325480400"; d="scan'208";a="171485558" Original-Received: from 108-161-123-122.dsl.teksavvy.com (HELO pastel.home) ([108.161.123.122]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 03 Apr 2012 00:22:32 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id 92EB559322; Tue, 3 Apr 2012 00:22:32 -0400 (EDT) In-Reply-To: (Kenichi Handa's message of "Tue, 03 Apr 2012 11:22:23 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:58478 Archived-At: >> > Usually, yes. But as far as there is a code space in high >> > area for a CJK charset, it is unavoidable to have a >> > buffer/string that contains a character represented by a >> > byte sequence in that high area as the test case of >> > Bug#11073. And, as "unification" means to treat such a >> > character the same way as the unified character, I thought >> > they both have the same character code. >> Since there are two internal byte-sequence representation, I don't see >> any good reason why we shouldn't have 2 internal int representations. >> I.e. if unification failed for the byte-sequence (which might be the >> result of a bug, for all I know), we may as well keep them non-unified >> in the int representation. > Please note that not all characters in the code-space of a > CJK charset are unified. For instance, Big5 has it's own > PUA (private use area), and characters in PUA are not > unified by default. So, if Emacs reads a Big5 file that > contains PUA chars, those chars stay in high-area. Then, > one can provide his own unification map that also maps PUA > chars to some Unicode chars as this: > (unify-charset 'big5 "MyBig5.map") > After this, I thought that previously read PUA chars staying > in the high-area should be treated as the corresponding > Unicode chars (in displaying, search, etc). But again, this unification takes place during decoding. Whereas what I'm talking about takes place when reading the internal utf-8 representation, which should be already unified. Stefan