From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bugs Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Date: Tue, 03 Apr 2012 11:22:23 +0900 Message-ID: References: <83sjgzvb6w.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1333422667 24090 80.91.229.3 (3 Apr 2012 03:11:07 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 3 Apr 2012 03:11:07 +0000 (UTC) Cc: 11073@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Apr 03 05:11:06 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SEu9C-0006rK-8o for geb-bug-gnu-emacs@m.gmane.org; Tue, 03 Apr 2012 05:11:06 +0200 Original-Received: from localhost ([::1]:37335 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEtOT-0005xa-PU for geb-bug-gnu-emacs@m.gmane.org; Mon, 02 Apr 2012 22:22:49 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:42030) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEtOP-0005xT-SE for bug-gnu-emacs@gnu.org; Mon, 02 Apr 2012 22:22:47 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SEtOO-0005MU-2P for bug-gnu-emacs@gnu.org; Mon, 02 Apr 2012 22:22:45 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:40705) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEtON-0005MN-VA for bug-gnu-emacs@gnu.org; Mon, 02 Apr 2012 22:22:43 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1SEtOg-0001e5-HT for bug-gnu-emacs@gnu.org; Mon, 02 Apr 2012 22:23:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Kenichi Handa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 03 Apr 2012 02:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11073 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 11073-submit@debbugs.gnu.org id=B11073.13334197726307 (code B ref 11073); Tue, 03 Apr 2012 02:23:02 +0000 Original-Received: (at 11073) by debbugs.gnu.org; 3 Apr 2012 02:22:52 +0000 Original-Received: from localhost ([127.0.0.1]:37243 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEtOV-0001df-3y for submit@debbugs.gnu.org; Mon, 02 Apr 2012 22:22:51 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:65354) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEtOR-0001dW-Hp for 11073@debbugs.gnu.org; Mon, 02 Apr 2012 22:22:49 -0400 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id q332MPWK017340; Tue, 3 Apr 2012 11:22:25 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id q332MPvi014142; Tue, 3 Apr 2012 11:22:25 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp3.aist.go.jp with ESMTP id q332MNWj010620; Tue, 3 Apr 2012 11:22:23 +0900 (JST) env-from (handa@m17n.org) In-Reply-To: (message from Stefan Monnier on Thu, 29 Mar 2012 12:04:22 -0400) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:58476 Archived-At: In article , Stefan Monnier writes: > > Usually, yes. But as far as there is a code space in high > > area for a CJK charset, it is unavoidable to have a > > buffer/string that contains a character represented by a > > byte sequence in that high area as the test case of > > Bug#11073. And, as "unification" means to treat such a > > character the same way as the unified character, I thought > > they both have the same character code. > Since there are two internal byte-sequence representation, I don't see > any good reason why we shouldn't have 2 internal int representations. > I.e. if unification failed for the byte-sequence (which might be the > result of a bug, for all I know), we may as well keep them non-unified > in the int representation. Please note that not all characters in the code-space of a CJK charset are unified. For instance, Big5 has it's own PUA (private use area), and characters in PUA are not unified by default. So, if Emacs reads a Big5 file that contains PUA chars, those chars stay in high-area. Then, one can provide his own unification map that also maps PUA chars to some Unicode chars as this: (unify-charset 'big5 "MyBig5.map") After this, I thought that previously read PUA chars staying in the high-area should be treated as the corresponding Unicode chars (in displaying, search, etc). One may find some bug in his map or find another map is better. Then he can do this again: (unify-charset 'big5 "MyNewBig5.map") The current design was to enable such a scenario. Of course, there will be an opinion that such a functionality is too much for Emacs, and when one changes any unification map, he must re-read a file, process-output, mail etc. --- Kenichi Handa handa@m17n.org