From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: severe problems with composite characters Date: Wed, 17 Sep 2003 15:49:00 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200309170649.PAA10318@etlken.m17n.org> References: <20030917.074537.51710930.wl@gnu.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1063781433 11656 80.91.224.253 (17 Sep 2003 06:50:33 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 17 Sep 2003 06:50:33 +0000 (UTC) Cc: kazu@iijlab.net, d.love@dl.ac.uk, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Sep 17 08:50:30 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19zW90-0002v5-00 for ; Wed, 17 Sep 2003 08:50:30 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 19zWCQ-0007Nh-00 for ; Wed, 17 Sep 2003 08:54:03 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.22) id 19zW7p-0007E4-To for emacs-devel@quimby.gnus.org; Wed, 17 Sep 2003 02:49:17 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.22) id 19zW7j-0007D0-Rp for emacs-devel@gnu.org; Wed, 17 Sep 2003 02:49:11 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.22) id 19zW7f-0007Bi-IC for emacs-devel@gnu.org; Wed, 17 Sep 2003 02:49:10 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.22) id 19zW7e-0007Ad-OK; Wed, 17 Sep 2003 02:49:06 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id h8H6n1u02826; Wed, 17 Sep 2003 15:49:01 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id h8H6n0927470; Wed, 17 Sep 2003 15:49:00 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id PAA10318; Wed, 17 Sep 2003 15:49:00 +0900 (JST) Original-To: wl@gnu.org In-reply-to: <20030917.074537.51710930.wl@gnu.org> (message from Werner LEMBERG on Wed, 17 Sep 2003 07:45:37 +0200 (CEST)) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:16436 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:16436 In article <20030917.074537.51710930.wl@gnu.org>, Werner LEMBERG writes: > ====================================================================== > string-width() returns a wrong number if its argument string > has composite characters. > Consider two bytes strings 0xcd 0xeb, whose width is one since they > are composed. > On Emacs 20.7 string-width() returns 1. > On Emacs 21.3.50 string-width() returns 2. ??? I've just confirmed this result with 21.3.50. (string-width (decode-coding-string "\xcd\xeb" 'thai-tis620)) => 1 Please note that Emacs 21 doesn't have a composite character anymore. For instance, compose-region doesn't change the characters in a region to a single composite character, instead it just puts text property `composition'. The display routine checks this text property and display the sequence correctly. I suspect that you evaluated something like this: (string-width "__some_composed_text__") in *scratch* buffer. As the Lisp reader ignores any text properties on reading a string expression in *scratch* buffer, the string given to string-width doesn't have `composition' property. > ====================================================================== > Suppose that composite characters are stored to a file with a > multi-lingual coding-system. An example is TIS-620 characters with > UTF-8 (or ctext). > When Emacs reads the file, the composite characters are not composed > since there is no post-conv function associated to the multi-lingual > coding-system. > Is this a bug? As such a post conv function is rather heavy, it is by default turned off. When you customize the variable utf-8-compose-scripts to t, Thai characters should be composed on decoding. But, I've just found a bug in this facility, and installed a fix. Please update your working directory, and try again. Don't forget to do "make autoloads" in "lisp" subdirectory. --- Ken'ichi HANDA handa@m17n.org