From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@ni.aist.go.jp>
Newsgroups: gmane.emacs.devel
Subject: Re: [w32] display international HELLO
Date: Fri, 09 Nov 2007 21:40:47 +0900
Message-ID: <E1IqTAF-0008GM-Rl@etlken.m17n.org>
References: <001501c822ab$ccfec5e0$d5101252@JRWXP1>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: ger.gmane.org 1194612074 8691 80.91.229.12 (9 Nov 2007 12:41:14 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 9 Nov 2007 12:41:14 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: "Richard Wordingham" <richard.wordingham@ntlworld.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Nov 09 13:41:15 2007
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1IqTAg-00033O-9w
	for ged-emacs-devel@m.gmane.org; Fri, 09 Nov 2007 13:41:14 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1IqTAU-0003DS-Gg
	for ged-emacs-devel@m.gmane.org; Fri, 09 Nov 2007 07:41:02 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1IqTAQ-0003Bo-25
	for emacs-devel@gnu.org; Fri, 09 Nov 2007 07:40:58 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1IqTAL-00036t-OW
	for emacs-devel@gnu.org; Fri, 09 Nov 2007 07:40:57 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1IqTAL-00036e-JX
	for emacs-devel@gnu.org; Fri, 09 Nov 2007 07:40:53 -0500
Original-Received: from mx1.aist.go.jp ([150.29.246.133])
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <handa@m17n.org>) id 1IqTAL-000801-9f
	for emacs-devel@gnu.org; Fri, 09 Nov 2007 07:40:53 -0500
Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115])
	by mx1.aist.go.jp  with ESMTP id lA9CenSY022151;
	Fri, 9 Nov 2007 21:40:49 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp3.aist.go.jp
	by rqsmtp1.aist.go.jp  with ESMTP id lA9Cemm8022355;
	Fri, 9 Nov 2007 21:40:48 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp3.aist.go.jp  with ESMTP id lA9CelaA011945;
	Fri, 9 Nov 2007 21:40:47 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken.m17n.org with local (Exim 4.67)
	(envelope-from <handa@m17n.org>)
	id 1IqTAF-0008GM-Rl; Fri, 09 Nov 2007 21:40:47 +0900
In-reply-to: <001501c822ab$ccfec5e0$d5101252@JRWXP1>
	(richard.wordingham@ntlworld.com)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
X-detected-kernel: by monty-python.gnu.org: Solaris 8 (1)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:82866
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/82866>

In article <001501c822ab$ccfec5e0$d5101252@JRWXP1>, "Richard Wordingham" <richard.wordingham@ntlworld.com> writes:

> Hindi and Malayalam are a tougher problem.  Although the basic text is 
> encoded in mule-unicode-0100-24ff, 'composition' properties are actually 
> specified in the file.  The composition property should provide renderable 
> text and mark-up which replace the basic text in the display, which ideally 
> should be totally unnecessary in an MS Windows system.  (Realising this 
> ideal requires the ability to upgrade the Uniscribe library to cover extra 
> scripts and even newly admitted characters in supported scripts.)  These 
> compositions are defined by elements for the charset indian-glyph, and its 
> characters have no specified Unicode equivalent.  You need a non-Unicode 
> font to display these characters.  Arial Unicode MS does not contain much in 
> the way of shaping tables, so it will not work properly for any of the truly 
> 'complex' Indic scripts.  (This may be why Microsoft seems to have abandoned 
> this font.)

In emacs-unicode-2 branch, I'm working on supporting Indic
(and any other scripts that require CTL (Complex Text
Layout)) by OpenType fonts (though the progress is slow).

> 2. My first discovery with Lao was that just selecting a font (Code2000) 
> that supported Lao was not enough.  It would not normally display Lao 
> characters (in the Lao charset), until I discovered that a trick such as

> (set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))

> suddenly made the Lao text displayable.  How does this work?  I have studied 
> the code of xdisp.c and its supporting functions, but I cannot find where 
> Emacs character codes are converted to Unicode.  I did notice that if I 
> pasted Lao in from an MS application, Emacs would accept them as Unicode 
> characters and they would be displayed properly if I selected an appropriate 
> font.

Emacs 22 still doens't unify characters in legacy charsets
and Unicode by default.  And, as Lao doesn't have official
national charset, Emacs invented one long ago (before
Unicode).  Lao characters in the HELLO file is using that
charset.  The above set-fontset-font tells Emacs to use
iso10646-1 font (i.e. a Unicode encoding font) for that Lao
charset.  Then, the CCL code ccl-encode-unicode-font
(defined in lisp/international/fontset.el) converts Lao
codes to the corresponding Unicode codes on displaying.  It
is done in x_encode_char of xterm.c (or in w32_encode_char
of w32term.c).

This ugly mechanism is not used in emacs-unicode-2.

> 3. Compositions of Lao characters, (i.e. with the 'composition' string 
> property) using the Code2000 font (the only fully working Lao font I have), 
> do not display properly, whether they are in the Lao or 
> mule-unicode-0100-24ff charset.  With the latter I have seen left-hand parts 
> of Hangul syllables displayed instead of Lao!  Perhaps when I understand how 
> uncomposed display does work, I will be able to understand this problem. At 
> present I need to defeat the composition logic by typing consonant + vowel 
> as <consonant, space, delete, vowel>!  The text entered thus then displays 
> properly, mocking the hard work that has gone into carefully composing 
> grapheme clusters.

I think it's a waste of time to learn composition mechanism
of Emacs 22.  It will be a lot improved in emacs-unicode-2.

> 4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one 
> of its variants), the Lao input method suddenly switches from generating Lao 
> characters in the Lao charset to generating Lao characters in the 
> mule-unicode-0100-24ff charset.  How is this effect
> achieved?

When a buffer is created, Emacs 22 setup the char table
translation-table-for-input suitable for the buffer's
file-coding-system.  That table converts Lao charset to
mule-unicode-0100-24ff charset within an input method.

>  I can't work 
> it out.  Characters already stored in the Lao charset remain in the Lao 
> charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).

> Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not 
> change the charset used by the Lao charset.

Yes.  Just saving doesn't change buffer contents.  Re-read
the file.

Anyway, emacs-unicode-2 doesn't have those bizarre problems.

---
Kenichi Handa
handa@ni.aist.go.jp