From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Probably dumb question: glyph rendering on unicode-2 branch Date: Tue, 25 Oct 2005 10:33:01 +0900 Message-ID: References: <09B15CC4-37F2-4B0F-8487-2037B482D1CC@cogsci.ucsd.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1130204091 913 80.91.229.2 (25 Oct 2005 01:34:51 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 25 Oct 2005 01:34:51 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 25 03:34:50 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EUDi1-0003FU-FM for ged-emacs-devel@m.gmane.org; Tue, 25 Oct 2005 03:34:38 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EUDi0-0006Vt-T0 for ged-emacs-devel@m.gmane.org; Mon, 24 Oct 2005 21:34:36 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1EUDgp-00067Z-Pl for emacs-devel@gnu.org; Mon, 24 Oct 2005 21:33:24 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1EUDgm-00064l-Tl for emacs-devel@gnu.org; Mon, 24 Oct 2005 21:33:22 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EUDgm-00064g-M7 for emacs-devel@gnu.org; Mon, 24 Oct 2005 21:33:20 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.34) id 1EUDgm-0006gz-Dv for emacs-devel@gnu.org; Mon, 24 Oct 2005 21:33:20 -0400 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id j9P1X2cU014712; Tue, 25 Oct 2005 10:33:02 +0900 Original-Received: from etlken (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id j9P1X2Eh011070; Tue, 25 Oct 2005 10:33:02 +0900 Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1EUDgT-0003A2-00; Tue, 25 Oct 2005 10:33:01 +0900 Original-To: Adrian Robert In-reply-to: (message from Adrian Robert on Mon, 24 Oct 2005 10:43:04 -0400) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:44785 Archived-At: In article , Adrian Robert writes: > I didn't get any response to the below, let me try asking it in a > different way: Sorry for not responding on this matter. It seems that I missed your original mail. > unicode-2 branch: > dispextern.h: > struct glyph { > ... > /* Character code for character glyphs (type == > CHAR_GLYPH). */ > unsigned ch; > ... > } > ... > struct glyph_string { > ... > /* Characters to be drawn, and number of characters. */ > XChar2b *char2b; > int nchars; > ... > } > {x,mac,w32}term.c: > x_encode_char(int c, XChar2b *char2b, ...) > { > ... > } > x_draw_glyph_string(struct glyph_string *s) > { > ... > } > Questions: > 1) Is 'int c' passed to x_encode_char() the same as 'unsigned ch' in > struct glpyh? Mostly yes. The exception is in the case that x_encode_char is called on an element of composition glyph. In that case, x_encode_char is called from get_char_face_and_encoding which is called from BUILD_COMPOSITE_GLYPH_STRING macro on each element of a composition glyph. > 2) In either case, what are they -- UCS-2? UTF-16? MULE? UCS-4? > UTF-32? What is the byte ordering? It is a character code used in Emacs. The value range is 0x0..0x3FFFFF. Among them, 0x0..0x10FFFF are exactly the same as Unicode characters. I think it's nonsense to ask "byte ordering" of (int). That's depends on your hardware architecture. > I'll be happy to RTFM if this is documented anywhere.. The file src/character.h contains some documentation about character code. >> I apologize if this is a dumb question, but I've been looking >> through the code and can't figure this one out: on the unicode-2 >> branch, if a font specifies "iso-10646-1" for XLFD registry/ >> encoding (and then fontset.c sets 'charset' accordingly), what >> exactly is getting passed in struct glyph_string.char2b to >> x_draw_glyph_string()? If a font has CHARSET_REGISTRY "iso10646" and CHARSET_ENCODING "1", the font contains only BMP characters. Emacs-unicode uses such a font only for BMP characters. >> Not UTF-8, since it's just 2 bytes. >> UCS-2? UTF-16? Don't these exclude a lot of unicode characters? Yes. But, as far as I know, there's no consensus about what to specify in a font supporting SMP or SIP in CHARSET_REGISTRY and CHARSET_ENCODING fields. >> Does emacs provide any internal facility to get UTF-8? Do you mean a way to convert a character code to UTF-8 byte sequence in C level? Then you can use the macro CHAR_STRING (defined in character.h) because Emacs-unicode's internal string/buffer representation is UTF-8 byte sequence. --- Kenichi Handa handa@m17n.org