From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Adrian Robert Newsgroups: gmane.emacs.devel Subject: Re: How to convert char from Emacs-20 internal to UTF-8? Date: Tue, 22 Mar 2005 12:30:26 -0500 Message-ID: <619606982fe3815e3343bfb65bb17906@cogsci.ucsd.edu> References: <64cb038a0c479b993368280a127af987@cogsci.ucsd.edu> <87acp3jyqi.fsf-monnier+emacs@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (Apple Message framework v619.2) Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1111512910 26439 80.91.229.2 (22 Mar 2005 17:35:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 22 Mar 2005 17:35:10 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 22 18:35:10 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DDnH8-0008Dk-Ju for ged-emacs-devel@m.gmane.org; Tue, 22 Mar 2005 18:34:42 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DDnYT-00038Y-Au for ged-emacs-devel@m.gmane.org; Tue, 22 Mar 2005 12:52:37 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DDnWM-0002al-Kx for emacs-devel@gnu.org; Tue, 22 Mar 2005 12:50:27 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DDnWG-0002Wq-52 for emacs-devel@gnu.org; Tue, 22 Mar 2005 12:50:20 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DDnWB-0002Px-An for emacs-devel@gnu.org; Tue, 22 Mar 2005 12:50:15 -0500 Original-Received: from [140.251.1.25] (helo=smtp-in1.med.cornell.edu) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1DDnH1-00006D-4d for emacs-devel@gnu.org; Tue, 22 Mar 2005 12:34:37 -0500 Original-Received: from mpx2.med.cornell.edu (biglb-vlan511vip.med.cornell.edu [140.251.11.120]) by smtp-in1.med.cornell.edu (Switch-3.1.6/Switch-3.1.6) with ESMTP id j2MHURNU273336 for ; Tue, 22 Mar 2005 12:30:27 -0500 Original-Received: from [140.251.33.115] by mpx2.med.cornell.edu (Sun Java System Messaging Server 6.1 HotFix 0.11 (built Jan 28 2005)) with ESMTP id <0IDR00LURKMRH890@mpx2.med.cornell.edu> for emacs-devel@gnu.org; Tue, 22 Mar 2005 12:30:27 -0500 (EST) In-reply-to: <87acp3jyqi.fsf-monnier+emacs@gnu.org> Original-To: Stefan Monnier X-Mailer: Apple Mail (2.619.2) X-PMX-Version: 4.7.1.128075, Antispam-Engine: 2.0.3.0, Antispam-Data: 2005.3.22.8 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-MailScanner-To: ged-emacs-devel@m.gmane.org Xref: news.gmane.org gmane.emacs.devel:34975 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:34975 On Mar 16, 2005, at 12:19 PM, Stefan Monnier wrote: >> I apologize for the "retro" question, but I was wondering if there >> was an >> easy way to convert a character in the Emacs-20 internal 19-bit >> encoding >> (from FAST_GLYPH_CHAR(glyph)) to UTF-8 (preferable) or straight >> Unicode. >> I'd like to do it fully within C if possible, and it needs to be >> efficient. I found a way to do this using parts of the C program available at: http://tclab.kaist.ac.kr/~otfried/Mule/ Basically it uses a large table to convert from charset/byte1/byte2 to unicode then UTF-8. I call SPLIT_NON_ASCII_CHAR() to get that info out of the 19-bit internal representation stored in the glyph. CCL was not needed, though maybe it would have provided a more compact way to solve the problem than a 250K table. However, I still have an issue: for 2-byte characters, such as Big5 or JIS Chinese characters, emacs (20) is giving me two glyphs for each character, with identical values. Does this have something to do with it thinking the font needs a double wide horizontal space to render the character? thanks, Adrian