From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Miles Bader Newsgroups: gmane.emacs.help Subject: Re: making "玄奘" say "Xuanzang" in chinese Date: Fri, 25 Mar 2005 05:07:40 +0900 Message-ID: <87r7i46c4z.fsf@tc-1-100.kawasaki.gol.ne.jp> References: Reply-To: Miles Bader NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp-2 X-Trace: sea.gmane.org 1111695004 30827 80.91.229.2 (24 Mar 2005 20:10:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 24 Mar 2005 20:10:04 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Mar 24 21:10:04 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DEYeS-0003ur-Ko for geh-help-gnu-emacs@m.gmane.org; Thu, 24 Mar 2005 21:09:56 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DEYtn-00023O-UT for geh-help-gnu-emacs@m.gmane.org; Thu, 24 Mar 2005 15:25:47 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DEYsF-0001ZH-Qs for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 15:24:11 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DEYsD-0001Y3-KD for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 15:24:10 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DEYsD-0001Xy-Fz for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 15:24:09 -0500 Original-Received: from [203.216.5.72] (helo=smtp02.dentaku.gol.com) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1DEYcM-0000Ay-4X; Thu, 24 Mar 2005 15:07:46 -0500 Original-Received: from localhost ([127.0.0.1]) by smtp02.dentaku.gol.com with esmtp (Dentaku) id 1DEYcJ-0000ya-R3; Fri, 25 Mar 2005 05:07:43 +0900 Original-Received: from yokohama2-61-203-152-144.ap.0038.net ([61.203.152.144] helo=tc-1-100.kawasaki.gol.ne.jp) by smtp02.dentaku.gol.com with esmtpa (Dentaku) id 1DEYcI-0000yV-HU; Fri, 25 Mar 2005 05:07:43 +0900 Original-Received: by tc-1-100.kawasaki.gol.ne.jp (Postfix, from userid 1000) id CFECC2F68; Fri, 25 Mar 2005 05:07:40 +0900 (JST) Original-To: Joe Corneli , help-gnu-emacs@gnu.org System-Type: i686-pc-linux-gnu In-Reply-To: (Joe Corneli's message of "Wed, 23 Mar 2005 10:29:38 -0600") Original-Lines: 52 X-Virus-Scanned: by AMaViS GOL X-Abuse-Complaints: abuse@gol.com X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org X-MailScanner-To: geh-help-gnu-emacs@m.gmane.org Xref: news.gmane.org gmane.emacs.help:25123 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:25123 Joe Corneli writes: > Adapted from w3m-filter.el: > > (while (re-search-forward "&#\\([0-9]+\\);" nil t) > (setq ucs (string-to-number (match-string 1))) > (delete-region (match-beginning 0) (match-end 0)) > (insert-char ucs 1)) > > This would appear to work if the characters themselves were recognized... > > But when I run this expression on a buffer containing the string > "玄奘" what I get is an error, like this: Is that really what w3m does? I'm not sure how the above could possibly work in any normal version of Emacs -- the argument to `insert-char' is an Emacs characater, not a unicode code-point. So, you need to translate from the unicode code-point to the Emacs character encoding. One method might be to translate the unicode code-point into a utf-16 string (should be trivial I guess), and then use `decode-coding-string' to translate that into Emacs' internal encoding; e.g.: (while (re-search-forward "&#\\([0-9]+\\);" nil t) (let* ((ucs (string-to-number (match-string 1))) (ucs-string (string (logand ucs #xFF) (logand (ash ucs -8) #xFF))) (decoded-string (decode-coding-string ucs-string 'mule-utf-16le))) (delete-region (match-beginning 0) (match-end 0)) (insert decoded-string))) For me, this does the right thing on your example, and on the text of that wikipedia page: The fictional character Xuanzang ($B8