From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Joe Corneli Newsgroups: gmane.emacs.help Subject: Re: making "玄奘" say "Xuanzang" in chinese Date: Thu, 24 Mar 2005 19:07:40 -0600 Message-ID: References: <87r7i46c4z.fsf@tc-1-100.kawasaki.gol.ne.jp> NNTP-Posting-Host: main.gmane.org X-Trace: sea.gmane.org 1111714065 17363 80.91.229.2 (25 Mar 2005 01:27:45 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 25 Mar 2005 01:27:45 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Mar 25 02:27:45 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DEdbx-0005RW-Ew for geh-help-gnu-emacs@m.gmane.org; Fri, 25 Mar 2005 02:27:41 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DEdrL-0003Ix-Ki for geh-help-gnu-emacs@m.gmane.org; Thu, 24 Mar 2005 20:43:35 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DEdoj-0002MX-Da for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 20:40:54 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DEdoW-0002IE-8j for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 20:40:42 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DEdoU-0002C6-PI for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 20:40:38 -0500 Original-Received: from [146.6.139.124] (helo=dell3.ma.utexas.edu) by monty-python.gnu.org with esmtp (Exim 4.34) id 1DEdIb-0002rf-Bi for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 20:07:41 -0500 Original-Received: from lab45.ma.utexas.edu (mail@lab45.ma.utexas.edu [128.83.133.159]) by dell3.ma.utexas.edu (8.11.0.Beta3/8.10.2) with ESMTP id j2P17eC18511; Thu, 24 Mar 2005 19:07:40 -0600 Original-Received: from jcorneli by lab45.ma.utexas.edu with local (Exim 3.36 #1 (Debian)) id 1DEdIa-0002j7-00; Thu, 24 Mar 2005 19:07:40 -0600 Original-To: help-gnu-emacs@gnu.org In-reply-to: <87r7i46c4z.fsf@tc-1-100.kawasaki.gol.ne.jp> (message from Miles Bader on Fri, 25 Mar 2005 05:07:40 +0900) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org X-MailScanner-To: geh-help-gnu-emacs@m.gmane.org Xref: news.gmane.org gmane.emacs.help:25131 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:25131 Joe Corneli writes: > Adapted from w3m-filter.el: > > (while (re-search-forward "&#\\([0-9]+\\);" nil t) > (setq ucs (string-to-number (match-string 1))) > (delete-region (match-beginning 0) (match-end 0)) > (insert-char ucs 1)) > > This would appear to work if the characters themselves were recognized... > > But when I run this expression on a buffer containing the string > "玄奘" what I get is an error, like this: Is that really what w3m does? Hm... well I did doctor it up a bit. In particular, I took out some code that wrapped `ucs' in the last line with the function defined by: (defun w3m-ucs-to-char (codepoint) (or (decode-char 'ucs codepoint) ?~)) But keeping the function around wasn't helping either. Except, when I tried it again, it worked, so I must have gotten something wrong. This code seems a little more readable than the code you supplied... but they seem to have the same effect. Anyway, your advice got me past whatever I was stumbling over. Can you suggest something that will work on this content from the gnu.org homepage? Neither the w3m code nor your code seems to produce human readable output on this stuff (maybe I'm missing some fonts or something?). I get a bunch of control-at characters... (oh yeah, after modifying the "[0-9]" to be ".....". [ Az@rbaycanca | Bahasa Indonesia | Bosanski | Catal` | 简体中文 | 繁體中文 | Cesky | Dansk | Deutsch | English | Ellynika' | Espaqol | Frangais | Hrvatski | Italiano | E+B+R+J+T+ | 日本語 | 한국어 | Magyar | Nederlands | Norsk | Polski | Portugujs | Rombna | Russkij | Srpski | Shqip | Suomi | Svenska | Tagalog | ภาษาไทย | T|rkge | Tie>'ng Vie>-.t | Ukrayins'ka ]