From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Miles Bader Newsgroups: gmane.emacs.help Subject: Re: making "玄奘" say "Xuanzang" in chinese Date: Fri, 25 Mar 2005 02:16:51 +0000 (UTC) Message-ID: References: <87r7i46c4z.fsf@tc-1-100.kawasaki.gol.ne.jp> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1111719297 27110 80.91.229.2 (25 Mar 2005 02:54:57 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 25 Mar 2005 02:54:57 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Mar 25 03:54:57 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DEeyK-0003aD-2C for geh-help-gnu-emacs@m.gmane.org; Fri, 25 Mar 2005 03:54:52 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DEfDi-00057E-8j for geh-help-gnu-emacs@m.gmane.org; Thu, 24 Mar 2005 22:10:46 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DEfCc-0004ck-AK for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 22:09:39 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DEfCV-0004XE-HB for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 22:09:32 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DEfCU-0004Ts-UF for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 22:09:31 -0500 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1DEekz-0007VV-Cq for help-gnu-emacs@gnu.org; Thu, 24 Mar 2005 21:41:05 -0500 Original-Received: from root by ciao.gmane.org with local (Exim 4.43) id 1DEekb-0002eV-RD for help-gnu-emacs@gnu.org; Fri, 25 Mar 2005 03:40:47 +0100 Original-Received: from TYO111.gate.nec.co.jp ([202.32.8.233]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 25 Mar 2005 03:40:41 +0100 Original-Received: from miles by TYO111.gate.nec.co.jp with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 25 Mar 2005 03:40:41 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-To: help-gnu-emacs@gnu.org Original-Lines: 38 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: main.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 202.32.8.233 (Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.7.6) Gecko/20050320 Firefox/1.0.1 (Debian package 1.0.1-3)) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org X-MailScanner-To: geh-help-gnu-emacs@m.gmane.org Xref: news.gmane.org gmane.emacs.help:25135 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:25135 Joe Corneli math.utexas.edu> writes: > (defun w3m-ucs-to-char (codepoint) > (or (decode-char 'ucs codepoint) ?~)) > > But keeping the function around wasn't helping either. Except, when I > tried it again, it worked, so I must have gotten something wrong. > > This code seems a little more readable than the code you > supplied... but they seem to have the same effect. Hmmm, I missed that; yeah, `decode-char' does look much nicer ... :-) > Can you suggest something that will work on this content from the > gnu.org homepage? Neither the w3m code nor your code seems to produce > human readable output on this stuff (maybe I'm missing some fonts or > something?). I get a bunch of control-at characters... (oh yeah, > after modifying the "[0-9]" to be ".....". > > [ Az rbaycanca | Bahasa Indonesia | Bosanski | Catal` > | 简体中文 | > 繁體中文 | Cesky | Dansk | Presumably the "x" following &# means "hex", so you should use the BASE argument to string-to-number if you see it. The following tweak to your original code seems to generate reasonable output: (while (re-search-forward "&#\\(x\\)?\\([0-9a-f]+\\);" nil t) (let ((ucs (string-to-number (match-string 2) (if (match-beginning 1) 16 10)))) (delete-region (match-beginning 0) (match-end 0)) (insert-char (decode-char 'ucs ucs) 1))) [The trick to select decimal or hex works because `match-beginning' returns nil for optional parenthesized expressions which didn't match.] -Miles