all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Miles Bader <miles@gnu.org>
Subject: Re: making "&#29572;&#22872;" say "Xuanzang" in chinese
Date: Fri, 25 Mar 2005 02:16:51 +0000 (UTC)	[thread overview]
Message-ID: <loom.20050325T030705-183@post.gmane.org> (raw)
In-Reply-To: E1DEdIa-0002j7-00@lab45.ma.utexas.edu

Joe Corneli <jcorneli <at> math.utexas.edu> writes:
>  (defun w3m-ucs-to-char (codepoint)
>    (or (decode-char 'ucs codepoint) ?~))
> 
> But keeping the function around wasn't helping either.  Except, when I
> tried it again, it worked, so I must have gotten something wrong.
> 
> This code seems a little more readable than the code you
> supplied...  but they seem to have the same effect.

Hmmm, I missed that; yeah, `decode-char' does look much nicer ... :-)

> Can you suggest something that will work on this content from the
> gnu.org homepage?  Neither the w3m code nor your code seems to produce
> human readable output on this stuff (maybe I'm missing some fonts or
> something?).  I get a bunch of control-at characters... (oh yeah,
> after modifying the "[0-9]" to be ".....".
> 
>   [ Az <at> rbaycanca | Bahasa Indonesia | Bosanski | Catal`
>   | &#x7b80;&#x4f53;&#x4e2d;&#x6587; |
>   &#x7e41;&#x9ad4;&#x4e2d;&#x6587; | Cesky | Dansk |

Presumably the "x" following &# means "hex", so you should use the BASE argument
to string-to-number if you see it.

The following tweak to your original code seems to generate reasonable output:

  (while (re-search-forward "&#\\(x\\)?\\([0-9a-f]+\\);" nil t)
    (let ((ucs (string-to-number (match-string 2)
                                 (if (match-beginning 1) 16 10))))
    (delete-region (match-beginning 0) (match-end 0))
    (insert-char (decode-char 'ucs ucs) 1)))

[The trick to select decimal or hex works because `match-beginning' returns nil
for optional parenthesized expressions which didn't match.]

-Miles

      reply	other threads:[~2005-03-25  2:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-23  3:14 making "&#29572;&#22872;" say "Xuanzang" in chinese Joe Corneli
2005-03-23 13:08 ` Mark Plaksin
2005-03-23 15:56   ` Joe Corneli
2005-03-23 16:29   ` Joe Corneli
     [not found]   ` <mailman.20.1111596532.28103.help-gnu-emacs@gnu.org>
2005-03-24 20:07     ` Miles Bader
2005-03-25  1:07       ` Joe Corneli
2005-03-25  2:16         ` Miles Bader [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20050325T030705-183@post.gmane.org \
    --to=miles@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.