all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Miles Bader <miles@gnu.org>
Subject: Re: making "&#29572;&#22872;" say "Xuanzang" in chinese
Date: Fri, 25 Mar 2005 05:07:40 +0900	[thread overview]
Message-ID: <87r7i46c4z.fsf@tc-1-100.kawasaki.gol.ne.jp> (raw)
In-Reply-To: <mailman.20.1111596532.28103.help-gnu-emacs@gnu.org> (Joe Corneli's message of "Wed, 23 Mar 2005 10:29:38 -0600")

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=iso-2022-jp-2, Size: 2041 bytes --]

Joe Corneli <jcorneli@math.utexas.edu> writes:
> Adapted from w3m-filter.el:
>
> (while (re-search-forward "&#\\([0-9]+\\);" nil t)
>   (setq ucs (string-to-number (match-string 1)))
>   (delete-region (match-beginning 0) (match-end 0))
>   (insert-char ucs 1))
>
> This would appear to work if the characters themselves were recognized...
>
> But when I run this expression on a buffer containing the string
> "&#29572;&#22872;" what I get is an error, like this:

Is that really what w3m does?  I'm not sure how the above could possibly
work in any normal version of Emacs -- the argument to `insert-char' is
an Emacs characater, not a unicode code-point.  So, you need to
translate from the unicode code-point to the Emacs character encoding.

One method might be to translate the unicode code-point into a utf-16
string (should be trivial I guess), and then use `decode-coding-string'
to translate that into Emacs' internal encoding; e.g.:


 (while (re-search-forward "&#\\([0-9]+\\);" nil t)
   (let* ((ucs (string-to-number (match-string 1)))
          (ucs-string (string (logand ucs #xFF) (logand (ash ucs -8) #xFF)))
          (decoded-string (decode-coding-string ucs-string 'mule-utf-16le)))
     (delete-region (match-beginning 0) (match-end 0))
     (insert decoded-string)))


For me, this does the right thing on your example, and on the text of
that wikipedia page:

   The fictional character Xuanzang (^[$B8<Ty^[(B, WG:  Hs^[.A^[N|an-tsang), a central
   character of the classic Chinese novel Journey to the West ...


It probably will only work well in recent CVS versions of Emacs
that have `utf-translate-cjk-mode' turned on by default though. [*]

-Miles


[*] In the current CVS Emacs, there seems to be a function that does
    this translation directly too, `utf-lookup-subst-table-for-decode'
    but given the odd name, it's probably not intended for general
    use...

-- 
Love is a snowmobile racing across the tundra.  Suddenly it flips over,
pinning you underneath.  At night the ice weasels come.  --Nietzsche

  parent reply	other threads:[~2005-03-24 20:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-23  3:14 making "&#29572;&#22872;" say "Xuanzang" in chinese Joe Corneli
2005-03-23 13:08 ` Mark Plaksin
2005-03-23 15:56   ` Joe Corneli
2005-03-23 16:29   ` Joe Corneli
     [not found]   ` <mailman.20.1111596532.28103.help-gnu-emacs@gnu.org>
2005-03-24 20:07     ` Miles Bader [this message]
2005-03-25  1:07       ` Joe Corneli
2005-03-25  2:16         ` Miles Bader

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r7i46c4z.fsf@tc-1-100.kawasaki.gol.ne.jp \
    --to=miles@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.