From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Benjamin Riefenstahl Newsgroups: gmane.emacs.help Subject: Re: Turning HTML character references into something readable? Date: 27 Apr 2003 17:54:34 +0200 Organization: None Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: main.gmane.org 1051459651 3013 80.91.224.249 (27 Apr 2003 16:07:31 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 27 Apr 2003 16:07:31 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Sun Apr 27 18:07:29 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 199ogb-0000mS-00 for ; Sun, 27 Apr 2003 18:07:29 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 199obS-0002Gn-01 for gnu-help-gnu-emacs@m.gmane.org; Sun, 27 Apr 2003 12:02:10 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!sn-xit-03!sn-xit-01!sn-post-01!supernews.com!news.supernews.com!cicero.benny.turtle-trading.net!nobody Original-Newsgroups: gnu.emacs.help User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 Original-X-Complaints-To: abuse@supernews.com Original-Lines: 32 Original-Xref: shelby.stanford.edu gnu.emacs.help:112416 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:8914 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:8914 Hi Karl, Karl Eichwalder writes: > Is there already a fuction to turn these HTML character references > into proper characters? "Я" must come out as the cyrillic > "??" etc. Actually that literal seems to be in some JIS encoding on my side, while Я indicates Unicode. If you want an ELisp function, M-x apropos RET char RET turns up functions that can be used to convert a Unicode codepoint to a string that Emacs shows fine here as cyrillic, e.g.: (char-to-string (decode-char 'ucs 1071)) If you want to get this into an interactive command, you'd need some more coding. Or maybe PSGML or some other SGML/HTML/XML mode may have that functionality already. > On the command line recode can do the trick: > >  echo "Я" | recode html..utf-8 You can use shell-command-on-region (M-|) to use "recode html..utf-8" directly. Note that M-| is not recognized by some german keyboard drivers. But than for everyday use you may want to encapsulate that into your own command where the "recode" command is hardcoded anyway. so long, benny