From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Karl Eichwalder Newsgroups: gmane.emacs.help Subject: Re: Turning HTML character references into something readable? Date: Sun, 27 Apr 2003 21:09:39 +0200 Organization: The Moon On Earth Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: main.gmane.org 1051472366 10522 80.91.224.249 (27 Apr 2003 19:39:26 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 27 Apr 2003 19:39:26 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Sun Apr 27 21:39:25 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 199rzh-0002jQ-00 for ; Sun, 27 Apr 2003 21:39:25 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 199s0g-0006v5-03 for gnu-help-gnu-emacs@m.gmane.org; Sun, 27 Apr 2003 15:40:26 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!logbridge.uoregon.edu!newsfeed.stueberl.de!newsfeed.completel.de!news.tesion.net!news.belwue.de!news.uni-stuttgart.de!uni-erlangen.de!elvis.franken.de!chico.franken.de!news.franken.de!gnu.franken.de!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 38 Original-NNTP-Posting-Host: tux.gnu.franken.de Original-X-Trace: tux.gnu.franken.de 1051470579 24176 192.168.1.3 (27 Apr 2003 19:09:39 GMT) Original-X-Complaints-To: newsadm@gnu.franken.de Original-NNTP-Posting-Date: Sun, 27 Apr 2003 19:09:39 +0000 (UTC) Mail-copies-to: never In-Reply-To: (Benjamin Riefenstahl's message of "27 Apr 2003 17:54:34 +0200") User-Agent: Gnus/5.09002 (Oort Gnus v0.20) Emacs/21.3.50 (gnu/linux) Cancel-Lock: sha1:8VbVcNa/oPBI+Mg9pBte4jgLcOw= Original-Xref: shelby.stanford.edu gnu.emacs.help:112428 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:8926 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:8926 Benjamin Riefenstahl writes: > Actually that literal seems to be in some JIS encoding on my side, > while Я indicates Unicode. Gnus decided to turn it into JIS; initially it was Unicode/UTF-8. > (char-to-string (decode-char 'ucs 1071)) Yes, this is a good hint! > If you want to get this into an interactive command, you'd need some > more coding. Or maybe PSGML or some other SGML/HTML/XML mode may have > that functionality already. In this case I cannot use PSGML because de.wikipedia.org is based on a free style markup language... >> On the command line recode can do the trick: >> >>  echo "Я" | recode html..utf-8 > > You can use shell-command-on-region (M-|) to use "recode html..utf-8" > directly. I completely forgot about this possibility. But now it turns out, "recode html..utf-8" is too ambitious; if the file already contains umlaut characters they will be encoded twice: echo "Danke schön ЮЯ" | recode html..utf-8 Danke schön �� I must find a way to tell recode to leave "Danke schön" untouched. -- | ,__o http://www.gnu.franken.de/ke/ | _-\_<, ke@suse.de (work) / keichwa@gmx.net (home) | (*)/'(*)