From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: david.madore@ens.fr (David Madore) Newsgroups: gmane.emacs.help Subject: getting out of raw-text encoding Date: 10 May 2003 15:21:08 GMT Organization: Ecole normale superieure, Paris, France Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: NNTP-Posting-Host: main.gmane.org X-Trace: main.gmane.org 1052580319 18277 80.91.224.249 (10 May 2003 15:25:19 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 10 May 2003 15:25:19 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Sat May 10 17:25:17 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19EWDt-0004kX-00 for ; Sat, 10 May 2003 17:25:17 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19EWEC-0005s2-0C for gnu-help-gnu-emacs@m.gmane.org; Sat, 10 May 2003 11:25:36 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!nntp.cs.ubc.ca!freenix!jussieu.fr!ens.fr!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 65 Original-NNTP-Posting-Host: clipper-gw.ens.fr Original-X-Trace: nef.ens.fr 1052580068 28335 129.199.1.22 (10 May 2003 15:21:08 GMT) Original-X-Complaints-To: abuse@ens.fr Original-NNTP-Posting-Date: 10 May 2003 15:21:08 GMT X-Newsreader: Flrn (0.5.0pre0 - 10/00) X-Start-Date: 10 May 2003 15:01:39 GMT Original-Xref: shelby.stanford.edu gnu.emacs.help:113027 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:9522 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:9522 Hi! This is with (FSF) Emacs 21.2, compiled with LEIM. (Though I suppose the actual Emacs version makes little difference.) Suppose I have a file which is *mostly* encoded in the iso-8859-1 character set. However, it has a few occasional characters that are not iso-8859-1. For example, it might be actually in the windows-1252 character set (that closely resembles iso-8859-1 but adds 32 extra characters, with codes from 128 through 159). A sample could be produced with perl -e 'for ($i=32;$i<256;$i++) { print chr($i); } print "\n";' >file for instance. When I open the file with Emacs, it quite rightfully detects that it is not iso-8859-1, and switches to raw-text encoding (non-ASCII characters are represented by escape sequences). Now suppose I don't like that. I can force the file to be reinterpreted as iso-8859-1 by killing the buffer and reopening it with C-x RET c iso-8859-1 as prefix. Basically this does what I want, but suppose I wish to do it without closing and reopening the file (there may be reasons for this, e.g., if the buffer is actually not a file I opened but something that was produced by an Emacs submodule, like Gnus in attempting to reply to a news post with a bad encoding). So what I wish is, to change every raw-text octet value that makes sense as an iso-8859-1 character by that corresponding character. I hope I'm using the correct terminology here. First of all, doing C-x RET c iso-8859-1 M-x revert-buffer does not work (it does not do anything, in fact, apparently). Nor do I understand why. But even if it did, it still wouldn't be what I like, because it would force the buffer to have an associated file. Usually, when I need to reinterpret using the encoding a file that has been read using the encoding by mistake (typical example would be when =utf-8 and =iso-8859-1) I can do this: C-x h M-x encode-coding-region M-x decode-coding-region which works by first forcing the buffer back to the octets that encode it in the encoding, and then decoding these octets using the encoding. However, when is raw-text, this does not function as I expect it (maybe I'm being entirely naive there?), for the encode-coding-region command does not do anything, and the decode-coding-region merely adds more junk in front of my escaped characters. So, basically, how do I get out of raw-text without killing the buffer? Incidentally, is there an implementation of the windows-1252 character encoding in Emacs somewhere (one that would map iso-8859-1 characters to iso-8859-1 characters, and all others to the corresponding Unicode codepoints)? That would be useful in interacting with these annoying systems that insist on pretending that windows-1252 "is" iso-8859-1. Many thanks to those who can Enlighten me in this matter! -- David A. Madore (david.madore@ens.fr, http://www.eleves.ens.fr:8080/home/madore/ )