From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Martin Monsorno Newsgroups: gmane.emacs.help Subject: Re: how to change file coding system Date: Wed, 17 Aug 2005 11:20:00 +0200 Organization: Schlund + Partner AG Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1124270691 30974 80.91.229.2 (17 Aug 2005 09:24:51 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 17 Aug 2005 09:24:51 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Aug 17 11:24:44 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1E5K8h-00042A-69 for geh-help-gnu-emacs@m.gmane.org; Wed, 17 Aug 2005 11:23:15 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1E5KC9-0006tf-IG for geh-help-gnu-emacs@m.gmane.org; Wed, 17 Aug 2005 05:26:49 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!npeer.de.kpn-eurorings.net!feed.news.schlund.de!schlund.de!news.schlund.de!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 87 Original-NNTP-Posting-Host: p7223e3d4.np.gmx.net Original-X-Trace: schlund.de 1124270355 28210 212.227.35.114 (17 Aug 2005 09:19:15 GMT) Original-X-Complaints-To: usenet@schlund.de Original-NNTP-Posting-Date: Wed, 17 Aug 2005 09:19:15 +0000 (UTC) User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) Original-Xref: shelby.stanford.edu gnu.emacs.help:133246 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:28773 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:28773 Peter Dyballa writes: > Am 16.08.2005 um 11:22 schrieb Martin Monsorno: > >> ,---- >> | monsorno@mmdev ~/work/workspace.c/gmx $ file bla* >> | bla.eclipse: UTF-8 Unicode text >> | bla.emacs: ISO-8859 text >> `---- >> >> Opening "bla.eclipse" with emacs, shows me the string >> "�berfall". Changing the file encoding with "C-x f >> iso-latin-1-unix" and saving leads to: > > The correct way would have been, once you've opened the file > bla.eclipse and Emacs came up showing `-0:´ as start of the mode-line > (stating ISO Latin-1 or ISO Latin-15 encoding), C-x r utf-8-unix > : re-open the file in UTF-8 encoding, to view it in its natural > mood. Hmm, I cannot make something showing off in the modeline, regardless of how I open one of the files (I tried to open bla.eclipse with both iso-8859-1 and utf-8 specified). (describe-variable 'buffer-file-coding-system) says: buffer-file-coding-system's value is raw-text-unix Local in buffer bla.eclipse; global value is mule-utf-8 AND this output is the same with file "bla.emacs", which is a 8859-latin1 file. :-? > When you now save the file in ISO Latin-1 encoding, having applied C-x > f (set-buffer-file-coding-system), GNU Emacs does the conversion. > Instead of C3 BC it writes only FC. The file size will be reduced by > one byte. To make it just more exciting, I tried something more: 1) Created a file called "bla.created-by-emacs" containing the string "überfall" with emacs. 2) Copied this file to "bla.changed-by-eclipse". 3) Opened this file with eclipse. 4) Saved this file with eclipse. 5) Created a new file with eclipse "bla.created-by-eclipse" containing the same string. 6) ls -l bla* -rw-r--r-- 1 monsorno users 11 17. Aug 11:00 bla.changed-by-eclipse -rw-r--r-- 1 monsorno users 10 17. Aug 10:35 bla.created-by-eclipse -rw-r--r-- 1 monsorno users 9 17. Aug 10:58 bla.created-by-emacs 7) file bla* bla.changed-by-eclipse: UTF-8 Unicode text bla.created-by-eclipse: UTF-8 Unicode text bla.created-by-emacs: ISO-8859 text 8) Visiting bla.changed-by-eclipse with emacs shows "�berfall" 9) Visiting bla.chreated-by-eclipse with emacs shows "überfall" So we now have 3 files containing the "same" string, 2 of them claim to be utf-8, but they use a different encoding (2 or 3 bytes). For all 3 files, when opening the in emacs, buffer-file-coding-system's value is raw-text-unix. Emacs can only display "bla.created-by-emacs" correctly, eclipse can only display "bla.created-by-eclipse" correctly. > The C-x RET commands *do not* change a buffer's (or a file's) contents, > they just put some new skin on the buffer so that your view on the > buffer's (i.e. file's) contents is adapted in a certain way: you can > see a buffer's (or file's) whatever contents in green, blue, red, > yellow, cyan ... utf-8, Mac-Roman, NeXT, koi-r8, euc-jp-unix ... > encoding/view. I think I understood this. But this means that I can change the file-encoding of a file with emacs, doesn't it? > Eclipse might be fooling you. The character `ü´ is encoded in UTF-8 as > C3 BC or, translating the two hex codes into ISO Latin-1 (or -15) > characters, as: à ³. What you cite in your eMail, � or in HTML > �, is *not* UTF-8. Yes, or at least, it does not look like an 'ü' ;-) What I cite in my mails are the strings as emacs shows them to me when loading one of the files. So the question is, /why/ are they not UTF-8? Does eclipse do a wrong latin-1 to utf-8 conversion? -- Martin