From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Giorgos Keramidas Newsgroups: gmane.emacs.help Subject: Re: Convert UTF-8 Date: Wed, 17 Dec 2008 13:17:04 +0200 Organization: SunSITE.dk - Supporting Open source Message-ID: <87ej07hwzz.fsf@kobe.laptop> References: <34c3af09-10d9-4b86-9683-08b37ccd4237@b41g2000pra.googlegroups.com> <1229480920.448497@arno.fh-trier.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1229514067 21650 80.91.229.12 (17 Dec 2008 11:41:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 17 Dec 2008 11:41:07 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Dec 17 12:42:12 2008 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LCumo-0001sz-3U for geh-help-gnu-emacs@m.gmane.org; Wed, 17 Dec 2008 12:41:54 +0100 Original-Received: from localhost ([127.0.0.1]:60786 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LCulc-0005nD-3P for geh-help-gnu-emacs@m.gmane.org; Wed, 17 Dec 2008 06:40:40 -0500 Original-Path: news.stanford.edu!newsfeed.stanford.edu!goblin2!goblin.stu.neva.ru!news.net.uni-c.dk!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail Original-Newsgroups: gnu.emacs.help User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (berkeley-unix) Cancel-Lock: sha1:JfwJ2w8EjvdU6ULXD4rYPb91pVM= Original-Lines: 42 Original-NNTP-Posting-Host: 77.49.234.32 Original-X-Trace: news.sunsite.dk DXC=WDVoJ6^18MBL?MA7T=6bbGYSB=nbEKnkKCA74; ^LPKQCL^MjGbH?lJH1TkjLa<[b>9h; HiI0TBakIMK6]F Original-X-Complaints-To: staff@sunsite.dk Original-Xref: news.stanford.edu gnu.emacs.help:165393 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:60723 Archived-At: On Wed, 17 Dec 2008 00:41:47 -0800 (PST), YOUNG wrote: > Well, I have no problem to load UTF-8 file with emacs at all. > > The problem is that emacs is not able to write UTF-8 at all. > > For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or > Latin 1 to 9; there are various aliases to indicating of it, but you > already know what it means.), I set it up with M-x set-buffer-file- > coding-system for writing utf-8 encoding. And, write (or save) it. > After that, exit the emacs and re-run it again, and try to read the > saved file to be expected UTF-8 encoding, but it reads again in ASCII. > It does not mean emacs can't read utf-8, but the file itself is not > encoded UTF-8. I check the file's encoding system with other > application like NotePAD++ or other editors, and all say the file is > still ASCII mode even though I write it as utf-8 in emacs. ASCII contains only 7-bit characters. All the characters of the 7-bit ASCII character set map to themselves in the UTF-8 coding system. This means that when a file contains only characters from the ASCII character set no conversion at all is needed from UTF-8 to ASCII or vice versa. If you set the buffer-file-coding system to UTF-8 *and* type some text that requires at least 8-bits to be represented correctly in in UTF-8, then the file will be saved in UTF-8. > Again, there is no problem in reading utf-8. When a file is encoded > utf-8 correctly, emacs reads/writes it in utf-8. It's good. However, > emacs is not able to write utf-8 if the file is encoded in ASCII. It > only writes in ASCII encode no matter how I do > "set-buffer-file-coding- system" > > So, if somebody knows this issue and how to write utf-8 correctly when > a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the > information, it would be appreciated. CP437 is very different from plain ASCII. It contains 8-bit characters and there are other differences in the 0x00 - 0x1F code range. If you ignore the 0x00-0x1F character differences you might be able to say that CP437 is a 'superset' of ASCII, but they are not the same thing.