From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: YOUNG Newsgroups: gmane.emacs.help Subject: Re: Convert UTF-8 Date: Thu, 18 Dec 2008 00:35:18 -0800 (PST) Organization: http://groups.google.com Message-ID: <8611f412-9a43-4325-8c60-0cb620e8a206@y1g2000pra.googlegroups.com> References: <34c3af09-10d9-4b86-9683-08b37ccd4237@b41g2000pra.googlegroups.com> <1229480920.448497@arno.fh-trier.de> <647e8cdc-381d-48a4-ab4c-25a7a52596d7@r15g2000prh.googlegroups.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1229589645 20930 80.91.229.12 (18 Dec 2008 08:40:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 18 Dec 2008 08:40:45 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Dec 18 09:41:48 2008 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LDES4-0003jU-2a for geh-help-gnu-emacs@m.gmane.org; Thu, 18 Dec 2008 09:41:48 +0100 Original-Received: from localhost ([127.0.0.1]:56585 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LDEQr-0004TQ-Vg for geh-help-gnu-emacs@m.gmane.org; Thu, 18 Dec 2008 03:40:34 -0500 Original-Path: news.stanford.edu!newsfeed.stanford.edu!postnews.google.com!y1g2000pra.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 80 Original-NNTP-Posting-Host: 76.93.136.148 Original-X-Trace: posting.google.com 1229589318 22364 127.0.0.1 (18 Dec 2008 08:35:18 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Thu, 18 Dec 2008 08:35:18 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: y1g2000pra.googlegroups.com; posting-host=76.93.136.148; posting-account=1n6WnAoAAACbXH3nD5I7RQWqdkbTxZki User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.39 Safari/525.19, gzip(gfe), gzip(gfe) Original-Xref: news.stanford.edu gnu.emacs.help:165446 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:60775 Archived-At: On Dec 17, 4:04=C2=A0am, Xah Lee wrote: > On Dec 17, 12:41 am, YOUNG wrote: > > > > > Well, I have no problem to load UTF-8 file with emacs at all. > > > The problem is that emacs is not able to write UTF-8 at all. > > > For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or > > Latin 1 to 9; there are various aliases to indicating of it, but you > > already know what it means.), I set it up with M-x set-buffer-file- > > coding-system for writing utf-8 encoding. And, write (or save) it. > > After that, exit the emacs and re-run it again, and try to read the > > saved file to be expected UTF-8 encoding, but it reads again in ASCII. > > It does not mean emacs can't read utf-8, but the file itself is not > > encoded UTF-8. I check the file's encoding system with other > > application like NotePAD++ or other editors, and all say the file is > > still ASCII mode even though I write it as utf-8 in emacs. > > > Again, there is no problem in reading utf-8. When a file is encoded > > utf-8 correctly, emacs reads/writes it in utf-8. It's good. However, > > emacs is not able to write utf-8 if the file is encoded in ASCII. It > > only writes in ASCII encode no matter how I do "set-buffer-file-coding- > > system" > > > So, if somebody knows this issue and how to write utf-8 correctly when > > a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the > > information, it would be appreciated. > > > Thanks, > > as other have mentioned, utf-8 is just a super set of ascii, so files > encoded in either are identical. > > You mentioned ISO8859, which is not ascii. I read your 2 posts, but > don't quite understand what you wanted. > > For some unicode with emacs tips, you might checkout: > > =E2=80=A2 Emacs and Unicode Tips > =C2=A0http://xahlee.org/emacs/emacs_n_unicode.html > > You might also beefup understanding of char encoding: > > http://en.wikipedia.org/wiki/ISO8859http://en.wikipedia.org/wiki/ASCIIhtt= p://en.wikipedia.org/wiki/UTF-8 > > =C2=A0 Xah > =E2=88=91http://xahlee.org/ > > =E2=98=84 Hi, Finally, I know what is the problem. Thank you guys for helping this issues. I am not expert on encoding system, though, I thank this opportunity for me to learn it. The problem is BOM (Byte Order Mark). In case of utf-8, it is avoided since BOM header could cause conflict when the expected special character is starting position like '#!' in Unix shell script. Therefore, if there is no text written at least 8-bits to be represented in utf-8, the text encoding is not defined or ASCII (I am not sure if it is right term, but here, let's say it is ASCII for convenience.) in emacs. I could conclude emacs does not have the feature of having BOM in utf-8. It only supports utf-8 without BOM. So, I could understand why the text was not written in utf-8 if the text does not contain actual utf-8 characters. If there is a text in utf-8 character and save it as utf-8, then there is no problem in writing utf-8 without BOM. Detailed information about unicode and BOM is found in http://unicode.org/faq/utf_bom.html Thank you,