From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: rusi Newsgroups: gmane.emacs.help Subject: Re: those funny non-ASCII characters Date: Fri, 1 Jun 2012 09:26:08 -0700 (PDT) Organization: http://groups.google.com Message-ID: References: <731567ba-000c-4643-9eff-0237129b90c7@oe8g2000pbb.googlegroups.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1338568218 4090 80.91.229.3 (1 Jun 2012 16:30:18 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 1 Jun 2012 16:30:18 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Jun 01 18:30:17 2012 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SaUjs-0003CF-Ah for geh-help-gnu-emacs@m.gmane.org; Fri, 01 Jun 2012 18:30:12 +0200 Original-Received: from localhost ([::1]:56193 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SaUjs-0005WP-2n for geh-help-gnu-emacs@m.gmane.org; Fri, 01 Jun 2012 12:30:12 -0400 Original-Path: usenet.stanford.edu!postnews.google.com!re8g2000pbc.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 46 Original-NNTP-Posting-Host: 116.74.133.115 Original-X-Trace: posting.google.com 1338568067 12163 127.0.0.1 (1 Jun 2012 16:27:47 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Fri, 1 Jun 2012 16:27:47 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: re8g2000pbc.googlegroups.com; posting-host=116.74.133.115; posting-account=mBpa7woAAAAGLEWUUKpmbxm-Quu5D8ui User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; Linux i686; rv:12.0) Gecko/20100101 Firefox/12.0,gzip(gfe) Original-Xref: usenet.stanford.edu gnu.emacs.help:192672 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:85077 Archived-At: On Jun 1, 12:03=C2=A0pm, Xah Lee wrote: > On May 31, 10:43=C2=A0pm, rusi wrote: > > > On Jun 1, 9:23=C2=A0am, Jason Rumney wrote: > > > > On Thursday, 31 May 2012 01:15:11 UTC+8, Buchs, Kevin =C2=A0wrote: > > > > Xah suggested I embrace Unicode. So I could use (prefer-coding-syst= em > > > > 'utf-8) or the file variable: -*- coding: utf-8 -*-. Are there draw= backs > > > > to the former? What about opening an ASCII coded file? Can emacs > > > > properly detect it or does it come up as UTF-8? > > > > ASCII is a subset of UTF-8, so the problem you are imagining does not= exist. > > > This does not exactly work that way on windows. > > eg recently saw a description of how notepad put a BOM mark in a > > haskell-script which made the haskell scripts unrunnable > > haskell compiler probably should bear the blame. Last i read (~4 years > ago), the lang spec says source code should be unicode (i forgot if it > specified a encoding), however, no haskell compiler at the time > supports it. If your lang spec says unicode, you have to support BOM > mark. > > =E3=80=88Unicode BOM Byte Order Mark Hack=E3=80=89http://xahlee.org/comp/= unicode_BOM_byte_orde_mark.html > > http://www.unicode.org/faq/utf_bom.html#bom1 > > =C2=A0Xah See http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf (pg 36) "Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms..." More specifically the non-recommendation of bom: http://www.unicode.org/faq= /utf_bom.html "Note that some recipients of UTF-8 encoded data do not expect a BOM. Where UTF-8 is used transparently in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts. "