all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Xah Lee <xahlee@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: those funny non-ASCII characters
Date: Sat, 2 Jun 2012 04:54:34 -0700 (PDT)	[thread overview]
Message-ID: <878c6c73-4646-42fa-b5c5-5535803457f1@ri8g2000pbc.googlegroups.com> (raw)
In-Reply-To: 202f4594-9462-48dc-954d-8cf9ac6a581e@s6g2000pbi.googlegroups.com

On Jun 1, 8:17 pm, rusi <rustompm...@gmail.com> wrote:
> On Jun 2, 2:06 am, Xah Lee <xah...@gmail.com> wrote:
>
>
> > Xah wrote
>
> > > > 〈Unicode BOM Byte Order Mark Hack〉http://xahlee.org/comp/unicode_BOM_byte_orde_mark.html
>
> > > >http://www.unicode.org/faq/utf_bom.html#bom1
>
> > On Jun 1, 9:26 am, rusi <rustompm...@gmail.com> wrote:
>
> > > Seehttp://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
> > > (pg 36) "Use of a BOM is neither required nor recommended for UTF-8,
> > > but may
> > > be encountered in contexts where UTF-8 data is converted from other
> > > encoding forms..."
>
> > > More specifically the non-recommendation of bom:http://www.unicode.org/faq/utf_bom.html
> > > "Note that some recipients of UTF-8 encoded data do not expect a BOM.
> > > Where UTF-8 is used transparently in 8-bit environments, the use of a
> > > BOM will interfere with any protocol or file format that expects
> > > specific ASCII characters at the beginning, such as the use of "#!" of
> > > at the beginning of Unix shell scripts. "
>
> > didn't i mention these 2 points exactly in the link i gave??
>
> Yeah your own link says this: (as you know I often use and quote your
> unicode pages :-) )
>
> - In unix-like OSes, BOM for utf-8 conflicts with the Shebang (Unix)
> hack.
> - Many Window software add BOM to utf-8 files, e.g. Notepad.
>
> But you also say
>
> > If your lang spec says unicode, you have to support BOM mark
>
> So I am not clear whats ur stand...
>
> Let me make my own position clear:
> The de jure unicode standard is set by the unicode consortium (or
> whatever its called)
> The de facto standard is set by microsoft and java
> The two conflict

BOM mark is part of the unicode standard. If a tech declares full
support for unicode, support for BOM mark is necessary.

BOM mark is a hack, but so is unix shebang mark. BOM mark being a
given, it wouldn't have any problem if utf-8 isn't invented. utf-8 is
invented by unix fanatic Rob Pike largely to help unix world move
forward to unicode. As it is, BOM mark conflict with the spirit of
utf-8 (because utf-8 is meant to be ASCII compatible as is, yet BOM
mark byte sequence isn't in ASCII.)

i read the link Thien-Thin Nguyen posted 〔http://
www.utf8everywhere.org/〕. At first i find it very informative, but in
the end i wasn't convinced in its opinion that we should all adopt
utf-8 instead of utf-16. I think if one switch a attitude, that utf-8
is the hack that introduced all this problems, then many of their
argument for utf-8 doesn't stand.

side note... about that site, it's Windows oriented. As such, they
didn't explain many terms and Windows tech they use, e.g. i have
little idea what narrowchar or widechar they mean, nor of the many
Windows libraries they mention.

also, the site is decidedly western-mind oriented. They forgot that in
china, the encoding used is GB 18030, which has the same char set as
unicode but different encoding, and is also compatible with ascii. No
utf-8 nor utf-anything whatsoever. Chinese web traffic are like half
of the world's or something.

the site wishes utf-16 to go away. Windows, Mac, NTFS, HFS+ file
systems, all utf-16, plus java C# etc. Though, the web (html,xml,css)
are all utf-8. Neither are likely to go away. If Java and C# and NTFS
disappeared from the face of this earth, then maybe. lol. :D

 Xah


  reply	other threads:[~2012-06-02 11:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.1961.1338398127.855.help-gnu-emacs@gnu.org>
2012-06-01  4:23 ` those funny non-ASCII characters Jason Rumney
2012-06-01  5:43   ` rusi
2012-06-01  6:12     ` Eli Zaretskii
2012-06-01  7:03     ` Xah Lee
2012-06-01 16:26       ` rusi
2012-06-01 21:06         ` Xah Lee
2012-06-02  3:17           ` rusi
2012-06-02 11:54             ` Xah Lee [this message]
2012-06-02 14:10               ` Xah Lee
2012-05-30 17:15 Buchs, Kevin
2012-05-31  7:17 ` Thien-Thi Nguyen
2012-05-31 14:57   ` Buchs, Kevin
2012-05-31 16:40     ` Thien-Thi Nguyen
2012-05-31 16:56       ` Buchs, Kevin
2012-05-31 21:46         ` Thien-Thi Nguyen
2012-06-01 13:36           ` Doug Lewan
     [not found]         ` <mailman.2041.1338500734.855.help-gnu-emacs@gnu.org>
2012-06-01  2:42           ` rusi
2012-05-31 15:59 ` PJ Weisberg
     [not found] <mailman.1665.1337953237.855.help-gnu-emacs@gnu.org>
2012-05-25 18:33 ` Xah Lee
  -- strict thread matches above, loose matches on Subject: below --
2012-05-25 13:40 Buchs, Kevin
2012-05-25 14:04 ` Eli Zaretskii
2012-05-25 14:42 ` Jambunathan K
     [not found] <mailman.1638.1337903381.855.help-gnu-emacs@gnu.org>
2012-05-25  0:56 ` Xah Lee
2012-05-24 23:49 Buchs, Kevin
2012-05-25  6:36 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878c6c73-4646-42fa-b5c5-5535803457f1@ri8g2000pbc.googlegroups.com \
    --to=xahlee@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.