From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: EOL: unix/dos/mac Date: Wed, 27 Mar 2013 03:34:36 +0900 Message-ID: <87vc8eau2r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ip4fc4xd.fsf@uwakimon.sk.tsukuba.ac.jp> <20130326140247.GB4179@acm.acm> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1364322945 29663 80.91.229.3 (26 Mar 2013 18:35:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 26 Mar 2013 18:35:45 +0000 (UTC) Cc: Per =?utf-8?Q?Starb=C3=A4ck?= , Stefan Monnier , emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 26 19:36:07 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UKYj6-0007QJ-UA for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 19:36:05 +0100 Original-Received: from localhost ([::1]:40552 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKYii-0003hN-Tn for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 14:35:40 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:56082) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKYhl-0002QU-LR for emacs-devel@gnu.org; Tue, 26 Mar 2013 14:34:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UKYhk-0004Na-9N for emacs-devel@gnu.org; Tue, 26 Mar 2013 14:34:41 -0400 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:52544) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKYhk-0004Mt-08 for emacs-devel@gnu.org; Tue, 26 Mar 2013 14:34:40 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 3A48B97090B; Wed, 27 Mar 2013 03:34:37 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id F36CB1A3D97; Wed, 27 Mar 2013 03:34:36 +0900 (JST) In-Reply-To: <20130326140247.GB4179@acm.acm> X-Mailer: VM undefined under 21.5 (beta32) "habanero" b0d40183ac79 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:158237 Archived-At: Alan Mackenzie writes: > This is a little confusing to poor old me. ASCII doesn't care about line > breaks either; only particular use cases care. True. ASCII is a coded character set. It does not have a way to represent an abstract line break in a single character; whatever you do, then, is outside of the ASCII standard. > If you write a script (whether bash, sed, ....) on a *nix system > and it has CRLF line ends, it will fail (with an obscure error > message) regardless of whether that script is nominally in UTF-8 or > ASCII or whatever. Python, at least, is not in your ellipsis. Not by default, and not on any supported platform. I wouldn't be surprised if Perl and Ruby have adopted "universal newlines", too. > In what sense does Unicode "not care"? In the sense that Unicode is more than a character set; it prescribes all kinds of algorithms for text processing as well. Here, section 5.8 of the Unicode Standard v6.2 prescribes that any of LF, CR, CRLF, and ISO 6246 NEXT LINE (U+0085) should be considered to be a single line (or paragraph) break in legacy text. It says nothing about how they should be represented internally, though. Unusually for the Unicode Standard, it allows you to guess what the user wants, and in some cases even alter the input stream before outputting it. "Legacy" text means it uses ASCII (or C1) control characters to represent line and/or paragraph breaks, rather than the characters prescribed by Unicode (U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR).