From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: EOL: unix/dos/mac
Date: Tue, 26 Mar 2013 16:45:30 +0900
Message-ID: <87620ed2p1.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CADkQgvvX1hZV5QbMZ4UfzG5i9oyFJVQS6LirozHg6xayQdMc1g@mail.gmail.com>
	<jwv620fl2p8.fsf-monnier+emacs@gnu.org>
	<87ip4fc4xd.fsf@uwakimon.sk.tsukuba.ac.jp> <831ub21xpn.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
X-Trace: ger.gmane.org 1364283942 1786 80.91.229.3 (26 Mar 2013 07:45:42 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 26 Mar 2013 07:45:42 +0000 (UTC)
Cc: per.starback@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 26 08:46:07 2013
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1UKOa4-0001kv-NT
	for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 08:46:04 +0100
Original-Received: from localhost ([::1]:48031 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1UKOZg-0000hC-OT
	for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 03:45:40 -0400
Original-Received: from eggs.gnu.org ([208.118.235.92]:41641)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1UKOZd-0000h6-BE
	for emacs-devel@gnu.org; Tue, 26 Mar 2013 03:45:38 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1UKOZc-0004YJ-8s
	for emacs-devel@gnu.org; Tue, 26 Mar 2013 03:45:37 -0400
Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:59543)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>)
	id 1UKOZa-0004XW-2o; Tue, 26 Mar 2013 03:45:34 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 472539708E6;
	Tue, 26 Mar 2013 16:45:31 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id 0C7921A3D97; Tue, 26 Mar 2013 16:45:31 +0900 (JST)
In-Reply-To: <831ub21xpn.fsf@gnu.org>
X-Mailer: VM undefined under 21.5  (beta32) "habanero" b0d40183ac79 XEmacs
	Lucid (x86_64-unknown-linux)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 130.158.97.224
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:158190
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/158190>

Eli Zaretskii writes:
 > > From: "Stephen J. Turnbull" <stephen@xemacs.org>

 > > [Unicode] just says "all of these sequences when encountered in
 > > text purporting to conform to this standard should be treated in
 > > the same way."  Emacsen should do the same.
 > 
 > That would require Emacs to store all the possible EOL sequences in
 > the buffer, and treat them all identically.  That's doable, but is a
 > non-trivial job; volunteers are welcome.

I don't know what you mean by "all the possible EOL sequences".  It's
well-defined (in Unicode TR#13 or section 5.8 of Unicode 6.2) what an
NLF is: it's the first of CRLF, LF, CR, or NL (U+0085) that matches
when parsing a line.  In the buffer, they would all be converted to
Emacs' representation (ie, LF).  Ensuring that C-x C-f file RET C-x
C-w file RET is the identity requires marking non-default EOL
sequences somehow, that's all.

 > > The question then is how to deal with file comparison.  We'd like to
 > > avoid creating spurious diffs based on "fixing" random different line
 > > endings
 > 
 > If Emacs is to support different EOL formats in the same file, it
 > should not convert them at all.

Of course it should convert them.

Trying to support multiple EOL codings in the buffer is craziness.
Two decades ago, I had to live that madness at the coding system
level, it was called "Nihongo Emacs" (or "The Japanese Patch" in other
programs).  Richard (and every other upstream maintainer) rightly
(with all due respect to the developers of those patches) rejected
that patch for application to the mainstream project.  Doing it only
for EOLs would be much less painful, but it's not worth it.

 > Anything else _will_ introduce spurious modifications, and could
 > even corrupt some files, if the exact EOL sequence here or there
 > matters.

No, it need not, any more than any ambiguous encoding need do so.  Of
course it will be fragile if (for example) Emacs crashes and you have
to recover an autosave file.

 > > I guess one could attach a text property to newlines differing from
 > > the file's autodetected EOL convention.
 > 
 > Not sure how a text property should help here.

It would mark non-default EOL sequences for correct output.

 > > I've also considered switching the internal representation of newline
 > > to U+2028 LINE SEPARATOR
 > 
 > What good would that be?

Unicode correctness; no confusion between Emacs internal
representation and the actual encoding of EOL on any given platform;
no long-lines ambiguity (LS would be considered a "soft newline" in
applications that automatically rewrap, and U+2029 PARAGRAPH SEPARATOR
would unambiguously demark paragraphs).

As I wrote, it's not urgent.