From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: EOL: unix/dos/mac Date: Tue, 26 Mar 2013 16:45:30 +0900 Message-ID: <87620ed2p1.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ip4fc4xd.fsf@uwakimon.sk.tsukuba.ac.jp> <831ub21xpn.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1364283942 1786 80.91.229.3 (26 Mar 2013 07:45:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 26 Mar 2013 07:45:42 +0000 (UTC) Cc: per.starback@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 26 08:46:07 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UKOa4-0001kv-NT for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 08:46:04 +0100 Original-Received: from localhost ([::1]:48031 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKOZg-0000hC-OT for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 03:45:40 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:41641) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKOZd-0000h6-BE for emacs-devel@gnu.org; Tue, 26 Mar 2013 03:45:38 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UKOZc-0004YJ-8s for emacs-devel@gnu.org; Tue, 26 Mar 2013 03:45:37 -0400 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:59543) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKOZa-0004XW-2o; Tue, 26 Mar 2013 03:45:34 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 472539708E6; Tue, 26 Mar 2013 16:45:31 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 0C7921A3D97; Tue, 26 Mar 2013 16:45:31 +0900 (JST) In-Reply-To: <831ub21xpn.fsf@gnu.org> X-Mailer: VM undefined under 21.5 (beta32) "habanero" b0d40183ac79 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:158190 Archived-At: Eli Zaretskii writes: > > From: "Stephen J. Turnbull" > > [Unicode] just says "all of these sequences when encountered in > > text purporting to conform to this standard should be treated in > > the same way." Emacsen should do the same. > > That would require Emacs to store all the possible EOL sequences in > the buffer, and treat them all identically. That's doable, but is a > non-trivial job; volunteers are welcome. I don't know what you mean by "all the possible EOL sequences". It's well-defined (in Unicode TR#13 or section 5.8 of Unicode 6.2) what an NLF is: it's the first of CRLF, LF, CR, or NL (U+0085) that matches when parsing a line. In the buffer, they would all be converted to Emacs' representation (ie, LF). Ensuring that C-x C-f file RET C-x C-w file RET is the identity requires marking non-default EOL sequences somehow, that's all. > > The question then is how to deal with file comparison. We'd like to > > avoid creating spurious diffs based on "fixing" random different line > > endings > > If Emacs is to support different EOL formats in the same file, it > should not convert them at all. Of course it should convert them. Trying to support multiple EOL codings in the buffer is craziness. Two decades ago, I had to live that madness at the coding system level, it was called "Nihongo Emacs" (or "The Japanese Patch" in other programs). Richard (and every other upstream maintainer) rightly (with all due respect to the developers of those patches) rejected that patch for application to the mainstream project. Doing it only for EOLs would be much less painful, but it's not worth it. > Anything else _will_ introduce spurious modifications, and could > even corrupt some files, if the exact EOL sequence here or there > matters. No, it need not, any more than any ambiguous encoding need do so. Of course it will be fragile if (for example) Emacs crashes and you have to recover an autosave file. > > I guess one could attach a text property to newlines differing from > > the file's autodetected EOL convention. > > Not sure how a text property should help here. It would mark non-default EOL sequences for correct output. > > I've also considered switching the internal representation of newline > > to U+2028 LINE SEPARATOR > > What good would that be? Unicode correctness; no confusion between Emacs internal representation and the actual encoding of EOL on any given platform; no long-lines ambiguity (LS would be considered a "soft newline" in applications that automatically rewrap, and U+2029 PARAGRAPH SEPARATOR would unambiguously demark paragraphs). As I wrote, it's not urgent.