From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: EOL: unix/dos/mac Date: Tue, 26 Mar 2013 15:07:21 +0200 Message-ID: <83obe6z4vq.fsf@gnu.org> References: <87ip4fc4xd.fsf@uwakimon.sk.tsukuba.ac.jp> <831ub21xpn.fsf@gnu.org> <87620ed2p1.fsf@uwakimon.sk.tsukuba.ac.jp> <83vc8ezh52.fsf@gnu.org> <871ub2crhm.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1364303247 28307 80.91.229.3 (26 Mar 2013 13:07:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 26 Mar 2013 13:07:27 +0000 (UTC) Cc: per.starback@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 26 14:07:51 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UKTbQ-0004F2-87 for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 14:07:48 +0100 Original-Received: from localhost ([::1]:47143 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKTb2-0006b3-9Y for ged-emacs-devel@m.gmane.org; Tue, 26 Mar 2013 09:07:24 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:47143) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKTat-0006ar-Lo for emacs-devel@gnu.org; Tue, 26 Mar 2013 09:07:21 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UKTao-0004XK-DZ for emacs-devel@gnu.org; Tue, 26 Mar 2013 09:07:15 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:56463) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UKTao-0004X3-0P for emacs-devel@gnu.org; Tue, 26 Mar 2013 09:07:10 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MK900B00R37NA00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Tue, 26 Mar 2013 15:07:08 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MK900BWVR3W1L90@a-mtaout22.012.net.il>; Tue, 26 Mar 2013 15:07:08 +0200 (IST) In-reply-to: <871ub2crhm.fsf@uwakimon.sk.tsukuba.ac.jp> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:158205 Archived-At: > From: "Stephen J. Turnbull" > Cc: per.starback@gmail.com, > monnier@iro.umontreal.ca, > emacs-devel@gnu.org > Date: Tue, 26 Mar 2013 20:47:33 +0900 > > > > Trying to support multiple EOL codings in the buffer is craziness. > > > > But it's the only way to be 100% sure you don't introduce spurious > > changes into files. And since newlines, unlike characters, are not > > displayed, there's no issues with fonts etc. here. > > Currently NLFs *are* displayed, if they don't match the default for > the buffer. No, they are displayed because nothing other than a single LF is treated like NLF by the Emacs internals. EOL conversion is a layer on top of that; the buffer maintenance and the display engine know absolutely nothing about it. Once these byte sequences are recognized as NLFs, they will not be displayed, because that's how the Emacs display works. > > > Doing it only for EOLs would be much less painful, but it's not > > > worth it. > > > > Please explain why do you think it isn't worth it. > > Because you have to fix pretty much everything I'm probably missing something important, because things I think will need fixing are nowhere near "pretty much everything". How about posting a long enough list of things to fix to convince me that "pretty much everything" is close to the truth? > new syntax will be required for stuff like zap-to-char Why? > and nearly required for regexps. For $ we will need to get regex.c support the additional NLFs, and that's all. If you mean a literal \n in regexps, then yes, something will have to be done with that. But it would be a good thing on its own right, because Emacs will come closer to supporting Unicode standard annexes. > Code will be massively uglified with tests for variable-length > sequences instead of single characters The code is already replete with that, ever since Emacs started using a multi-byte representation for characters in buffers. We have a set of macros to fetch and examine multi-byte sequences, for that reason. I see nothing hard or "ugly" here, sorry. > everything from motion to insdel will have to be modified Why? > Any code handling old-style hidden lines (with CR marking > "invisible" lines) will have to be changed. First, we want to deprecate and remove this feature anyway (there's already an implemented alternative). And second, we already handle this today so that we don't display ^M there; the same method can be used for the other NLFs. > It's not obvious to me that there are no counterintuitive > implications. Opposed to that, there are very few text files with > mixed line endings, and in many cases the user would actually like to > have them regularized (at a time of their choosing, so they can have a > commit with only whitespace changes, for example). We should be consistent: either there is a problem with mixed line endings and with Unicode NLFs that aren't treated as EOL at all, or there isn't. If the problem is insignificant, perhaps nothing should be changed at all. If the problem _is_ significant, we might as well solve it The Right Way, instead of applying more and more band-aid. Conversion of NLFs to a single LF is a kludge, same as emptying the kettle when you already have a procedure for preparing a kettle of boiled water starting with an empty one. You cannot do such conversion efficiently if you need to discover the EOL format for every line. Dispensing with the conversion altogether solves both problems in one go. What it adds doesn't seem so frightening to me, certainly less so than, say, adding bidi support ;-) > > Surely, going again through the pain of inadvertent changes to user > > files is a movie we don't want to be part of again. > > What pain of inadvertant changes? Sure, there will likely be bugs in > the first draft of such code, what else is new? If you're talking > specifically about the \201 regression, that's a completely different > issue AFAICT -- that was about buffer-as-unibyte exposing the > *internal* representation to Lisp, which was a "Mr. Foot, may I > introduce to you Mr. Bullet" kind of idea from Day 1. The internal representation is still exposed, so nothing's changed in that department. > > > > Anything else _will_ introduce spurious modifications, and could > > > > even corrupt some files, if the exact EOL sequence here or there > > > > matters. > > > > > > No, it need not, any more than any ambiguous encoding need do so. Of > > > course it will be fragile if (for example) Emacs crashes and you have > > > to recover an autosave file. > > > > It will be fragile, and subtle bugs will tend to break quite a bit. > > I don't think so. Well, then we will have agree to disagree. > I think you're hearing monsters in the closet. And I think _you_ are hearing them. Or maybe you will show me such a large list of things that will become broken by keeping NLFs that I will change my mind.