EOL: unix/dos/mac

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* EOL: unix/dos/mac
@ 2013-03-25 13:34 Per Starbäck
  2013-03-25 13:56 ` Xue Fuqiao
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Per Starbäck @ 2013-03-25 13:34 UTC (permalink / raw)
  To: emacs-devel

The end-of-line indicators for coding systems are unix, dos, and mac.
I suggest they are replaced with lf, crlf, and cr.

I think the current indicators are misleading for those who don't know
about this.

Mac OS X has been preloaded on all Macs since 2002.
Mac owners who find how to click on the colon on the mode line might
think that they should
have "Mac" there. (And other users might think they are doing their
Mac user friends a favor if they
convert a text file to "Mac" before sending them a file.)

"DOS" might also lead to confusion, since Microsoft Windows isn't
really DOS (anymore).
Wouldn't users who see that think that this must be something old in
some obsolete DOS encoding?

It's not a big problem, but it will (very) slowly get bigger and
bigger with time, so that even people
who do know about this stuff can be confused.

So I think it would be best if Emacs used designations that aren't
about what systems (used to) use
the different codings, but instead just "(CR)", "(LF)", "(CRLF)".

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 13:34 EOL: unix/dos/mac Per Starbäck
@ 2013-03-25 13:56 ` Xue Fuqiao
  2013-03-25 22:41   ` Richard Stallman
  2013-03-25 14:21 ` Eli Zaretskii
  2013-03-25 19:17 ` Stefan Monnier
  2 siblings, 1 reply; 27+ messages in thread
From: Xue Fuqiao @ 2013-03-25 13:56 UTC (permalink / raw)
  To: Per Starbäck; +Cc: emacs-devel

On Mon, 25 Mar 2013 14:34:04 +0100
Per Starbäck <per.starback@gmail.com> wrote:

> The end-of-line indicators for coding systems are unix, dos, and mac.
> I suggest they are replaced with lf, crlf, and cr.

+1

-- 
Xue Fuqiao
http://www.gnu.org/software/emacs/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 13:56 ` Xue Fuqiao
@ 2013-03-25 22:41   ` Richard Stallman
  2013-03-26  2:11     ` Stephen J. Turnbull
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Stallman @ 2013-03-25 22:41 UTC (permalink / raw)
  To: Xue Fuqiao; +Cc: per.starback, emacs-devel

    > The end-of-line indicators for coding systems are unix, dos, and mac.
    > I suggest they are replaced with lf, crlf, and cr.

Someone needs to check how this would affect non-wizard users.
It might be confusing for them.  It might also be good for them.
The point is, we are not like them and we can't tell
how this would affect them.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 22:41   ` Richard Stallman
@ 2013-03-26  2:11     ` Stephen J. Turnbull
  0 siblings, 0 replies; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26  2:11 UTC (permalink / raw)
  To: rms; +Cc: Xue Fuqiao, per.starback, emacs-devel

Richard Stallman writes:

 > > The end-of-line indicators for coding systems are unix, dos, and
 > > mac.  I suggest they are replaced with lf, crlf, and cr.
 > 
 > Someone needs to check how this would affect non-wizard users.

I don't see why it would.  Non-wizards rarely want to see it at all,
and usually have a very incomplete understanding of what it means.
IME that's what it means to be a non-wizard.  Even in Japan, where
users encounter 4 or 5 (!!) encodings *every* *day* (ISO-2022-JP in
mail headers, EUC-JP and Shift JIS in text files from older *nix and
Micros*ft environments, UTF-8 in text files from modern environments,
and UTF-16 in file names on NT file systems), younger users don't even
realize that they're there.  They just call coding problems "mojibake"
and ask for corrected data.

I think a better way to present this information would be to put it in
a separate "troubleshoot this buffer" function.  Perhaps adding it to
C-u C-x =, or a separate function on C-h = (both with the nuance
"troubleshoot around point").  Caveat: I have no empirical evidence
for the feeling that this would be better, just introspection and
experience with helping users who are not much helped by the current
UI.

The idea is that ordinarily, Emacs just Does The Right Thing, so
there's no need to know what the EOL suffix (or for that matter the
EOL modeline indicator) means, and many users forget or never learn.
If they *do* run into trouble like "stair-stepping" or "^M" in a
buffer, they can use C-u C-x = to find out "what's different about
this linebreak".

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 13:34 EOL: unix/dos/mac Per Starbäck
  2013-03-25 13:56 ` Xue Fuqiao
@ 2013-03-25 14:21 ` Eli Zaretskii
  2013-03-25 17:28   ` Dani Moncayo
  2013-03-25 19:17 ` Stefan Monnier
  2 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-25 14:21 UTC (permalink / raw)
  To: Per Starbäck; +Cc: emacs-devel

> Date: Mon, 25 Mar 2013 14:34:04 +0100
> From: Per Starbäck <per.starback@gmail.com>
> 
> The end-of-line indicators for coding systems are unix, dos, and mac.
> I suggest they are replaced with lf, crlf, and cr.

I have customized my Emacsen long ago to show /, \, and : instead.
Never looked back, and I will certainly keep those customizations if
your suggestion is accepted as the default.

The current indicators are shown only if the EOL format is _not_ the
native one on the underlying platform.  That was done a long time ago,
to draw users' attention to the fact that the file has unusual line
endings.  I think the need to draw attention to that has passed.  But
that's me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 14:21 ` Eli Zaretskii
@ 2013-03-25 17:28   ` Dani Moncayo
  0 siblings, 0 replies; 27+ messages in thread
From: Dani Moncayo @ 2013-03-25 17:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Per Starbäck, emacs-devel

>> The end-of-line indicators for coding systems are unix, dos, and mac.
>> I suggest they are replaced with lf, crlf, and cr.
>
> I have customized my Emacsen long ago to show /, \, and : instead.
> Never looked back, and I will certainly keep those customizations if
> your suggestion is accepted as the default.

Me too.  I don't want these strings to take more than one character in
the modeline, which is sometimes too short.

> The current indicators are shown only if the EOL format is _not_ the
> native one on the underlying platform.  That was done a long time ago,
> to draw users' attention to the fact that the file has unusual line
> endings.  I think the need to draw attention to that has passed.  But
> that's me.

I also prefer a consistent notation across all platforms.  I don't
think that this information (EOL-style) deserves so much attention.

-- 
Dani Moncayo



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 13:34 EOL: unix/dos/mac Per Starbäck
  2013-03-25 13:56 ` Xue Fuqiao
  2013-03-25 14:21 ` Eli Zaretskii
@ 2013-03-25 19:17 ` Stefan Monnier
  2013-03-26  1:42   ` Stephen J. Turnbull
  2013-03-26  7:53   ` Ulrich Mueller
  2 siblings, 2 replies; 27+ messages in thread
From: Stefan Monnier @ 2013-03-25 19:17 UTC (permalink / raw)
  To: Per Starbäck; +Cc: emacs-devel

> The end-of-line indicators for coding systems are unix, dos, and mac.
> I suggest they are replaced with lf, crlf, and cr.

I do not like cr/lf/crlf as I expect many users will have no idea what
they mean.

> Mac OS X has been preloaded on all Macs since 2002.

The "Mac" indicator is indeed a very poor choice nowadays.

> "DOS" might also lead to confusion, since Microsoft Windows isn't
> really DOS (anymore).

"DOS" is not a great choice either, indeed, tho it's definitely not as
bad as "Mac" since the heir of DOS still uses the same system.

> I have customized my Emacsen long ago to show /, \, and : instead.

I also like this representation, since it happens to correlate rather
well (although most Mac OS X users never see the `/', just like most Mac
OS users never saw the `:' separator).

> The current indicators are shown only if the EOL format is _not_ the
> native one on the underlying platform.  That was done a long time ago,
> to draw users' attention to the fact that the file has unusual line
> endings.  I think the need to draw attention to that has passed.  But
> that's me.

I actually disagree that this need has passed.  For that reason,
I actually like to see "(DOS)" in the modeline, since a simple change
from "/" to "\" would definitely go unnoticed (in my case at least).

So I'm OK with "updating" the indicators, tho I'm not sure what we
should use instead.  To replace "Mac", maybe we could use "MacOS9",
which is longish but hopefully such files are rare nowadays.  But DOS
files are not rare, so we need something sufficiently concise.

BTW, in this same area, it would be good to detect and indicate
prominently "Unix with some CRLFs", also known as "mixed-line-ending",
which is often misunderstood as "my Emacs fails to recognize my CRLF
file".

        Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 19:17 ` Stefan Monnier
@ 2013-03-26  1:42   ` Stephen J. Turnbull
  2013-03-26  6:28     ` Eli Zaretskii
                       ` (2 more replies)
  2013-03-26  7:53   ` Ulrich Mueller
  1 sibling, 3 replies; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26  1:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Per Starbäck, emacs-devel

Stefan Monnier writes:

 > BTW, in this same area, it would be good to detect and indicate
 > prominently "Unix with some CRLFs", also known as "mixed-line-ending",
 > which is often misunderstood as "my Emacs fails to recognize my CRLF
 > file".

Unicode doesn't care, you know: it considers all ASCII line breaks and
terminators to be the same thing (NEW LINE FUNCTION).  I haven't read
that part of the standard in a long time, but IIRC, although many
people interpolate "according to platform", Unicode doesn't care about
that, it just says "all of these sequences when encountered in text
purporting to conform to this standard should be treated in the same
way."  Emacsen should do the same.

The question then is how to deal with file comparison.  We'd like to
avoid creating spurious diffs based on "fixing" random different line
endings, so if the user doesn't edit those positions (lines?), the
line ending should be written as read.  I guess one could attach a
text property to newlines differing from the file's autodetected EOL
convention.

I've also considered switching the internal representation of newline
to U+2028 LINE SEPARATOR, but that's not at all pressing.

Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26  1:42   ` Stephen J. Turnbull
@ 2013-03-26  6:28     ` Eli Zaretskii
  2013-03-26  7:45       ` Stephen J. Turnbull
  2013-03-26 12:51     ` Stefan Monnier
  2013-03-26 14:02     ` Alan Mackenzie
  2 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-26  6:28 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: per.starback, monnier, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Date: Tue, 26 Mar 2013 10:42:38 +0900
> Cc: Per Starbäck <per.starback@gmail.com>, emacs-devel@gnu.org
> 
> Stefan Monnier writes:
> 
>  > BTW, in this same area, it would be good to detect and indicate
>  > prominently "Unix with some CRLFs", also known as "mixed-line-ending",
>  > which is often misunderstood as "my Emacs fails to recognize my CRLF
>  > file".
> 
> Unicode doesn't care, you know: it considers all ASCII line breaks and
> terminators to be the same thing (NEW LINE FUNCTION).  I haven't read
> that part of the standard in a long time, but IIRC, although many
> people interpolate "according to platform", Unicode doesn't care about
> that, it just says "all of these sequences when encountered in text
> purporting to conform to this standard should be treated in the same
> way."  Emacsen should do the same.

That would require Emacs to store all the possible EOL sequences in
the buffer, and treat them all identically.  That's doable, but is a
non-trivial job; volunteers are welcome.

> The question then is how to deal with file comparison.  We'd like to
> avoid creating spurious diffs based on "fixing" random different line
> endings

If Emacs is to support different EOL formats in the same file, it
should not convert them at all.  Anything else _will_ introduce
spurious modifications, and could even corrupt some files, if the
exact EOL sequence here or there matters.

> I guess one could attach a text property to newlines differing from
> the file's autodetected EOL convention.

Not sure how a text property should help here.

> I've also considered switching the internal representation of newline
> to U+2028 LINE SEPARATOR

What good would that be?




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26  6:28     ` Eli Zaretskii
@ 2013-03-26  7:45       ` Stephen J. Turnbull
  2013-03-26  8:42         ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26  7:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: per.starback, monnier, emacs-devel

Eli Zaretskii writes:
 > > From: "Stephen J. Turnbull" <stephen@xemacs.org>

 > > [Unicode] just says "all of these sequences when encountered in
 > > text purporting to conform to this standard should be treated in
 > > the same way."  Emacsen should do the same.
 > 
 > That would require Emacs to store all the possible EOL sequences in
 > the buffer, and treat them all identically.  That's doable, but is a
 > non-trivial job; volunteers are welcome.

I don't know what you mean by "all the possible EOL sequences".  It's
well-defined (in Unicode TR#13 or section 5.8 of Unicode 6.2) what an
NLF is: it's the first of CRLF, LF, CR, or NL (U+0085) that matches
when parsing a line.  In the buffer, they would all be converted to
Emacs' representation (ie, LF).  Ensuring that C-x C-f file RET C-x
C-w file RET is the identity requires marking non-default EOL
sequences somehow, that's all.

 > > The question then is how to deal with file comparison.  We'd like to
 > > avoid creating spurious diffs based on "fixing" random different line
 > > endings
 > 
 > If Emacs is to support different EOL formats in the same file, it
 > should not convert them at all.

Of course it should convert them.

Trying to support multiple EOL codings in the buffer is craziness.
Two decades ago, I had to live that madness at the coding system
level, it was called "Nihongo Emacs" (or "The Japanese Patch" in other
programs).  Richard (and every other upstream maintainer) rightly
(with all due respect to the developers of those patches) rejected
that patch for application to the mainstream project.  Doing it only
for EOLs would be much less painful, but it's not worth it.

 > Anything else _will_ introduce spurious modifications, and could
 > even corrupt some files, if the exact EOL sequence here or there
 > matters.

No, it need not, any more than any ambiguous encoding need do so.  Of
course it will be fragile if (for example) Emacs crashes and you have
to recover an autosave file.

 > > I guess one could attach a text property to newlines differing from
 > > the file's autodetected EOL convention.
 > 
 > Not sure how a text property should help here.

It would mark non-default EOL sequences for correct output.

 > > I've also considered switching the internal representation of newline
 > > to U+2028 LINE SEPARATOR
 > 
 > What good would that be?

Unicode correctness; no confusion between Emacs internal
representation and the actual encoding of EOL on any given platform;
no long-lines ambiguity (LS would be considered a "soft newline" in
applications that automatically rewrap, and U+2029 PARAGRAPH SEPARATOR
would unambiguously demark paragraphs).

As I wrote, it's not urgent.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26  7:45       ` Stephen J. Turnbull
@ 2013-03-26  8:42         ` Eli Zaretskii
  2013-03-26 11:47           ` Stephen J. Turnbull
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-26  8:42 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: per.starback, monnier, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: per.starback@gmail.com,
>     monnier@iro.umontreal.ca,
>     emacs-devel@gnu.org
> Date: Tue, 26 Mar 2013 16:45:30 +0900
> 
> Eli Zaretskii writes:
>  > > From: "Stephen J. Turnbull" <stephen@xemacs.org>
> 
>  > > [Unicode] just says "all of these sequences when encountered in
>  > > text purporting to conform to this standard should be treated in
>  > > the same way."  Emacsen should do the same.
>  > 
>  > That would require Emacs to store all the possible EOL sequences in
>  > the buffer, and treat them all identically.  That's doable, but is a
>  > non-trivial job; volunteers are welcome.
> 
> I don't know what you mean by "all the possible EOL sequences".  It's
> well-defined (in Unicode TR#13 or section 5.8 of Unicode 6.2) what an
> NLF is: it's the first of CRLF, LF, CR, or NL (U+0085) that matches
> when parsing a line.

That's what I meant: any of the possible NLF.

>  > > The question then is how to deal with file comparison.  We'd like to
>  > > avoid creating spurious diffs based on "fixing" random different line
>  > > endings
>  > 
>  > If Emacs is to support different EOL formats in the same file, it
>  > should not convert them at all.
> 
> Of course it should convert them.
> 
> Trying to support multiple EOL codings in the buffer is craziness.

But it's the only way to be 100% sure you don't introduce spurious
changes into files.  And since newlines, unlike characters, are not
displayed, there's no issues with fonts etc. here.  So "craziness"
sounds like exaggeration to me, although I do agree that making this
happen is not a trivial job.

> Doing it only for EOLs would be much less painful, but it's not
> worth it.

Please explain why do you think it isn't worth it.  Surely, going
again through the pain of inadvertent changes to user files is a movie
we don't want to be part of again.

>  > Anything else _will_ introduce spurious modifications, and could
>  > even corrupt some files, if the exact EOL sequence here or there
>  > matters.
> 
> No, it need not, any more than any ambiguous encoding need do so.  Of
> course it will be fragile if (for example) Emacs crashes and you have
> to recover an autosave file.

It will be fragile, and subtle bugs will tend to break quite a bit.

>  > > I guess one could attach a text property to newlines differing from
>  > > the file's autodetected EOL convention.
>  > 
>  > Not sure how a text property should help here.
> 
> It would mark non-default EOL sequences for correct output.

And when text properties are removed by some operation on a buffer,
what then?  I don't think this is reliable enough to ensure we don't
change user files where the user didn't edit them.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26  8:42         ` Eli Zaretskii
@ 2013-03-26 11:47           ` Stephen J. Turnbull
  2013-03-26 13:07             ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26 11:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: per.starback, monnier, emacs-devel

Eli Zaretskii writes:
 > > From: "Stephen J. Turnbull" <stephen@xemacs.org>

 > > Of course it should convert them.
 > > 
 > > Trying to support multiple EOL codings in the buffer is craziness.
 > 
 > But it's the only way to be 100% sure you don't introduce spurious
 > changes into files.  And since newlines, unlike characters, are not
 > displayed, there's no issues with fonts etc. here.

Currently NLFs *are* displayed, if they don't match the default for
the buffer.  Some even appear as glyphs (^M in -unix buffers).  Sure,
there's no issue with fonts.  There are worse things than getting the
wrong font, though.

 > So "craziness" sounds like exaggeration to me, although I do agree
 > that making this happen is not a trivial job.
 > 
 > > Doing it only for EOLs would be much less painful, but it's not
 > > worth it.
 > 
 > Please explain why do you think it isn't worth it.

Because you have to fix pretty much everything, and new syntax will be
required for stuff like zap-to-char and nearly required for regexps.
Code will be massively uglified with tests for variable-length
sequences instead of single characters, everything from motion to
insdel will have to be modified.  Any code handling old-style hidden
lines (with CR marking "invisible" lines) will have to be changed.
It's not obvious to me that there are no counterintuitive
implications.  Opposed to that, there are very few text files with
mixed line endings, and in many cases the user would actually like to
have them regularized (at a time of their choosing, so they can have a
commit with only whitespace changes, for example).

 > Surely, going again through the pain of inadvertent changes to user
 > files is a movie we don't want to be part of again.

What pain of inadvertant changes?  Sure, there will likely be bugs in
the first draft of such code, what else is new?  If you're talking
specifically about the \201 regression, that's a completely different
issue AFAICT -- that was about buffer-as-unibyte exposing the
*internal* representation to Lisp, which was a "Mr. Foot, may I
introduce to you Mr. Bullet" kind of idea from Day 1.

 > >  > Anything else _will_ introduce spurious modifications, and could
 > >  > even corrupt some files, if the exact EOL sequence here or there
 > >  > matters.
 > > 
 > > No, it need not, any more than any ambiguous encoding need do so.  Of
 > > course it will be fragile if (for example) Emacs crashes and you have
 > > to recover an autosave file.
 > 
 > It will be fragile, and subtle bugs will tend to break quite a bit.

I don't think so.  It can be implemented as two functions, one run
just after decoding text from an external encoding, and one run just
before encoding text to an external encoding.  Done efficiently it can
probably be applied to saving autosave files as well, removing the
fragility.  For maximum safety the information about non-default NLFs
could be kept in "no-see-'um" properties accessed by separate APIs so
that users and programs don't accidentally delete the information.

 > >  > > I guess one could attach a text property to newlines differing from
 > >  > > the file's autodetected EOL convention.
 > >  > 
 > >  > Not sure how a text property should help here.
 > > 
 > > It would mark non-default EOL sequences for correct output.
 > 
 > And when text properties are removed by some operation on a buffer,
 > what then?  I don't think this is reliable enough to ensure we don't
 > change user files where the user didn't edit them.

I think you're hearing monsters in the closet.  Sure, that *could*
happen but code that does so is buggy IMO.  If that's not a good
enough answer, "no-see-'um" properties as described above would do the
trick.  I suspect that operations that change properties are rare
enough that putting a check for a "don't change me" flag into the
normal text property operations would not be an efficiency hit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 11:47           ` Stephen J. Turnbull
@ 2013-03-26 13:07             ` Eli Zaretskii
  2013-03-26 18:12               ` Stephen J. Turnbull
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-26 13:07 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: per.starback, monnier, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: per.starback@gmail.com,
>     monnier@iro.umontreal.ca,
>     emacs-devel@gnu.org
> Date: Tue, 26 Mar 2013 20:47:33 +0900
> 
>  > > Trying to support multiple EOL codings in the buffer is craziness.
>  > 
>  > But it's the only way to be 100% sure you don't introduce spurious
>  > changes into files.  And since newlines, unlike characters, are not
>  > displayed, there's no issues with fonts etc. here.
> 
> Currently NLFs *are* displayed, if they don't match the default for
> the buffer.

No, they are displayed because nothing other than a single LF is
treated like NLF by the Emacs internals.  EOL conversion is a layer on
top of that; the buffer maintenance and the display engine know
absolutely nothing about it.

Once these byte sequences are recognized as NLFs, they will not be
displayed, because that's how the Emacs display works.

>  > > Doing it only for EOLs would be much less painful, but it's not
>  > > worth it.
>  > 
>  > Please explain why do you think it isn't worth it.
> 
> Because you have to fix pretty much everything

I'm probably missing something important, because things I think will
need fixing are nowhere near "pretty much everything".  How about
posting a long enough list of things to fix to convince me that
"pretty much everything" is close to the truth?

> new syntax will be required for stuff like zap-to-char

Why?

> and nearly required for regexps.

For $ we will need to get regex.c support the additional NLFs, and
that's all.  If you mean a literal \n in regexps, then yes, something
will have to be done with that.  But it would be a good thing on its
own right, because Emacs will come closer to supporting Unicode
standard annexes.

> Code will be massively uglified with tests for variable-length
> sequences instead of single characters

The code is already replete with that, ever since Emacs started using
a multi-byte representation for characters in buffers.  We have a set
of macros to fetch and examine multi-byte sequences, for that reason.
I see nothing hard or "ugly" here, sorry.

> everything from motion to insdel will have to be modified

Why?

> Any code handling old-style hidden lines (with CR marking
> "invisible" lines) will have to be changed.

First, we want to deprecate and remove this feature anyway (there's
already an implemented alternative).  And second, we already handle
this today so that we don't display ^M there; the same method can be
used for the other NLFs.

> It's not obvious to me that there are no counterintuitive
> implications.  Opposed to that, there are very few text files with
> mixed line endings, and in many cases the user would actually like to
> have them regularized (at a time of their choosing, so they can have a
> commit with only whitespace changes, for example).

We should be consistent: either there is a problem with mixed line
endings and with Unicode NLFs that aren't treated as EOL at all, or
there isn't.  If the problem is insignificant, perhaps nothing should
be changed at all.  If the problem _is_ significant, we might as well
solve it The Right Way, instead of applying more and more band-aid.
Conversion of NLFs to a single LF is a kludge, same as emptying the
kettle when you already have a procedure for preparing a kettle of
boiled water starting with an empty one.  You cannot do such
conversion efficiently if you need to discover the EOL format for
every line.  Dispensing with the conversion altogether solves both
problems in one go.  What it adds doesn't seem so frightening to me,
certainly less so than, say, adding bidi support ;-)

>  > Surely, going again through the pain of inadvertent changes to user
>  > files is a movie we don't want to be part of again.
> 
> What pain of inadvertant changes?  Sure, there will likely be bugs in
> the first draft of such code, what else is new?  If you're talking
> specifically about the \201 regression, that's a completely different
> issue AFAICT -- that was about buffer-as-unibyte exposing the
> *internal* representation to Lisp, which was a "Mr. Foot, may I
> introduce to you Mr. Bullet" kind of idea from Day 1.

The internal representation is still exposed, so nothing's changed in
that department.

>  > >  > Anything else _will_ introduce spurious modifications, and could
>  > >  > even corrupt some files, if the exact EOL sequence here or there
>  > >  > matters.
>  > > 
>  > > No, it need not, any more than any ambiguous encoding need do so.  Of
>  > > course it will be fragile if (for example) Emacs crashes and you have
>  > > to recover an autosave file.
>  > 
>  > It will be fragile, and subtle bugs will tend to break quite a bit.
> 
> I don't think so.

Well, then we will have agree to disagree.

> I think you're hearing monsters in the closet.

And I think _you_ are hearing them.  Or maybe you will show me such a
large list of things that will become broken by keeping NLFs that I
will change my mind.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 13:07             ` Eli Zaretskii
@ 2013-03-26 18:12               ` Stephen J. Turnbull
  2013-03-26 18:44                 ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26 18:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: per.starback, monnier, emacs-devel

Eli Zaretskii writes:
 > > From: "Stephen J. Turnbull" <stephen@xemacs.org>

 > > Currently NLFs *are* displayed, if they don't match the default for
 > > the buffer.
 > 
 > No, they are displayed because nothing other than a single LF is
 > treated like NLF by the Emacs internals.

Emacs doesn't get to define NLF; it's a Unicode concept.  You'll get
in trouble if you get confused about that.  Those *are* NLFs, and in
the "CR in *-unix buffer" form they *are* displayed as "^M"s, while in
the "bare LF in *-doc buffer" form they *do* appear as stair-stepping
lines.  That does bother some users, including some who understand why
it happens.

 > > Because you have to fix pretty much everything
 > 
 > I'm probably missing something important, because things I think will
 > need fixing are nowhere near "pretty much everything".  How about
 > posting a long enough list of things to fix to convince me that
 > "pretty much everything" is close to the truth?

"Everything" is of course an exaggeration.  At a minimum, you need to
change delete and motion commands to handle the fact that EOL doesn't
have a constant width in characters.  Should users be able to move
*into* a CRLF in -unix buffer?  How about a -dos buffer?  Should
forward-char-command move into or *over* a CRLF?  Does it matter what
the EOL convention is for that buffer?  What are we going to do for
the occasional user who wants the less usual behavior for some reason?
You need to decide what (insert "\015") means in a -dos buffer, and
you can be pretty sure that some users will be confused whichever you
choose.  Ditto (insert "\012") in a -mac buffer.  You may very well
want those to mean something different from the commands that
self-insert either or both of those characters.  Until now,
skip-chars-forward and regexps would find EOL if the string defining
the target contained "\n".  Is that going to continue to be true?  How
do you propose to find a bare LF -- are we going to make users use
octal or hex escapes, or do we define new string syntax?

 > > Code will be massively uglified with tests for variable-length
 > > sequences instead of single characters
 > 
 > The code is already replete with that, ever since Emacs started using
 > a multi-byte representation for characters in buffers.  We have a set
 > of macros to fetch and examine multi-byte sequences, for that reason.
 > I see nothing hard or "ugly" here, sorry.

Ah, but this is completely a different story.  Those there are C
macros, and not visible to Lisp programs, which know that a line break
is represented by a single character, U+000A.  That's no longer true
for NLF, which by definition is composed of one or more *characters*,
not code units.  It's *Lisp* code that has to deal with this.

 > > Any code handling old-style hidden lines (with CR marking
 > > "invisible" lines) will have to be changed.
 > 
 > First, we want to deprecate and remove this feature anyway (there's
 > already an implemented alternative).  And second, we already handle
 > this today so that we don't display ^M there; the same method can be
 > used for the other NLFs.

Sorry, that breaks immediately.  That ^M is now an NLF, and you either
treat it that way and not as an invisibility marker, or the meaning of
the buffer changes when you switch that mode on and off in a very
delicate way.  I'm pretty sure it will corrupt the buffer unless you
mark preexisting ^Ms as NLFs or convert them to something else.  Which
is what I'm proposing, of course.

So you can fall back on deprecation.  Has the feature actually been
scheduled for deprecation and eventual removal?  If not, you're
looking at 5-10 years before it gets removed.

 > If the problem _is_ significant, we might as well solve it The
 > Right Way, instead of applying more and more band-aid.  Conversion
 > of NLFs to a single LF is a kludge,

Not to mention a close approximation to the right way to handle them
according to the Unicode standard under many circumstances.  (The
truly correct way to handle them is to substitute LINE SEPARATOR, as I
mentioned earlier.)

 > You cannot do such conversion efficiently if you need to discover
 > the EOL format for every line.

Of course you can.  You don't need to "discover" the EOL format; you
know that an EOL is anything that matches "\r\n\|\r\|\n\|\205" as you
move forward through the buffer.  It's only a tiny bit more expensive
than current conversion for -dos or -mac, and those are hardly
prohibitive, especially when compared to I/O itself.

 > What it adds doesn't seem so frightening to me, certainly less so
 > than, say, adding bidi support ;-)

Agreed, but irrelevant.  bidi is a new feature necessary to support
some languages currently used by millions of people, and the hairiness
is mandated by UAX #9 -- an alternative implementation is not going to
make conformance much easier.  What we're talking about here are
alternative implementations of a much smaller feature, NLF, and which
one is going to be more efficient and more natural for Emacs.

 > The internal representation is still exposed, so nothing's changed in
 > that department.

I know, and taking advantage of that exposure still falls in the class
of "Kids, these stunts are performed by trained professionals.  Don't
try this at home!"  Can you deny that?

 > > I think you're hearing monsters in the closet.
 > 
 > And I think _you_ are hearing them.

Well, yes, I am.  But I've worked with implementations of coding
systems in both XEmacs and Python, and I know that what I'm talking
about will work and be efficient, and buffers and strings will
continue to conform to the Emacs model.  I know that what you're
talking about will break some invariants for character motion and
editing at line end, and that worries me.  Proof?  You're right, I
have none.  By the same token, you don't either.  What worries me is
that while I can prove (or perhaps disprove) my point with a small set
of unit tests and benchmarks, you will have to hand that version of
Emacs to real users for a year or three to find out if anybody really
cares that the model broke.

 > Or maybe you will show me such a large list of things that will
 > become broken by keeping NLFs that I will change my mind.

I can't; I gave you my list already, and I grant that it's not all
that long and several of the potential problems can't be confirmed at
this point.  But if you decide to keep NLFs in the buffer rather than
conforming to the tried and true Emacs/Mule model of converting them
to a one-character representation, I predict you will find plenty of
breakage over years, just as the \201 bug regressed multiple times
over something like a decade.

It's true that keeping NLFs in the buffer will bring Emacs's internal
representation into closer conformance with the Unicode Standard, but
both the benefits and the costs of that are unclear to me.  Sure, it
makes it conceptually straightforward to support Unicode handling of
NLF in regexps, but you can already do that by simply avoiding EOL
conversion when you need highly accurate Unicode conformance.  On the
other hand, when you are treating NLFs as NLFs, you will be breaking
the 40-year-old Emacs model of a linebreak marked by a single
character.  I don't know what trouble that will cause, but there's no
easy workaround for it that preserves those NLFs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 18:12               ` Stephen J. Turnbull
@ 2013-03-26 18:44                 ` Eli Zaretskii
  2013-03-27  5:10                   ` Stephen J. Turnbull
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-26 18:44 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: per.starback, monnier, emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: per.starback@gmail.com,
>     monnier@iro.umontreal.ca,
>     emacs-devel@gnu.org
> Date: Wed, 27 Mar 2013 03:12:11 +0900
> 
> Eli Zaretskii writes:
>  > > From: "Stephen J. Turnbull" <stephen@xemacs.org>
> 
>  > > Currently NLFs *are* displayed, if they don't match the default for
>  > > the buffer.
>  > 
>  > No, they are displayed because nothing other than a single LF is
>  > treated like NLF by the Emacs internals.
> 
> Emacs doesn't get to define NLF; it's a Unicode concept.

Can we be less pedantic, please, just to have the water less muddy?

OK, let me rephrase: they are displayed because nothing other than a
single LF character is currently treated by Emacs as an end of line.
An end of line is never displayed by Emacs or sent to the screen, not
even on a TTY; it is acted upon by moving the display to the next line
(a.k.a. "new-line function").

> Those *are* NLFs, and in
> the "CR in *-unix buffer" form they *are* displayed as "^M"s, while in
> the "bare LF in *-doc buffer" form they *do* appear as stair-stepping
> lines.

I guess you meant "-dos", not "-doc".  Anyway, there are no
stair-stepping lines in Emacs because of this, because Emacs never
outputs the EOL sequences to the screen.  That is why the -unix or
-dos variants are meaningless in terminal-coding-system.

> "Everything" is of course an exaggeration.  At a minimum, you need to
> change delete and motion commands to handle the fact that EOL doesn't
> have a constant width in characters.  Should users be able to move
> *into* a CRLF in -unix buffer?  How about a -dos buffer?

No and no (and there won't be any -unix and -dos buffers under this
mode of operation).

> Should forward-char-command move into or *over* a CRLF?

No.

> Does it matter what the EOL convention is for that buffer?

No.

> What are we going to do for the occasional user who wants the less
> usual behavior for some reason?

What "less usual behavior"?

> You need to decide what (insert "\015") means in a -dos buffer

No decision required: it will insert an CR, like it does today.  If
that CR happens to precede a newline, it will become invisible when
inserted.

> Until now, skip-chars-forward and regexps would find EOL if the
> string defining the target contained "\n".  Is that going to
> continue to be true?  How do you propose to find a bare LF -- are we
> going to make users use octal or hex escapes, or do we define new
> string syntax?

I see no serious problems with this, sorry to disappoint you.

> Ah, but this is completely a different story.  Those there are C
> macros, and not visible to Lisp programs, which know that a line break
> is represented by a single character, U+000A.  That's no longer true
> for NLF, which by definition is composed of one or more *characters*,
> not code units.  It's *Lisp* code that has to deal with this.

Lisp code already needs to deal with similar complications, e.g. when
it moves across invisible text or text covered by a 'display' property
or overlay string.

>  > > Any code handling old-style hidden lines (with CR marking
>  > > "invisible" lines) will have to be changed.
>  > 
>  > First, we want to deprecate and remove this feature anyway (there's
>  > already an implemented alternative).  And second, we already handle
>  > this today so that we don't display ^M there; the same method can be
>  > used for the other NLFs.
> 
> Sorry, that breaks immediately.  That ^M is now an NLF, and you either
> treat it that way and not as an invisibility marker, or the meaning of
> the buffer changes when you switch that mode on and off in a very
> delicate way.

No, it doesn't break, like it doesn't today.  When selective display
is in effect, a buffer-local variable says that, so you can treat ^M
accordingly.

> So you can fall back on deprecation.  Has the feature actually been
> scheduled for deprecation and eventual removal?

Yes, long ago.

>  > What it adds doesn't seem so frightening to me, certainly less so
>  > than, say, adding bidi support ;-)
> 
> Agreed, but irrelevant.  bidi is a new feature necessary to support
> some languages currently used by millions of people, and the hairiness
> is mandated by UAX #9 -- an alternative implementation is not going to
> make conformance much easier.

You are missing my point, which was about implications _on_Emacs_ of
adding bidi support.  UAX#9 cannot (and didn't) help making design
decisions in that regard.

>  > The internal representation is still exposed, so nothing's changed in
>  > that department.
> 
> I know, and taking advantage of that exposure still falls in the class
> of "Kids, these stunts are performed by trained professionals.  Don't
> try this at home!"  Can you deny that?

No.  But I'm saying that given that exposure, the abstraction _will_
leak, and when it does, users will be unhappy again.

> I know that what you're talking about will break some invariants for
> character motion and editing at line end, and that worries me.
> Proof?  You're right, I have none.

You don't need a proof, because I agree.  But we already have quite a
few features that introduce peculiar effects into character motion,
and they didn't cause any catastrophes.  I don't see why this one is
any different.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 18:44                 ` Eli Zaretskii
@ 2013-03-27  5:10                   ` Stephen J. Turnbull
  0 siblings, 0 replies; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-27  5:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: per.starback, monnier, emacs-devel

Eli Zaretskii writes:

 > You don't need a proof, because I agree.  But we already have quite a
 > few features that introduce peculiar effects into character motion,
 > and they didn't cause any catastrophes.  I don't see why this one is
 > any different.

If your standard is "catastrophes", then (a) no, this one is no
different, and (b) I have no contribution to make, because the
contribution I want to make requires concern with problems that are
less than catastrophic.






^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26  1:42   ` Stephen J. Turnbull
  2013-03-26  6:28     ` Eli Zaretskii
@ 2013-03-26 12:51     ` Stefan Monnier
  2013-03-26 13:10       ` Eli Zaretskii
  2013-03-26 16:16       ` Stephen J. Turnbull
  2013-03-26 14:02     ` Alan Mackenzie
  2 siblings, 2 replies; 27+ messages in thread
From: Stefan Monnier @ 2013-03-26 12:51 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Per Starbäck, emacs-devel

> Unicode doesn't care, you know: it considers all ASCII line breaks and
> terminators to be the same thing (NEW LINE FUNCTION).

But when saving the file, which line ends would we use?
For pre-existing line-ends, we could reproduce what was there before,
but what about new lines?


        Stefan



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 12:51     ` Stefan Monnier
@ 2013-03-26 13:10       ` Eli Zaretskii
  2013-03-26 17:16         ` Stefan Monnier
  2013-03-26 16:16       ` Stephen J. Turnbull
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-26 13:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: per.starback, stephen, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 26 Mar 2013 08:51:45 -0400
> Cc: Per Starbäck <per.starback@gmail.com>,
> 	emacs-devel@gnu.org
> 
> But when saving the file, which line ends would we use?
> For pre-existing line-ends, we could reproduce what was there before,
> but what about new lines?

User preference and some heuristics, I guess, as always.  E.g., if all
the lines used the same NLF, use that for new lines; otherwise look at
some user option for guidance.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 13:10       ` Eli Zaretskii
@ 2013-03-26 17:16         ` Stefan Monnier
  2013-03-26 17:47           ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2013-03-26 17:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: per.starback, stephen, emacs-devel

>> But when saving the file, which line ends would we use?
>> For pre-existing line-ends, we could reproduce what was there before,
>> but what about new lines?
> User preference and some heuristics, I guess, as always.  E.g., if all
> the lines used the same NLF, use that for new lines; otherwise look at
> some user option for guidance.

So for files that use a consistent style, that means same behavior as
what we now have.  The only difference is for mixed-style files, and
AFAIK the only mixed-style files that occur often enough to care are of
the LF-vs-CRLF kind, where I think the most important thing is to make
ti clear that the extra CRs displayed are due to the presence of this
mixed-style (so maybe we should check which style is more prominent and
either highlight the few extra CRs or on the contrary hide the CRs and
highlight the few missing CRs).

        Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 17:16         ` Stefan Monnier
@ 2013-03-26 17:47           ` Eli Zaretskii
  2013-03-26 18:41             ` Stephen J. Turnbull
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-26 17:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: per.starback, stephen, emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: per.starback@gmail.com, stephen@xemacs.org, emacs-devel@gnu.org
> Date: Tue, 26 Mar 2013 13:16:00 -0400
> 
> >> But when saving the file, which line ends would we use?
> >> For pre-existing line-ends, we could reproduce what was there before,
> >> but what about new lines?
> > User preference and some heuristics, I guess, as always.  E.g., if all
> > the lines used the same NLF, use that for new lines; otherwise look at
> > some user option for guidance.
> 
> So for files that use a consistent style, that means same behavior as
> what we now have.

The suggestion was to support _all_ Unicode NLFs, which are more than
the 3 EOL formats we support now.  Other than that, yes, for
consistent style the behavior visible to user will be the same.

Note that my take on this is that if we extend EOL format to all the
Unicode NLFs, we should not convert them to newline and back on I/O,
but rather keep them verbatim in the buffers and strings (Stephen
disagrees).  If we go that way, there will be another user-visible
change: the character position could jump by more than one when you
move into the next line.

> The only difference is for mixed-style files, and
> AFAIK the only mixed-style files that occur often enough to care are of
> the LF-vs-CRLF kind, where I think the most important thing is to make
> ti clear that the extra CRs displayed are due to the presence of this
> mixed-style (so maybe we should check which style is more prominent and
> either highlight the few extra CRs or on the contrary hide the CRs and
> highlight the few missing CRs).

If we want to continue with a clear indication of mixed style, then
perhaps no changes are needed at all, as we do that now.  The only
change in that case might be a mode-line indication of the mixed
style, since the offending CR characters might not be visible in the
displayed portion of the file.

I rather thought the suggestion was to stop paying attention to what
exactly is used as EOL, including if they are mixed-style.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 17:47           ` Eli Zaretskii
@ 2013-03-26 18:41             ` Stephen J. Turnbull
  0 siblings, 0 replies; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26 18:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: per.starback, Stefan Monnier, emacs-devel

Eli Zaretskii writes:

 > If we want to continue with a clear indication of mixed style, then
 > perhaps no changes are needed at all, as we do that now.  The only
 > change in that case might be a mode-line indication of the mixed
 > style, since the offending CR characters might not be visible in the
 > displayed portion of the file.
 > 
 > I rather thought the suggestion was to stop paying attention to what
 > exactly is used as EOL, including if they are mixed-style.

That's what I have in mind.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 12:51     ` Stefan Monnier
  2013-03-26 13:10       ` Eli Zaretskii
@ 2013-03-26 16:16       ` Stephen J. Turnbull
  1 sibling, 0 replies; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26 16:16 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Per Starbäck, emacs-devel

Stefan Monnier writes:
 > > Unicode doesn't care, you know: it considers all ASCII line breaks and
 > > terminators to be the same thing (NEW LINE FUNCTION).
 > 
 > But when saving the file, which line ends would we use?
 > For pre-existing line-ends, we could reproduce what was there before,
 > but what about new lines?

Basically, what Eli said.  To remind you how flexible this is: The
file coding system including EOL convention would be determined as it
currently: a specific argument to write-file or the binding of
buffer-file-coding-system, in that order.  The last would be
determined as currently: user's explicit setting, various settings
based on alists, and finally heuristic autodetection based on file
contents and platform convention for new/empty files.

We'd need an additional control variable: whether to automatically
convert variant NLFs to the EOL convention for writing.  Or perhaps
this should be done on reading.  And a command to do it at the user's
convenience.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26  1:42   ` Stephen J. Turnbull
  2013-03-26  6:28     ` Eli Zaretskii
  2013-03-26 12:51     ` Stefan Monnier
@ 2013-03-26 14:02     ` Alan Mackenzie
  2013-03-26 14:19       ` Eli Zaretskii
  2013-03-26 18:34       ` Stephen J. Turnbull
  2 siblings, 2 replies; 27+ messages in thread
From: Alan Mackenzie @ 2013-03-26 14:02 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Per Starbäck, Stefan Monnier, emacs-devel

Hi, Stephen.

On Tue, Mar 26, 2013 at 10:42:38AM +0900, Stephen J. Turnbull wrote:
> Stefan Monnier writes:

>  > BTW, in this same area, it would be good to detect and indicate
>  > prominently "Unix with some CRLFs", also known as "mixed-line-ending",
>  > which is often misunderstood as "my Emacs fails to recognize my CRLF
>  > file".

> Unicode doesn't care, you know: it considers all ASCII line breaks and
> terminators to be the same thing (NEW LINE FUNCTION).  I haven't read
> that part of the standard in a long time, but IIRC, although many
> people interpolate "according to platform", Unicode doesn't care about
> that, it just says "all of these sequences when encountered in text
> purporting to conform to this standard should be treated in the same
> way."  Emacsen should do the same.

This is a little confusing to poor old me.  ASCII doesn't care about line
breaks either; only particular use cases care.  If you write a script
(whether bash, sed, ....) on a *nix system and it has CRLF line ends, it
will fail (with an obscure error message) regardless of whether that
script is nominally in UTF-8 or ASCII or whatever.

In what sense does Unicode "not care"?

> Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 14:02     ` Alan Mackenzie
@ 2013-03-26 14:19       ` Eli Zaretskii
  2013-03-26 18:34       ` Stephen J. Turnbull
  1 sibling, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2013-03-26 14:19 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: per.starback, stephen, monnier, emacs-devel

> Date: Tue, 26 Mar 2013 14:02:47 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: Per Starbäck <per.starback@gmail.com>,
> 	Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> This is a little confusing to poor old me.  ASCII doesn't care about line
> breaks either; only particular use cases care.  If you write a script
> (whether bash, sed, ....) on a *nix system and it has CRLF line ends, it
> will fail (with an obscure error message) regardless of whether that
> script is nominally in UTF-8 or ASCII or whatever.
> 
> In what sense does Unicode "not care"?

In the sense that the shell script with CR-LF EOLs should not have
failed, if Bash supported Unicode line-breaking features.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26 14:02     ` Alan Mackenzie
  2013-03-26 14:19       ` Eli Zaretskii
@ 2013-03-26 18:34       ` Stephen J. Turnbull
  1 sibling, 0 replies; 27+ messages in thread
From: Stephen J. Turnbull @ 2013-03-26 18:34 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Per Starbäck, Stefan Monnier, emacs-devel

Alan Mackenzie writes:

 > This is a little confusing to poor old me.  ASCII doesn't care about line
 > breaks either; only particular use cases care.

True.  ASCII is a coded character set.  It does not have a way to
represent an abstract line break in a single character; whatever you
do, then, is outside of the ASCII standard.

 > If you write a script (whether bash, sed, ....) on a *nix system
 > and it has CRLF line ends, it will fail (with an obscure error
 > message) regardless of whether that script is nominally in UTF-8 or
 > ASCII or whatever.

Python, at least, is not in your ellipsis.  Not by default, and not on
any supported platform.  I wouldn't be surprised if Perl and Ruby have
adopted "universal newlines", too.

 > In what sense does Unicode "not care"?

In the sense that Unicode is more than a character set; it prescribes
all kinds of algorithms for text processing as well.  Here, section
5.8 of the Unicode Standard v6.2 prescribes that any of LF, CR, CRLF,
and ISO 6246 NEXT LINE (U+0085) should be considered to be a single
line (or paragraph) break in legacy text.  It says nothing about how
they should be represented internally, though.  Unusually for the
Unicode Standard, it allows you to guess what the user wants, and in
some cases even alter the input stream before outputting it.

"Legacy" text means it uses ASCII (or C1) control characters to
represent line and/or paragraph breaks, rather than the characters
prescribed by Unicode (U+2028 LINE SEPARATOR and U+2029 PARAGRAPH
SEPARATOR).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-25 19:17 ` Stefan Monnier
  2013-03-26  1:42   ` Stephen J. Turnbull
@ 2013-03-26  7:53   ` Ulrich Mueller
  2013-03-26 12:53     ` Stefan Monnier
  1 sibling, 1 reply; 27+ messages in thread
From: Ulrich Mueller @ 2013-03-26  7:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Per Starbäck, emacs-devel

>>>>> On Mon, 25 Mar 2013, Stefan Monnier wrote:

> So I'm OK with "updating" the indicators, tho I'm not sure what we
> should use instead.

Currently, the indicators coincide with the naming of coding systems,
like utf-8-{unix,dos,mac}. Wouldn't it be confusing to use different
notations? Or are the coding systems to be changed too?

> To replace "Mac", maybe we could use "MacOS9", which is longish but
> hopefully such files are rare nowadays.

You could use "OS9". There are both OS-9 (by Microware) and Mac OS 9,
but they agree on using CR as a line ending.

Ulrich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: EOL: unix/dos/mac
  2013-03-26  7:53   ` Ulrich Mueller
@ 2013-03-26 12:53     ` Stefan Monnier
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Monnier @ 2013-03-26 12:53 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Per Starbäck, emacs-devel

> You could use "OS9". There are both OS-9 (by Microware) and Mac OS 9,
> but they agree on using CR as a line ending.

Ah, I didn't know OS-9 also used CR as line-ending, so indeed "OS9"
sounds like an attractive replacement for "Mac".


        Stefan



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-03-27  5:10 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-25 13:34 EOL: unix/dos/mac Per Starbäck
2013-03-25 13:56 ` Xue Fuqiao
2013-03-25 22:41   ` Richard Stallman
2013-03-26  2:11     ` Stephen J. Turnbull
2013-03-25 14:21 ` Eli Zaretskii
2013-03-25 17:28   ` Dani Moncayo
2013-03-25 19:17 ` Stefan Monnier
2013-03-26  1:42   ` Stephen J. Turnbull
2013-03-26  6:28     ` Eli Zaretskii
2013-03-26  7:45       ` Stephen J. Turnbull
2013-03-26  8:42         ` Eli Zaretskii
2013-03-26 11:47           ` Stephen J. Turnbull
2013-03-26 13:07             ` Eli Zaretskii
2013-03-26 18:12               ` Stephen J. Turnbull
2013-03-26 18:44                 ` Eli Zaretskii
2013-03-27  5:10                   ` Stephen J. Turnbull
2013-03-26 12:51     ` Stefan Monnier
2013-03-26 13:10       ` Eli Zaretskii
2013-03-26 17:16         ` Stefan Monnier
2013-03-26 17:47           ` Eli Zaretskii
2013-03-26 18:41             ` Stephen J. Turnbull
2013-03-26 16:16       ` Stephen J. Turnbull
2013-03-26 14:02     ` Alan Mackenzie
2013-03-26 14:19       ` Eli Zaretskii
2013-03-26 18:34       ` Stephen J. Turnbull
2013-03-26  7:53   ` Ulrich Mueller
2013-03-26 12:53     ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.