unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* mode line eol char indication
@ 2008-12-31 22:50 Drew Adams
  2009-01-01  1:20 ` Jason Rumney
  0 siblings, 1 reply; 12+ messages in thread
From: Drew Adams @ 2008-12-31 22:50 UTC (permalink / raw)
  To: emacs-devel

The Emacs manual, node Mode Line, explains the eol character
indication this way:

 The character after CS is usually a colon.  However, under some
 circumstances a different string is displayed, which indicates a
 nontrivial end-of-line convention.  Usually, lines of text are
 separated by "newline characters", but two other conventions are
 sometimes used.  The MS-DOS convention is to use a
 "carriage-return" character followed by a "linefeed" character;
 when editing such files, the colon changes to either a backslash
 (`\') or `(DOS)', depending on the operating system.  The
 Macintosh end-of-line convention is to use a "carriage-return"
 character instead of a newline; when editing such files, the
 colon indicator changes to either a forward slash (`/') or
 `(Mac)'.  On some systems, Emacs displays `(Unix)' instead of
 the colon for files that use newline as the line separator.

That's quite a mouthful.  I wonder now about this convention,
which I've lived with for decades without wondering ;-).

* The non-"nontrivial" eol convention, represented by `:', is
  presumably what is meant by "usually", that is, a newline char.
  But a newline eol is also sometimes represented by `(Unix)'.
  Why?  And why is this called "nontrivial" - why is it more
  nontrivial and more usual than the other possibilities?

* `\' is used sometimes to represent carriage return (C-m)
  followed by newline (C-j), but sometimes `(DOS)' is used to
  represent the same eol chars.

* `/' is used sometimes to represent C-m, but sometimes `(Mac)'
  is used to represent the same same eol char.

Why `:'?  Why `\' (is there some relation to the DOS directory
separator?)?  Why `/'?

Why so many variations - both `:' and `(Unix)'; both `\' and
`(DOS)'; both `/' and `(Mac)'?

None of those labels are particularly helpful, IMO. And there's
no telling when one or the other of the equivalent alternatives
will be used, apparently.

Why not (always) use the Emacs standard representation of the
actual eol chars?  IOW:

* \n instead of : and (Unix)

* \r instead of / and (Mac)

* \n\r instead of \ and (DOS)

That's 4 chars max instead of 6 chars max, and it's more
explicit.

We might even want to move this end-of-line indication to, well,
the end of the mode line (far right).  That would be a little
mnemonic: what you see at the end of the line is what is used at
the buffer's line endings.

Unless I'm missing something, the current system is not too
systematic and not too obvious.  \n, \r, or \n\r is clear.  It
even lets you know, for MS DOS/Windows, that the newline comes
before the carriage return, not the reverse (though you probably
don't care).





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mode line eol char indication
  2008-12-31 22:50 mode line eol char indication Drew Adams
@ 2009-01-01  1:20 ` Jason Rumney
  2009-01-01  5:44   ` Drew Adams
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Rumney @ 2009-01-01  1:20 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel

Drew Adams wrote:
> * The non-"nontrivial" eol convention, represented by `:', is
>   presumably what is meant by "usually", that is, a newline char.
>   But a newline eol is also sometimes represented by `(Unix)'.
>   Why?  And why is this called "nontrivial" - why is it more
>   nontrivial and more usual than the other possibilities?
>   

In Emacs 20, only the single character indications were used, but people 
found them confusing. But the full word indications are too long for 
many people, so now we use the single character when the newline format 
is native for the platform Emacs is running on, and the full word when 
it is non-native - this change occurred in 21.1 IIRC. Unix line ends are 
non-trivial because they are what Emacs uses internally - no conversion 
is required. They are more usual for users of GNU based platforms 
because GNU is based on unix conventions.

> Why `:'?  Why `\' (is there some relation to the DOS directory
> separator?)?  Why `/'?
>   
.
Originally : was was based on the unix PATH separator, and \ on the DOS 
directory separator. / was made the Mac indicator because like the DOS 
separator, it is not straight up and down, but it leans a different 
direction than DOS. I think at some point during 20.1 pretest, we had / 
for Unix and : for Mac, until someone pointed out that : was less 
noticeable, so that should indicate the trivial Unix line-end.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mode line eol char indication
  2009-01-01  1:20 ` Jason Rumney
@ 2009-01-01  5:44   ` Drew Adams
  2009-01-01  8:33     ` Stephen J. Turnbull
  0 siblings, 1 reply; 12+ messages in thread
From: Drew Adams @ 2009-01-01  5:44 UTC (permalink / raw)
  To: 'Jason Rumney'; +Cc: emacs-devel

> > * The non-"nontrivial" eol convention, represented by `:', is
> >   presumably what is meant by "usually", that is, a newline char.
> >   But a newline eol is also sometimes represented by `(Unix)'.
> >   Why?  And why is this called "nontrivial" - why is it more
> >   nontrivial and more usual than the other possibilities?

I meant "trivial", sorry. The doc claims that line endings other than newline
are nontrivial.

> In Emacs 20, only the single character indications were used, 
> but people found them confusing. But the full word indications
> are too long for many people, so now we use the single
> character when the newline format is native for the platform
> Emacs is running on, and the full word when it is non-native -
> this change occurred in 21.1 IIRC. Unix line ends are
> non-trivial

(I think you too meant "trivial" here, for UNIX.)

> because they are what Emacs uses internally - no conversion 
> is required.

What's trivial for the implementation shouldn't be behind characterizing this
line ending to the user as more trivial. Why would a user care which is easier
to implement?

> They are more usual for users of GNU based platforms 
> because GNU is based on unix conventions.

Yes, and less usual for users of other platforms. But who cares?

My argument is not that one or the other is more trivial or more usual. It's
that:

* Neither is more trivial (for the user) or more usual (for the user).

* It's unimportant whather one is in fact more trivial or more usual. Such a
characterization is not explained in the doc anyway, and it just makes the doc
less understandable.
 
> > Why `:'?  Why `\' (is there some relation to the DOS directory
> > separator?)?  Why `/'?
>
> Originally : was was based on the unix PATH separator, and \ 
> on the DOS directory separator. / was made the Mac indicator
> because like the DOS separator, it is not straight up and down,
> but it leans a different direction than DOS. I think at some
> point during 20.1 pretest, we had / for Unix and : for Mac,
> until someone pointed out that : was less noticeable, so that
> should indicate the trivial Unix line-end.

Well, all of that is a historical explanation, and it gives a bit of the
rationale accepted at the time, but it's not very convincing as to why it's the
best choice or why we should have two different representations for each line
ending. Not to mention why the doc should be so convoluted trying to explain it.

To me:

1. There is no logical connection with the path separator or the directory
separator that is used for a given platform and the line ending used for that
platform.

That's an artificial connection that is too cute by half. We're asking users to
guess the line ending based on the platform, and guess the platform based on
either a path separator (for UNIX) or a directory separator (for Mac/DOS).
(Guess what this means: `:'? It's the UNIX path separator, so this buffer has
UNIX line endings.)

If we're trying to indicate the _line ending characters_, then lets just say
what they are: C-j, C-m, or C-j C-m.

2. Unix, DOS, an Mac are preferable to :, \, and /. Much clearer.

3. \n, \n\r, and \r would also be preferable to :, \, and /.

4. C-j, C-j C-m, and C-m would also be preferable to :, \, and /.

5. We should pick just one label for each line-ending, not have two alternatives
for each. Either name the platform or name the line ending, consistently,
always.

6. If the aim is to indicate the platform, then use Unix, DOS, and Mac. If the
aim is to indicate the line ending, then use \n, \n\r, and \r.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mode line eol char indication
  2009-01-01  5:44   ` Drew Adams
@ 2009-01-01  8:33     ` Stephen J. Turnbull
  2009-01-01  8:39       ` Jason Rumney
  2009-01-01 18:11       ` Drew Adams
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen J. Turnbull @ 2009-01-01  8:33 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel, 'Jason Rumney'

Drew Adams writes:

 > What's trivial for the implementation shouldn't be behind
 > characterizing this line ending to the user as more trivial. Why
 > would a user care which is easier to implement?

Because the trivial line endings never get screwed up.  Nontrivial
line endings cause no end of pain (eg, inappropriate conversion of
line endings causes 100% of the lines of a text file to differ from
its previous revision, and irrecoverable data corruption in binary
files (ie, where CR and LF have semantics other than "line ending").

 > If we're trying to indicate the _line ending characters_, then lets just say
 > what they are: C-j, C-m, or C-j C-m.

Those are commands.  Users almost *never* use those as self-inserting
characters.  ^J, LF, NL, \n, OK (my preference is LF), but not C-j,
please.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mode line eol char indication
  2009-01-01  8:33     ` Stephen J. Turnbull
@ 2009-01-01  8:39       ` Jason Rumney
  2009-01-01 18:11         ` Drew Adams
  2009-01-01 18:11       ` Drew Adams
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Rumney @ 2009-01-01  8:39 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Drew Adams, emacs-devel

Stephen J. Turnbull wrote:
> Those are commands.  Users almost *never* use those as self-inserting
> characters.  ^J, LF, NL, \n, OK (my preference is LF), but not C-j,
> please.
>   

In my experience users often don't know the difference between LF and 
CR. And they shouldn't have to care, all they need to know is that a 
text file has line endings that will work with other software on their 
system (single charactor indication), or if not, what type of system 
this text file has come from so they can make an intelligent decision no 
what to do about it.







^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mode line eol char indication
  2009-01-01  8:33     ` Stephen J. Turnbull
  2009-01-01  8:39       ` Jason Rumney
@ 2009-01-01 18:11       ` Drew Adams
  1 sibling, 0 replies; 12+ messages in thread
From: Drew Adams @ 2009-01-01 18:11 UTC (permalink / raw)
  To: 'Stephen J. Turnbull'; +Cc: emacs-devel, 'Jason Rumney'

>  > If we're trying to indicate the _line ending characters_, 
>  > then lets just say what they are: C-j, C-m, or C-j C-m.
> 
> Those are commands.  Users almost *never* use those as self-inserting
> characters.  ^J, LF, NL, \n, OK (my preference is LF), but not C-j,
> please.

I actually meant ^J and ^M, sorry (must've been tired).

I agree: ^J, LF, and \n are all fine. NL is not very common AFAIK. Emacs uses ^J
and \n conventionally, so I'd vote for one of those.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mode line eol char indication
  2009-01-01  8:39       ` Jason Rumney
@ 2009-01-01 18:11         ` Drew Adams
  2009-01-01 18:17           ` Juanma Barranquero
  0 siblings, 1 reply; 12+ messages in thread
From: Drew Adams @ 2009-01-01 18:11 UTC (permalink / raw)
  To: 'Jason Rumney', 'Stephen J. Turnbull'; +Cc: emacs-devel

> In my experience users often don't know the difference between LF and 
> CR. And they shouldn't have to care, all they need to know is that a 
> text file has line endings that will work with other software 
> on their system (single charactor indication), or if not, what type
> of system this text file has come from so they can make an intelligent 
> decision no what to do about it.

So you are arguing that it is the system/platform name that is more meaningful
to users, not the eol characters. I'm OK with that.

In that case, we should always use `Unix', `DOS', and `Mac' (or similar) -
definitely not `:', `\', and `/'.

One could argue though that users are sometimes concered with the line endings
themselves, as, e.g., when they end up seeing extra ^M chars. Sooner or later,
it seems, people end up learning about the different line endings.

There are arguments supporting each: eol chars or platform name. What's
important is to pick meaningful indicators and be systematic - either always
platform or always eol chars. And not to use indicators (`:', `\', `/') that are
not very representative of what they stand for.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mode line eol char indication
  2009-01-01 18:11         ` Drew Adams
@ 2009-01-01 18:17           ` Juanma Barranquero
  2009-01-01 19:14             ` David De La Harpe Golden
  2009-01-01 19:15             ` Drew Adams
  0 siblings, 2 replies; 12+ messages in thread
From: Juanma Barranquero @ 2009-01-01 18:17 UTC (permalink / raw)
  To: Drew Adams; +Cc: Stephen J. Turnbull, emacs-devel, Jason Rumney

On Thu, Jan 1, 2009 at 19:11, Drew Adams <drew.adams@oracle.com> wrote:

> So you are arguing that it is the system/platform name that is more meaningful
> to users, not the eol characters. I'm OK with that.

I'm not. \n, \r and \r\n (or ^J, etc) are exact: what they say is what
the file contains. "Unix", "DOS" and "Mac" are just hints about the
likely origin. Is not like it is impossible to create CRLF files under
GNU/Linux, or LF files on Windows.

    Juanma




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mode line eol char indication
  2009-01-01 18:17           ` Juanma Barranquero
@ 2009-01-01 19:14             ` David De La Harpe Golden
  2009-01-01 19:25               ` Drew Adams
  2009-01-01 19:15             ` Drew Adams
  1 sibling, 1 reply; 12+ messages in thread
From: David De La Harpe Golden @ 2009-01-01 19:14 UTC (permalink / raw)
  To: Juanma Barranquero
  Cc: Stephen J. Turnbull, Jason Rumney, Drew Adams, emacs-devel

Juanma Barranquero wrote:
> On Thu, Jan 1, 2009 at 19:11, Drew Adams <drew.adams@oracle.com> wrote:
> 
>> So you are arguing that it is the system/platform name that is more meaningful
>> to users, not the eol characters. I'm OK with that.
> 
> I'm not. \n, \r and \r\n (or ^J, etc) are exact: what they say is what
> the file contains. "Unix", "DOS" and "Mac" are just hints about the
> likely origin. Is not like it is impossible to create CRLF files under
> GNU/Linux, or LF files on Windows.
> 
>

I agree with Juanma on the exactness issue.  Emacs is still first and 
foremost a text editor, line endings are a pretty unavoidable aspect of 
editing text files. And LF-ending files are not particularly
uncommon on windows in my experience (which mostly involves windows
boxes being used as little more than access terminals for less sucky 
computers, mind).

I kinda dislike \n since in C \n doesn't mean LF in general since it's 
defined to translate to whatever the native newline sequence is in files 
opened in text mode IIRC.

CR/LF/CRLF are nicely descriptive, though CRLF is a bit long.
CR/LF/CL are all the same length, and there is no ASCII nonprinting 
character called CL AFAIK.

^M/^J/^M^J are similarly descriptive, and have the possible advantage 
that one can directly concatenate them to other things and they will 
still stand out somewhat, like /\: do, thanks to the caret.

BTW,  Unicode specs printable representations for ascii control 
characters - see U+240A and U+240D.  On unicode text terminals and 
graphical displays, might be nice to just use them?

http://unicode.org/charts/PDF/U2400.pdf

␍/␊/␍␊

Possible disadvantage being that not all fonts might include them I 
guess, and some might supply peculiar glyphs for them (on my system
they're just quite nice little mini cr and lf signs, shrug).
















^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mode line eol char indication
  2009-01-01 18:17           ` Juanma Barranquero
  2009-01-01 19:14             ` David De La Harpe Golden
@ 2009-01-01 19:15             ` Drew Adams
  1 sibling, 0 replies; 12+ messages in thread
From: Drew Adams @ 2009-01-01 19:15 UTC (permalink / raw)
  To: 'Juanma Barranquero'
  Cc: 'Stephen J. Turnbull', emacs-devel,
	'Jason Rumney'

> > So you are arguing that it is the system/platform name that 
> > is more meaningful to users, not the eol characters. I'm OK with that.
> 
> I'm not. \n, \r and \r\n (or ^J, etc) are exact: what they say is what
> the file contains. "Unix", "DOS" and "Mac" are just hints about the
> likely origin. Is not like it is impossible to create CRLF files under
> GNU/Linux, or LF files on Windows.

I think I already said that my preference too is to show the eol chars, and I
agree with your reason. This is about the buffer content, after all, not
necessarily about a platform.

I'm OK however with either approach - whichever most people prefer. But we
should stick to one of them. It makes little sense to have sometimes `(DOS)' and
sometimes `\', which mean the same thing.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mode line eol char indication
  2009-01-01 19:14             ` David De La Harpe Golden
@ 2009-01-01 19:25               ` Drew Adams
  2009-01-02  3:44                 ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Drew Adams @ 2009-01-01 19:25 UTC (permalink / raw)
  To: 'David De La Harpe Golden', 'Juanma Barranquero'
  Cc: 'Stephen J. Turnbull', 'Jason Rumney',
	emacs-devel

> I agree with Juanma on the exactness issue.  Emacs is still first and 
> foremost a text editor, line endings are a pretty unavoidable 
> aspect of editing text files. And LF-ending files are not particularly
> uncommon on windows in my experience (which mostly involves windows
> boxes being used as little more than access terminals for less sucky 
> computers, mind).
> 
> I kinda dislike \n since in C \n doesn't mean LF in general 
> since it's defined to translate to whatever the native newline
> sequence is in files opened in text mode IIRC.
> 
> CR/LF/CRLF are nicely descriptive, though CRLF is a bit long.

It's 2 chars shorter than `(Unix)'.

> CR/LF/CL are all the same length, and there is no ASCII nonprinting 
> character called CL AFAIK.

`CL' is not readily recognizable. `CRLF' is - it uses standard abbreviations.

> ^M/^J/^M^J are similarly descriptive, and have the possible advantage 
> that one can directly concatenate them to other things and they will 
> still stand out somewhat, like /\: do, thanks to the caret.

I agree. Emacs users will sooner or later come to recognize ^M as carriage
return and ^J as line feed (newline). Might as well use these in the UI. They
are succinct and clear.

My vote is for ^M, ^J, and ^M^J, unless a more convincing argument can be made
for using platform names.

> BTW,  Unicode specs printable representations for ascii control 
> characters - see U+240A and U+240D.  On unicode text terminals and 
> graphical displays, might be nice to just use them?
> 
> http://unicode.org/charts/PDF/U2400.pdf
> 
> ?/?/??
> 
> Possible disadvantage being that not all fonts might include them I 
> guess, and some might supply peculiar glyphs for them (on my system
> they're just quite nice little mini cr and lf signs, shrug).

The disadvantage (which you mention, and which your text quoted above as ?/?/??
shows) outweighs the advantage of having a cute (admittedly standard) symbol:
just CR is fine; no need for 

C
 R






^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mode line eol char indication
  2009-01-01 19:25               ` Drew Adams
@ 2009-01-02  3:44                 ` Stefan Monnier
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2009-01-02  3:44 UTC (permalink / raw)
  To: Drew Adams
  Cc: 'Juanma Barranquero', 'Stephen J. Turnbull',
	'Jason Rumney', emacs-devel,
	'David De La Harpe Golden'

> I agree. Emacs users will sooner or later come to recognize ^M as carriage
> return and ^J as line feed (newline). Might as well use these in the UI. They
> are succinct and clear.

By the time they figure out what ^M is "carriage return" and ^J is
"newline" and that there's a difference between the two, and what is
that difference, and how it relates to the EOL-convention used under
various systems, they'll be just as unlikely to be confused by (Mac)
or (DOS) or (Unix).

The current behavior is pretty close to optimal, I think.  And things
like CR, LF, ^M, ^J are just non-starters AFAIC.


        Stefan




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-01-02  3:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-31 22:50 mode line eol char indication Drew Adams
2009-01-01  1:20 ` Jason Rumney
2009-01-01  5:44   ` Drew Adams
2009-01-01  8:33     ` Stephen J. Turnbull
2009-01-01  8:39       ` Jason Rumney
2009-01-01 18:11         ` Drew Adams
2009-01-01 18:17           ` Juanma Barranquero
2009-01-01 19:14             ` David De La Harpe Golden
2009-01-01 19:25               ` Drew Adams
2009-01-02  3:44                 ` Stefan Monnier
2009-01-01 19:15             ` Drew Adams
2009-01-01 18:11       ` Drew Adams

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).