* mode line eol char indication @ 2008-12-31 22:50 Drew Adams 2009-01-01 1:20 ` Jason Rumney 0 siblings, 1 reply; 12+ messages in thread From: Drew Adams @ 2008-12-31 22:50 UTC (permalink / raw) To: emacs-devel The Emacs manual, node Mode Line, explains the eol character indication this way: The character after CS is usually a colon. However, under some circumstances a different string is displayed, which indicates a nontrivial end-of-line convention. Usually, lines of text are separated by "newline characters", but two other conventions are sometimes used. The MS-DOS convention is to use a "carriage-return" character followed by a "linefeed" character; when editing such files, the colon changes to either a backslash (`\') or `(DOS)', depending on the operating system. The Macintosh end-of-line convention is to use a "carriage-return" character instead of a newline; when editing such files, the colon indicator changes to either a forward slash (`/') or `(Mac)'. On some systems, Emacs displays `(Unix)' instead of the colon for files that use newline as the line separator. That's quite a mouthful. I wonder now about this convention, which I've lived with for decades without wondering ;-). * The non-"nontrivial" eol convention, represented by `:', is presumably what is meant by "usually", that is, a newline char. But a newline eol is also sometimes represented by `(Unix)'. Why? And why is this called "nontrivial" - why is it more nontrivial and more usual than the other possibilities? * `\' is used sometimes to represent carriage return (C-m) followed by newline (C-j), but sometimes `(DOS)' is used to represent the same eol chars. * `/' is used sometimes to represent C-m, but sometimes `(Mac)' is used to represent the same same eol char. Why `:'? Why `\' (is there some relation to the DOS directory separator?)? Why `/'? Why so many variations - both `:' and `(Unix)'; both `\' and `(DOS)'; both `/' and `(Mac)'? None of those labels are particularly helpful, IMO. And there's no telling when one or the other of the equivalent alternatives will be used, apparently. Why not (always) use the Emacs standard representation of the actual eol chars? IOW: * \n instead of : and (Unix) * \r instead of / and (Mac) * \n\r instead of \ and (DOS) That's 4 chars max instead of 6 chars max, and it's more explicit. We might even want to move this end-of-line indication to, well, the end of the mode line (far right). That would be a little mnemonic: what you see at the end of the line is what is used at the buffer's line endings. Unless I'm missing something, the current system is not too systematic and not too obvious. \n, \r, or \n\r is clear. It even lets you know, for MS DOS/Windows, that the newline comes before the carriage return, not the reverse (though you probably don't care). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mode line eol char indication 2008-12-31 22:50 mode line eol char indication Drew Adams @ 2009-01-01 1:20 ` Jason Rumney 2009-01-01 5:44 ` Drew Adams 0 siblings, 1 reply; 12+ messages in thread From: Jason Rumney @ 2009-01-01 1:20 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel Drew Adams wrote: > * The non-"nontrivial" eol convention, represented by `:', is > presumably what is meant by "usually", that is, a newline char. > But a newline eol is also sometimes represented by `(Unix)'. > Why? And why is this called "nontrivial" - why is it more > nontrivial and more usual than the other possibilities? > In Emacs 20, only the single character indications were used, but people found them confusing. But the full word indications are too long for many people, so now we use the single character when the newline format is native for the platform Emacs is running on, and the full word when it is non-native - this change occurred in 21.1 IIRC. Unix line ends are non-trivial because they are what Emacs uses internally - no conversion is required. They are more usual for users of GNU based platforms because GNU is based on unix conventions. > Why `:'? Why `\' (is there some relation to the DOS directory > separator?)? Why `/'? > . Originally : was was based on the unix PATH separator, and \ on the DOS directory separator. / was made the Mac indicator because like the DOS separator, it is not straight up and down, but it leans a different direction than DOS. I think at some point during 20.1 pretest, we had / for Unix and : for Mac, until someone pointed out that : was less noticeable, so that should indicate the trivial Unix line-end. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: mode line eol char indication 2009-01-01 1:20 ` Jason Rumney @ 2009-01-01 5:44 ` Drew Adams 2009-01-01 8:33 ` Stephen J. Turnbull 0 siblings, 1 reply; 12+ messages in thread From: Drew Adams @ 2009-01-01 5:44 UTC (permalink / raw) To: 'Jason Rumney'; +Cc: emacs-devel > > * The non-"nontrivial" eol convention, represented by `:', is > > presumably what is meant by "usually", that is, a newline char. > > But a newline eol is also sometimes represented by `(Unix)'. > > Why? And why is this called "nontrivial" - why is it more > > nontrivial and more usual than the other possibilities? I meant "trivial", sorry. The doc claims that line endings other than newline are nontrivial. > In Emacs 20, only the single character indications were used, > but people found them confusing. But the full word indications > are too long for many people, so now we use the single > character when the newline format is native for the platform > Emacs is running on, and the full word when it is non-native - > this change occurred in 21.1 IIRC. Unix line ends are > non-trivial (I think you too meant "trivial" here, for UNIX.) > because they are what Emacs uses internally - no conversion > is required. What's trivial for the implementation shouldn't be behind characterizing this line ending to the user as more trivial. Why would a user care which is easier to implement? > They are more usual for users of GNU based platforms > because GNU is based on unix conventions. Yes, and less usual for users of other platforms. But who cares? My argument is not that one or the other is more trivial or more usual. It's that: * Neither is more trivial (for the user) or more usual (for the user). * It's unimportant whather one is in fact more trivial or more usual. Such a characterization is not explained in the doc anyway, and it just makes the doc less understandable. > > Why `:'? Why `\' (is there some relation to the DOS directory > > separator?)? Why `/'? > > Originally : was was based on the unix PATH separator, and \ > on the DOS directory separator. / was made the Mac indicator > because like the DOS separator, it is not straight up and down, > but it leans a different direction than DOS. I think at some > point during 20.1 pretest, we had / for Unix and : for Mac, > until someone pointed out that : was less noticeable, so that > should indicate the trivial Unix line-end. Well, all of that is a historical explanation, and it gives a bit of the rationale accepted at the time, but it's not very convincing as to why it's the best choice or why we should have two different representations for each line ending. Not to mention why the doc should be so convoluted trying to explain it. To me: 1. There is no logical connection with the path separator or the directory separator that is used for a given platform and the line ending used for that platform. That's an artificial connection that is too cute by half. We're asking users to guess the line ending based on the platform, and guess the platform based on either a path separator (for UNIX) or a directory separator (for Mac/DOS). (Guess what this means: `:'? It's the UNIX path separator, so this buffer has UNIX line endings.) If we're trying to indicate the _line ending characters_, then lets just say what they are: C-j, C-m, or C-j C-m. 2. Unix, DOS, an Mac are preferable to :, \, and /. Much clearer. 3. \n, \n\r, and \r would also be preferable to :, \, and /. 4. C-j, C-j C-m, and C-m would also be preferable to :, \, and /. 5. We should pick just one label for each line-ending, not have two alternatives for each. Either name the platform or name the line ending, consistently, always. 6. If the aim is to indicate the platform, then use Unix, DOS, and Mac. If the aim is to indicate the line ending, then use \n, \n\r, and \r. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: mode line eol char indication 2009-01-01 5:44 ` Drew Adams @ 2009-01-01 8:33 ` Stephen J. Turnbull 2009-01-01 8:39 ` Jason Rumney 2009-01-01 18:11 ` Drew Adams 0 siblings, 2 replies; 12+ messages in thread From: Stephen J. Turnbull @ 2009-01-01 8:33 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel, 'Jason Rumney' Drew Adams writes: > What's trivial for the implementation shouldn't be behind > characterizing this line ending to the user as more trivial. Why > would a user care which is easier to implement? Because the trivial line endings never get screwed up. Nontrivial line endings cause no end of pain (eg, inappropriate conversion of line endings causes 100% of the lines of a text file to differ from its previous revision, and irrecoverable data corruption in binary files (ie, where CR and LF have semantics other than "line ending"). > If we're trying to indicate the _line ending characters_, then lets just say > what they are: C-j, C-m, or C-j C-m. Those are commands. Users almost *never* use those as self-inserting characters. ^J, LF, NL, \n, OK (my preference is LF), but not C-j, please. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mode line eol char indication 2009-01-01 8:33 ` Stephen J. Turnbull @ 2009-01-01 8:39 ` Jason Rumney 2009-01-01 18:11 ` Drew Adams 2009-01-01 18:11 ` Drew Adams 1 sibling, 1 reply; 12+ messages in thread From: Jason Rumney @ 2009-01-01 8:39 UTC (permalink / raw) To: Stephen J. Turnbull; +Cc: Drew Adams, emacs-devel Stephen J. Turnbull wrote: > Those are commands. Users almost *never* use those as self-inserting > characters. ^J, LF, NL, \n, OK (my preference is LF), but not C-j, > please. > In my experience users often don't know the difference between LF and CR. And they shouldn't have to care, all they need to know is that a text file has line endings that will work with other software on their system (single charactor indication), or if not, what type of system this text file has come from so they can make an intelligent decision no what to do about it. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: mode line eol char indication 2009-01-01 8:39 ` Jason Rumney @ 2009-01-01 18:11 ` Drew Adams 2009-01-01 18:17 ` Juanma Barranquero 0 siblings, 1 reply; 12+ messages in thread From: Drew Adams @ 2009-01-01 18:11 UTC (permalink / raw) To: 'Jason Rumney', 'Stephen J. Turnbull'; +Cc: emacs-devel > In my experience users often don't know the difference between LF and > CR. And they shouldn't have to care, all they need to know is that a > text file has line endings that will work with other software > on their system (single charactor indication), or if not, what type > of system this text file has come from so they can make an intelligent > decision no what to do about it. So you are arguing that it is the system/platform name that is more meaningful to users, not the eol characters. I'm OK with that. In that case, we should always use `Unix', `DOS', and `Mac' (or similar) - definitely not `:', `\', and `/'. One could argue though that users are sometimes concered with the line endings themselves, as, e.g., when they end up seeing extra ^M chars. Sooner or later, it seems, people end up learning about the different line endings. There are arguments supporting each: eol chars or platform name. What's important is to pick meaningful indicators and be systematic - either always platform or always eol chars. And not to use indicators (`:', `\', `/') that are not very representative of what they stand for. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mode line eol char indication 2009-01-01 18:11 ` Drew Adams @ 2009-01-01 18:17 ` Juanma Barranquero 2009-01-01 19:14 ` David De La Harpe Golden 2009-01-01 19:15 ` Drew Adams 0 siblings, 2 replies; 12+ messages in thread From: Juanma Barranquero @ 2009-01-01 18:17 UTC (permalink / raw) To: Drew Adams; +Cc: Stephen J. Turnbull, emacs-devel, Jason Rumney On Thu, Jan 1, 2009 at 19:11, Drew Adams <drew.adams@oracle.com> wrote: > So you are arguing that it is the system/platform name that is more meaningful > to users, not the eol characters. I'm OK with that. I'm not. \n, \r and \r\n (or ^J, etc) are exact: what they say is what the file contains. "Unix", "DOS" and "Mac" are just hints about the likely origin. Is not like it is impossible to create CRLF files under GNU/Linux, or LF files on Windows. Juanma ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mode line eol char indication 2009-01-01 18:17 ` Juanma Barranquero @ 2009-01-01 19:14 ` David De La Harpe Golden 2009-01-01 19:25 ` Drew Adams 2009-01-01 19:15 ` Drew Adams 1 sibling, 1 reply; 12+ messages in thread From: David De La Harpe Golden @ 2009-01-01 19:14 UTC (permalink / raw) To: Juanma Barranquero Cc: Stephen J. Turnbull, Jason Rumney, Drew Adams, emacs-devel Juanma Barranquero wrote: > On Thu, Jan 1, 2009 at 19:11, Drew Adams <drew.adams@oracle.com> wrote: > >> So you are arguing that it is the system/platform name that is more meaningful >> to users, not the eol characters. I'm OK with that. > > I'm not. \n, \r and \r\n (or ^J, etc) are exact: what they say is what > the file contains. "Unix", "DOS" and "Mac" are just hints about the > likely origin. Is not like it is impossible to create CRLF files under > GNU/Linux, or LF files on Windows. > > I agree with Juanma on the exactness issue. Emacs is still first and foremost a text editor, line endings are a pretty unavoidable aspect of editing text files. And LF-ending files are not particularly uncommon on windows in my experience (which mostly involves windows boxes being used as little more than access terminals for less sucky computers, mind). I kinda dislike \n since in C \n doesn't mean LF in general since it's defined to translate to whatever the native newline sequence is in files opened in text mode IIRC. CR/LF/CRLF are nicely descriptive, though CRLF is a bit long. CR/LF/CL are all the same length, and there is no ASCII nonprinting character called CL AFAIK. ^M/^J/^M^J are similarly descriptive, and have the possible advantage that one can directly concatenate them to other things and they will still stand out somewhat, like /\: do, thanks to the caret. BTW, Unicode specs printable representations for ascii control characters - see U+240A and U+240D. On unicode text terminals and graphical displays, might be nice to just use them? http://unicode.org/charts/PDF/U2400.pdf ␍/␊/␍␊ Possible disadvantage being that not all fonts might include them I guess, and some might supply peculiar glyphs for them (on my system they're just quite nice little mini cr and lf signs, shrug). ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: mode line eol char indication 2009-01-01 19:14 ` David De La Harpe Golden @ 2009-01-01 19:25 ` Drew Adams 2009-01-02 3:44 ` Stefan Monnier 0 siblings, 1 reply; 12+ messages in thread From: Drew Adams @ 2009-01-01 19:25 UTC (permalink / raw) To: 'David De La Harpe Golden', 'Juanma Barranquero' Cc: 'Stephen J. Turnbull', 'Jason Rumney', emacs-devel > I agree with Juanma on the exactness issue. Emacs is still first and > foremost a text editor, line endings are a pretty unavoidable > aspect of editing text files. And LF-ending files are not particularly > uncommon on windows in my experience (which mostly involves windows > boxes being used as little more than access terminals for less sucky > computers, mind). > > I kinda dislike \n since in C \n doesn't mean LF in general > since it's defined to translate to whatever the native newline > sequence is in files opened in text mode IIRC. > > CR/LF/CRLF are nicely descriptive, though CRLF is a bit long. It's 2 chars shorter than `(Unix)'. > CR/LF/CL are all the same length, and there is no ASCII nonprinting > character called CL AFAIK. `CL' is not readily recognizable. `CRLF' is - it uses standard abbreviations. > ^M/^J/^M^J are similarly descriptive, and have the possible advantage > that one can directly concatenate them to other things and they will > still stand out somewhat, like /\: do, thanks to the caret. I agree. Emacs users will sooner or later come to recognize ^M as carriage return and ^J as line feed (newline). Might as well use these in the UI. They are succinct and clear. My vote is for ^M, ^J, and ^M^J, unless a more convincing argument can be made for using platform names. > BTW, Unicode specs printable representations for ascii control > characters - see U+240A and U+240D. On unicode text terminals and > graphical displays, might be nice to just use them? > > http://unicode.org/charts/PDF/U2400.pdf > > ?/?/?? > > Possible disadvantage being that not all fonts might include them I > guess, and some might supply peculiar glyphs for them (on my system > they're just quite nice little mini cr and lf signs, shrug). The disadvantage (which you mention, and which your text quoted above as ?/?/?? shows) outweighs the advantage of having a cute (admittedly standard) symbol: just CR is fine; no need for C R ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mode line eol char indication 2009-01-01 19:25 ` Drew Adams @ 2009-01-02 3:44 ` Stefan Monnier 0 siblings, 0 replies; 12+ messages in thread From: Stefan Monnier @ 2009-01-02 3:44 UTC (permalink / raw) To: Drew Adams Cc: 'Juanma Barranquero', 'Stephen J. Turnbull', 'Jason Rumney', emacs-devel, 'David De La Harpe Golden' > I agree. Emacs users will sooner or later come to recognize ^M as carriage > return and ^J as line feed (newline). Might as well use these in the UI. They > are succinct and clear. By the time they figure out what ^M is "carriage return" and ^J is "newline" and that there's a difference between the two, and what is that difference, and how it relates to the EOL-convention used under various systems, they'll be just as unlikely to be confused by (Mac) or (DOS) or (Unix). The current behavior is pretty close to optimal, I think. And things like CR, LF, ^M, ^J are just non-starters AFAIC. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: mode line eol char indication 2009-01-01 18:17 ` Juanma Barranquero 2009-01-01 19:14 ` David De La Harpe Golden @ 2009-01-01 19:15 ` Drew Adams 1 sibling, 0 replies; 12+ messages in thread From: Drew Adams @ 2009-01-01 19:15 UTC (permalink / raw) To: 'Juanma Barranquero' Cc: 'Stephen J. Turnbull', emacs-devel, 'Jason Rumney' > > So you are arguing that it is the system/platform name that > > is more meaningful to users, not the eol characters. I'm OK with that. > > I'm not. \n, \r and \r\n (or ^J, etc) are exact: what they say is what > the file contains. "Unix", "DOS" and "Mac" are just hints about the > likely origin. Is not like it is impossible to create CRLF files under > GNU/Linux, or LF files on Windows. I think I already said that my preference too is to show the eol chars, and I agree with your reason. This is about the buffer content, after all, not necessarily about a platform. I'm OK however with either approach - whichever most people prefer. But we should stick to one of them. It makes little sense to have sometimes `(DOS)' and sometimes `\', which mean the same thing. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: mode line eol char indication 2009-01-01 8:33 ` Stephen J. Turnbull 2009-01-01 8:39 ` Jason Rumney @ 2009-01-01 18:11 ` Drew Adams 1 sibling, 0 replies; 12+ messages in thread From: Drew Adams @ 2009-01-01 18:11 UTC (permalink / raw) To: 'Stephen J. Turnbull'; +Cc: emacs-devel, 'Jason Rumney' > > If we're trying to indicate the _line ending characters_, > > then lets just say what they are: C-j, C-m, or C-j C-m. > > Those are commands. Users almost *never* use those as self-inserting > characters. ^J, LF, NL, \n, OK (my preference is LF), but not C-j, > please. I actually meant ^J and ^M, sorry (must've been tired). I agree: ^J, LF, and \n are all fine. NL is not very common AFAIK. Emacs uses ^J and \n conventionally, so I'd vote for one of those. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-01-02 3:44 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-31 22:50 mode line eol char indication Drew Adams 2009-01-01 1:20 ` Jason Rumney 2009-01-01 5:44 ` Drew Adams 2009-01-01 8:33 ` Stephen J. Turnbull 2009-01-01 8:39 ` Jason Rumney 2009-01-01 18:11 ` Drew Adams 2009-01-01 18:17 ` Juanma Barranquero 2009-01-01 19:14 ` David De La Harpe Golden 2009-01-01 19:25 ` Drew Adams 2009-01-02 3:44 ` Stefan Monnier 2009-01-01 19:15 ` Drew Adams 2009-01-01 18:11 ` Drew Adams
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).