* ASCII-only startup message?
@ 2015-12-26 17:25 Paul Eggert
2015-12-26 18:16 ` Eli Zaretskii
0 siblings, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2015-12-26 17:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Emacs Development
[-- Attachment #1: Type: text/plain, Size: 237 bytes --]
What bug did the attached patch fix? Emacs is supposed to display single quotes
properly even in ASCII-only locales, and if there's a problem with that display
it'd be better to fix the underlying problem than paper over the symptoms.
[-- Attachment #2: 0001-Don-t-produce-non-ASCII-characters-in-scratch.patch --]
[-- Type: text/x-diff, Size: 1111 bytes --]
From 1490096652f405f6e2847a8b5e6842f80e2ec9f1 Mon Sep 17 00:00:00 2001
From: Eli Zaretskii <eliz@gnu.org>
Date: Sat, 26 Dec 2015 18:58:04 +0200
Subject: [PATCH] Don't produce non-ASCII characters in *scratch*
* lisp/startup.el (initial-scratch-message): Quote apostrophes to
avoid producing non-ASCII characters in the *scratch* buffer's
commentary.
---
lisp/startup.el | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lisp/startup.el b/lisp/startup.el
index a31d355..0e36d35 100644
--- a/lisp/startup.el
+++ b/lisp/startup.el
@@ -1430,9 +1430,9 @@ x-apply-session-resources
(put 'cursor 'face-modified t))))
(defcustom initial-scratch-message (purecopy "\
-;; This buffer is for notes you don't want to save, and for Lisp evaluation.
+;; This buffer is for notes you don\\='t want to save, and for Lisp evaluation.
;; If you want to create a file, visit that file with \\[find-file],
-;; then enter the text in that file's own buffer.
+;; then enter the text in that file\\='s own buffer.
")
"Initial documentation displayed in *scratch* buffer at startup.
--
2.5.0
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 17:25 ASCII-only startup message? Paul Eggert
@ 2015-12-26 18:16 ` Eli Zaretskii
2015-12-26 18:41 ` Random832
2015-12-26 18:45 ` Paul Eggert
0 siblings, 2 replies; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-26 18:16 UTC (permalink / raw)
To: Paul Eggert; +Cc: Emacs-devel
> Cc: Emacs Development <Emacs-devel@gnu.org>
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 26 Dec 2015 09:25:32 -0800
>
> What bug did the attached patch fix?
That text is a comment we insert into *scratch*; it's not a doc
string. It is run through substitute-command-keys because we want to
show the key bindings for visiting files, that's all. We shouldn't
produce non-ASCII characters in comments, it's a user prerogative.
The adverse effect of inserting non-ASCII characters shows when your
locale's codeset is not UTF-8: if you want to save that buffer or any
part of it that includes this comment, you get annoyed by the request
to specify a suitable encoding (because *scratch* correctly starts
with the locale's default encoding).
> Emacs is supposed to display single quotes properly even in
> ASCII-only locales, and if there's a problem with that display it'd
> be better to fix the underlying problem than paper over the
> symptoms.
No, the display is OK, that's not the problem.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 18:16 ` Eli Zaretskii
@ 2015-12-26 18:41 ` Random832
2015-12-26 18:50 ` Paul Eggert
2015-12-26 19:01 ` Eli Zaretskii
2015-12-26 18:45 ` Paul Eggert
1 sibling, 2 replies; 62+ messages in thread
From: Random832 @ 2015-12-26 18:41 UTC (permalink / raw)
To: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> The adverse effect of inserting non-ASCII characters shows when your
> locale's codeset is not UTF-8: if you want to save that buffer or any
> part of it that includes this comment, you get annoyed by the request
> to specify a suitable encoding (because *scratch* correctly starts
> with the locale's default encoding).
Under what circumstances does it insert non-ASCII characters in the
first place? Is this a new feature in Emacs 25? As far as I can tell it
only has ASCII apostrophes.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 18:16 ` Eli Zaretskii
2015-12-26 18:41 ` Random832
@ 2015-12-26 18:45 ` Paul Eggert
2015-12-26 19:10 ` Eli Zaretskii
1 sibling, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2015-12-26 18:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Emacs-devel
Eli Zaretskii wrote:
> That text is a comment we insert into *scratch*; it's not a doc
> string.
Sure, but the same considerations apply to it that would apply to any text in
commentary and documentation. The text should use good English style.
> We shouldn't
> produce non-ASCII characters in comments, it's a user prerogative.
This is Emacs's comment, one that Emacs creates and inserts, so user prerogative
does not apply here.
> The adverse effect of inserting non-ASCII characters shows when your
> locale's codeset is not UTF-8: if you want to save that buffer or any
> part of it that includes this comment, you get annoyed by the request
> to specify a suitable encoding (because*scratch* correctly starts
> with the locale's default encoding).
This sort of thing is far more likely to happen with *Help* and *Info* buffers
than with the *scratch* buffer. If it is a problem, it needs to be fixed in
general, so that any far-more-likely scenario is fixed. Changing the startup
string is likely to paper over any real problem, and so is counterproductive.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 18:41 ` Random832
@ 2015-12-26 18:50 ` Paul Eggert
2015-12-26 19:11 ` Eli Zaretskii
2015-12-26 19:01 ` Eli Zaretskii
1 sibling, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2015-12-26 18:50 UTC (permalink / raw)
To: Random832, emacs-devel
Random832 wrote:
> Under what circumstances does it insert non-ASCII characters in the
> first place?
If your locale can display single quotes, it uses them (or at least, it did
before Eli's recent change).
> Is this a new feature in Emacs 25?
Yes, and it's in the documentation and is noted in NEWS under
text-quoting-style, substitute-command-keys, format-message, "Documentation
strings", etc. We have had several lengthy discussions about it.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 18:41 ` Random832
2015-12-26 18:50 ` Paul Eggert
@ 2015-12-26 19:01 ` Eli Zaretskii
1 sibling, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-26 19:01 UTC (permalink / raw)
To: Random832; +Cc: emacs-devel
> From: Random832 <random832@fastmail.com>
> Date: Sat, 26 Dec 2015 13:41:12 -0500
>
> Eli Zaretskii <eliz@gnu.org> writes:
> > The adverse effect of inserting non-ASCII characters shows when your
> > locale's codeset is not UTF-8: if you want to save that buffer or any
> > part of it that includes this comment, you get annoyed by the request
> > to specify a suitable encoding (because *scratch* correctly starts
> > with the locale's default encoding).
>
> Under what circumstances does it insert non-ASCII characters in the
> first place? Is this a new feature in Emacs 25?
It is a new feature of Emacs 25 that substitute-command-keys by
default converts several quote characters, including the apostrophe,
into the corresponding Unicode "curved quote" characters.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 18:45 ` Paul Eggert
@ 2015-12-26 19:10 ` Eli Zaretskii
2015-12-26 19:40 ` Paul Eggert
0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-26 19:10 UTC (permalink / raw)
To: Paul Eggert; +Cc: Emacs-devel
> Cc: Emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 26 Dec 2015 10:45:23 -0800
>
> Eli Zaretskii wrote:
> > That text is a comment we insert into *scratch*; it's not a doc
> > string.
>
> Sure, but the same considerations apply to it that would apply to any text in
> commentary and documentation. The text should use good English style.
We have agreed to make these changes in documentation and in
messages. I don't think we have agreed to do this in any other text
that just happens to be run through substitute-command-keys for
whatever reasons.
> > We shouldn't
> > produce non-ASCII characters in comments, it's a user prerogative.
>
> This is Emacs's comment, one that Emacs creates and inserts, so user prerogative
> does not apply here.
Exactly! So Emacs should do that.
> > The adverse effect of inserting non-ASCII characters shows when your
> > locale's codeset is not UTF-8: if you want to save that buffer or any
> > part of it that includes this comment, you get annoyed by the request
> > to specify a suitable encoding (because*scratch* correctly starts
> > with the locale's default encoding).
>
> This sort of thing is far more likely to happen with *Help* and *Info* buffers
> than with the *scratch* buffer. If it is a problem, it needs to be fixed in
> general, so that any far-more-likely scenario is fixed. Changing the startup
> string is likely to paper over any real problem, and so is counterproductive.
I don't want to restart old longish threads. What we do in *Help*
buffers was already decided, and I don't want to open that decision.
The *scratch* buffer is different. We explicitly say that it is for
notes. Users might want to save some of those notes. The *scratch*
buffer is correctly created in the user locale's encoding, so we
should not put there characters that cannot be encoded in that
encoding. Unlike with display, inserting locale-dependent text in
this case sounds like not a good idea (think uniformity). Changing
the encoding of that buffer doesn't sound correct, either: the user
should be able to type there anything in their locale's language, and
be able to save that in the locale's encoding. So I simply quoted
these apostrophes. I really don't see what's the big deal, we never
agreed to have these curved quotes in *scratch*, not even discussed
that, AFAIR. It's an unintended side effect, as far as I'm concerned.
So I fixed it. The change doesn't modify the startup string, on the
contrary: it returns it to what it has been for eons.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 18:50 ` Paul Eggert
@ 2015-12-26 19:11 ` Eli Zaretskii
0 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-26 19:11 UTC (permalink / raw)
To: Paul Eggert; +Cc: random832, emacs-devel
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 26 Dec 2015 10:50:58 -0800
>
> > Is this a new feature in Emacs 25?
>
> Yes, and it's in the documentation and is noted in NEWS under
> text-quoting-style, substitute-command-keys, format-message, "Documentation
> strings", etc. We have had several lengthy discussions about it.
Indeed. Interested individuals are invited to read those discussions.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 19:10 ` Eli Zaretskii
@ 2015-12-26 19:40 ` Paul Eggert
2015-12-26 20:50 ` Eli Zaretskii
0 siblings, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2015-12-26 19:40 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Emacs-devel
Eli Zaretskii wrote:
> I really don't see what's the big deal
Yes, of course this is a very minor point.
> we never agreed to have these curved quotes in *scratch*, not even discussed
> that, AFAIR.
No, we discussed it on September 24 after I made the change on September 2. You
asked whether the change was intended; I replied that it was. Please see:
http://lists.gnu.org/archive/html/emacs-diffs/2015-09/msg00031.html
http://lists.gnu.org/archive/html/emacs-devel/2015-09/msg00967.html
http://lists.gnu.org/archive/html/emacs-devel/2015-09/msg00968.html
As far as I know there was no further comment -- no reports of problems with the
startup buffer, for example -- so I would favor leaving the message as it was
before today, at least during the feature freeze.
Alternatively, I suppose we could rephrase the message to avoid apostrophes
entirely (this would prevent those *ugly* ASCII apostrophes from *ruining* the
otherwise-*beautiful* startup screen :-). However, I mildly prefer starting with
a couple of non-ASCII characters, as a useful test of Emacs's capabilities on
startup.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 19:40 ` Paul Eggert
@ 2015-12-26 20:50 ` Eli Zaretskii
2015-12-26 23:28 ` Paul Eggert
0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-26 20:50 UTC (permalink / raw)
To: Paul Eggert; +Cc: Emacs-devel
> Cc: Emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 26 Dec 2015 11:40:55 -0800
>
> > we never agreed to have these curved quotes in *scratch*, not even discussed
> > that, AFAIR.
>
> No, we discussed it on September 24 after I made the change on September 2. You
> asked whether the change was intended; I replied that it was. Please see:
>
> http://lists.gnu.org/archive/html/emacs-diffs/2015-09/msg00031.html
> http://lists.gnu.org/archive/html/emacs-devel/2015-09/msg00967.html
> http://lists.gnu.org/archive/html/emacs-devel/2015-09/msg00968.html
OK, so I changed my mind now. These characters shouldn't be there in
Lisp comments. Not in *scratch*.
> Alternatively, I suppose we could rephrase the message to avoid apostrophes
> entirely (this would prevent those *ugly* ASCII apostrophes from *ruining* the
> otherwise-*beautiful* startup screen :-). However, I mildly prefer starting with
> a couple of non-ASCII characters, as a useful test of Emacs's capabilities on
> startup.
The ugliness argument was about the grave accent, `, not about the
apostrophe. There's nothing ugly about it.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 20:50 ` Eli Zaretskii
@ 2015-12-26 23:28 ` Paul Eggert
2015-12-27 0:17 ` Drew Adams
2015-12-27 3:44 ` Eli Zaretskii
0 siblings, 2 replies; 62+ messages in thread
From: Paul Eggert @ 2015-12-26 23:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 613 bytes --]
Eli Zaretskii wrote:
> The ugliness argument was about the grave accent, `, not about the
> apostrophe.
Although the ugliness argument is primarily about the grave accent, it is also
about the apostrophe. With most fonts “There’s” looks nicer than “There's”, and
it’s better typography in English. Either character will do, but there seems
little point uglifying the former into the latter.
To try to avoid spending more of our time about whether to use straight or
curved apostrophes, I reworded the commentary to omit the apostrophes. I
tightened it up a bit while I was at it.
[-- Attachment #2: 0001-Reword-initial-scratch-for-brevity-appearance.patch --]
[-- Type: text/x-diff, Size: 1193 bytes --]
From c82f117c22d1f88f35166cdb7d58c2ca02b1127f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 26 Dec 2015 15:22:28 -0800
Subject: [PATCH] Reword initial *scratch* for brevity, appearance
* lisp/startup.el (initial-scratch-message):
Reword to avoid apostrophes, and to make it shorter.
See the thread starting in:
http://lists.gnu.org/archive/html/emacs-devel/2015-12/msg01241.html
---
lisp/startup.el | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/lisp/startup.el b/lisp/startup.el
index 0e36d35..20f25a8 100644
--- a/lisp/startup.el
+++ b/lisp/startup.el
@@ -1430,9 +1430,8 @@ x-apply-session-resources
(put 'cursor 'face-modified t))))
(defcustom initial-scratch-message (purecopy "\
-;; This buffer is for notes you don\\='t want to save, and for Lisp evaluation.
-;; If you want to create a file, visit that file with \\[find-file],
-;; then enter the text in that file\\='s own buffer.
+;; This buffer is for text that is not saved, and for Lisp evaluation.
+;; To create a file, visit it with \\[find-file] and enter text in its buffer.
")
"Initial documentation displayed in *scratch* buffer at startup.
--
2.5.0
^ permalink raw reply related [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-26 23:28 ` Paul Eggert
@ 2015-12-27 0:17 ` Drew Adams
2015-12-27 1:03 ` Clément Pit--Claudel
` (2 more replies)
2015-12-27 3:44 ` Eli Zaretskii
1 sibling, 3 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-27 0:17 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii; +Cc: Emacs-devel
> > The ugliness argument was about the grave accent, `, not about
> > the apostrophe.
>
> Although the ugliness argument is primarily about the grave
> accent, it is also about the apostrophe. With most fonts
> “There’s” looks nicer than “There's”, and it’s better typography
> in English.
This is completely wrong. Do you have a reference to back up
such a claim?
I have never seen any doc or typography guideline that favors
a quotation mark over an apostrophe for English contractions,
possessives, or non-word plurals. Quite the contrary. These
use cases are precisely the raison d'être for the apostrophe.
Start here: http://english.stackexchange.com/a/36048/51214
(And please consider fasting from kool-aid ASAP, would be my
recommendation.)
> Either character will do, but there seems little point uglifying
> the former into the latter.
What was the point in uglifying the latter (apostrophe) into the
former (right single quotation mark)?
> To try to avoid spending more of our time about whether to use
> straight or curved apostrophes, I reworded the commentary to omit
> the apostrophes. I tightened it up a bit while I was at it.
Hallelujah! We are saved! ASCII saves, and we can save as ASCII!
(But it is not about straight vs curved apostrophes. Any ol'
apostrophe will do. It's about apostrophes vs single quotation
marks.)
Anyway, it's the _wrong thing_. Or if it helps you understand
better: it’s the _wrong thing_.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 0:17 ` Drew Adams
@ 2015-12-27 1:03 ` Clément Pit--Claudel
2015-12-27 2:51 ` Drew Adams
2015-12-27 1:09 ` Paul Eggert
2015-12-27 6:58 ` Random832
2 siblings, 1 reply; 62+ messages in thread
From: Clément Pit--Claudel @ 2015-12-27 1:03 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3097 bytes --]
Hi Drew,
On 12/27/2015 01:17 AM, Drew Adams wrote:
> I have never seen any doc or typography guideline that favors
> a quotation mark over an apostrophe for English contractions,
> possessives, or non-word plurals. Quite the contrary. These
> use cases are precisely the raison d'être for the apostrophe.
I don't know much about this topic, so this may not be the type of documents you're looking for. Are you aware of the following passage, on page 274 of the latest Unicode standard (page 19 of http://www.unicode.org/versions/Unicode8.0.0/ch06.pdf)? I believe that David Kastrup already quoted it.
> Apostrophes
>
> U+0027 apostrophe is the most commonly used character for apostrophe. For
> historical reasons, U+0027 is a particularly overloaded character. In ASCII, it
> is used to represent a punctuation mark (such as right single quotation mark,
> left single quotation mark, apostrophe punctuation, vertical line, or prime) or
> a modifier letter (such as apostrophe modifier or acute accent). Punctuation
> marks generally break words; modifier letters generally are considered part of a
> word. When text is set, U+2019 right single quotation mark is preferred as
> apostrophe, but only U+0027 is present on most keyboards. Software commonly
> offers a facility for automatically converting the U+0027 apostrophe to a
> contextually selected curly quotation glyph. In these systems, a U+0027 in the
> data stream is always represented as a straight vertical line and can never
> represent a curly apostrophe or a right quotation mark.
>
> Letter Apostrophe.
>
> U+02BC modifier letter apostrophe is preferred where the apostrophe is to
> represent a modifier letter (for example, in transliterations to indicate a
> glottal stop). In the latter case, it is also referred to as a letter
> apostrophe.
>
> Punctuation Apostrophe.
>
> U+2019 right single quotation mark is preferred where the character is to
> represent a punctuation mark, as for contractions: “We’ve been here before.” In
> this latter case, U+2019 is also referred to as a punctuation apostrophe.
>
> An implementation cannot assume that users’ text always adheres to the
> distinction between these characters. The text may come from different sources,
> including mapping from other character sets that do not make this distinction
> between the letter apostrophe and the punctuation apostrophe/right single
> quotation mark. In that case, all of them will generally be represented by
> U+2019.
>
> The semantics of U+2019 are therefore context dependent. For example, if
> surrounded by letters or digits on both sides, it behaves as an in-text
> punctuation character and does not separate words or lines.
I understand this as an explicit endorsement of "There’s" over "There's", and of l"it’s the _wrong thing_" over "it's the _wrong thing_". Mark Davis (the president of the Unicode consortium) clarified this in an email back in 1999: http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML017/0558.html
Cheers,
Clément.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 0:17 ` Drew Adams
2015-12-27 1:03 ` Clément Pit--Claudel
@ 2015-12-27 1:09 ` Paul Eggert
2015-12-27 15:56 ` Eli Zaretskii
2015-12-27 6:58 ` Random832
2 siblings, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2015-12-27 1:09 UTC (permalink / raw)
To: Drew Adams, Eli Zaretskii; +Cc: Emacs-devel
Drew Adams wrote:
> I have never seen any doc or typography guideline that favors
> a quotation mark over an apostrophe for English contractions,
> possessives, or non-word plurals.
Section 6.2 of the Unicode Standard states:
U+2019 right single quotation mark is preferred where the character is to
represent a punctuation mark, as for contractions: “We’ve been here before.” In
this latter case, U+2019 is also referred to as a punctuation apostrophe.
http://www.unicode.org/versions/Unicode8.0.0/ch06.pdf
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-27 1:03 ` Clément Pit--Claudel
@ 2015-12-27 2:51 ` Drew Adams
0 siblings, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-27 2:51 UTC (permalink / raw)
To: Clément Pit--Claudel, emacs-devel
> > I have never seen any doc or typography guideline that favors
> > a quotation mark over an apostrophe for English contractions,
> > possessives, or non-word plurals. Quite the contrary. These
> > use cases are precisely the raison d'être for the apostrophe.
>
> I don't know much about this topic, so this may not be the type of documents
> you're looking for. Are you aware of the following passage, on page 274 of
> the latest Unicode standard (page 19 of
> http://www.unicode.org/versions/Unicode8.0.0/ch06.pdf)? I believe that David
> Kastrup already quoted it.
>
> > Apostrophes
> >
> > U+0027 apostrophe is the most commonly used character for apostrophe. For
> > historical reasons, U+0027 is a particularly overloaded character. In
> > ASCII, it is used to represent a punctuation mark (such as right single
> > quotation mark, left single quotation mark, apostrophe punctuation, vertical
> > line, or prime) or a modifier letter (such as apostrophe modifier or acute
> > accent).
> > Punctuation marks generally break words; modifier letters generally are
> > considered part of a word. When text is set, U+2019 right single quotation
> > mark is preferred as apostrophe, but only U+0027 is present on most keyboards.
> > Software commonly offers a facility for automatically converting the U+0027
> > apostrophe to a contextually selected curly quotation glyph. In these systems,
> > a U+0027 in the data stream is always represented as a straight vertical line
> > and can never represent a curly apostrophe or a right quotation mark.
> >
> > Letter Apostrophe.
> >
> > U+02BC modifier letter apostrophe is preferred where the apostrophe is to
> > represent a modifier letter (for example, in transliterations to indicate
> > a glottal stop). In the latter case, it is also referred to as a letter
> > apostrophe.
> >
> > Punctuation Apostrophe.
> >
> > U+2019 right single quotation mark is preferred where the character is to
> > represent a punctuation mark, as for contractions: “We’ve been here
> > before.” In this latter case, U+2019 is also referred to as a punctuation
> > apostrophe.
> >
> > An implementation cannot assume that users’ text always adheres to the
> > distinction between these characters. The text may come from different
> > sources, including mapping from other character sets that do not make this
> > distinction between the letter apostrophe and the punctuation apostrophe/right
> > single quotation mark. In that case, all of them will generally be represented
> > by U+2019.
> >
> > The semantics of U+2019 are therefore context dependent. For example, if
> > surrounded by letters or digits on both sides, it behaves as an in-text
> > punctuation character and does not separate words or lines.
>
> I understand this as an explicit endorsement of "There’s" over "There's",
> and of "it’s the _wrong thing_" over "it's the _wrong thing_". Mark Davis
> (the president of the Unicode consortium) clarified this in an email back in
> 1999: http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML017/0558.html
I see, and I thank both you and Paul for that information, of which I was unaware.
I should have checked first the Wikipedia article for Apostrophe, which covers
all of this (including the Unicode corner) quite well:
https://en.wikipedia.org/wiki/Apostrophe.
The topic seems still to be muddy water. Unicode still confuses quotation
mark with apostrophe. At least the _role_ of apostrophe (used within a word)
is recognized as what I said, as opposed to the role of quotation marks (used
around words).
But Unicode has apparently decided to consider the _same character_ as
appropriate for both uses. Odd that that choice is made over and against
the same kind of confusion wrt use of the ASCII apostrophe character, which
they rightfully point to as (even more) overloaded in its use ("U+0027 is a
particularly overloaded character").
I bend to Unicode's choice in this, of course. But it is too bad, IMO,
that it does not distinguish apostrophe and quotation marks in usage,
but recommends using the _same character for both uses_.
Recommending this, even when they could have done otherwise (ASCII's
existing-keyboards excuse does not apply to Unicode choices), seems like
a mistake to me. But what do I know? I'm no expert in these things.
And Unicode has wider concerns than English typography - similar-looking
glyphs apparently have different usages across languages. Perhaps this
confounding of apostrophe and quote mark was a compromise of some kind.
I do agree that one character for two uses (apostrophe and quotation) is
better than one character for several uses, which has been the case for
the ASCII apostrophe. I am surprised that two different characters were
not defined for these different uses, however, even if they might often
have similar or even identical appearances.
Oh well. I would still argue for for _Emacs_ to use ASCII apostrophe (and
not a quote mark) for apostrophe uses, both on the basis of appearance -
at least in the default fonts and in the fonts I use - and (especially) on
the basis of simplicity of use for most keyboard users. Emacs is about
writing and editing more than it is about presentation-level typography.
In terms of appearance, I disagree with Paul's appreciation that "it’s"
is preferable to "it's" in the default Emacs fonts (and more generally in
the fonts I use, for Emacs and for technical doc applications). But in
any case, I think that ease of use is more important for Emacs than
appearance, for this.
Anyway, thanks to both of you, again, for teaching me about Unicode's
equivalence of apostrophe and quotation mark, and its preference of this
character for the uses of an apostrophe.
It's not my personal preference for uses of an apostrophe. It's not the
preference used (so far) by my company in its technical docs (FWIW).
And I don't think it should be the preference for Emacs - which _should_
generally use Unicode, and which should respect its recommendations when
appropriate, but which need not bend to using the Unicode right quotation
mark as apostrophe.
I think (so far) that Emacs should stick to using ASCII apostrophe as
apostrophe, in spite of the Unicode standard's recommendation here.
And this mainly for simplicity of use, not appearance (though I also
prefer the appearance, personally, in the fonts I know).
I have now heard Unicode's recommendation (thanks to you), but I don't
read the reason given for that recommendation as a strong reason.
Of course, even the weakest of reasons given by Unicode becomes a
strong reason (in general), just by virtue of being a Unicode
Consortium recommendation. Whether it is appropriate for Emacs is
another story.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-26 23:28 ` Paul Eggert
2015-12-27 0:17 ` Drew Adams
@ 2015-12-27 3:44 ` Eli Zaretskii
2015-12-27 8:12 ` Nikolai Weibull
2015-12-28 20:04 ` John Wiegley
1 sibling, 2 replies; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-27 3:44 UTC (permalink / raw)
To: Paul Eggert; +Cc: Emacs-devel
> Cc: Emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 26 Dec 2015 15:28:49 -0800
>
> To try to avoid spending more of our time about whether to use straight or
> curved apostrophes, I reworded the commentary to omit the apostrophes. I
> tightened it up a bit while I was at it.
Feel free to commit this if you must, although it feels almost
ridiculous to me. An apostrophe is just a character, we shouldn't
develop a mania about characters like that.
Thanks.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 0:17 ` Drew Adams
2015-12-27 1:03 ` Clément Pit--Claudel
2015-12-27 1:09 ` Paul Eggert
@ 2015-12-27 6:58 ` Random832
2015-12-27 14:17 ` Per Starbäck
2 siblings, 1 reply; 62+ messages in thread
From: Random832 @ 2015-12-27 6:58 UTC (permalink / raw)
To: emacs-devel
Drew Adams <drew.adams@oracle.com> writes:
> This is completely wrong. Do you have a reference to back up
> such a claim?
>
> I have never seen any doc or typography guideline that favors
> a quotation mark over an apostrophe for English contractions,
> possessives, or non-word plurals. Quite the contrary. These
> use cases are precisely the raison d'être for the apostrophe.
Er, the question isn't whether to use a quotation mark or an
apostrophe, it's whether to use a curved apostrophe or a
straight apostrophe. That Unicode happens to unify straight
apostrophe with straight single quote and curved apostrophe with
curved single quote isn't relevant.
> (But it is not about straight vs curved apostrophes. Any ol'
> apostrophe will do. It's about apostrophes vs single quotation
> marks.)
I have no idea why you think U+0027 is more an apostrophe, or
less a single quotation mark, than U+2019. The fact that it is
required to use a typographically suboptimal neutral/"straight"
glyph is precisely because of its historical use as a quotation
mark (and as a prime symbol, stress mark, etc).
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 3:44 ` Eli Zaretskii
@ 2015-12-27 8:12 ` Nikolai Weibull
2015-12-28 20:04 ` John Wiegley
1 sibling, 0 replies; 62+ messages in thread
From: Nikolai Weibull @ 2015-12-27 8:12 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Paul Eggert, Emacs Developers
On Sun, Dec 27, 2015 at 4:44 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> Cc: Emacs-devel@gnu.org
>> From: Paul Eggert <eggert@cs.ucla.edu>
>> Date: Sat, 26 Dec 2015 15:28:49 -0800
>>
>> To try to avoid spending more of our time about whether to use straight or
>> curved apostrophes, I reworded the commentary to omit the apostrophes. I
>> tightened it up a bit while I was at it.
> Feel free to commit this if you must, although it feels almost
> ridiculous to me. An apostrophe is just a character, we shouldn't
> develop a mania about characters like that.
Oh, the irony. ;-) (Or should I say ‘😉’?)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 6:58 ` Random832
@ 2015-12-27 14:17 ` Per Starbäck
2015-12-27 14:55 ` Drew Adams
0 siblings, 1 reply; 62+ messages in thread
From: Per Starbäck @ 2015-12-27 14:17 UTC (permalink / raw)
To: emacs-devel@gnu.org
>> I have never seen any doc or typography guideline that favors
>> a quotation mark over an apostrophe for English contractions,
>> possessives, or non-word plurals. Quite the contrary. These
>> use cases are precisely the raison d'être for the apostrophe.
>
> Er, the question isn't whether to use a quotation mark or an
> apostrophe, it's whether to use a curved apostrophe or a
> straight apostrophe. That Unicode happens to unify straight
> apostrophe with straight single quote and curved apostrophe with
> curved single quote isn't relevant.
Right. And it is not primarily a Unicode thing, it is a typography
thing. There are a few characters that owe their existance to
typewriters which used less differentiation than you would use in
writing or in setting text. In a real book (for example) you would
never see the typewriter character ', but always a specific character,
like a left-single or right-single or a prime character.
Earlier it was a big difference between professional-looking (typeset)
text and amateurish-looking typed text (or later printed on a line
printer) with no inbetween. Straight apostrophes/quotes would only be
seen in the later kind, and would be one of the tell-tale signs of
something not done by a professional.
But of course, when technology made it easier to produce nicer-looking
output, the differences became muddled. Lots of text was produced with
nice-looking fonts by people who didn't know anything about typgraphy.
Text if often published in books more or less taken from output from
word processors used by the authors, and more and more text is read
online straight from authors you use the characters that are
conveniently located on their keyboard (as I do here for example).
Today you can see "typewriter" characters even in prestige books where
there ought to be people involved who know better, so it doesn't
surprise me that some people think that that is how an apostrophe
actually looks. Unicode has muddles it further by bad names for these
characters. I think ascii ' should have a name similar to ascii -
(HYPHEN-MINUS) which shows that this is something used as a stand-in
for several different characters.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-27 14:17 ` Per Starbäck
@ 2015-12-27 14:55 ` Drew Adams
2015-12-27 16:35 ` Per Starbäck
0 siblings, 1 reply; 62+ messages in thread
From: Drew Adams @ 2015-12-27 14:55 UTC (permalink / raw)
To: Per Starbäck, emacs-devel
> Unicode has muddles it further by bad names for these
> characters. I think ascii ' should have a name similar to ascii -
> (HYPHEN-MINUS) which shows that this is something used as a stand-in
> for several different characters.
Yes. And not just the names. Unicode too has a single stand-in for
multiple (2) characters. A single Unicode character is apparently
meant (recommended) to represent both the apostrophe and the right
single quotation mark. These are (should be) different animals and
they need not always have the same glyphs. But for Unicode not so.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 1:09 ` Paul Eggert
@ 2015-12-27 15:56 ` Eli Zaretskii
2015-12-27 18:45 ` Paul Eggert
0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-27 15:56 UTC (permalink / raw)
To: Paul Eggert; +Cc: drew.adams, Emacs-devel
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 26 Dec 2015 17:09:49 -0800
> Cc: Emacs-devel@gnu.org
>
> Section 6.2 of the Unicode Standard states:
>
> U+2019 right single quotation mark is preferred where the character is to represent a punctuation mark, as for contractions: “We’ve been here before.” In this latter case, U+2019 is also referred to as a punctuation apostrophe.
The Unicode recommendations should be taken with a grain of salt when
applying them to Emacs, especially for major modes which aren't
derived from Text mode. Unicode Standard is about typesetting and
displaying plain text, it says that much in many places. See "Plain
Text" in Chapter 2 of the standard, which says, inter alia:
The Unicode Standard encodes plain text. The distinction between
plain text and other forms of data in the same data stream is the
function of a higher-level protocol and is not specified by the
Unicode Standard itself.
Even in the passage quoted in this thread, it says "When text is set"
(with "set" meaning "typeset" here). Whenever any markup is used, or
some other high-level protocols are applicable, Unicode (voluntarily)
takes a back seat.
The issue at hand is not with plain text, but with comments in a major
mode that supports Lisp, i.e. the text in the buffer has syntax of a
source of a program. Text handling in such buffers has its own
high-level protocols that override Unicode recommendations where
needed. As a trivial example, we fontify comments and strings in this
mode to have special appearances that are outside of the Unicode
scope. As a less trivial example, evaluate the following in a buffer
under Fundamental mode:
(insert ";; אבגדה\n")
You will see that this "Lisp comment" is displayed starting at the
right edge of the window, as prescribed by the UBA, the Unicode
Bidirectional Algorithm, which Emacs supports. Now do the same in
*scratch* -- the comment is displayed starting at the left window edge
instead, as you'd expect for a comment, because a buffer whose mode is
for program sources overrides the UBA wrt to the "base paragraph
direction".
IOW, Emacs already behaves slightly differently in major modes that
derive from prog-mode, and therefore there's nothing inherently wrong
with deviating from plain-text related Unicode recommendations
regarding the apostrophe, quotes, etc.
So I think we should use our own judgment in this case, and what the
Unicode Standard says is not the only source of wisdom we should
consider.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 14:55 ` Drew Adams
@ 2015-12-27 16:35 ` Per Starbäck
2015-12-27 17:42 ` Drew Adams
0 siblings, 1 reply; 62+ messages in thread
From: Per Starbäck @ 2015-12-27 16:35 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel@gnu.org
>> Unicode has muddle[d] it further by bad names for these
>> characters. I think ascii ' should have a name similar to ascii -
>> (HYPHEN-MINUS) which shows that this is something used as a stand-in
>> for several different characters.
>
> Yes. And not just the names. Unicode too has a single stand-in for
> multiple (2) characters. A single Unicode character is apparently
> meant (recommended) to represent both the apostrophe and the right
> single quotation mark.
I don't agree. It *is* one character that is used in several ways, in
that typographical traditional sees them as the same character.
(Important for Unicode is also that no previous character set
differentiated between them, because then it would have to as well, by
its design decisions.)
That one character has several meanings, as the exclamation mark "!"
also means factorial doesn't mean it needs to be seen as two
characters. Even when there are such "double characters" in Unicode
often you are recommended to only use one of them anyway. That's the
case with U+00E5 "ANGSTROM SIGN" where you normally instead use the
letter Å (LATIN CAPITAL LETTER A WITH RING ABOVE, U+00C5) instead.
I suspect that the thought that the apostrophe is "another" character
than one of the curly quotes wouldn't at all be so strong if the
Unicode name for ' wasn't APOSTROPHE but instead was TYPEWRITER SINGLE
QUOTATION MARK.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-27 16:35 ` Per Starbäck
@ 2015-12-27 17:42 ` Drew Adams
2015-12-27 19:27 ` Per Starbäck
0 siblings, 1 reply; 62+ messages in thread
From: Drew Adams @ 2015-12-27 17:42 UTC (permalink / raw)
To: Per Starbäck; +Cc: emacs-devel
> >> Unicode has muddle[d] it further by bad names for these
> >> characters. I think ascii ' should have a name similar to ascii -
> >> (HYPHEN-MINUS) which shows that this is something used as a stand-in
> >> for several different characters.
> >
> > Yes. And not just the names. Unicode too has a single stand-in for
> > multiple (2) characters. A single Unicode character is apparently
> > meant (recommended) to represent both the apostrophe and the right
> > single quotation mark.
>
> I don't agree. It *is* one character that is used in several ways,
> in that typographical traditional sees them as the same character.
> (Important for Unicode is also that no previous character set
> differentiated between them, because then it would have to as well, by
> its design decisions.)
Yes, we disagree. We don't disagree that the Unicode standard can
define and recommend what it wants. And Unicode takes multiple
languages into consideration and sometimes makes compromises.
That's to be expected.
We do disagree that an apostrophe is the same thing as a single
quotation mark. The two might or might not look the same, but
they function quite differently. Whether Unicode chooses one or
two characters to represent those different functions is, well,
a choice.
IOW, "it *is* one character" ONLY if one sees it or defines it
as such. If not, it is not.
See the Q&A I referenced at the outset:
http://english.stackexchange.com/a/36048/51214. Or google
"apostrophe versus quotation mark" or similar.
> That one character has several meanings, as the exclamation mark
> "!" also means factorial doesn't mean it needs to be seen as two
> characters.
Correct. It does not imply that it NEEDS to be seen as two
characters. But it also does not imply that it NEEDS to be
seen as the one and the same character.
Consider the apostrophe and the prime mark. You could argue
that they do not NEED to be seen as separate characters. But
the (better) choice was made to use separate chars for them.
(And again, we're talking "characters" now, not their glyphs.)
Or consider character HYPHEN-MINUS (U+002D), character HYPHEN
(U+2010), and character MINUS SIGN (U+2212).
You might say that the first of these is analogous to the ASCII
apostrophe (U+0027) - it is essentially for compatibility. But
Unicode clearly separated hyphen from minus. NOT because they
necessarily *look* different, but because they *are* different -
they are *used* differently.
Unicode made choices, and no doubt good ones. But they are
*choices*: same char for different uses of !, same char for
different uses of ’, but different chars for different uses
of − and -. None of this was written in the stars; people
made choices. Just as we are doing for Emacs.
> I suspect that the thought that the apostrophe is "another" character
> than one of the curly quotes wouldn't at all be so strong if the
> Unicode name for ' wasn't APOSTROPHE but instead was TYPEWRITER SINGLE
> QUOTATION MARK.
Again, the argument for having two characters is based not on
the appearance so much as on the different uses. Ask what an
apostrophe IS and you will get the explanation that I cited
(http://english.stackexchange.com/a/36048/51214). Ask what a
quotation mark IS and you will get an entirely different
explanation. They are different things, whether or not someone
decides to represent them using the same character.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 15:56 ` Eli Zaretskii
@ 2015-12-27 18:45 ` Paul Eggert
0 siblings, 0 replies; 62+ messages in thread
From: Paul Eggert @ 2015-12-27 18:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: drew.adams, Emacs-devel
Eli Zaretskii wrote:
> (insert ";; אבגדה\n")
> ...
> Emacs already behaves slightly differently in major modes that
> derive from prog-mode,
Sure, if one includes the characters that delimit a comment, as in that example.
But characters within a comment are generally treated as text by Emacs, and this
is a good thing. Users should not need to learn different rules for
text-within-a-comment as opposed to other text.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 17:42 ` Drew Adams
@ 2015-12-27 19:27 ` Per Starbäck
2015-12-27 22:47 ` Drew Adams
0 siblings, 1 reply; 62+ messages in thread
From: Per Starbäck @ 2015-12-27 19:27 UTC (permalink / raw)
To: emacs-devel@gnu.org
>> That one character has several meanings, as the exclamation mark
>> "!" also means factorial doesn't mean it needs to be seen as two
>> characters.
>
> Correct. It does not imply that it NEEDS to be seen as two
> characters. But it also does not imply that it NEEDS to be
> seen as the one and the same character.
> Consider the apostrophe and the prime mark. You could argue
> that they do not NEED to be seen as separate characters. But
> the (better) choice was made to use separate chars for them.
No, only out of ignorance could you argue that, since they don't even
look the same. Please recognize that typography is much older than
computers, and that the "choice" that those should be different
characters goes back a long time. It's not something that anyone alive
when the Unicode consortium was founded has had any input on.
For most characters (included all mentioned in this post) the correct
chronology is this:
(1) there is a bunch of characters, used in writing and typesetting
(2) some technology (typewriters, computers) create some new ersatz
characters that are used as several "real" characters, for simplicity
sake
(3) later technology creates bigger character sets that have all those
characters. Of course the "ersatz characters" still exist as well, and
it is they that have special syntactic meanings in programming
languages etc. (Also people often use keep using them in typed text
since it's easier to enter, as I do in this text for example.)
You keep arguing as if step (1) didn't exist, that the ascii
characters are the original characters and the Unicode consortium then
decides to split some of them up more or less arbitrarily.
> Unicode made choices, and no doubt good ones. But they are
> *choices*: same char for different uses of !, same char for
> different uses of ’, but different chars for different uses
> of − and -.
There *are* certainly some interesting choices made, but these are
not. All of your examples are established since a long time before
computers even existed. You think "different uses of ~ and -" only
because you have been conditioned by typewriters and computers
(probably primarily the latter) into thinking there is *one* character
"-" that is used in various ways.
> Or consider character HYPHEN-MINUS (U+002D), character HYPHEN
> (U+2010), and character MINUS SIGN (U+2212).
>
> You might say that the first of these is analogous to the ASCII
> apostrophe (U+0027) - it is essentially for compatibility.
Yes, that is true, but not for compatibility between "apostrophe" and
"right single quotation mark" as that imagined argument continues in
your post, but for compatibility between "left single quotation mark"
and "right single quotation mark" as well as less common characters
like "prime".
It is also analogous to ASCII " which is a compatibility character
between primarily "left double quotation mark" and "right double
quotation mark" (but also for less common characters like "double
prime").
I've cut down on quotations. This can have a tendency to run away into
what isn't relevant. What *is* relevant is that there is common
misconception that ASCII ' someone is *more* correct as apostrophe
than it is as a quotation character. It just isn't. In "lazy"
typewritten text (like this), by all means use ' and ". In
good-looking text they just aren't used. This is relevant for Emacs as
it has been decided to sometimes show such "good-looking text".
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-27 19:27 ` Per Starbäck
@ 2015-12-27 22:47 ` Drew Adams
2015-12-27 23:45 ` Per Starbäck
2015-12-28 9:37 ` Paul Eggert
0 siblings, 2 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-27 22:47 UTC (permalink / raw)
To: Per Starbäck, emacs-devel
> > Or consider character HYPHEN-MINUS (U+002D), character HYPHEN
> > (U+2010), and character MINUS SIGN (U+2212).
> >
> > You might say that the first of these is analogous to the ASCII
> > apostrophe (U+0027) - it is essentially for compatibility.
>
> Yes, that is true, but not for compatibility between "apostrophe" and
> "right single quotation mark" as that imagined argument continues in
> your post, but for compatibility between "left single quotation mark"
> and "right single quotation mark" as well as less common characters
> like "prime".
Huh? The Unicode _name_ of character U+0027 is... "APOSTROPHE".
And the Unicode "old name" of it is "APOSTROPHE-QUOTE".
Claiming that Unicode intends this character only for compatibility
between "left single quotation mark", "right single quotation mark",
and less common characters like "prime", and NOT for compatibility
between "apostrophe" and "right single quotation mark" is, well,
imaginative. Where do you get that notion?
---
And then there is this, which echoes the point I made that an
apostrophe _is not_ a closing quotation mark.
https://tedclancy.wordpress.com/2015/06/03/which-unicode-character-should-represent-the-english-apostrophe-and-why-the-unicode-committee-is-very-wrong/
(cited here, BTW: http://ilovetypography.com/2015/08/07/this-month-in-typography-6/)
Using U+2019 is inconsistent with the rest of the standard
----------------------------------------------------------
Earlier in section 6.2, the standard explains the difference
between punctuation marks and modifier letters:
Punctuation marks generally break words; modifier letters
generally are considered part of a word. Consider any English
word with an apostrophe, e.g. “don’t”.
The word “don’t” is a single word. It is not the word “don”
juxtaposed against the word “t”. The apostrophe is part of the
word, which, in Unicode-speak, means it’s a modifier letter,
not a punctuation mark, regardless of what colloquial English
calls it.
According to the Unicode character database, U+2019 is a
punctuation mark (General Category = Pf), while U+02BC is a
modifier letter (General Category = Lm). Since English
apostrophes are part of the words they’re in, they are
modifier letters, and hence should be represented by U+02BC,
not U+2019.
And this, which makes a somewhat different argument:
https://www.mail-archive.com/unicode@unicode.org/msg35871.html
It refers to the previous argument thus:
Were there no modifier letters at all, Unicode had have to
introduce an apostrophe character, because an apostrophe is
not at all the same as a quotation mark and does not work the
same way neither. By handling text, not theories, Ted Clancy
at Mozilla clearly shows us that ambiguating the apostrophe
with a close-quote brings up counterproductive complications
that impact severely the productivity of the users.
Reply: https://www.mail-archive.com/unicode@unicode.org/msg35851.html
And this URL provides a history of the move from U+02BC to U+0219:
http://charupdate.info/#ambiguation
It points out that this move was so odd that it required the
invention of the word "ambiguation" to cover the confusion.
The same article suggests that the Unicode Consortium itself
"is not at ease with the new preference".
A search in the Mail Archives shows why the apostrophe and the
single close quote were ambiguated—a process that needs even a
new word to put on it, as ordinarily everybody works for
disambiguation. It was for simplification's sake, in word
processing software.
Simplification for word-processing software! Aka MS Word and
its notorious misuse of _left_ single quotation mark for things
like "‘Tis the season" (it should be "’Tis"):
The phenomenon called “the Apostrophe Catastrophe” consists in
a huge number of instances where text processing software (word
processor, desktop publishing) inserts an open quote instead of
a leading apostrophe.
Interestingly, a similar discussion surrounds the use of hyphen:
https://www.mail-archive.com/unicode@unicode.org/msg35852.html
But luckily, the miscategorisation of U+2010 hasn't led to any
pressing practical problems, unlike the misuse of U+2019 for the
apostrophe.
This discussion, BTW, is from _2015_, 16 years after the Unicode
decision to switch from using U+02BC to using U+0219 as apostrophe.
Still problematic, it would seem. Certainly not cut-and-dried.
---
To be clear, I am NOT arguing that _Emacs_ should use U+02BC
instead of U+0219 as apostrophe. I argue that Emacs should
(continue to) use U+0027 (ASCII apostrophe) as apostrophe (in its
own doc, *scratch* comments, and so on). Not because it is a
more genuine apostrophe but because it is much easier for users
(and programs) to work with.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 22:47 ` Drew Adams
@ 2015-12-27 23:45 ` Per Starbäck
2015-12-28 2:01 ` Drew Adams
2015-12-28 9:37 ` Paul Eggert
1 sibling, 1 reply; 62+ messages in thread
From: Per Starbäck @ 2015-12-27 23:45 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel@gnu.org
>> Yes, that is true, but not for compatibility between "apostrophe" and
>> "right single quotation mark" as that imagined argument continues in
>> your post, but for compatibility between "left single quotation mark"
>> and "right single quotation mark" as well as less common characters
>> like "prime".
>
> Huh? The Unicode _name_ of character U+0027 is... "APOSTROPHE".
> And the Unicode "old name" of it is "APOSTROPHE-QUOTE".
As I've already written, a lot confusion comes from the bad name the ascii '
has in Unicode. Avoid that confusion.
And yes, there are some people who think that the squiggle used as
apostrophe and as right-single-quotation should be seen as two
different characters depending on usage. There are arguments for and
against that, and you quote a lot of people who are for it, but how is
that relevant? Maybe I agree with the arguments, maybe I don't, and I
won't tell, because it doesn't matter. We are not going to create a
new emacs-reformed-unicode character set now, we are implementing
something that exists, and that very clearly says that U+2018 and
U+2019 are the preferred characters to use for English paired
quotation marks, and U+2019 is also the preferred character to use for
apostrophe.
> Claiming that Unicode intends this character only for compatibility
> between "left single quotation mark", "right single quotation mark",
> and less common characters like "prime", and NOT for compatibility
> between "apostrophe" and "right single quotation mark" is, well,
> imaginative. Where do you get that notion?
Just imagine that Unicode hasn't been reformed you want, but that
there is one character that is used both as apostrophe and right
single quotation mark. Not because it's The Right Way, but because
then you will be able to read and understand what I wrote.
> To be clear, I am NOT arguing that _Emacs_ should use U+02BC
> instead of U+0219 as apostrophe. I argue that Emacs should
> (continue to) use U+0027 (ASCII apostrophe) as apostrophe (in its
> own doc, *scratch* comments, and so on). Not because it is a
> more genuine apostrophe but because it is much easier for users
> (and programs) to work with.
I think mixing typewriter text and nice-looking text in the same
buffer is the worst option. A typographical hotchpotch jarring. It
would be same kind of error as using straight ascii "" for inner
quotes inside curly outside quotes or vice versa.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-27 23:45 ` Per Starbäck
@ 2015-12-28 2:01 ` Drew Adams
2015-12-28 5:51 ` Random832
` (2 more replies)
0 siblings, 3 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-28 2:01 UTC (permalink / raw)
To: Per Starbäck; +Cc: emacs-devel
> >> Yes, that is true, but not for compatibility between "apostrophe" and
> >> "right single quotation mark" as that imagined argument continues in
> >> your post, but for compatibility between "left single quotation mark"
> >> and "right single quotation mark" as well as less common characters
> >> like "prime".
> >
> > Huh? The Unicode _name_ of character U+0027 is... "APOSTROPHE".
> > And the Unicode "old name" of it is "APOSTROPHE-QUOTE".
>
> As I've already written, a lot confusion comes from the bad name
> the ascii ' has in Unicode. Avoid that confusion.
So as the only demonstration of your claim that this character is
not maintained in Unicode for compatibility between "apostrophe"
and "right single quotation mark", you offer the statement that
the name is wrong.
Sheesh. You know, Unicode names have been updated more than once.
How come no update here, if this character has nothing to do with
apostrophe and is only about quotation-mark compatibility?
Any evidence for your claim that ' is in Unicode only for
compatibility between "left single quotation mark" and "right
single quotation mark"? Do you think that is even the most
common use case for ' in old-fashioned plain text, whether
typewriter or computer? ", yes, but '? I don't think so.
> And yes, there are some people who think that the squiggle used as
> apostrophe and as right-single-quotation should be seen as two
> different characters depending on usage.
The basic argument is this: an apostrophe is not a quotation
mark; their purposes/uses are different. And this is being
revisited in 2015, a decade and a half after the choice was
chiseled in stone.
> We are not going to create a new emacs-reformed-unicode
> character set now
No one suggested otherwise. The question raised was whether
a right curly quote mark should be used in *scratch* as
apostrophe.
> we are implementing something that exists, and that very clearly
> says that ... U+2019 is also the preferred character to use for
> apostrophe.
Emacs has already implemented Unicode support. That is not
in question. Dunno what you think "we are implementing" now.
The *scratch* buffer text?
As Eli has said:
The Unicode recommendations should be taken with a grain of salt
when applying them to Emacs, especially for major modes which
aren't derived from Text mode. Unicode Standard is about
typesetting and displaying plain text, it says that much in many
places.
And as I said in a related vein:
Emacs should (continue to) use U+0027 (ASCII apostrophe) as
apostrophe (in its own doc, *scratch* comments, and so on).
Not because it is a more genuine apostrophe but because it is
is much easier for users (and programs) to work with.
Reading the recent controversy about the Unicode apostrophe
"preference" (which is not a recommendation, AFAIK) on the
Unicode mailing list points to even more problems with that
preference than I was aware of, for users of text processing
applications. Problems from bidi handling to inserting to
spell-checking to searching...
We certainly support the use of U+2019 any way someone wants to
use it. But that does not mean we must plaster it everwhere.
> > Claiming that Unicode intends this character only for compatibility
> > between "left single quotation mark", "right single quotation mark",
> > and less common characters like "prime", and NOT for compatibility
> > between "apostrophe" and "right single quotation mark" is, well,
> > imaginative. Where do you get that notion?
>
> Just imagine that Unicode hasn't been reformed you want, but that
> there is one character that is used both as apostrophe and right
> single quotation mark. Not because it's The Right Way, but because
> then you will be able to read and understand what I wrote.
Please, give me the benefit of the doubt that I am able to read
and understand what you wrote. There's no need for condescension.
I am not out to reform Unicode - that's a strawman. My purpose in
this thread is to argue that U+2019 is not the best apostrophe
choice for distributed-Emacs boilerplate text such as that used in
comments, because it is harder for users to deal with. The best
choice for that is U+0027 ('), plain old keyboard apostrophe.
I've stated clearly more than once that I support the Unicode
standard and am very glad that Emacs supports it. That does not
mean that Emacs should use U+2019 (’) as apostrophe character in
its boilerplate text.
> I think mixing typewriter text and nice-looking text in the same
> buffer is the worst option. A typographical hotchpotch jarring.
There again we disagree. But even if we didn't - even granting
your esthetic sensibility, the ease-of-use reason for plain ' far
outweighs it, for me.
And even for purely presentation and navigation purposes (i.e.,
no editing involved), as I mentioned, at least some technical doc
and publishing systems, including those of large organizations
with thousands of users, have deliberately opted for the simple
', because it is judged to be _easier on users_. Even to the
extent of using QA tools that correct unintended ’ to '.
A fortiori for a text editor and programming environment such
as Emacs.
It's not just because we _can_ insert ’ everywhere that we
must do so. It's a judgment call, and depends on the context
and use cases.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 2:01 ` Drew Adams
@ 2015-12-28 5:51 ` Random832
2015-12-28 10:09 ` Drew Adams
2015-12-28 6:05 ` Per Starbäck
2015-12-28 9:12 ` Nikolai Weibull
2 siblings, 1 reply; 62+ messages in thread
From: Random832 @ 2015-12-28 5:51 UTC (permalink / raw)
To: emacs-devel
Drew Adams writes:
> So as the only demonstration of your claim that this character is
> not maintained in Unicode for compatibility between "apostrophe"
> and "right single quotation mark", you offer the statement that
> the name is wrong.
Drew Adams writes:
> These are (should be) different animals and
> they need not always have the same glyphs.
As long as we're on the subject of whose claims are assertions without
evidence, can you produce a single example of a system that actually
supported using different glyphs for these (apart from the typewriter
glyph, which isn't typographically appropriate for anything), and what
those glyphs might have looked like?
And, lest we get off the subject, The reason not to use U+2019 or any
other non-ASCII character in the default scratch buffer text is because
the user may not be able to save it, not because the ASCII one is more
typographically or semantically appropriate.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 2:01 ` Drew Adams
2015-12-28 5:51 ` Random832
@ 2015-12-28 6:05 ` Per Starbäck
2015-12-28 10:13 ` Drew Adams
2015-12-28 9:12 ` Nikolai Weibull
2 siblings, 1 reply; 62+ messages in thread
From: Per Starbäck @ 2015-12-28 6:05 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel@gnu.org
>> And yes, there are some people who think that the squiggle used as
>> apostrophe and as right-single-quotation should be seen as two
>> different characters depending on usage.
>
> The basic argument is this: an apostrophe is not a quotation
> mark; their purposes/uses are different. [...]
You say that you are not trying to reform Unicode, so let's not go
over the arguments for
reforming it again and who has suggested that on what mailing lists.
Let's stick to what Unicode actually says. It is not unclear, and
"right single quote" and "punctuation apostrophe" are never seen as
different characters there. It is one character.
What Unicode character should be used for that character? That
depends. Unicode says "When text is set, U+2019 right single quotation
mark is preferred as apostrophe". But we are not setting text. Should
we stick to the overloaded ascii characters instead? There are
arguments for that (you mention some, like "easier to use"). There has
already been lots of discussion about this, and a decision has been
made to use the typographical characters in some places.
> No one suggested otherwise. The question raised was whether
> a right curly quote mark should be used in *scratch* as
> apostrophe.
I don't see how "... as apostrophe" is important here, since it is the
same character.
Then the question is just "Should curly quote marks be used there?" I
can see arguments for or against that, and am not entering that
discussion. I just want to make sure Emacs doesn't create its own
division between apostrophes and right single quotes and displays
texts where those look different.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 2:01 ` Drew Adams
2015-12-28 5:51 ` Random832
2015-12-28 6:05 ` Per Starbäck
@ 2015-12-28 9:12 ` Nikolai Weibull
2015-12-28 10:15 ` Drew Adams
2 siblings, 1 reply; 62+ messages in thread
From: Nikolai Weibull @ 2015-12-28 9:12 UTC (permalink / raw)
To: Drew Adams; +Cc: Per Starbäck, Emacs Developers
On Mon, Dec 28, 2015 at 3:01 AM, Drew Adams <drew.adams@oracle.com> wrote:
> Any evidence for your claim that ' is in Unicode only for
> compatibility between "left single quotation mark" and "right
> single quotation mark"? Do you think that is even the most
> common use case for ' in old-fashioned plain text, whether
> typewriter or computer? ", yes, but '? I don't think so.
Given the English’ propensity to still not use contractions, but to
use single quotation marks when quoting, it’s not as clear cut as you
suggest either.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 22:47 ` Drew Adams
2015-12-27 23:45 ` Per Starbäck
@ 2015-12-28 9:37 ` Paul Eggert
2015-12-28 10:16 ` Drew Adams
2015-12-28 16:31 ` Eli Zaretskii
1 sibling, 2 replies; 62+ messages in thread
From: Paul Eggert @ 2015-12-28 9:37 UTC (permalink / raw)
To: Drew Adams, Per Starbäck, emacs-devel
Drew Adams wrote:
> Emacs should
> (continue to) use U+0027 (ASCII apostrophe) as apostrophe (in its
> own doc
The Emacs source code largely does that already, so I assume you’re talking
about how the documentation is presented to the user. It would be a nontrivial
project to change Emacs in the way you suggest, as many of the punctuation
apostrophes are in *info* buffers and are generated by Texinfo, which has taken
the Unicode approach for consistency with longstanding English typographic
conventions. Traditionally there is no textual distinction between punctuation
apostrophes and right single quotation marks, just as there is no textual
distinction between (say) parenthetical and redactive dashes, even though in
both cases there are large semantic differences. English can be a messy
language, but there it is.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-28 5:51 ` Random832
@ 2015-12-28 10:09 ` Drew Adams
0 siblings, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-28 10:09 UTC (permalink / raw)
To: Random832, emacs-devel
> > So as the only demonstration of your claim that this character is
> > not maintained in Unicode for compatibility between "apostrophe"
> > and "right single quotation mark", you offer the statement that
> > the name is wrong.
>
> Drew Adams writes:
> > These are (should be) different animals and
> > they need not always have the same glyphs.
>
> As long as we're on the subject of whose claims are assertions without
> evidence, can you produce a single example of a system that actually
> supported using different glyphs for these (apart from the typewriter
> glyph, which isn't typographically appropriate for anything), and what
> those glyphs might have looked like?
I never made such a claim.
Not only have I not said that the glyphs need to be different
or have been different, I have explicitly said that the glyphs
can be the same even when the uses are different. They could
be (yes it's a choice) considered different characters based on
their different uses, and not on their different appearances.
What I have said is that an apostrophe is not a quotation mark.
They have different jobs. An apostrophe is used within a word.
Quotation marks are used around/between words.
Here is one linguist's interesting take, BTW: the apostrophe
is the 27th English letter!
The apostrophe is not a punctuation mark. It doesn't punctuate.
Punctuation marks are placed between units (sentences, clauses,
phrases, words, morphemes) to signal structure, boundaries, or
pauses. The apostrophe appears within words. It's a 27th letter
of the alphabet. This issue concerns spelling.
http://chronicle.com/blogs/linguafranca/2013/03/22/being-an-apostrophe/
and http://languagelog.ldc.upenn.edu/nll/?p=2664
> And, lest we get off the subject, The reason not to use U+2019 or any
> other non-ASCII character in the default scratch buffer text is because
> the user may not be able to save it, not because the ASCII one is more
> typographically or semantically appropriate.
That is part of the argument I made more generally for ' (U+0027):
ease of use by users of a text editor and programming environment.
I am not the one arguing that it should be used because it is
more beautiful (though I don't personally think it is less
beautiful, in the default fonts and the fonts I use). Relative
beauty was given as a reason only by those in favor of U+2019.
I've been pretty clear that the reason to use it is to make
life easier for users - in several ways.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-28 6:05 ` Per Starbäck
@ 2015-12-28 10:13 ` Drew Adams
0 siblings, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-28 10:13 UTC (permalink / raw)
To: Per Starbäck; +Cc: emacs-devel
> > No one suggested otherwise. The question raised was whether
> > a right curly quote mark should be used in *scratch* as
> > apostrophe.
>
> I don't see how "... as apostrophe" is important here, since it
> is the same character.
By "_as apostrophe_" I mean what I said at the outset: used in one of
the apostrophe use cases, which define apostrophe by function, not
by appearance. (There was never any question about whether a right
quotation mark should be used _as a quotation mark_. The question
is only whether it should also be used as apostrophe, in *scratch*.)
All of the use cases of an apostrophe are uses _within a word_.
1. Marking the omission of one or more letters of a word (contraction).
2. Marking possessive case (e.g., "Per's pet peeve").
3. Certain plurals.
(There are 6 apostrophe use cases altogether, in the Pullum article:
http://chronicle.com/blogs/linguafranca/2013/03/22/being-an-apostrophe/.)
Wikipedia gives the same 3 use cases (and it calls apostrophe
a punctuation mark, which some linguists do not).
https://en.wikipedia.org/wiki/Apostrophe
Whether considered punctuation or not, AFAICT linguists agree
that these in-word use cases are what make an apostrophe an
apostrophe - not its appearance. None of these are uses cases
for a quotation mark. Quotation marks are used outside words;
never within words.
This is the point (Wikipedia):
The apostrophe looks the same as a closing single quotation
mark, although they have different meanings.
I would say that the apostrophe _can_ look the same, and it
generally does. What is important is that the meaning is not
the same - an apostrophe is not a quotation mark, even when
they might look the same.
This is so regardless of whether Unicode has decided to "prefer"
the use of a single character for both meanings (apostrophe,
quotation mark).
(See also www.umich.edu/~jlawler/IELL-Punctuation.pdf, which
lists as separate punctuation marks, "single and double quotation
marks ‘ “ « » ” ’, and the apostrophe, or raised comma ’ ".
They are not the same mark, even when they look identical.)
> Then the question is just "Should curly quote marks be used there?"
> I can see arguments for or against that, and am not entering that
> discussion. I just want to make sure Emacs doesn't create its own
> division between apostrophes and right single quotes and displays
> texts where those look different.
It is perfectly proper, IMO, for an application to display a quotation
mark using one glyph and an apostrophe using another glyph, Unicode
"preferences" not withstanding. An apostrophe is not a quotation mark,
by function, even if Unicode prefers the use of the same character to
represent both. And Unicode does not preclude using different chars.
And even within the Unicode body there apparently has been and still
is disagreement over that stated "preference". (Although there is
agreement that U+0027 is not preferred. The disagreement is over
other Unicode apostrophe characters, not over ASCII apostrophe.)
And Emacs can decide for itself what it needs and wants. Emacs can
respect or ignore Unicode "preferred" use of a given character, based
on its own needs. And that shows no disrespect for the Unicode
standard and no lack of supporting it. An Emacs user is free to use
whatever Unicode characters s?he likes wherever s?he likes.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-28 9:12 ` Nikolai Weibull
@ 2015-12-28 10:15 ` Drew Adams
2015-12-28 14:59 ` Nikolai Weibull
0 siblings, 1 reply; 62+ messages in thread
From: Drew Adams @ 2015-12-28 10:15 UTC (permalink / raw)
To: Nikolai Weibull; +Cc: Per Starbäck, Emacs Developers
> > Any evidence for your claim that ' is in Unicode only for
> > compatibility between "left single quotation mark" and "right
> > single quotation mark"? Do you think that is even the most
> > common use case for ' in old-fashioned plain text, whether
> > typewriter or computer? ", yes, but '? I don't think so.
>
> Given the English’ propensity to still not use contractions, but to
> use single quotation marks when quoting, it’s not as clear cut as you
> suggest either.
Sorry, but I don't know what that means, or how it relates to
what you quoted from me. Is there a propensity in English not
to use contractions? Maybe in some academic writing. Not in
general, to my knowledge.
Are you arguing that you think that ' has been used mainly for
quotation and not as an apostrophe? Perhaps in Britain, where
single quotation marks are apparently used at the top level,
but not in the US, is my guess. I think you will find relatively
few uses of single quotation marks in American English, and
relatively many uses of an apostrophe. But it's not an important
point. And I never said it was "clear cut" - I said "I don't
think so", and I _asked_ what Per thought.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-28 9:37 ` Paul Eggert
@ 2015-12-28 10:16 ` Drew Adams
2015-12-29 7:05 ` Random832
2015-12-28 16:31 ` Eli Zaretskii
1 sibling, 1 reply; 62+ messages in thread
From: Drew Adams @ 2015-12-28 10:16 UTC (permalink / raw)
To: Paul Eggert, Per Starbäck, emacs-devel
> > Emacs should (continue to) use U+0027 (ASCII apostrophe) as apostrophe
> > (in its own doc, *scratch* comments, and so on).
>
> The Emacs source code largely does that already, so I assume you’re
> talking about how the documentation is presented to the user.
The question raised was use in *scratch*. But yes, I am talking about
source-code comments that users can see.
(And yes, _personally_, I would prefer that we make life easier for
users in Info and *Help* and *Messages* and ... too, if that were
still possible. But that's not what this discussion is about.)
> It would be a nontrivial project to change Emacs in the way you
> suggest, as many of the punctuation apostrophes are in *info* buffers
> and are generated by Texinfo...
No, I'm not suggesting that now - Texinfo is clearly a lost cause. ;-)
The question was raised about *scratch* and comments in code.
> Traditionally there is no textual distinction between punctuation
> apostrophes and right single quotation marks,
By "textual distinction" are you saying only that they traditionally
look the same? If so, I agree.
> ... even though in both cases there are large semantic differences.
Again, I agree. An apostrophe is not a quotation mark, even if
they look the same or similar.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 10:15 ` Drew Adams
@ 2015-12-28 14:59 ` Nikolai Weibull
2015-12-28 18:39 ` Drew Adams
2015-12-29 9:06 ` Alan Mackenzie
0 siblings, 2 replies; 62+ messages in thread
From: Nikolai Weibull @ 2015-12-28 14:59 UTC (permalink / raw)
To: Drew Adams; +Cc: Nikolai Weibull, Per Starbäck, Emacs Developers
On Mon, Dec 28, 2015 at 11:15 AM, Drew Adams <drew.adams@oracle.com> wrote:
>> > Any evidence for your claim that ' is in Unicode only for
>> > compatibility between "left single quotation mark" and "right
>> > single quotation mark"? Do you think that is even the most
>> > common use case for ' in old-fashioned plain text, whether
>> > typewriter or computer? ", yes, but '? I don't think so.
>>
>> Given the English’ propensity to still not use contractions, but to
>> use single quotation marks when quoting, it’s not as clear cut as you
>> suggest either.
>
> Sorry, but I don't know what that means, or how it relates to
> what you quoted from me. Is there a propensity in English not
> to use contractions? Maybe in some academic writing. Not in
> general, to my knowledge.
My point is that the English will use single quotes a lot more than
Americans, given that they use them for the first level of quoting.
As they also tend to shy away from contractions, in far more areas
than academic writing, the factor between using the symbol we’re
discussing as a quoting device and as a means of displaying
contractions is also different.
> Are you arguing that you think that ' has been used mainly for
> quotation and not as an apostrophe? Perhaps in Britain, where
> single quotation marks are apparently used at the top level,
> but not in the US, is my guess. I think you will find relatively
> few uses of single quotation marks in American English, and
> relatively many uses of an apostrophe. But it's not an important
> point.
I’m not arguing anything. I just wanted to point out that what you
said is not true across the board, even when considering the same
language.
> And I never said it was "clear cut" - I said "I don't
> think so", and I _asked_ what Per thought.
No, you asked, then you _told_ him what _you_ thought. There’s a
rather large difference between the two in how I, as a reader,
interpret what you wrote, so even if you intended to say what you said
you intended, that’s not how a reader would understand it.
I didn’t reply to create further reasons for argument in this thread,
so I’m sorry if that’s been the result. I think the point you’ve
raised in regard to U+2019 not being an especially well chosen
apostrophe is valid and that U+02BC was, perhaps, a better choice. In
the end, they went with what was easier for software current at the
time to handle, thus falling victim for the same sins that their
forebears did. That said, continuing to use the worst of the three
(U+0027) is not something that I agree with.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 9:37 ` Paul Eggert
2015-12-28 10:16 ` Drew Adams
@ 2015-12-28 16:31 ` Eli Zaretskii
1 sibling, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-28 16:31 UTC (permalink / raw)
To: Paul Eggert; +Cc: per, drew.adams, emacs-devel
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 28 Dec 2015 01:37:20 -0800
>
> (insert ";; אבגדה\n")
> ...
> Emacs already behaves slightly differently in major modes that
> derive from prog-mode,
>
> Sure, if one includes the characters that delimit a comment, as in that example. But characters within a comment are generally treated as text by Emacs, and this is a good thing. Users should not need to learn different rules for text-within-a-comment as opposed to other text.
I've obviously failed to drive the point home, because I thought I've
shown an example when _characters_ in a comment are treated by Emacs
not-exactly-like-any-text: we bend the UBA rules for characters in
program comments, but not for characters in text modes. That is not
triggered by the comment delimiters, it's triggered by the major mode
in effect. (The same will happen with strings in a program source.)
As for users having to learn different rules: they don't, not wrt
using quote characters. Users are free to use whatever characters
they feel like in the comments they write. It's what Emacs does that
we have been talking about, not users.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-28 14:59 ` Nikolai Weibull
@ 2015-12-28 18:39 ` Drew Adams
2015-12-29 9:06 ` Alan Mackenzie
1 sibling, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-28 18:39 UTC (permalink / raw)
To: Nikolai Weibull; +Cc: Per Starbäck, Emacs Developers
> >> > Any evidence for your claim that ' is in Unicode only for
> >> > compatibility between "left single quotation mark" and "right
> >> > single quotation mark"? Do you think that is even the most
> >> > common use case for ' in old-fashioned plain text, whether
> >> > typewriter or computer? ", yes, but '? I don't think so.
^^^^^^^^^^^^^^^^
> > And I never said it was "clear cut" - I said "I don't
> > think so", and I _asked_ what Per thought.
>
> No, you asked, then you _told_ him what _you_ thought.
Yes, what I _think_. And I still think so, even given your
welcome guess that it might not be so true for British writers
as for Americans. I never said or suggested that it was clear
cut or obvious. On this, so far, all has been only conjecture.
The real question (to Per) was whether there is evidence for
the claim that ' is part of Unicode only for quotation-mark
compatibility and not for compatibility between quotation
mark and apostrophe, in spite of its Unicode character name
(which is APOSTROPHE).
I added the subsidiary question about main past typewriter
and computer use because I do _think_ that ' has probably
been used more as apostrophe, which would tend to support
its incorporation into Unicode (with the name APOSTROPHE!)
for compatibility that includes apostrophe.
I would think that claiming that the name is a mistake or
inaccurate wrt the intention, would call for some support.
Even an exchange in a Unicode mailing list where someone
suggests that the name is misguided would offer support.
> There’s a rather large difference between the two in how I, as a reader,
> interpret what you wrote, so even if you intended to say what you said
> you intended, that’s not how a reader would understand it.
Do you see anywhere where I said or suggested that this
question is clear cut? I'm writing pretty fast, to keep
up with the friendly replies of several of you ;-), but I
don't think I ever suggested such a thing by what I've
written.
If I did, let me correct that impression by emphasizing:
It is my _guess_ that most of the occurrences of ' (ASCII
apostrophe) "in old-fashioned plain text, whether typewriter
or computer" are for uses as apostrophe and not as closing
quotation mark.
On this particular (not so important) question, we are all
just guessing, so far. (And BTW, I did not ask Per for
evidence wrt this extra question, but only what he thinks.)
> I didn’t reply to create further reasons for argument in
> this thread, so I’m sorry if that’s been the result.
The thread has been a bit contentious at times. Apparently
this is a hot button. I don't think anyone has tried to get
excited about the question, but arguments have not always
remained 100% cool and logical.
I'm not a linguist or a Unicode expert. As one Emacs user,
I support Eli's decision to use ' and not ’ in *scratch*.
I don't think that Emacs must necessarily follow what Unicode
"prefers" wrt using a given character as an apostrophe, but
it can make up its own mind, which can be context-dependent
and which should (hopefully) take Emacs's own needs as a text
editor and programming environment into consideration.
> I think the point you’ve
> raised in regard to U+2019 not being an especially well chosen
> apostrophe is valid and that U+02BC was, perhaps, a better choice. In
> the end, they went with what was easier for software current at the
> time to handle, thus falling victim for the same sins that their
> forebears did.
And yes, there are real problems with using U+02BC currently
(e.g., tool and font support), which make it not a good choice
for Emacs either. (I did not suggest that Emacs use it as such.)
> That said, continuing to use the worst of the three (U+0027)
> is not something that I agree with.
Maybe we can agree to disagree about that. I don't think it
is the worst in general - for Emacs, primarily because of its
much greater ease of use.
Reading about the problems of text-processing systems in dealing
with U+2019 for things like spelling (it is not a character that
is considered part of a word), and seeing the hoops that we are
now trying to jump through with Emacs to support search & replace
for it properly, does not make me confident that we should be
broadcasting it everywhere now as our apostrophe.
There is urgency to completely support Unicode for users. And
Emacs has pretty much done that. And it will be good to further
support Unicode by improving search and replace that involves
such characters.
But there should be no urgency to impose such characters on
users in contexts where we do not need to at present. And
*scratch* is a perfect example of such a context. (IMHO)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-27 3:44 ` Eli Zaretskii
2015-12-27 8:12 ` Nikolai Weibull
@ 2015-12-28 20:04 ` John Wiegley
2015-12-29 6:50 ` Richard Stallman
` (2 more replies)
1 sibling, 3 replies; 62+ messages in thread
From: John Wiegley @ 2015-12-28 20:04 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]
>>>>> Eli Zaretskii <eliz@gnu.org> writes:
> Feel free to commit this if you must, although it feels almost ridiculous to
> me. An apostrophe is just a character, we shouldn't develop a mania about
> characters like that.
I agree with Eli here.
In fact, the whole issue of `' vs. Unicode printing characters seems like a
horribly bad decision. The amount of development time, bugs, documentation
issues, and discussion, that has been spent on a minor typographical point
that I imagine few people actually care about, was a mistake. Rather than
solving any real problems, we've created a whole bunch of annoying new
problems for ourselves, few of which have obvious answers -- or else these
threads wouldn't grow so long every time.
`This' reads perfectly fine to me, and is what I prefer to see. But that ship
has proverbially sailed; otherwise, I'd suggest we reverse this whole debacle,
and go back to the original behavior before any more time is lost.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 20:04 ` John Wiegley
@ 2015-12-29 6:50 ` Richard Stallman
2015-12-29 16:55 ` John Wiegley
2015-12-29 15:05 ` Random832
[not found] ` <<n5u7gn$6vh$1@ger.gmane.org>
2 siblings, 1 reply; 62+ messages in thread
From: Richard Stallman @ 2015-12-29 6:50 UTC (permalink / raw)
To: John Wiegley; +Cc: eliz, eggert, Emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> `This' reads perfectly fine to me, and is what I prefer to see. But that ship
> has proverbially sailed; otherwise, I'd suggest we reverse this whole debacle,
> and go back to the original behavior before any more time is lost.
I think we should poll the users. We could set the default either
way depending on what the users prefer.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 10:16 ` Drew Adams
@ 2015-12-29 7:05 ` Random832
2015-12-29 8:01 ` Yuri Khan
` (2 more replies)
0 siblings, 3 replies; 62+ messages in thread
From: Random832 @ 2015-12-29 7:05 UTC (permalink / raw)
To: emacs-devel
Drew Adams <drew.adams@oracle.com> writes:
> (And yes, _personally_, I would prefer that we make life easier for
> users in Info and *Help* and *Messages* and ... too, if that were
> still possible. But that's not what this discussion is about.)
To take a step back and think about solving both problems... What
about a list of characters - to include, by default, quotation marks
and apostrophes and maybe a few other things like dashes - that are
allowed to be silently mapped to best-fit ASCII equivalents when
writing a buffer in a coding system that does not support them?
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 7:05 ` Random832
@ 2015-12-29 8:01 ` Yuri Khan
2015-12-29 14:38 ` Random832
` (2 more replies)
2015-12-29 15:55 ` Eli Zaretskii
2015-12-29 17:40 ` Drew Adams
2 siblings, 3 replies; 62+ messages in thread
From: Yuri Khan @ 2015-12-29 8:01 UTC (permalink / raw)
To: Random832; +Cc: Emacs developers
On Tue, Dec 29, 2015 at 1:05 PM, Random832 <random832@fastmail.com> wrote:
> To take a step back and think about solving both problems... What
> about a list of characters - to include, by default, quotation marks
> and apostrophes and maybe a few other things like dashes - that are
> allowed to be silently mapped to best-fit ASCII equivalents when
> writing a buffer in a coding system that does not support them?
Bad idea. If the user fixes typography in a file and forgets to change
the target encoding before saving, they risk losing their work.
Emacs could opt to auto-degrade its generated text when initializing
the *scratch* buffer when its coding system cannot represent curly
quotes, but that’s the limit.
<contentious suggestion>
Alternatively, new buffers could be initially created with the UTF-8
encoding, leaving the locale-specified encoding only for dealing with
legacy files.
</contentious suggestion>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 14:59 ` Nikolai Weibull
2015-12-28 18:39 ` Drew Adams
@ 2015-12-29 9:06 ` Alan Mackenzie
1 sibling, 0 replies; 62+ messages in thread
From: Alan Mackenzie @ 2015-12-29 9:06 UTC (permalink / raw)
To: Nikolai Weibull; +Cc: emacs-tangents, Per Starbäck, Drew Adams
[ redirected to emacs-tangents ]
Hello, Nikolai
On Mon, Dec 28, 2015 at 03:59:00PM +0100, Nikolai Weibull wrote:
> My point is that the English will use single quotes a lot more than
> Americans, given that they use them for the first level of quoting.
Well, I'm not quite English, but I grew up in England and was educated
there. I find myself using double quotes at the first level, always.
> As they also tend to shy away from contractions, in far more areas
> than academic writing, ....
I'm, you're, she's, we're, you're, they're, I've, you've, it's, we've,
they've, can't, won't, wouldn't, couldn't, amn't, aren't, isn't,
weren't, I'll've, you'll've, .... are part of my normal working
vocabulary. Where do you get the idea that the English avoid
contractions? (That's a genuine question, not a rhetorical one.)
[ .... ]
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 8:01 ` Yuri Khan
@ 2015-12-29 14:38 ` Random832
2015-12-29 15:58 ` Eli Zaretskii
2015-12-29 17:05 ` Paul Eggert
2 siblings, 0 replies; 62+ messages in thread
From: Random832 @ 2015-12-29 14:38 UTC (permalink / raw)
To: Yuri Khan; +Cc: Emacs developers
On Tue, Dec 29, 2015, at 03:01, Yuri Khan wrote:
> On Tue, Dec 29, 2015 at 1:05 PM, Random832 <random832@fastmail.com>
> wrote:
>
> > To take a step back and think about solving both problems... What
> > about a list of characters - to include, by default, quotation marks
> > and apostrophes and maybe a few other things like dashes - that are
> > allowed to be silently mapped to best-fit ASCII equivalents when
> > writing a buffer in a coding system that does not support them?
>
> Bad idea. If the user fixes typography in a file and forgets to change
> the target encoding before saving, they risk losing their work.
Okay, how about, it prints an echo area message (but no confirmation
question, so no workflow interruption), and does not actually alter the
buffer (so they can save again after fixing the encoding, without having
lost anything)?
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-28 20:04 ` John Wiegley
2015-12-29 6:50 ` Richard Stallman
@ 2015-12-29 15:05 ` Random832
2015-12-29 16:49 ` John Wiegley
[not found] ` <<n5u7gn$6vh$1@ger.gmane.org>
2 siblings, 1 reply; 62+ messages in thread
From: Random832 @ 2015-12-29 15:05 UTC (permalink / raw)
To: emacs-devel
On 2015-12-28, John Wiegley <jwiegley@gmail.com> wrote:
> `This' reads perfectly fine to me, and is what I prefer to see.
What does `This' actually look like to you? Please post a screenshot.
Or don't, because I can guess; I'm just emphasizing the importance of
considering that these characters don't look the same to everyone.
And most users, today, use fonts in which `This' does _not_ look fine,
and 'This' looks much better. The closest I've ever seen were PC
console fonts, which still had ` placed higher vertically, even though
they otherwise looked like paired quotes.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 7:05 ` Random832
2015-12-29 8:01 ` Yuri Khan
@ 2015-12-29 15:55 ` Eli Zaretskii
2015-12-29 17:40 ` Drew Adams
2 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-29 15:55 UTC (permalink / raw)
To: Random832; +Cc: emacs-devel
> From: Random832 <random832@fastmail.com>
> Date: Tue, 29 Dec 2015 02:05:55 -0500
>
> To take a step back and think about solving both problems... What
> about a list of characters - to include, by default, quotation marks
> and apostrophes and maybe a few other things like dashes - that are
> allowed to be silently mapped to best-fit ASCII equivalents when
> writing a buffer in a coding system that does not support them?
That could only be an optional feature, off by default. Emacs
should never change the text behind user's back, except when
explicitly asked to do so.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 8:01 ` Yuri Khan
2015-12-29 14:38 ` Random832
@ 2015-12-29 15:58 ` Eli Zaretskii
2015-12-29 17:05 ` Paul Eggert
2 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-29 15:58 UTC (permalink / raw)
To: Yuri Khan; +Cc: random832, emacs-devel
> From: Yuri Khan <yuri.v.khan@gmail.com>
> Date: Tue, 29 Dec 2015 14:01:48 +0600
> Cc: Emacs developers <emacs-devel@gnu.org>
>
> <contentious suggestion>
> Alternatively, new buffers could be initially created with the UTF-8
> encoding, leaving the locale-specified encoding only for dealing with
> legacy files.
> </contentious suggestion>
IMO, this is a non-starter. It took us a lot of time and errors to
arrive at the current defaults in encoding and decoding text in
various situations; we should only change those defaults if users
complain (which would mean UTF-8 completed conquering the world). We
are not there yet.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 15:05 ` Random832
@ 2015-12-29 16:49 ` John Wiegley
0 siblings, 0 replies; 62+ messages in thread
From: John Wiegley @ 2015-12-29 16:49 UTC (permalink / raw)
To: Random832; +Cc: emacs-devel
>>>>> Random832 <random832@fastmail.com> writes:
> What does `This' actually look like to you? Please post a screenshot. Or
> don't, because I can guess; I'm just emphasizing the importance of
> considering that these characters don't look the same to everyone.
In printed text, it looks horrible. But in *Help* documentation, it actually
emphasizes to me that a symbol name is being indicated, rather than a normal
word that's being single-quoted. That's why I like it; aesthetics have nothing
to do with my preference.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 6:50 ` Richard Stallman
@ 2015-12-29 16:55 ` John Wiegley
2015-12-29 17:30 ` Paul Eggert
0 siblings, 1 reply; 62+ messages in thread
From: John Wiegley @ 2015-12-29 16:55 UTC (permalink / raw)
To: Richard Stallman; +Cc: eliz, eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]
>>>>> Richard Stallman <rms@gnu.org> writes:
>> `This' reads perfectly fine to me, and is what I prefer to see. But that
>> ship has proverbially sailed; otherwise, I'd suggest we reverse this whole
>> debacle, and go back to the original behavior before any more time is lost.
> I think we should poll the users. We could set the default either way
> depending on what the users prefer.
The problem isn't the default: it's having to consider all the ramifications
of this change, which continue to pop up in unexpected places. Even if we turn
it off by default, we'll still be fielding bug reports from those who turn it
on.
Since I wasn't present for this change, I'd like to know:
a. How much work is left until this feature is truly "complete"? One
indicator that it's not yet is how many times it's been thought to be so.
b. How much work would it be to revert the whole thing and go back to using
`foo' for symbol quoting?
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 8:01 ` Yuri Khan
2015-12-29 14:38 ` Random832
2015-12-29 15:58 ` Eli Zaretskii
@ 2015-12-29 17:05 ` Paul Eggert
2015-12-29 18:00 ` Drew Adams
2015-12-29 18:16 ` Eli Zaretskii
2 siblings, 2 replies; 62+ messages in thread
From: Paul Eggert @ 2015-12-29 17:05 UTC (permalink / raw)
To: Yuri Khan, Random832; +Cc: Emacs developers
Yuri Khan wrote:
> <contentious suggestion>
> Alternatively, new buffers could be initially created with the UTF-8
> encoding, leaving the locale-specified encoding only for dealing with
> legacy files.
> </contentious suggestion>
It would be reasonable to add an option to Emacs to behave that way, though we
should probably not make this behavior the default (at least, not without having
more experience in its use).
As for UTF-8 “conquering the world”; obviously it hasn’t done that yet, and
Emacs will need to support legacy encodings for quite some time. That being
said, recent surveys from W3Techs continues to show steady growth for UTF-8 on
the Web, with 86% of websites currently using UTF-8 (up from 82% a year ago).
All other encodings are declining in use, or at best remaining constant to
within the stated precision of measurement (e.g., EUC-KR at 0.4%).
http://w3techs.com/technologies/history_overview/character_encoding
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 16:55 ` John Wiegley
@ 2015-12-29 17:30 ` Paul Eggert
2015-12-29 18:18 ` Drew Adams
2016-01-01 13:29 ` Marcin Borkowski
0 siblings, 2 replies; 62+ messages in thread
From: Paul Eggert @ 2015-12-29 17:30 UTC (permalink / raw)
To: Emacs-devel
John Wiegley wrote:
> in *Help* documentation, it actually
> emphasizes to me that a symbol name is being indicated, rather than a normal
> word that's being single-quoted.
In practice that is not an advantage, since Emacs uses American English with the
convention that single-quoted words are code, so it is easy to distinguish
single-quoted symbol names from American-style double-quoted normal words,
regardless of whether quotes are straight or curved. This is a common convention
in other GNU programs and in GNU documentation.
> a. How much work is left until this feature is truly "complete"? One
> indicator that it's not yet is how many times it's been thought to be so.
Emacs has never been “complete”. That being said, the current implementation
seems to be at a reasonable sweet spot.
> b. How much work would it be to revert the whole thing and go back to using
> `foo' for symbol quoting?
It would take quite a bit of work to do that.
(set text-quoting-style 'grave) in your .emacs will give you much of the
behavior you want, and I recommend it for users who prefer quoting the
old-fashioned way. This does not suffice to alter *info* buffers, though, as
they are generated by Texinfo. In general, this issue is bigger than just Emacs,
the rest of the GNU world has mostly moved on, and Emacs has been lagging behind.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-29 7:05 ` Random832
2015-12-29 8:01 ` Yuri Khan
2015-12-29 15:55 ` Eli Zaretskii
@ 2015-12-29 17:40 ` Drew Adams
2 siblings, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-29 17:40 UTC (permalink / raw)
To: Random832, emacs-devel
> > (And yes, _personally_, I would prefer that we make life easier for
> > users in Info and *Help* and *Messages* and ... too, if that were
> > still possible. But that's not what this discussion is about.)
>
> To take a step back and think about solving both problems... What
> about a list of characters - to include, by default, quotation marks
> and apostrophes and maybe a few other things like dashes - that are
> allowed to be silently mapped to best-fit ASCII equivalents when
> writing a buffer in a coding system that does not support them?
I don't have an opinion on that proposal, now.
But I will say that the reason for my (personal) opinion about
such buffers is not limited to saving. ` and ' are simply far
easier for users to use - by nearly any definition of "use",
not just saving.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
[not found] ` <<n5u7gn$6vh$1@ger.gmane.org>
@ 2015-12-29 17:46 ` Drew Adams
0 siblings, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-29 17:46 UTC (permalink / raw)
To: Random832, emacs-devel
> What does `This' actually look like to you? Please post a screenshot.
> Or don't, because I can guess; I'm just emphasizing the importance of
> considering that these characters don't look the same to everyone.
The same is true for curly quotes (or nearly any other chars).
Different fonts can, and often do, show the same char differently.
Add to that the fact that different users have different needs.
Add to that the fact that different users have different notions
of beauty or accessibility or visual distinction, or different
perceptions.
Some will prefer one, for some contexts. Some will prefer
another, for some contexts. Some won't care.
> And most users, today, use fonts in which `This' does _not_ look fine,
> and 'This' looks much better.
Define "look fine". Define "looks much better".
Specify the tradeoffs between your "look fine/much better"
other conditions of usability.
The issue is not at all as simple as personal appreciation
of whether this or that "looks fine" (OK) or "looks much
better."
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-29 17:05 ` Paul Eggert
@ 2015-12-29 18:00 ` Drew Adams
2015-12-29 18:16 ` Eli Zaretskii
1 sibling, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-29 18:00 UTC (permalink / raw)
To: Paul Eggert, Yuri Khan, Random832; +Cc: Emacs developers
> As for UTF-8 “conquering the world”; obviously it hasn’t done that yet, and
> Emacs will need to support legacy encodings for quite some time. That being
> said, recent surveys from W3Techs continues to show steady growth for UTF-8
> on the Web, with 86% of websites currently using UTF-8 (up from 82% a year
> ago). All other encodings are declining in use, or at best remaining
> constant to within the stated precision of measurement (e.g., EUC-KR at 0.4%).
>
> http://w3techs.com/technologies/history_overview/character_encoding
It's not just about what proportion of websites use UTF-8.
Emacs is not a website. We are not redesigning a website.
There are plenty of problems that other _editing_ applications
have with things like Unicode apostrophe/quotation characters.
Such problems promise to be at least as great for Emacs, where
"editing" means so much more than it does for something like
MS Word.
Most importantly, there is _no hurry_ to force Unicode
apostrophes and other problematic choices on Emacs users.
Emacs already supports the use of Unicode fully by _users_.
User choice is primary in Emacs. Trying to nail down now,
once and for all, as a design-time decision, is a crazy
form of premature conformance - overengineering.
_Wait and see_ more about how Emacs users use Unicode and
what they want and expect.
_Wait and see_ more about how other editing systems deal
with the problematic issues, such as handling of spelling,
search, etc. wrt things like apostrophes.
There is _no hurry_ to cast this stuff in bronze.
Unicode is great. But there is no need to go over the
kool-aid top on this regrooving.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 17:05 ` Paul Eggert
2015-12-29 18:00 ` Drew Adams
@ 2015-12-29 18:16 ` Eli Zaretskii
2015-12-29 19:24 ` Paul Eggert
1 sibling, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2015-12-29 18:16 UTC (permalink / raw)
To: Paul Eggert; +Cc: random832, emacs-devel, yuri.v.khan
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 29 Dec 2015 09:05:10 -0800
> Cc: Emacs developers <emacs-devel@gnu.org>
>
> recent surveys from W3Techs continues to show steady growth for UTF-8 on
> the Web, with 86% of websites currently using UTF-8
On the Web, yes. But that's not necessarily the situation with local
files.
^ permalink raw reply [flat|nested] 62+ messages in thread
* RE: ASCII-only startup message?
2015-12-29 17:30 ` Paul Eggert
@ 2015-12-29 18:18 ` Drew Adams
2016-01-01 13:29 ` Marcin Borkowski
1 sibling, 0 replies; 62+ messages in thread
From: Drew Adams @ 2015-12-29 18:18 UTC (permalink / raw)
To: Paul Eggert, Emacs-devel
> > b. How much work would it be to revert the whole thing and
> > go back to using `foo' for symbol quoting?
>
> It would take quite a bit of work to do that.
The lesson - the moral of the story - is that Emacs Dev should
not have let you shepherd Emacs off the cliff so precipitously.
"Easy does it" should have been the approach. And with continual
interaction with, and feedback from, users, as any changes were
introduced.
There _was_ some pushback from users (who follow emacs-devel) as
you went full-steam ahead. But we were explicitly told to hold
off; that this was all just an "experiment"; and that we could
express ourselves afterward. Well,... Now the "experiment" is
a fait accompli. Congratulations. Emacs Dev has a lesson to
learn here, however (IMHO).
> Emacs has been lagging behind.
This is precisely the mantra that lured Emacs Dev into giving
you the green light, Paul, or at least into taking an attitude
of let's-wait-and-see-how-the-experiment-goes. Going, going,
gone. But still going. Out with the old! In with the new!
Follow me, forward and onward...
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 18:16 ` Eli Zaretskii
@ 2015-12-29 19:24 ` Paul Eggert
0 siblings, 0 replies; 62+ messages in thread
From: Paul Eggert @ 2015-12-29 19:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: random832, emacs-devel, yuri.v.khan
Eli Zaretskii wrote:
>> recent surveys from W3Techs continues to show steady growth for UTF-8 on
>> >the Web, with 86% of websites currently using UTF-8
> On the Web, yes. But that's not necessarily the situation with local
> files.
Yes, of course. It is not as easy to survey local files. That being said, in my
experience the natural tendency is for files to have the same encoding locally
that they do on the Web, and for Unicode to grow in popularity in local files
too. You can see this tendency acting in the Emacs source code itself over time,
and in the source code for other GNU projects. Although Emacs will need to
support non-UTF-8 encodings for quite some time the overall trend toward UTF-8
is clear, and it would be reasonable to add an option to Emacs to prefer UTF-8
in new files in every locale.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2015-12-29 17:30 ` Paul Eggert
2015-12-29 18:18 ` Drew Adams
@ 2016-01-01 13:29 ` Marcin Borkowski
2016-01-01 17:48 ` John Wiegley
1 sibling, 1 reply; 62+ messages in thread
From: Marcin Borkowski @ 2016-01-01 13:29 UTC (permalink / raw)
To: Emacs-devel
On 2015-12-29, at 18:30, Paul Eggert <eggert@cs.ucla.edu> wrote:
> [...] Emacs has been lagging behind.
I don't want to revive the old, old, old thread. Personally, I don't
care too much, provided I can isearch for them. (Though I did prefer
ASCII quotes slightly.) I also think that the amount of time and effort
wasted was horrible. But this does not belong here and now.
I only want to point out that an argument like "___ has been lagging
behind" is so lousy that it's not even an argument. History knows quite
a bunch of situations when apparent "progress" was something so
abhorrent that those who were "lagging behind" are the only ones that do
not need to be ashamed.
Best wishes for 2016,
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2016-01-01 13:29 ` Marcin Borkowski
@ 2016-01-01 17:48 ` John Wiegley
2016-01-01 17:50 ` John Wiegley
2016-01-02 8:14 ` Paul Eggert
0 siblings, 2 replies; 62+ messages in thread
From: John Wiegley @ 2016-01-01 17:48 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: Emacs-devel
>>>>> Marcin Borkowski <mbork@mbork.pl> writes:
> I only want to point out that an argument like "___ has been lagging behind"
> is so lousy that it's not even an argument. History knows quite a bunch of
> situations when apparent "progress" was something so abhorrent that those
> who were "lagging behind" are the only ones that do not need to be ashamed.
I want to say that, regardless of whether this comment pertains to the current
discussion or not (I don't know enough to say, really), in general there is
genuine truth here that should not be ignored. What is "shiny and new" can
sometimes be quite meriticious, and is not realized as such until after the
excitement has passed.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2016-01-01 17:48 ` John Wiegley
@ 2016-01-01 17:50 ` John Wiegley
2016-01-02 8:14 ` Paul Eggert
1 sibling, 0 replies; 62+ messages in thread
From: John Wiegley @ 2016-01-01 17:50 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: Emacs-devel
>>>>> John Wiegley <jwiegley@gmail.com> writes:
> sometimes be quite meriticious, and is not realized as such until after the
Argh, spell check not knowing words... meretricious! "apparently attractive
but having in reality no value or integrity".
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: ASCII-only startup message?
2016-01-01 17:48 ` John Wiegley
2016-01-01 17:50 ` John Wiegley
@ 2016-01-02 8:14 ` Paul Eggert
1 sibling, 0 replies; 62+ messages in thread
From: Paul Eggert @ 2016-01-02 8:14 UTC (permalink / raw)
To: Emacs-devel
John Wiegley wrote:
> What is "shiny and new" can
> sometimes be quite meretricious, and is not realized as such until after the
> excitement has passed.
Quite true. Conversely, I can remember folks telling me that plain '\n' as a
line terminator was a newfangled idea and probably wouldn’t catch on. The good
old days!
For what it’s worth here are Emacs’s opinions of the encodings for the files in
emacs-25. It has 453 UTF-8 files, and 37 files with some other text encoding
that is not ASCII.
#files encoding example file
1546 undecided-unix INSTALL
1394 prefer-utf-8-unix lisp/abbrev.el
444 utf-8-unix INSTALL.REPO
279 no-conversion etc/images/undo.pbm
14 iso-2022-7bit-unix etc/HELLO
9 chinese-big5-unix leim/CXTERM-DIC/ZOZY.tit
8 undecided-dos nt/configure.bat
7 chinese-iso-8bit-unix CXTERM-DIC/QJ.tit
7 utf-8-emacs-unix lisp/language/tibetan.el
2 utf-8-dos test/etags/html-src/algrthms.html
1 chinese-big5-dos leim/MISC-DIC/cangjie-table.b5
1 chinese-iso-8bit-dos leim/MISC-DIC/pinyin.map
1 cp850-unix src/msdos.c
1 iso-2022-cn-ext-dos leim/MISC-DIC/cangjie-table.cns
1 iso-2022-jp-unix etc/tutorials/TUTORIAL.ja
1 japanese-iso-8bit-unix leim/SKK-DIC/SKK-JISYO.L
1 japanese-shift-jis-unix admin/charsets/mapfiles/cns2ucsdkw.txt
^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2016-01-02 8:14 UTC | newest]
Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-26 17:25 ASCII-only startup message? Paul Eggert
2015-12-26 18:16 ` Eli Zaretskii
2015-12-26 18:41 ` Random832
2015-12-26 18:50 ` Paul Eggert
2015-12-26 19:11 ` Eli Zaretskii
2015-12-26 19:01 ` Eli Zaretskii
2015-12-26 18:45 ` Paul Eggert
2015-12-26 19:10 ` Eli Zaretskii
2015-12-26 19:40 ` Paul Eggert
2015-12-26 20:50 ` Eli Zaretskii
2015-12-26 23:28 ` Paul Eggert
2015-12-27 0:17 ` Drew Adams
2015-12-27 1:03 ` Clément Pit--Claudel
2015-12-27 2:51 ` Drew Adams
2015-12-27 1:09 ` Paul Eggert
2015-12-27 15:56 ` Eli Zaretskii
2015-12-27 18:45 ` Paul Eggert
2015-12-27 6:58 ` Random832
2015-12-27 14:17 ` Per Starbäck
2015-12-27 14:55 ` Drew Adams
2015-12-27 16:35 ` Per Starbäck
2015-12-27 17:42 ` Drew Adams
2015-12-27 19:27 ` Per Starbäck
2015-12-27 22:47 ` Drew Adams
2015-12-27 23:45 ` Per Starbäck
2015-12-28 2:01 ` Drew Adams
2015-12-28 5:51 ` Random832
2015-12-28 10:09 ` Drew Adams
2015-12-28 6:05 ` Per Starbäck
2015-12-28 10:13 ` Drew Adams
2015-12-28 9:12 ` Nikolai Weibull
2015-12-28 10:15 ` Drew Adams
2015-12-28 14:59 ` Nikolai Weibull
2015-12-28 18:39 ` Drew Adams
2015-12-29 9:06 ` Alan Mackenzie
2015-12-28 9:37 ` Paul Eggert
2015-12-28 10:16 ` Drew Adams
2015-12-29 7:05 ` Random832
2015-12-29 8:01 ` Yuri Khan
2015-12-29 14:38 ` Random832
2015-12-29 15:58 ` Eli Zaretskii
2015-12-29 17:05 ` Paul Eggert
2015-12-29 18:00 ` Drew Adams
2015-12-29 18:16 ` Eli Zaretskii
2015-12-29 19:24 ` Paul Eggert
2015-12-29 15:55 ` Eli Zaretskii
2015-12-29 17:40 ` Drew Adams
2015-12-28 16:31 ` Eli Zaretskii
2015-12-27 3:44 ` Eli Zaretskii
2015-12-27 8:12 ` Nikolai Weibull
2015-12-28 20:04 ` John Wiegley
2015-12-29 6:50 ` Richard Stallman
2015-12-29 16:55 ` John Wiegley
2015-12-29 17:30 ` Paul Eggert
2015-12-29 18:18 ` Drew Adams
2016-01-01 13:29 ` Marcin Borkowski
2016-01-01 17:48 ` John Wiegley
2016-01-01 17:50 ` John Wiegley
2016-01-02 8:14 ` Paul Eggert
2015-12-29 15:05 ` Random832
2015-12-29 16:49 ` John Wiegley
[not found] ` <<n5u7gn$6vh$1@ger.gmane.org>
2015-12-29 17:46 ` Drew Adams
[not found] <<567ECD8C.1070408@cs.ucla.edu>
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.