unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Disambiguate modeline character for UTF-8?
@ 2020-08-23 11:46 Ulrich Mueller
  2020-08-23 15:27 ` Stefan Monnier
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2020-08-23 11:46 UTC (permalink / raw)
  To: emacs-devel

Presumably UTF-8 is the most popular coding system today. Nevertheless,
it shares its mnemonic character displayed in the modeline with several
others (while legacy codings like ISO-8859-1 have their unique char):

   U -- utf-8*  (all variants)
   U -- utf-16* (all variants)
   U -- utf-7
   u -- utf-7-imap
   U -- koi8-u

I wonder if this could be disambiguated, such that "U" would be used
exclusively for UTF-8 and its variants. For example, as follows:

   U -- utf-8*  (all variants)
   u -- utf-16* (all variants)
   m -- utf-7*  (Mnemonic: "m" for mail-safe or MIME)
   Y -- koi8-u  (Mnemonic: "Y" looks similar to 1st letter in "Українська")

WDYT?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 11:46 Disambiguate modeline character for UTF-8? Ulrich Mueller
@ 2020-08-23 15:27 ` Stefan Monnier
  2020-08-23 16:07   ` Eli Zaretskii
  0 siblings, 1 reply; 97+ messages in thread
From: Stefan Monnier @ 2020-08-23 15:27 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

> Presumably UTF-8 is the most popular coding system today. Nevertheless,
> it shares its mnemonic character displayed in the modeline with several
> others (while legacy codings like ISO-8859-1 have their unique char):
>
>    U -- utf-8*  (all variants)
>    U -- utf-16* (all variants)
>    U -- utf-7
>    u -- utf-7-imap
>    U -- koi8-u

Agreed.

> I wonder if this could be disambiguated, such that "U" would be used
> exclusively for UTF-8 and its variants. For example, as follows:
>
>    U -- utf-8*  (all variants)
>    u -- utf-16* (all variants)
>    m -- utf-7*  (Mnemonic: "m" for mail-safe or MIME)
>    Y -- koi8-u  (Mnemonic: "Y" looks similar to 1st letter in "Українська")
>
> WDYT?

Yay, bikeshedding ;-)

I don't see a strong reason to limit ourselves to a single char, FWIW,
so I think `u7` is fine for utf-7* (it should be very rare anyway).


        Stefan




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 15:27 ` Stefan Monnier
@ 2020-08-23 16:07   ` Eli Zaretskii
  2020-08-23 18:24     ` Paul Eggert
  2020-08-24 18:35     ` Juri Linkov
  0 siblings, 2 replies; 97+ messages in thread
From: Eli Zaretskii @ 2020-08-23 16:07 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ulm, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 23 Aug 2020 11:27:22 -0400
> Cc: emacs-devel@gnu.org
> 
> I don't see a strong reason to limit ourselves to a single char, FWIW,
> so I think `u7` is fine for utf-7* (it should be very rare anyway).

It must be a single character, but OTOH it doesn't have to be an ASCII
character.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 16:07   ` Eli Zaretskii
@ 2020-08-23 18:24     ` Paul Eggert
  2020-08-23 18:53       ` Ulrich Mueller
  2020-08-24 18:35     ` Juri Linkov
  1 sibling, 1 reply; 97+ messages in thread
From: Paul Eggert @ 2020-08-23 18:24 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: ulm, emacs-devel

On 8/23/20 9:07 AM, Eli Zaretskii wrote:
> it doesn't have to be an ASCII
> character.

OK, then how about this refinement of Ulrich's suggestion?

  U -- utf-8*  (all variants)
  W -- utf-16* (all variants)
  Ǔ -- utf-7                  U+01D3 LATIN CAPITAL LETTER U WITH CARON
  ǔ -- utf-7-imap             U+01D4 LATIN SMALL LETTER U WITH CARON
  У -- koi8-u                 U+0423 CYRILLIC CAPITAL LETTER U

W because it's double-U, and a caron because it looks like a 7 rotated.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 18:24     ` Paul Eggert
@ 2020-08-23 18:53       ` Ulrich Mueller
  2020-08-23 18:56         ` Eli Zaretskii
  2020-08-23 18:57         ` Eli Zaretskii
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2020-08-23 18:53 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, Stefan Monnier, emacs-devel

>>>>> On Sun, 23 Aug 2020, Paul Eggert wrote:

> On 8/23/20 9:07 AM, Eli Zaretskii wrote:
>> it doesn't have to be an ASCII character.

I really didn't want to open that can of worms. Now we can have endless
bikeshedding. :)

Also, shouldn't one be extra conservative for the characters displayed
in the modeline, as not all systems may be capable of displaying the
full unicode repertoire?

> OK, then how about this refinement of Ulrich's suggestion?

>  U -- utf-8*  (all variants)
>  W -- utf-16* (all variants)
>  Ǔ -- utf-7                  U+01D3 LATIN CAPITAL LETTER U WITH CARON
>  ǔ -- utf-7-imap             U+01D4 LATIN SMALL LETTER U WITH CARON
>  У -- koi8-u                 U+0423 CYRILLIC CAPITAL LETTER U

> W because it's double-U, and a caron because it looks like a 7 rotated.

W is already used for iso-latin-8 aka iso-8859-14.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 18:53       ` Ulrich Mueller
@ 2020-08-23 18:56         ` Eli Zaretskii
  2020-08-23 18:57         ` Eli Zaretskii
  1 sibling, 0 replies; 97+ messages in thread
From: Eli Zaretskii @ 2020-08-23 18:56 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: eggert, monnier, emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  Stefan Monnier
>  <monnier@iro.umontreal.ca>,  emacs-devel@gnu.org
> Date: Sun, 23 Aug 2020 20:53:36 +0200
> 
> >>>>> On Sun, 23 Aug 2020, Paul Eggert wrote:
> 
> > On 8/23/20 9:07 AM, Eli Zaretskii wrote:
> >> it doesn't have to be an ASCII character.
> 
> I really didn't want to open that can of worms. Now we can have endless
> bikeshedding. :)

It was you who started it.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 18:53       ` Ulrich Mueller
  2020-08-23 18:56         ` Eli Zaretskii
@ 2020-08-23 18:57         ` Eli Zaretskii
  2020-08-23 19:13           ` Ulrich Mueller
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2020-08-23 18:57 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: eggert, monnier, emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Date: Sun, 23 Aug 2020 20:53:36 +0200
> Cc: Eli Zaretskii <eliz@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>,
>  emacs-devel@gnu.org
> 
> Also, shouldn't one be extra conservative for the characters displayed
> in the modeline, as not all systems may be capable of displaying the
> full unicode repertoire?

I just said we could do it, I didn't say we should.  From my POV, we
could simply let these sleeping dogs lie, I very much doubt that many
users even look at these indicators.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 18:57         ` Eli Zaretskii
@ 2020-08-23 19:13           ` Ulrich Mueller
  2020-08-23 19:42             ` Eli Zaretskii
  2020-08-23 19:47             ` Stefan Kangas
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2020-08-23 19:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, monnier, emacs-devel

>>>>> On Sun, 23 Aug 2020, Eli Zaretskii wrote:

>> Also, shouldn't one be extra conservative for the characters
>> displayed in the modeline, as not all systems may be capable of
>> displaying the full unicode repertoire?

> I just said we could do it, I didn't say we should.  From my POV, we
> could simply let these sleeping dogs lie, I very much doubt that many
> users even look at these indicators.

I stumbled upon this when updating a short section about Emacs in the
Gentoo developer manual, where I realised that I cannot say that "-"
and "U" in the modeline indicate ASCII and UTF-8, respectively.

IMHO these two are the most important ones nowadays, so they should be
unique. I don't really care about the rest (maybe "1" is still somewhat
important here in Europe), and tried to change as little as possible in
my suggestion. Namely, only move the ones colliding with "U" out of the
way and otherwise stay with ASCII.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 19:13           ` Ulrich Mueller
@ 2020-08-23 19:42             ` Eli Zaretskii
  2020-08-23 21:23               ` Stefan Monnier
  2020-08-23 19:47             ` Stefan Kangas
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2020-08-23 19:42 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: eggert, monnier, emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: eggert@cs.ucla.edu,  monnier@iro.umontreal.ca,  emacs-devel@gnu.org
> Date: Sun, 23 Aug 2020 21:13:25 +0200
> 
> IMHO these two are the most important ones nowadays, so they should be
> unique. I don't really care about the rest (maybe "1" is still somewhat
> important here in Europe), and tried to change as little as possible in
> my suggestion. Namely, only move the ones colliding with "U" out of the
> way and otherwise stay with ASCII.

I'm asking whether this whole issue is important enough to trigger yet
another round of endless arguments and gratuitous changes for very
little gain.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 19:13           ` Ulrich Mueller
  2020-08-23 19:42             ` Eli Zaretskii
@ 2020-08-23 19:47             ` Stefan Kangas
  1 sibling, 0 replies; 97+ messages in thread
From: Stefan Kangas @ 2020-08-23 19:47 UTC (permalink / raw)
  To: Ulrich Mueller, Eli Zaretskii; +Cc: eggert, monnier, emacs-devel

Ulrich Mueller <ulm@gentoo.org> writes:

> I stumbled upon this when updating a short section about Emacs in the
> Gentoo developer manual, where I realised that I cannot say that "-"
> and "U" in the modeline indicate ASCII and UTF-8, respectively.
>
> IMHO these two are the most important ones nowadays, so they should be
> unique. I don't really care about the rest (maybe "1" is still somewhat
> important here in Europe), and tried to change as little as possible in
> my suggestion. Namely, only move the ones colliding with "U" out of the
> way and otherwise stay with ASCII.

Do we even need an indicator for UTF-8?  I find that I only need to know
when it's something else.  But maybe I'm missing something obvious.

Best regards,
Stefan Kangas



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 19:42             ` Eli Zaretskii
@ 2020-08-23 21:23               ` Stefan Monnier
  2020-08-24  7:06                 ` Ulrich Mueller
  0 siblings, 1 reply; 97+ messages in thread
From: Stefan Monnier @ 2020-08-23 21:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ulrich Mueller, eggert, emacs-devel

> I'm asking whether this whole issue is important enough to trigger yet
> another round of endless arguments and gratuitous changes for very
> little gain.

I would appreciate it if utf-16 and utf-7 (those werd things from which
I'd rather stay away) is made somehow different from utf-8.


        Stefan




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 21:23               ` Stefan Monnier
@ 2020-08-24  7:06                 ` Ulrich Mueller
  2020-08-24 14:30                   ` Yuri Khan
  2020-08-24 14:36                   ` Drew Adams
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2020-08-24  7:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, eggert, emacs-devel

>>>>> On Sun, 23 Aug 2020, Stefan Monnier wrote:

>> I'm asking whether this whole issue is important enough to trigger yet
>> another round of endless arguments and gratuitous changes for very
>> little gain.

> I would appreciate it if utf-16 and utf-7 (those werd things from which
> I'd rather stay away) is made somehow different from utf-8.

The smallest change to achieve this would be to change both utf-16 and
utf-7 from "U" to "u" (and koi8-u to "Y").



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-24  7:06                 ` Ulrich Mueller
@ 2020-08-24 14:30                   ` Yuri Khan
  2020-08-29 11:17                     ` Ulrich Mueller
  2020-08-24 14:36                   ` Drew Adams
  1 sibling, 1 reply; 97+ messages in thread
From: Yuri Khan @ 2020-08-24 14:30 UTC (permalink / raw)
  To: Ulrich Mueller
  Cc: Eli Zaretskii, Paul Eggert, Stefan Monnier, Emacs developers

On Mon, 24 Aug 2020 at 14:07, Ulrich Mueller <ulm@gentoo.org> wrote:

> The smallest change to achieve this would be to change both utf-16 and
> utf-7 from "U" to "u" (and koi8-u to "Y").

The letter Y has nothing in common with Ukraine (country) and
Ukrainian (language). Pretty much everyone who might be looking at a
file encoded in koi8-u will have fonts with Cyrillic coverage so using
the Cyrillic letter У should be better.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: Disambiguate modeline character for UTF-8?
  2020-08-24  7:06                 ` Ulrich Mueller
  2020-08-24 14:30                   ` Yuri Khan
@ 2020-08-24 14:36                   ` Drew Adams
  2020-08-24 15:23                     ` Ulrich Mueller
  1 sibling, 1 reply; 97+ messages in thread
From: Drew Adams @ 2020-08-24 14:36 UTC (permalink / raw)
  To: Ulrich Mueller, Stefan Monnier; +Cc: Eli Zaretskii, eggert, emacs-devel

> The smallest change to achieve this would be to change both utf-16 and
> utf-7 from "U" to "u" (and koi8-u to "Y").

Not really wanting to get into this particular
bike-shed discussion, as I don't care about it
and don't have a suggestion of what indicators
to use.

I'll just say this, as some have suggested that
one main thing they want is to be able to easily
and quickly tell whether the encoding is NOT
utf-8 (and not ASCII, presumably):

The characters "u" and "U" are not so easily
distinguished.  You might want to pick some
other, quite different looking, character for
the non-UTF-8 (i.e., UTF-16 etc.).



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-24 14:36                   ` Drew Adams
@ 2020-08-24 15:23                     ` Ulrich Mueller
  2020-08-24 16:43                       ` Stefan Monnier
  2023-07-05 10:08                       ` Ulrich Mueller
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2020-08-24 15:23 UTC (permalink / raw)
  To: Drew Adams; +Cc: Eli Zaretskii, eggert, Stefan Monnier, emacs-devel

>>>>> On Mon, 24 Aug 2020, Drew Adams wrote:

> I'll just say this, as some have suggested that
> one main thing they want is to be able to easily
> and quickly tell whether the encoding is NOT
> utf-8 (and not ASCII, presumably):

> The characters "u" and "U" are not so easily
> distinguished.  You might want to pick some
> other, quite different looking, character for
> the non-UTF-8 (i.e., UTF-16 etc.).

Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8?
This would be visually unobtrusive, so any uncommon coding system would
stand out against it.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-24 15:23                     ` Ulrich Mueller
@ 2020-08-24 16:43                       ` Stefan Monnier
  2023-07-05 10:08                       ` Ulrich Mueller
  1 sibling, 0 replies; 97+ messages in thread
From: Stefan Monnier @ 2020-08-24 16:43 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, Drew Adams, emacs-devel

> Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8?
> This would be visually unobtrusive, so any uncommon coding system would
> stand out against it.

U1 from me!


        Stefan ;-)




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-23 16:07   ` Eli Zaretskii
  2020-08-23 18:24     ` Paul Eggert
@ 2020-08-24 18:35     ` Juri Linkov
  2020-08-24 18:55       ` Eli Zaretskii
  1 sibling, 1 reply; 97+ messages in thread
From: Juri Linkov @ 2020-08-24 18:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ulm, Stefan Monnier, emacs-devel

>> I don't see a strong reason to limit ourselves to a single char, FWIW,
>> so I think `u7` is fine for utf-7* (it should be very rare anyway).
>
> It must be a single character, but OTOH it doesn't have to be an ASCII
> character.

I don't know where the requirement for a single character comes from,
but since I can't memorize these cryptic characters, I customized
the mode-line to display coding names in full, except a few characters
that I can remember: "U" for UTF-8, and "-" for ASCII:

;; This fix uses mnemonics only for known codings that are frequently used.
;; Otherwise, it displays the full name of the codings.
(setq-default mode-line-mule-info
              `(""
                (current-input-method
                 (:propertize ("" current-input-method-title)
                              local-map ,mode-line-input-method-map
                              mouse-face mode-line-highlight))
                (:eval
                 (propertize
                  (cond
                   ((not (memq buffer-file-coding-system
                               '(no-conversion
                                 undecided-unix
                                 prefer-utf-8-unix
                                 utf-8
                                 utf-8-dos
                                 utf-8-emacs
                                 utf-8-emacs-dos
                                 utf-8-emacs-unix
                                 utf-8-unix)))
                    (replace-regexp-in-string
                     "-\\(?:dos\\|unix\\)$" ""
                     (format "%S" buffer-file-coding-system)))
                   (t "%z"))
                  'help-echo 'mode-line-mule-info-help-echo
                  'mouse-face 'mode-line-highlight
                  'local-map mode-line-coding-system-map))
                (:eval (mode-line-eol-desc))))

A long coding string in the mode-line also serves as a warning that
a non-standard coding is used in the buffer.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-24 18:35     ` Juri Linkov
@ 2020-08-24 18:55       ` Eli Zaretskii
  2020-08-25 18:59         ` Juri Linkov
  0 siblings, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2020-08-24 18:55 UTC (permalink / raw)
  To: Juri Linkov; +Cc: ulm, monnier, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  ulm@gentoo.org,
>   emacs-devel@gnu.org
> Date: Mon, 24 Aug 2020 21:35:56 +0300
> 
> >> I don't see a strong reason to limit ourselves to a single char, FWIW,
> >> so I think `u7` is fine for utf-7* (it should be very rare anyway).
> >
> > It must be a single character, but OTOH it doesn't have to be an ASCII
> > character.
> 
> I don't know where the requirement for a single character comes from,

Look at the implementation of %z format on the mode line, and you will
see that it expects a single character.

> but since I can't memorize these cryptic characters, I customized
> the mode-line to display coding names in full, except a few characters
> that I can remember: "U" for UTF-8, and "-" for ASCII:

So on a TTY, you can have "UTF-8UTF-8UTF-8", if all the 3 encodings
are UTF-8?  Or do you only handle buffer-file-coding-system and ignore
the other 2 encodings?

> A long coding string in the mode-line also serves as a warning that
> a non-standard coding is used in the buffer.

It's okay to customize the mode line to your personal needs, but are
you really proposing this for a general-purpose feature in Emacs?
Because then we'd need to start by deciding what is "non-standard" in
this context.  For example, assuming the "standard" encoding is the
one determined by the locale, then if one lives in a non-UTF-8 locale,
they will always see "non-standard" strings in each and every .el file
they ever edit, which doesn't sound like a good idea to me.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-24 18:55       ` Eli Zaretskii
@ 2020-08-25 18:59         ` Juri Linkov
  2020-08-25 19:26           ` Eli Zaretskii
  0 siblings, 1 reply; 97+ messages in thread
From: Juri Linkov @ 2020-08-25 18:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ulm, monnier, emacs-devel

>> but since I can't memorize these cryptic characters, I customized
>> the mode-line to display coding names in full, except a few characters
>> that I can remember: "U" for UTF-8, and "-" for ASCII:
>
> So on a TTY, you can have "UTF-8UTF-8UTF-8", if all the 3 encodings
> are UTF-8?  Or do you only handle buffer-file-coding-system and ignore
> the other 2 encodings?

Currently the other 2 encodings are ignored on a TTY, but since
often all 3 encodings are the same, then maybe it would be enough
to display the full name for buffer-file-coding-system,
and mnemonics for the other 2 encodings.

>> A long coding string in the mode-line also serves as a warning that
>> a non-standard coding is used in the buffer.
>
> It's okay to customize the mode line to your personal needs, but are
> you really proposing this for a general-purpose feature in Emacs?
> Because then we'd need to start by deciding what is "non-standard" in
> this context.  For example, assuming the "standard" encoding is the
> one determined by the locale, then if one lives in a non-UTF-8 locale,
> they will always see "non-standard" strings in each and every .el file
> they ever edit, which doesn't sound like a good idea to me.

A list of "standard" codings could be customizable, so every user
could add more codings to it after learning their mnemonic characters.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-25 18:59         ` Juri Linkov
@ 2020-08-25 19:26           ` Eli Zaretskii
  0 siblings, 0 replies; 97+ messages in thread
From: Eli Zaretskii @ 2020-08-25 19:26 UTC (permalink / raw)
  To: Juri Linkov; +Cc: ulm, monnier, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: monnier@iro.umontreal.ca,  ulm@gentoo.org,  emacs-devel@gnu.org
> Date: Tue, 25 Aug 2020 21:59:00 +0300
> 
> > So on a TTY, you can have "UTF-8UTF-8UTF-8", if all the 3 encodings
> > are UTF-8?  Or do you only handle buffer-file-coding-system and ignore
> > the other 2 encodings?
> 
> Currently the other 2 encodings are ignored on a TTY, but since
> often all 3 encodings are the same, then maybe it would be enough

They might be the same in your locale, but it isn't necessarily the
situation for everyone.

> > It's okay to customize the mode line to your personal needs, but are
> > you really proposing this for a general-purpose feature in Emacs?
> > Because then we'd need to start by deciding what is "non-standard" in
> > this context.  For example, assuming the "standard" encoding is the
> > one determined by the locale, then if one lives in a non-UTF-8 locale,
> > they will always see "non-standard" strings in each and every .el file
> > they ever edit, which doesn't sound like a good idea to me.
> 
> A list of "standard" codings could be customizable, so every user
> could add more codings to it after learning their mnemonic characters.

My point is that "standard" depends on several factors, so a fixed
preferred value is probably not enough.

So I don't think you suggestion is a good idea in general, sorry.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-24 14:30                   ` Yuri Khan
@ 2020-08-29 11:17                     ` Ulrich Mueller
  0 siblings, 0 replies; 97+ messages in thread
From: Ulrich Mueller @ 2020-08-29 11:17 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Eli Zaretskii, Paul Eggert, Stefan Monnier, Emacs developers

>>>>> On Mon, 24 Aug 2020, Yuri Khan wrote:

>> The smallest change to achieve this would be to change both utf-16 and
>> utf-7 from "U" to "u" (and koi8-u to "Y").

> The letter Y has nothing in common with Ukraine (country) and
> Ukrainian (language). Pretty much everyone who might be looking at a
> file encoded in koi8-u will have fonts with Cyrillic coverage so using
> the Cyrillic letter У should be better.

I asked a fellow Gentoo developer from Ukraine, and he confirms that
using Y for koi8-u would be "strange". He also made the suggestion to
use K for all koi8*, even if it would collide with Korean. It should be
obvious from the buffer's content if it's (e.g.) Russian or Korean, so
there isn't any real ambiguity.

Interestingly, the following comment (by RMS) in cyrillic.el goes into
the same direction:

  ;; We used to use ?K.  It is true that ?K is more strictly correct,
  ;; but it is also used for Korean.  So people who use koi8 for
  ;; languages other than Russian will have to forgive us.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2020-08-24 15:23                     ` Ulrich Mueller
  2020-08-24 16:43                       ` Stefan Monnier
@ 2023-07-05 10:08                       ` Ulrich Mueller
  2023-07-05 11:41                         ` Eli Zaretskii
  2023-07-05 12:49                         ` Disambiguate modeline character for UTF-8? Stefan Monnier
  1 sibling, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-05 10:08 UTC (permalink / raw)
  To: emacs-devel; +Cc: Drew Adams, Eli Zaretskii, eggert, Stefan Monnier

>>>>> On Mon, 24 Aug 2020, Ulrich Mueller wrote:

>>>>> On Mon, 24 Aug 2020, Drew Adams wrote:
>> I'll just say this, as some have suggested that
>> one main thing they want is to be able to easily
>> and quickly tell whether the encoding is NOT
>> utf-8 (and not ASCII, presumably):

>> The characters "u" and "U" are not so easily
>> distinguished.  You might want to pick some
>> other, quite different looking, character for
>> the non-UTF-8 (i.e., UTF-16 etc.).

> Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8?
> This would be visually unobtrusive, so any uncommon coding system would
> stand out against it.

Coming back to this thread (which at the time ended in bikeshedding).
The goal I had in mind was to disambiguate UTF-8, i.e. a unique modeline
character would be used for it. Currently this is not the case:

   U -- utf-8*  (all variants)
   U -- utf-16* (all variants)
   U -- utf-7
   U -- koi8-u

So, I propose to change this to either:

   + -- utf-8*  (all variants)
   (everything else unchanged)

or:

   U -- utf-8*  (all variants)
   u -- utf-16* (all variants)
   u -- utf-7
   K -- koi8-u

Note that "K" is also used for Korean. I think that's not a real
conflict, because normally it would be clear from context whether the
buffer's content is Korean or Ukrainian.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 10:08                       ` Ulrich Mueller
@ 2023-07-05 11:41                         ` Eli Zaretskii
  2023-07-05 13:04                           ` Ulrich Mueller
  2023-07-05 12:49                         ` Disambiguate modeline character for UTF-8? Stefan Monnier
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-05 11:41 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: Drew Adams <drew.adams@oracle.com>,  Eli Zaretskii <eliz@gnu.org>,
>   eggert@cs.ucla.edu,  Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 05 Jul 2023 12:08:59 +0200
> 
> >>>>> On Mon, 24 Aug 2020, Ulrich Mueller wrote:
> 
> >>>>> On Mon, 24 Aug 2020, Drew Adams wrote:
> >> I'll just say this, as some have suggested that
> >> one main thing they want is to be able to easily
> >> and quickly tell whether the encoding is NOT
> >> utf-8 (and not ASCII, presumably):
> 
> >> The characters "u" and "U" are not so easily
> >> distinguished.  You might want to pick some
> >> other, quite different looking, character for
> >> the non-UTF-8 (i.e., UTF-16 etc.).
> 
> > Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8?
> > This would be visually unobtrusive, so any uncommon coding system would
> > stand out against it.
> 
> Coming back to this thread (which at the time ended in bikeshedding).
> The goal I had in mind was to disambiguate UTF-8, i.e. a unique modeline
> character would be used for it. Currently this is not the case:
> 
>    U -- utf-8*  (all variants)
>    U -- utf-16* (all variants)
>    U -- utf-7
>    U -- koi8-u
> 
> So, I propose to change this to either:
> 
>    + -- utf-8*  (all variants)
>    (everything else unchanged)
> 
> or:
> 
>    U -- utf-8*  (all variants)
>    u -- utf-16* (all variants)
>    u -- utf-7
>    K -- koi8-u

TBH, I don't like to change such long-time features.

The only real problem is between UTF-8 and UTF-16, since the others
are hardly ever used these days.  UTF-16 is also quite rarely used,
basically only on MS-Windows for system-level files.  So is this
really a problem that we need to solve, at the risk of breaking
people's "muscle" memory?  If I see the lower-case "u" on the
modeline when I expect to see "U" instead, I'd be surprised.  Is it
worth it?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 10:08                       ` Ulrich Mueller
  2023-07-05 11:41                         ` Eli Zaretskii
@ 2023-07-05 12:49                         ` Stefan Monnier
  2023-07-05 13:38                           ` Eli Zaretskii
  2023-07-06 19:07                           ` Filipp Gunbin
  1 sibling, 2 replies; 97+ messages in thread
From: Stefan Monnier @ 2023-07-05 12:49 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel, Drew Adams, Eli Zaretskii, eggert

>    + -- utf-8*  (all variants)
>    (everything else unchanged)
>
> or:
>
>    U -- utf-8*  (all variants)
>    u -- utf-16* (all variants)
>    u -- utf-7
>    K -- koi8-u

They both sound good to me.

BTW, things like 8bit coding systems like koi8 are (and should be)
becoming sufficiently rare nowadays in my experience that we could
consider using non-single-letter thingies for them (I'd even welcome
extra highlighting with some kind of warning color).


        Stefan




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 11:41                         ` Eli Zaretskii
@ 2023-07-05 13:04                           ` Ulrich Mueller
  2023-07-05 13:44                             ` Eli Zaretskii
  2023-07-06 12:27                             ` Po Lu
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-05 13:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel, drew.adams, eggert, monnier

>>>>> On Wed, 05 Jul 2023, Eli Zaretskii wrote:

>> Coming back to this thread (which at the time ended in bikeshedding).
>> The goal I had in mind was to disambiguate UTF-8, i.e. a unique modeline
>> character would be used for it. Currently this is not the case:
>> 
>>    U -- utf-8*  (all variants)
>>    U -- utf-16* (all variants)
>>    U -- utf-7
>>    U -- koi8-u
>> 
>> So, I propose to change this to either:
>> 
>>    + -- utf-8*  (all variants)
>>    (everything else unchanged)
>> 
>> or:
>> 
>>    U -- utf-8*  (all variants)
>>    u -- utf-16* (all variants)
>>    u -- utf-7
>>    K -- koi8-u

> TBH, I don't like to change such long-time features.

> The only real problem is between UTF-8 and UTF-16, since the others
> are hardly ever used these days.  UTF-16 is also quite rarely used,
> basically only on MS-Windows for system-level files.  So is this
> really a problem that we need to solve, at the risk of breaking
> people's "muscle" memory?  If I see the lower-case "u" on the
> modeline when I expect to see "U" instead, I'd be surprised.  Is it
> worth it?

UTF-8 is one of the most common encodings, and it is strange that it
shares its modeline indicator with anything else. And the "U" is really
ambiguous, because context won't help (or how would you decide if a
buffer's file encoding is e.g. koi8-u or utf-8?).

As you say, the others in the above list are rarely used nowadays. So,
maybe users should see the "u" or the "K" to indicate that the file has
an unusual encoding?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 12:49                         ` Disambiguate modeline character for UTF-8? Stefan Monnier
@ 2023-07-05 13:38                           ` Eli Zaretskii
  2023-07-06 19:07                           ` Filipp Gunbin
  1 sibling, 0 replies; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-05 13:38 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ulm, emacs-devel, drew.adams, eggert

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org,  Drew Adams <drew.adams@oracle.com>,  Eli Zaretskii
>  <eliz@gnu.org>,  eggert@cs.ucla.edu
> Date: Wed, 05 Jul 2023 08:49:44 -0400
> 
> BTW, things like 8bit coding systems like koi8 are (and should be)
> becoming sufficiently rare nowadays in my experience that we could
> consider using non-single-letter thingies for them

This will require infrastructure changes, since currently the mnemonic
can only be a single character.  I'm not sure such a change (and the
difficulties it will cause, like how do you display the triplet, which
is today "UUU", on the modeline) would be justified.  We could instead
simply reuse the letters used by those ancient encodings, disregarding
the fact that they are already "in use".



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 13:04                           ` Ulrich Mueller
@ 2023-07-05 13:44                             ` Eli Zaretskii
  2023-07-05 21:50                               ` Ulrich Mueller
  2023-07-06 12:27                             ` Po Lu
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-05 13:44 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: Ulrich Mueller <ulm@gentoo.org>,  emacs-devel@gnu.org,
>   drew.adams@oracle.com,  eggert@cs.ucla.edu,  monnier@iro.umontreal.ca
> Date: Wed, 05 Jul 2023 15:04:08 +0200
> 
> >>>>> On Wed, 05 Jul 2023, Eli Zaretskii wrote:
> 
> > The only real problem is between UTF-8 and UTF-16, since the others
> > are hardly ever used these days.  UTF-16 is also quite rarely used,
> > basically only on MS-Windows for system-level files.  So is this
> > really a problem that we need to solve, at the risk of breaking
> > people's "muscle" memory?  If I see the lower-case "u" on the
> > modeline when I expect to see "U" instead, I'd be surprised.  Is it
> > worth it?
> 
> UTF-8 is one of the most common encodings, and it is strange that it
> shares its modeline indicator with anything else. And the "U" is really
> ambiguous, because context won't help (or how would you decide if a
> buffer's file encoding is e.g. koi8-u or utf-8?).

Is the problem that koi8-u also uses 'U'?  That is, if we change
koi8-u to some other character, will that be good enough?

The other encodings are all from the UTF family, so using 'U' for them
all does make sense.  (The lower-case 'u' for utf-7 is IMO simply a
mistake, and can be fixed with a low risk, I think, since this
encoding is rare.)



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 13:44                             ` Eli Zaretskii
@ 2023-07-05 21:50                               ` Ulrich Mueller
  2023-07-05 22:11                                 ` Paul Eggert
                                                   ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-05 21:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, drew.adams, eggert, monnier

>>>>> On Wed, 05 Jul 2023, Eli Zaretskii wrote:

>> UTF-8 is one of the most common encodings, and it is strange that it
>> shares its modeline indicator with anything else. And the "U" is really
>> ambiguous, because context won't help (or how would you decide if a
>> buffer's file encoding is e.g. koi8-u or utf-8?).

> Is the problem that koi8-u also uses 'U'?  That is, if we change
> koi8-u to some other character, will that be good enough?

It would help, but it would solve only part of the problem. (I had
suggested "K" for koi8 before.)

> The other encodings are all from the UTF family, so using 'U' for them
> all does make sense.

IMHO it doesn't make sense at all. UTF-8, UTF-16 and UTF-7 are
completely different encodings which have nothing in common except
their name.

All I'm asking for is a unique indicator for UTF-8. Wouldn't this be
justified for the most common encoding (or maybe it's second after
ASCII)?

> (The lower-case 'u' for utf-7 is IMO simply a mistake, and can be
> fixed with a low risk, I think, since this encoding is rare.)



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 21:50                               ` Ulrich Mueller
@ 2023-07-05 22:11                                 ` Paul Eggert
  2023-07-06  8:51                                   ` Ulrich Mueller
  2023-07-06  5:33                                 ` Eli Zaretskii
  2023-07-06 12:31                                 ` Po Lu
  2 siblings, 1 reply; 97+ messages in thread
From: Paul Eggert @ 2023-07-05 22:11 UTC (permalink / raw)
  To: Ulrich Mueller, Eli Zaretskii; +Cc: emacs-devel, drew.adams, monnier

On 2023-07-05 14:50, Ulrich Mueller wrote:
> All I'm asking for is a unique indicator for UTF-8. Wouldn't this be
> justified for the most common encoding (or maybe it's second after
> ASCII)?

Is the idea to use 'u' for UTF-8, and 'U' for the other Unicode-related 
encodings? That sounds good to me, since 'u' should be common and 'U' rare.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 21:50                               ` Ulrich Mueller
  2023-07-05 22:11                                 ` Paul Eggert
@ 2023-07-06  5:33                                 ` Eli Zaretskii
  2023-07-06  8:47                                   ` Ulrich Mueller
  2023-07-06 12:31                                 ` Po Lu
  2 siblings, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-06  5:33 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: emacs-devel@gnu.org,  drew.adams@oracle.com,  eggert@cs.ucla.edu,
>   monnier@iro.umontreal.ca
> Date: Wed, 05 Jul 2023 23:50:53 +0200
> 
> > The other encodings are all from the UTF family, so using 'U' for them
> > all does make sense.
> 
> IMHO it doesn't make sense at all. UTF-8, UTF-16 and UTF-7 are
> completely different encodings which have nothing in common except
> their name.

They do have something important in common: they are all Unicode
encodings, thus "U".

> All I'm asking for is a unique indicator for UTF-8. Wouldn't this be
> justified for the most common encoding (or maybe it's second after
> ASCII)?

Sorry, I'm not interested in such radical changes.  UTF-8 is an
important encoding, but that doesn't justify what you propose.  In the
absolute majority of cases, "U" already means UTF-8 and nothing else,
so the issue is marginal at best, and making such significant
incompatible changes in user-facing displays is unjustified from my
POV.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06  5:33                                 ` Eli Zaretskii
@ 2023-07-06  8:47                                   ` Ulrich Mueller
  2023-07-06  9:20                                     ` Eli Zaretskii
  2023-07-06 12:32                                     ` Po Lu
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-06  8:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, drew.adams, eggert, monnier

>>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote:

>> All I'm asking for is a unique indicator for UTF-8. Wouldn't this be
>> justified for the most common encoding (or maybe it's second after
>> ASCII)?

> Sorry, I'm not interested in such radical changes.  UTF-8 is an
> important encoding, but that doesn't justify what you propose.  In the
> absolute majority of cases, "U" already means UTF-8 and nothing else,
> so the issue is marginal at best, and making such significant
> incompatible changes in user-facing displays is unjustified from my
> POV.

Sorry, but in what world does this qualify as a "radical change"?

I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u",
and koi8 to some other character like "K".



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 22:11                                 ` Paul Eggert
@ 2023-07-06  8:51                                   ` Ulrich Mueller
  0 siblings, 0 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-06  8:51 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, emacs-devel, drew.adams, monnier

>>>>> On Thu, 06 Jul 2023, Paul Eggert wrote:

> Is the idea to use 'u' for UTF-8, and 'U' for the other
> Unicode-related encodings? That sounds good to me, since 'u' should be
> common and 'U' rare.

My suggestion was to keep "U" for UTF-8, and change the others to "u"
(so in the most common case there would be no change). But I'd also be
fine with the other way around.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06  8:47                                   ` Ulrich Mueller
@ 2023-07-06  9:20                                     ` Eli Zaretskii
  2023-07-06  9:46                                       ` Ulrich Mueller
  2023-07-06 12:32                                     ` Po Lu
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-06  9:20 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: emacs-devel@gnu.org,  drew.adams@oracle.com,  eggert@cs.ucla.edu,
>   monnier@iro.umontreal.ca
> Date: Thu, 06 Jul 2023 10:47:53 +0200
> 
> >>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote:
> 
> >> All I'm asking for is a unique indicator for UTF-8. Wouldn't this be
> >> justified for the most common encoding (or maybe it's second after
> >> ASCII)?
> 
> > Sorry, I'm not interested in such radical changes.  UTF-8 is an
> > important encoding, but that doesn't justify what you propose.  In the
> > absolute majority of cases, "U" already means UTF-8 and nothing else,
> > so the issue is marginal at best, and making such significant
> > incompatible changes in user-facing displays is unjustified from my
> > POV.
> 
> Sorry, but in what world does this qualify as a "radical change"?

In this one.  People have been staring at "U" (and "UUU" on TTY
frames) since Emacs 23.1 was released.  If someone cares about those
characters so much so that they want them changed, please think about
others who care about them and would be surprised and probably worried
by suddenly seeing a different character.  (If the assumption is that
people don't care about these indications, then this whole discussion
is moot to begin with.)

> I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u",
> and koi8 to some other character like "K".

Yes, I understand the proposal.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06  9:20                                     ` Eli Zaretskii
@ 2023-07-06  9:46                                       ` Ulrich Mueller
  2023-07-06 12:34                                         ` Po Lu
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-06  9:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, drew.adams, eggert, monnier

>>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote:

>> > Sorry, I'm not interested in such radical changes.  UTF-8 is an
>> > important encoding, but that doesn't justify what you propose.  In the
>> > absolute majority of cases, "U" already means UTF-8 and nothing else,
>> > so the issue is marginal at best, and making such significant
>> > incompatible changes in user-facing displays is unjustified from my
>> > POV.
>> 
>> Sorry, but in what world does this qualify as a "radical change"?

> In this one.  People have been staring at "U" (and "UUU" on TTY
> frames) since Emacs 23.1 was released.  If someone cares about those
> characters so much so that they want them changed, please think about
> others who care about them and would be surprised and probably worried
> by suddenly seeing a different character.  (If the assumption is that
> people don't care about these indications, then this whole discussion
> is moot to begin with.)

Well, in the absolute majority of cases (UTF-8) the "U" would stay.

I'd rather expect users to be surprised when they see the "U" but then
find out that the encoding is something entirely different.

>> I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u",
>> and koi8 to some other character like "K".

> Yes, I understand the proposal.

How about the following then?

- Keep "U" for both UTF-8 and UTF-16.
- Change UTF-7 to "u" (which is already used for one of its variants).
- Change koi8 to "K".



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 13:04                           ` Ulrich Mueller
  2023-07-05 13:44                             ` Eli Zaretskii
@ 2023-07-06 12:27                             ` Po Lu
  2023-07-07  7:09                               ` UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
  1 sibling, 1 reply; 97+ messages in thread
From: Po Lu @ 2023-07-06 12:27 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier

Ulrich Mueller <ulm@gentoo.org> writes:

> UTF-8 is one of the most common encodings, and it is strange that it
> shares its modeline indicator with anything else. And the "U" is really
> ambiguous, because context won't help (or how would you decide if a
> buffer's file encoding is e.g. koi8-u or utf-8?).
>
> As you say, the others in the above list are rarely used nowadays. So,
> maybe users should see the "u" or the "K" to indicate that the file has
> an unusual encoding?

The coding system indication in the mode line is most useful for
determining which characters can be represented in the file's coding
system.  Since the same characters can be encoded in all of UTF-16,
UTF-32 and UTF-8, it is only natural for them to share the same mode
line indicator.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 21:50                               ` Ulrich Mueller
  2023-07-05 22:11                                 ` Paul Eggert
  2023-07-06  5:33                                 ` Eli Zaretskii
@ 2023-07-06 12:31                                 ` Po Lu
  2023-07-06 13:02                                   ` Andreas Schwab
  2023-07-06 13:08                                   ` Ulrich Mueller
  2 siblings, 2 replies; 97+ messages in thread
From: Po Lu @ 2023-07-06 12:31 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier

Ulrich Mueller <ulm@gentoo.org> writes:

> IMHO it doesn't make sense at all. UTF-8, UTF-16 and UTF-7 are
> completely different encodings which have nothing in common except
> their name.

I disagree.  UTF-7, UTF-8 and UTF-16 both encode the same coded
character set (or at least the BMP of the same character set.)  That's a
far cry from there being ``nothing in common''.

> All I'm asking for is a unique indicator for UTF-8. Wouldn't this be
> justified for the most common encoding (or maybe it's second after
> ASCII)?

Why would UTF-8 warrant a unique indicator on the basis of popularity
alone?  The indicator is supposed to describe the coded character set:
people who also need to know the coding system (which is not needed as
often) can also read its tooltip text.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06  8:47                                   ` Ulrich Mueller
  2023-07-06  9:20                                     ` Eli Zaretskii
@ 2023-07-06 12:32                                     ` Po Lu
  1 sibling, 0 replies; 97+ messages in thread
From: Po Lu @ 2023-07-06 12:32 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier

Ulrich Mueller <ulm@gentoo.org> writes:

> Sorry, but in what world does this qualify as a "radical change"?

A change to long standing behavior is always radical.

> I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u",
> and koi8 to some other character like "K".

I'm fine with changing koi8-u.  But changing the characters that
represent Unicode encodings is unreasonable.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06  9:46                                       ` Ulrich Mueller
@ 2023-07-06 12:34                                         ` Po Lu
  0 siblings, 0 replies; 97+ messages in thread
From: Po Lu @ 2023-07-06 12:34 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier

Ulrich Mueller <ulm@gentoo.org> writes:

> Well, in the absolute majority of cases (UTF-8) the "U" would stay.
>
> I'd rather expect users to be surprised when they see the "U" but then
> find out that the encoding is something entirely different.

When I see `U', my only expectation is for the coding system being used
to represent a Unicode character set.  The tool tip text displays
exactly which coding system that is.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 12:31                                 ` Po Lu
@ 2023-07-06 13:02                                   ` Andreas Schwab
  2023-07-06 13:08                                   ` Ulrich Mueller
  1 sibling, 0 replies; 97+ messages in thread
From: Andreas Schwab @ 2023-07-06 13:02 UTC (permalink / raw)
  To: Po Lu
  Cc: Ulrich Mueller, Eli Zaretskii, emacs-devel, drew.adams, eggert,
	monnier

On Jul 06 2023, Po Lu wrote:

> I disagree.  UTF-7, UTF-8 and UTF-16 both encode the same coded
> character set (or at least the BMP of the same character set.)

As does GB18030.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 12:31                                 ` Po Lu
  2023-07-06 13:02                                   ` Andreas Schwab
@ 2023-07-06 13:08                                   ` Ulrich Mueller
  2023-07-06 17:37                                     ` Paul Eggert
  2023-07-07  0:19                                     ` Disambiguate modeline character for UTF-8? Po Lu
  1 sibling, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-06 13:08 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier

>>>>> On Thu, 06 Jul 2023, Po Lu wrote:

> I disagree.  UTF-7, UTF-8 and UTF-16 both encode the same coded
> character set (or at least the BMP of the same character set.)  That's a
> far cry from there being ``nothing in common''.

This argument applies only to UTF-8 and UTF-16.

OTOH, UTF-7 isn't part of the Unicode standard. Also, it cannot encode
all of Unicode but only the first 65536 code points [1].

>> All I'm asking for is a unique indicator for UTF-8. Wouldn't this
>> be justified for the most common encoding (or maybe it's second
>> after ASCII)?

> Why would UTF-8 warrant a unique indicator on the basis of popularity
> alone?  The indicator is supposed to describe the coded character set:
> people who also need to know the coding system (which is not needed as
> often) can also read its tooltip text.

Right, and for both UTF-7 and koi8-u the coded character set is not
Unicode but only a subset of it.

[1] https://www.rfc-editor.org/rfc/rfc2152.txt



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 13:08                                   ` Ulrich Mueller
@ 2023-07-06 17:37                                     ` Paul Eggert
  2023-07-06 18:13                                       ` Eli Zaretskii
  2023-07-06 18:44                                       ` Ulrich Müller
  2023-07-07  0:19                                     ` Disambiguate modeline character for UTF-8? Po Lu
  1 sibling, 2 replies; 97+ messages in thread
From: Paul Eggert @ 2023-07-06 17:37 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

On 2023-07-06 06:08, Ulrich Mueller wrote:
> for both UTF-7 and koi8-u the coded character set is not
> Unicode but only a subset of it

It would be helpful to use 'u' when only a subset of Unicode can be 
represented, as a clue that something odd is going on, compared to the 
more-usual 'U'.

Also, Andreas made a good point: since GB18030 encodes Unicode, 
shouldn't it be displayed as 'U' too? Why treat it specially?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 17:37                                     ` Paul Eggert
@ 2023-07-06 18:13                                       ` Eli Zaretskii
  2023-07-06 18:44                                       ` Ulrich Müller
  1 sibling, 0 replies; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-06 18:13 UTC (permalink / raw)
  To: Paul Eggert; +Cc: ulm, emacs-devel

> Date: Thu, 6 Jul 2023 10:37:53 -0700
> Cc: emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> Also, Andreas made a good point: since GB18030 encodes Unicode, 
> shouldn't it be displayed as 'U' too? Why treat it specially?

I don't think this is what Andreas had in mind, but in any case, I
don't think we should make such a change with the GB18030 users
actually asking for that.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 17:37                                     ` Paul Eggert
  2023-07-06 18:13                                       ` Eli Zaretskii
@ 2023-07-06 18:44                                       ` Ulrich Müller
  2023-07-06 19:01                                         ` Eli Zaretskii
  1 sibling, 1 reply; 97+ messages in thread
From: Ulrich Müller @ 2023-07-06 18:44 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

>>>>> On Thu, 06 Jul 2023, Paul Eggert wrote:

> On 2023-07-06 06:08, Ulrich Mueller wrote:
>> for both UTF-7 and koi8-u the coded character set is not
>> Unicode but only a subset of it

> It would be helpful to use 'u' when only a subset of Unicode can be
> represented, as a clue that something odd is going on, compared to the
> more-usual 'U'.

How about the following patch then?


From b33df88e456092e89bad52565b68a77ea3d0c71a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ulrich=20M=C3=BCller?= <ulm@gentoo.org>
Date: Thu, 6 Jul 2023 20:36:09 +0200
Subject: [PATCH] Disambiguate mode line indication for utf-8 and utf-16

* lisp/international/mule-conf.el (utf-7):
* lisp/language/cyrillic.el (koi8-u): Change mnemonic letters to
?u and ?K, respectively.
---
 lisp/international/mule-conf.el | 2 +-
 lisp/language/cyrillic.el       | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index a27aaf9e522..f65f124b633 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1600,7 +1600,7 @@ 'ascii
 (define-coding-system 'utf-7
   "UTF-7 encoding of Unicode (RFC 2152)."
   :coding-type 'utf-8
-  :mnemonic ?U
+  :mnemonic ?u
   :mime-charset 'utf-7
   :charset-list '(unicode)
   :pre-write-conversion 'utf-7-pre-write-conversion
diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el
index 7af87e65703..1ad1302095b 100644
--- a/lisp/language/cyrillic.el
+++ b/lisp/language/cyrillic.el
@@ -126,7 +126,10 @@ 'cp878
 (define-coding-system 'koi8-u
   "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)"
   :coding-type 'charset
-  :mnemonic ?U
+  ;; This used to be ?U which collided with UTF-8.  ?K is also used
+  ;; for Korean, but it shouldn't be a real conflict since Cyrillic
+  ;; and Hangul can be disambiguated from context.
+  :mnemonic ?K
   :charset-list '(koi8-u)
   :mime-charset 'koi8-u)
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 18:44                                       ` Ulrich Müller
@ 2023-07-06 19:01                                         ` Eli Zaretskii
  2023-07-06 19:31                                           ` Ulrich Mueller
  0 siblings, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-06 19:01 UTC (permalink / raw)
  To: Ulrich Müller; +Cc: eggert, emacs-devel

> From: Ulrich Müller <ulm@gentoo.org>
> Cc: emacs-devel@gnu.org
> Date: Thu, 06 Jul 2023 20:44:05 +0200
> 
> --- a/lisp/language/cyrillic.el
> +++ b/lisp/language/cyrillic.el
> @@ -126,7 +126,10 @@ 'cp878
>  (define-coding-system 'koi8-u
>    "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)"
>    :coding-type 'charset
> -  :mnemonic ?U
> +  ;; This used to be ?U which collided with UTF-8.  ?K is also used
> +  ;; for Korean, but it shouldn't be a real conflict since Cyrillic
> +  ;; and Hangul can be disambiguated from context.
> +  :mnemonic ?K

K is not a good idea, for 2 reasons:

  . the KOI8 family includes 3 encodings, not 1
  . U in koi8-u stands for "Ukraine", so replacing it with K will
    probably be frowned upon

How about using У instead?  (Assuming using non-ASCII works there; the
code seems to allow that.)



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-05 12:49                         ` Disambiguate modeline character for UTF-8? Stefan Monnier
  2023-07-05 13:38                           ` Eli Zaretskii
@ 2023-07-06 19:07                           ` Filipp Gunbin
  1 sibling, 0 replies; 97+ messages in thread
From: Filipp Gunbin @ 2023-07-06 19:07 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Ulrich Mueller, emacs-devel, Drew Adams, Eli Zaretskii, eggert

On 05/07/2023 08:49 -0400, Stefan Monnier wrote:

>>    + -- utf-8*  (all variants)
>>    (everything else unchanged)
>>
>> or:
>>
>>    U -- utf-8*  (all variants)
>>    u -- utf-16* (all variants)
>>    u -- utf-7
>>    K -- koi8-u
>
> They both sound good to me.
>
> BTW, things like 8bit coding systems like koi8 are (and should be)
> becoming sufficiently rare nowadays in my experience that we could
> consider using non-single-letter thingies for them (I'd even welcome
> extra highlighting with some kind of warning color).

My experience differs: for example, cp1251 is often used for
auto-generated csv files (like transaction list from an online bank) in
Russian environments.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 19:01                                         ` Eli Zaretskii
@ 2023-07-06 19:31                                           ` Ulrich Mueller
  2023-07-07  5:18                                             ` Eli Zaretskii
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-06 19:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, emacs-devel

>>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote:

>> From: Ulrich Müller <ulm@gentoo.org>
>> Cc: emacs-devel@gnu.org
>> Date: Thu, 06 Jul 2023 20:44:05 +0200
>> 
>> --- a/lisp/language/cyrillic.el
>> +++ b/lisp/language/cyrillic.el
>> @@ -126,7 +126,10 @@ 'cp878
>> (define-coding-system 'koi8-u
>> "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)"
>> :coding-type 'charset
>> -  :mnemonic ?U
>> +  ;; This used to be ?U which collided with UTF-8.  ?K is also used
>> +  ;; for Korean, but it shouldn't be a real conflict since Cyrillic
>> +  ;; and Hangul can be disambiguated from context.
>> +  :mnemonic ?K

> K is not a good idea, for 2 reasons:

>   . the KOI8 family includes 3 encodings, not 1
>   . U in koi8-u stands for "Ukraine", so replacing it with K will
>     probably be frowned upon

The K (for all KOI8 variants) was actually suggested by a person from
Ukraine, back in 2020:
https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg01010.html

> How about using У instead?  (Assuming using non-ASCII works there; the
> code seems to allow that.)

I've just tested a patch with current master, and for me the У works
both in an X frame ("У" in the mode line), in a text terminal under X
("UUУ") and in the Linux console ("UUУ").

Can we assume that users have the necessary fonts installed?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 13:08                                   ` Ulrich Mueller
  2023-07-06 17:37                                     ` Paul Eggert
@ 2023-07-07  0:19                                     ` Po Lu
  1 sibling, 0 replies; 97+ messages in thread
From: Po Lu @ 2023-07-07  0:19 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier

Ulrich Mueller <ulm@gentoo.org> writes:

>>>>>> On Thu, 06 Jul 2023, Po Lu wrote:
>
>> I disagree.  UTF-7, UTF-8 and UTF-16 both encode the same coded
>> character set (or at least the BMP of the same character set.)  That's a
>> far cry from there being ``nothing in common''.
>
> This argument applies only to UTF-8 and UTF-16.
>
> OTOH, UTF-7 isn't part of the Unicode standard. Also, it cannot encode
> all of Unicode but only the first 65536 code points [1].

As I said, the BMP of the same character set.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-06 19:31                                           ` Ulrich Mueller
@ 2023-07-07  5:18                                             ` Eli Zaretskii
  2023-07-07  5:48                                               ` Ulrich Müller
  0 siblings, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-07  5:18 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: eggert, emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Thu, 06 Jul 2023 21:31:49 +0200
> 
> >>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote:
> 
> > K is not a good idea, for 2 reasons:
> 
> >   . the KOI8 family includes 3 encodings, not 1
> >   . U in koi8-u stands for "Ukraine", so replacing it with K will
> >     probably be frowned upon
> 
> The K (for all KOI8 variants) was actually suggested by a person from
> Ukraine, back in 2020:
> https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg01010.html

That's one person...

> > How about using У instead?  (Assuming using non-ASCII works there; the
> > code seems to allow that.)
> 
> I've just tested a patch with current master, and for me the У works
> both in an X frame ("У" in the mode line), in a text terminal under X
> ("UUУ") and in the Linux console ("UUУ").
> 
> Can we assume that users have the necessary fonts installed?

In Ukraine? most probably.

Did that character on a GUI frame need a non-default font, or was it
supported by the default font.  I'd expect the Cyrillic script to be
supported by the fonts people use as the default in Emacs.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  5:18                                             ` Eli Zaretskii
@ 2023-07-07  5:48                                               ` Ulrich Müller
  2023-07-07  6:16                                                 ` Po Lu
  2023-07-08  8:49                                                 ` Eli Zaretskii
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Müller @ 2023-07-07  5:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, emacs-devel

>>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote:

>> The K (for all KOI8 variants) was actually suggested by a person from
>> Ukraine, back in 2020:
>> https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg01010.html

> That's one person...

Yes.

>> > How about using У instead?  (Assuming using non-ASCII works there; the
>> > code seems to allow that.)
>> 
>> I've just tested a patch with current master, and for me the У works
>> both in an X frame ("У" in the mode line), in a text terminal under X
>> ("UUУ") and in the Linux console ("UUУ").
>> 
>> Can we assume that users have the necessary fonts installed?

> In Ukraine? most probably.

> Did that character on a GUI frame need a non-default font, or was it
> supported by the default font.  I'd expect the Cyrillic script to be
> supported by the fonts people use as the default in Emacs.

It's supported by the default font for me (which is Droid Sans Mono).
У aka U+0423 is contained in WGL4, and I guess that most fonts (also on
GNU/Linux) would cover a superset of it.

Updated patch below.


From fbcc65ebde142f998e9dae8ad711f484585ef29b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ulrich=20M=C3=BCller?= <ulm@gentoo.org>
Date: Thu, 6 Jul 2023 20:36:09 +0200
Subject: [PATCH] Disambiguate mode line indication for utf-8 and utf-16

* lisp/international/mule-conf.el (utf-7):
* lisp/language/cyrillic.el (koi8-u): Change mnemonic letters to
?u and ?\N{cyrillic capital letter u}, respectively.
---
 lisp/international/mule-conf.el | 2 +-
 lisp/language/cyrillic.el       | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index a27aaf9e522..f65f124b633 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1600,7 +1600,7 @@ 'ascii
 (define-coding-system 'utf-7
   "UTF-7 encoding of Unicode (RFC 2152)."
   :coding-type 'utf-8
-  :mnemonic ?U
+  :mnemonic ?u
   :mime-charset 'utf-7
   :charset-list '(unicode)
   :pre-write-conversion 'utf-7-pre-write-conversion
diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el
index 7af87e65703..f923c84e221 100644
--- a/lisp/language/cyrillic.el
+++ b/lisp/language/cyrillic.el
@@ -126,7 +126,8 @@ 'cp878
 (define-coding-system 'koi8-u
   "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)"
   :coding-type 'charset
-  :mnemonic ?U
+  ;; This used to be ?U which collided with UTF-8.
+  :mnemonic ?\N{cyrillic capital letter u} ; У
   :charset-list '(koi8-u)
   :mime-charset 'koi8-u)
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  5:48                                               ` Ulrich Müller
@ 2023-07-07  6:16                                                 ` Po Lu
  2023-07-07  6:41                                                   ` Ulrich Mueller
  2023-07-08  8:49                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 97+ messages in thread
From: Po Lu @ 2023-07-07  6:16 UTC (permalink / raw)
  To: Ulrich Müller; +Cc: Eli Zaretskii, eggert, emacs-devel

Ulrich Müller <ulm@gentoo.org> writes:

> diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
> index a27aaf9e522..f65f124b633 100644
> --- a/lisp/international/mule-conf.el
> +++ b/lisp/international/mule-conf.el
> @@ -1600,7 +1600,7 @@ 'ascii
>  (define-coding-system 'utf-7
>    "UTF-7 encoding of Unicode (RFC 2152)."
>    :coding-type 'utf-8
> -  :mnemonic ?U
> +  :mnemonic ?u
>    :mime-charset 'utf-7
>    :charset-list '(unicode)
>    :pre-write-conversion 'utf-7-pre-write-conversion

I thought we agreed NOT to change the mnemonic used for UTF-7, which
also encodes the Unicode BMP.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  6:16                                                 ` Po Lu
@ 2023-07-07  6:41                                                   ` Ulrich Mueller
  2023-07-07  7:38                                                     ` Po Lu
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07  6:41 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, eggert, emacs-devel

>>>>> On Fri, 07 Jul 2023, Po Lu wrote:

>>  (define-coding-system 'utf-7
>>    "UTF-7 encoding of Unicode (RFC 2152)."
>>    :coding-type 'utf-8
>> -  :mnemonic ?U
>> +  :mnemonic ?u
>>    :mime-charset 'utf-7
>>    :charset-list '(unicode)
>>    :pre-write-conversion 'utf-7-pre-write-conversion

> I thought we agreed NOT to change the mnemonic used for UTF-7, which
> also encodes the Unicode BMP.

Which isn't Unicode but only a subset, so it's reasonable that it has
a different mnemonic.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* UTF-32 (was: Re: Disambiguate modeline character for UTF-8?)
  2023-07-06 12:27                             ` Po Lu
@ 2023-07-07  7:09                               ` Ulrich Mueller
  2023-07-07  7:34                                 ` Eli Zaretskii
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07  7:09 UTC (permalink / raw)
  To: emacs-devel

>>>>> On Thu, 06 Jul 2023, Po Lu wrote:

> [...]  Since the same characters can be encoded in all of UTF-16,
> UTF-32 and UTF-8, it is only natural for them to share the same mode
> line indicator.

On a different tangent, Emacs doesn't seem to know about UTF-32, which
I find a little surprising.

Is there simply no need for that encoding, or am I missing something?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32 (was: Re: Disambiguate modeline character for UTF-8?)
  2023-07-07  7:09                               ` UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
@ 2023-07-07  7:34                                 ` Eli Zaretskii
  2023-07-07  8:20                                   ` UTF-32 Ulrich Mueller
  0 siblings, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-07  7:34 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Date: Fri, 07 Jul 2023 09:09:20 +0200
> 
> On a different tangent, Emacs doesn't seem to know about UTF-32, which
> I find a little surprising.
> 
> Is there simply no need for that encoding, or am I missing something?

There's no need.  We don't support character codepoints that are wider
than 32 bits.




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  6:41                                                   ` Ulrich Mueller
@ 2023-07-07  7:38                                                     ` Po Lu
  2023-07-07  9:44                                                       ` Ulrich Mueller
  0 siblings, 1 reply; 97+ messages in thread
From: Po Lu @ 2023-07-07  7:38 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel

Ulrich Mueller <ulm@gentoo.org> writes:

> Which isn't Unicode but only a subset, so it's reasonable that it has
> a different mnemonic.

Why?  Most of the characters users will want to save in a Unicode
document will be part of the BMP.  If it turns out that a character is
not representable, Emacs will ask the user to select a better coding
system upon trying to save the file.

Not that I expect this to happen in practice anyway, since UTF-7 is
rarely encountered nowadays.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32
  2023-07-07  7:34                                 ` Eli Zaretskii
@ 2023-07-07  8:20                                   ` Ulrich Mueller
  2023-07-07 10:16                                     ` UTF-32 Eli Zaretskii
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07  8:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote:

>> On a different tangent, Emacs doesn't seem to know about UTF-32, which
>> I find a little surprising.
>> 
>> Is there simply no need for that encoding, or am I missing something?

> There's no need.  We don't support character codepoints that are wider
> than 32 bits.

IIUC UTF-32 (aka UCS-4) encodes only Unicode codepoints, and it encodes
every character as 4 bytes.

https://www.unicode.org/reports/tr19/tr19-9.html



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  7:38                                                     ` Po Lu
@ 2023-07-07  9:44                                                       ` Ulrich Mueller
  2023-07-07 10:21                                                         ` Eli Zaretskii
  2023-07-07 12:01                                                         ` Po Lu
  0 siblings, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07  9:44 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, eggert, emacs-devel

>>>>> On Fri, 07 Jul 2023, Po Lu wrote:

> Ulrich Mueller <ulm@gentoo.org> writes:
>> Which isn't Unicode but only a subset, so it's reasonable that it has
>> a different mnemonic.

> Why?  Most of the characters users will want to save in a Unicode
> document will be part of the BMP.  If it turns out that a character is
> not representable, Emacs will ask the user to select a better coding
> system upon trying to save the file.

Do you agree that the character repertoire that can be encoded by UTF-7
is not identical to the one that can be encoded by UTF-8 or UTF-16?

> Not that I expect this to happen in practice anyway, since UTF-7 is
> rarely encountered nowadays.

Yes, and when users encounter that rare case, they should get an
indication that the file they are visiting is in an unusual encoding.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32
  2023-07-07  8:20                                   ` UTF-32 Ulrich Mueller
@ 2023-07-07 10:16                                     ` Eli Zaretskii
  2023-07-07 10:34                                       ` UTF-32 Ulrich Mueller
  0 siblings, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-07 10:16 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 07 Jul 2023 10:20:07 +0200
> 
> >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote:
> 
> >> On a different tangent, Emacs doesn't seem to know about UTF-32, which
> >> I find a little surprising.
> >> 
> >> Is there simply no need for that encoding, or am I missing something?
> 
> > There's no need.  We don't support character codepoints that are wider
> > than 32 bits.
> 
> IIUC UTF-32 (aka UCS-4) encodes only Unicode codepoints, and it encodes
> every character as 4 bytes.
> 
> https://www.unicode.org/reports/tr19/tr19-9.html

Yes, I know.  Not sure why you posted this, though.  If you are saying
that this somehow contradicts what I wrote above, please elaborate,
because I don't see the contradiction.




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  9:44                                                       ` Ulrich Mueller
@ 2023-07-07 10:21                                                         ` Eli Zaretskii
  2023-07-07 10:42                                                           ` Ulrich Mueller
  2023-07-07 12:01                                                         ` Po Lu
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-07 10:21 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: luangruo, eggert, emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Fri, 07 Jul 2023 11:44:51 +0200
> 
> > Not that I expect this to happen in practice anyway, since UTF-7 is
> > rarely encountered nowadays.
> 
> Yes, and when users encounter that rare case, they should get an
> indication that the file they are visiting is in an unusual encoding.

We don't have a notion of "unusual" encoding in Emacs.  Basically,
anything besides ASCII and perhaps UTF-8 is "unusual" nowadays, so
such a notion won't be useful, IMO.

How is utf-7 more "unusual" than, say, ebcdic or iso-2022-jp or even
windows-1251?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32
  2023-07-07 10:16                                     ` UTF-32 Eli Zaretskii
@ 2023-07-07 10:34                                       ` Ulrich Mueller
  2023-07-07 12:49                                         ` UTF-32 Eli Zaretskii
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07 10:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote:

>> >> On a different tangent, Emacs doesn't seem to know about UTF-32, which
>> >> I find a little surprising.
>> >> 
>> >> Is there simply no need for that encoding, or am I missing something?
>> 
>> > There's no need.  We don't support character codepoints that are wider
>> > than 32 bits.
>> 
>> IIUC UTF-32 (aka UCS-4) encodes only Unicode codepoints, and it encodes
>> every character as 4 bytes.
>> 
>> https://www.unicode.org/reports/tr19/tr19-9.html

> Yes, I know.  Not sure why you posted this, though.  If you are saying
> that this somehow contradicts what I wrote above, please elaborate,
> because I don't see the contradiction.

I don't understand how "codepoints that are wider than 32 bits"
are related to UTF-32. UTF-8, UTF-16, and UTF-32 all encode the same
repertoire (U+0000 to U+10FFFF).

Emacs knows about UTF-8 and UTF-16 but not about UTF-32. Is it an
unreasonable question to ask why that is so? (Just out of interest,
I do not challenge it, and I have no need for UTF-32.)



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 10:21                                                         ` Eli Zaretskii
@ 2023-07-07 10:42                                                           ` Ulrich Mueller
  2023-07-07 12:04                                                             ` Po Lu
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07 10:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, eggert, emacs-devel

>>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote:

>> > Not that I expect this to happen in practice anyway, since UTF-7 is
>> > rarely encountered nowadays.
>> 
>> Yes, and when users encounter that rare case, they should get an
>> indication that the file they are visiting is in an unusual encoding.

> We don't have a notion of "unusual" encoding in Emacs.  Basically,
> anything besides ASCII and perhaps UTF-8 is "unusual" nowadays, so
> such a notion won't be useful, IMO.

> How is utf-7 more "unusual" than, say, ebcdic or iso-2022-jp or even
> windows-1251?

Sorry, poor choice of wording. What I meant was that when users
encounter a rare case like UTF-7 (or any of the others you just
mentioned), then they should get an indication of the fact.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  9:44                                                       ` Ulrich Mueller
  2023-07-07 10:21                                                         ` Eli Zaretskii
@ 2023-07-07 12:01                                                         ` Po Lu
  2023-07-07 12:38                                                           ` Andreas Schwab
  2023-07-07 12:58                                                           ` Eli Zaretskii
  1 sibling, 2 replies; 97+ messages in thread
From: Po Lu @ 2023-07-07 12:01 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel

Ulrich Mueller <ulm@gentoo.org> writes:

> Do you agree that the character repertoire that can be encoded by UTF-7
> is not identical to the one that can be encoded by UTF-8 or UTF-16?

Only one of the repertories.  The Unicode BMP can be encoded by all of
those coding systems.

> Yes, and when users encounter that rare case, they should get an
> indication that the file they are visiting is in an unusual encoding.

Why?  Again, the purpose of the indicator is to indicate the characters
that can be represented in the buffer's coding system.  If the user
wants to know exactly which coding system is in use, he can view the
indicator's tooltip.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 10:42                                                           ` Ulrich Mueller
@ 2023-07-07 12:04                                                             ` Po Lu
  2023-07-07 13:01                                                               ` Ulrich Mueller
  0 siblings, 1 reply; 97+ messages in thread
From: Po Lu @ 2023-07-07 12:04 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel

Ulrich Mueller <ulm@gentoo.org> writes:

> Sorry, poor choice of wording. What I meant was that when users
> encounter a rare case like UTF-7 (or any of the others you just
> mentioned), then they should get an indication of the fact.

Just to encourage them to convert those files into another ``common''
coding system?  Emacs should stay impartial on such matters, and I can't
think of any other purpose for specifically indicating that a rarely
encountered coding system is being used.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 12:01                                                         ` Po Lu
@ 2023-07-07 12:38                                                           ` Andreas Schwab
  2023-07-07 13:37                                                             ` Po Lu
  2023-07-07 12:58                                                           ` Eli Zaretskii
  1 sibling, 1 reply; 97+ messages in thread
From: Andreas Schwab @ 2023-07-07 12:38 UTC (permalink / raw)
  To: Po Lu; +Cc: Ulrich Mueller, Eli Zaretskii, eggert, emacs-devel

On Jul 07 2023, Po Lu wrote:

> Why?  Again, the purpose of the indicator is to indicate the characters
> that can be represented in the buffer's coding system.

No.  The purpose is to indicate the buffer's file coding system.  You
can put non-latin-1 characters in a buffer with a latin-1 file coding
system just fine.  The file coding system is only relevant when the
buffer contents is saved.

The buffer contents, if multibyte, is always represented by
utf-8-emacs-unix, but that is an internal detail.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32
  2023-07-07 10:34                                       ` UTF-32 Ulrich Mueller
@ 2023-07-07 12:49                                         ` Eli Zaretskii
  2023-07-07 13:24                                           ` UTF-32 Andreas Schwab
  2023-07-07 13:36                                           ` UTF-32 Ulrich Mueller
  0 siblings, 2 replies; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-07 12:49 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 07 Jul 2023 12:34:17 +0200
> 
> >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote:
> 
> >> https://www.unicode.org/reports/tr19/tr19-9.html
> 
> > Yes, I know.  Not sure why you posted this, though.  If you are saying
> > that this somehow contradicts what I wrote above, please elaborate,
> > because I don't see the contradiction.
> 
> I don't understand how "codepoints that are wider than 32 bits"
> are related to UTF-32.

Because using UTF-32 for codepoints that fit in 32 bits makes very
little sense.  See, e.g., https://en.wikipedia.org/wiki/UTF-32.

> UTF-8, UTF-16, and UTF-32 all encode the same
> repertoire (U+0000 to U+10FFFF).

UTF-8 is identical with the codepoints as long as the codepoints are
plain-ASCII.  UTF-16 is identical with the codepoints as long as the
codepoints are inside the BMP.  UTF-32 is identical with the
codepoints as long as the codepoints don't exceed 32 bits.  Since
Unicode doesn't exceed 32 bits, and Emacs extensions of the Unicode
codepoint space also don't exceed 32 bits, Emacs doesn't need to use
UTF-32.

> Emacs knows about UTF-8 and UTF-16 but not about UTF-32. Is it an
> unreasonable question to ask why that is so? (Just out of interest,
> I do not challenge it, and I have no need for UTF-32.)

The question is fine, and I think I answered it.  Did I miss some
aspects of the question?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 12:01                                                         ` Po Lu
  2023-07-07 12:38                                                           ` Andreas Schwab
@ 2023-07-07 12:58                                                           ` Eli Zaretskii
  1 sibling, 0 replies; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-07 12:58 UTC (permalink / raw)
  To: Po Lu; +Cc: ulm, eggert, emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Fri, 07 Jul 2023 20:01:23 +0800
> 
> Again, the purpose of the indicator is to indicate the characters
> that can be represented in the buffer's coding system.

That's not true.  The mnemonic is the indication of the coding-system
itself, not of its supported characters.  You may perceive it as the
indication of the characters, but it is a wrong interpretation.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 12:04                                                             ` Po Lu
@ 2023-07-07 13:01                                                               ` Ulrich Mueller
  2023-07-07 13:38                                                                 ` Po Lu
  0 siblings, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07 13:01 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, eggert, emacs-devel

>>>>> On Fri, 07 Jul 2023, Po Lu wrote:

>> Sorry, poor choice of wording. What I meant was that when users
>> encounter a rare case like UTF-7 (or any of the others you just
>> mentioned), then they should get an indication of the fact.

> Just to encourage them to convert those files into another ``common''
> coding system?

No, to make them aware that the the encoding is something other that
UTF-8. And yes, I do want to know if the file that I'm about to save can
be used as input for the next program (e.g. a compiler or a web server).

> Emacs should stay impartial on such matters, and I can't think of any
> other purpose for specifically indicating that a rarely encountered
> coding system is being used.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32
  2023-07-07 12:49                                         ` UTF-32 Eli Zaretskii
@ 2023-07-07 13:24                                           ` Andreas Schwab
  2023-07-07 13:36                                           ` UTF-32 Ulrich Mueller
  1 sibling, 0 replies; 97+ messages in thread
From: Andreas Schwab @ 2023-07-07 13:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel

On Jul 07 2023, Eli Zaretskii wrote:

> Because using UTF-32 for codepoints that fit in 32 bits makes very
> little sense.

*Every* codepoint fits in 32 bits.  That's why UTF-32 (aka UCS-4)
exists.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32
  2023-07-07 12:49                                         ` UTF-32 Eli Zaretskii
  2023-07-07 13:24                                           ` UTF-32 Andreas Schwab
@ 2023-07-07 13:36                                           ` Ulrich Mueller
  2023-07-07 14:06                                             ` UTF-32 Eli Zaretskii
  1 sibling, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-07 13:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote:

>> I don't understand how "codepoints that are wider than 32 bits"
>> are related to UTF-32.

> Because using UTF-32 for codepoints that fit in 32 bits makes very
> little sense.  See, e.g., https://en.wikipedia.org/wiki/UTF-32.

Sure, it is a wasteful encoding, and it has issues with byte ordering
(but the same is true for UTF-16).

>> UTF-8, UTF-16, and UTF-32 all encode the same
>> repertoire (U+0000 to U+10FFFF).

> UTF-8 is identical with the codepoints as long as the codepoints are
> plain-ASCII.  UTF-16 is identical with the codepoints as long as the
> codepoints are inside the BMP.  UTF-32 is identical with the
> codepoints as long as the codepoints don't exceed 32 bits.  Since
> Unicode doesn't exceed 32 bits, and Emacs extensions of the Unicode
> codepoint space also don't exceed 32 bits, Emacs doesn't need to use
> UTF-32.

>> Emacs knows about UTF-8 and UTF-16 but not about UTF-32. Is it an
>> unreasonable question to ask why that is so? (Just out of interest,
>> I do not challenge it, and I have no need for UTF-32.)

> The question is fine, and I think I answered it.  Did I miss some
> aspects of the question?

The previous discussion was in the context of _file_ coding systems.
Emacs cannot read or write files encoded in UTF-32, correct?

So probably such files just don't exist, or somebody would have
implemented it in the meantime? (OTOH, GNU Recode knows about UTF-32,
UTF-32BE, and UTF-32LE. No UTF-32NUXI, though. :)



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 12:38                                                           ` Andreas Schwab
@ 2023-07-07 13:37                                                             ` Po Lu
  2023-07-07 13:45                                                               ` Andreas Schwab
  0 siblings, 1 reply; 97+ messages in thread
From: Po Lu @ 2023-07-07 13:37 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Ulrich Mueller, Eli Zaretskii, eggert, emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> No.  The purpose is to indicate the buffer's file coding system.  You
> can put non-latin-1 characters in a buffer with a latin-1 file coding
> system just fine.

It should have been clear that was what I meant...

> The file coding system is only relevant when the buffer contents is
> saved.

since there is little point in editing a buffer visiting a file without
the intention to save it.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 13:01                                                               ` Ulrich Mueller
@ 2023-07-07 13:38                                                                 ` Po Lu
  0 siblings, 0 replies; 97+ messages in thread
From: Po Lu @ 2023-07-07 13:38 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel

Ulrich Mueller <ulm@gentoo.org> writes:

> No, to make them aware that the the encoding is something other that
> UTF-8. And yes, I do want to know if the file that I'm about to save can
> be used as input for the next program (e.g. a compiler or a web server).

If you set the buffer coding system to utf-7, you should already be
aware of that fact.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07 13:37                                                             ` Po Lu
@ 2023-07-07 13:45                                                               ` Andreas Schwab
  0 siblings, 0 replies; 97+ messages in thread
From: Andreas Schwab @ 2023-07-07 13:45 UTC (permalink / raw)
  To: Po Lu; +Cc: Ulrich Mueller, Eli Zaretskii, eggert, emacs-devel

On Jul 07 2023, Po Lu wrote:

> since there is little point in editing a buffer visiting a file without
> the intention to save it.

There is nothing wrong with that.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: UTF-32
  2023-07-07 13:36                                           ` UTF-32 Ulrich Mueller
@ 2023-07-07 14:06                                             ` Eli Zaretskii
  0 siblings, 0 replies; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-07 14:06 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 07 Jul 2023 15:36:31 +0200
> 
> The previous discussion was in the context of _file_ coding systems.
> Emacs cannot read or write files encoded in UTF-32, correct?

It can't, but when did you see such files in the wild?

The Wikipedia article says UTF-32 is used internally by programs, and
says that for a reason.

> So probably such files just don't exist, or somebody would have
> implemented it in the meantime? (OTOH, GNU Recode knows about UTF-32,
> UTF-32BE, and UTF-32LE. No UTF-32NUXI, though. :)

Implementing this would not be hard, but why implement something for
which we have no use?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-07  5:48                                               ` Ulrich Müller
  2023-07-07  6:16                                                 ` Po Lu
@ 2023-07-08  8:49                                                 ` Eli Zaretskii
  2023-07-08 15:27                                                   ` Basil Contovounesios
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-08  8:49 UTC (permalink / raw)
  To: Ulrich Müller; +Cc: eggert, emacs-devel

> From: Ulrich Müller <ulm@gentoo.org>
> Cc: eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Fri, 07 Jul 2023 07:48:41 +0200
> 
> > Did that character on a GUI frame need a non-default font, or was it
> > supported by the default font.  I'd expect the Cyrillic script to be
> > supported by the fonts people use as the default in Emacs.
> 
> It's supported by the default font for me (which is Droid Sans Mono).
> У aka U+0423 is contained in WGL4, and I guess that most fonts (also on
> GNU/Linux) would cover a superset of it.
> 
> Updated patch below.

Thanks, installed on master, with a followup change to document this
in NEWS and explain how to get back the old behavior.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-08  8:49                                                 ` Eli Zaretskii
@ 2023-07-08 15:27                                                   ` Basil Contovounesios
  2023-07-08 15:38                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 97+ messages in thread
From: Basil Contovounesios @ 2023-07-08 15:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ulrich Müller, eggert, emacs-devel

Eli Zaretskii [2023-07-08 11:49 +0300] wrote:

>> From: Ulrich Müller <ulm@gentoo.org>
>> Cc: eggert@cs.ucla.edu,  emacs-devel@gnu.org
>> Date: Fri, 07 Jul 2023 07:48:41 +0200
>> 
>> > Did that character on a GUI frame need a non-default font, or was it
>> > supported by the default font.  I'd expect the Cyrillic script to be
>> > supported by the fonts people use as the default in Emacs.
>> 
>> It's supported by the default font for me (which is Droid Sans Mono).
>> У aka U+0423 is contained in WGL4, and I guess that most fonts (also on
>> GNU/Linux) would cover a superset of it.
>> 
>> Updated patch below.
>
> Thanks, installed on master, with a followup change to document this
> in NEWS and explain how to get back the old behavior.

Thanks, but I think it's too early in the build to use \N{name} syntax:

Loading /home/blc/.local/src/emacs/lisp/language/cyrillic.el (source)...
Error: invalid-read-syntax ("\\N{cyrillic capital letter u}" 130 42)
  [...]
  load-with-code-conversion("/home/blc/.local/src/emacs/lisp/language/cyrillic.el"
                            "/home/blc/.local/src/emacs/lisp/language/cyrillic.el"
                            nil nil)
  load("language/cyrillic")
  load("loadup.el")

So I followed the example of lisp/international/mule-conf.el and
switched to ?\uXXXX syntax:

; Fix last change to lisp/language/cyrillic.el.
1a9d454ebf6 2023-07-08 16:24:15 +0100
https://git.sv.gnu.org/cgit/emacs.git/commit/?id=1a9d454ebf6

-- 
Basil



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-08 15:27                                                   ` Basil Contovounesios
@ 2023-07-08 15:38                                                     ` Eli Zaretskii
  2023-07-08 16:21                                                       ` Basil Contovounesios
  2023-07-09  9:22                                                       ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
  0 siblings, 2 replies; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-08 15:38 UTC (permalink / raw)
  To: Basil Contovounesios; +Cc: ulm, eggert, emacs-devel

> From: Basil Contovounesios <contovob@tcd.ie>
> Cc: Ulrich Müller <ulm@gentoo.org>,  eggert@cs.ucla.edu,
>   emacs-devel@gnu.org
> Date: Sat, 08 Jul 2023 16:27:43 +0100
> 
> Eli Zaretskii [2023-07-08 11:49 +0300] wrote:
> 
> Thanks, but I think it's too early in the build to use \N{name} syntax:

Thanks, I've somehow missed that.  But why \uNNNN instead of just the
character itself?  *.el files are always UTF-8 encoded, so there's no
need to use only ASCII.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-08 15:38                                                     ` Eli Zaretskii
@ 2023-07-08 16:21                                                       ` Basil Contovounesios
  2023-07-08 16:33                                                         ` Eli Zaretskii
  2023-07-08 18:21                                                         ` Ulrich Mueller
  2023-07-09  9:22                                                       ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
  1 sibling, 2 replies; 97+ messages in thread
From: Basil Contovounesios @ 2023-07-08 16:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ulm, eggert, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 717 bytes --]

Eli Zaretskii [2023-07-08 18:38 +0300] wrote:

>> From: Basil Contovounesios <contovob@tcd.ie>
>> Cc: Ulrich Müller <ulm@gentoo.org>,  eggert@cs.ucla.edu,
>>   emacs-devel@gnu.org
>> Date: Sat, 08 Jul 2023 16:27:43 +0100
>> 
>> Eli Zaretskii [2023-07-08 11:49 +0300] wrote:
>> 
>> Thanks, but I think it's too early in the build to use \N{name} syntax:
>
> Thanks, I've somehow missed that.  But why \uNNNN instead of just the
> character itself?

I just followed the example of characters.el and mule-conf.el, but...

> *.el files are always UTF-8 encoded, so there's no need to use only
> ASCII.

...I see cyrillic.el explicitly declares -*- coding: utf-8 -*-.
Is this what you had in mind?


[-- Attachment #2: cyrillic.diff --]
[-- Type: text/x-diff, Size: 487 bytes --]

diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el
index 9ad65877140..cf3ee5a2b9d 100644
--- a/lisp/language/cyrillic.el
+++ b/lisp/language/cyrillic.el
@@ -127,7 +127,7 @@ 'koi8-u
   "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)"
   :coding-type 'charset
   ;; This used to be ?U which collided with UTF-8.
-  :mnemonic ?\u0423                 ; ?\N{cyrillic capital letter u} У
+  :mnemonic ?У
   :charset-list '(koi8-u)
   :mime-charset 'koi8-u)
 

[-- Attachment #3: Type: text/plain, Size: 20 bytes --]


Thanks,

-- 
Basil

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-08 16:21                                                       ` Basil Contovounesios
@ 2023-07-08 16:33                                                         ` Eli Zaretskii
  2023-07-08 16:57                                                           ` Basil Contovounesios
  2023-07-08 18:21                                                         ` Ulrich Mueller
  1 sibling, 1 reply; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-08 16:33 UTC (permalink / raw)
  To: Basil Contovounesios; +Cc: ulm, eggert, emacs-devel

> From: Basil Contovounesios <contovob@tcd.ie>
> Cc: ulm@gentoo.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 08 Jul 2023 17:21:15 +0100
> 
> > *.el files are always UTF-8 encoded, so there's no need to use only
> > ASCII.
> 
> ...I see cyrillic.el explicitly declares -*- coding: utf-8 -*-.

That's history.  Nowadays we have this in file-coding-system-alist:

   ("\\.el\\'" . prefer-utf-8)

> Is this what you had in mind?

Yes, thanks.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-08 16:33                                                         ` Eli Zaretskii
@ 2023-07-08 16:57                                                           ` Basil Contovounesios
  0 siblings, 0 replies; 97+ messages in thread
From: Basil Contovounesios @ 2023-07-08 16:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ulm, eggert, emacs-devel

Eli Zaretskii [2023-07-08 19:33 +0300] wrote:
>> From: Basil Contovounesios <contovob@tcd.ie>
>> Cc: ulm@gentoo.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
>> Date: Sat, 08 Jul 2023 17:21:15 +0100
>> 
>> Is this what you had in mind?
> Yes, thanks.

Done:

; Simplify last change to cyrillic.el.
05984303a58 2023-07-08 17:51:58 +0100
https://git.sv.gnu.org/cgit/emacs.git/commit/?id=05984303a58

Thanks,

-- 
Basil



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-08 16:21                                                       ` Basil Contovounesios
  2023-07-08 16:33                                                         ` Eli Zaretskii
@ 2023-07-08 18:21                                                         ` Ulrich Mueller
  2023-07-08 21:31                                                           ` Basil Contovounesios
  1 sibling, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-08 18:21 UTC (permalink / raw)
  To: Basil Contovounesios; +Cc: Eli Zaretskii, eggert, emacs-devel

>>>>> On Sat, 08 Jul 2023, Basil Contovounesios wrote:

> Is this what you had in mind?

> diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el
> index 9ad65877140..cf3ee5a2b9d 100644
> --- a/lisp/language/cyrillic.el
> +++ b/lisp/language/cyrillic.el
> @@ -127,7 +127,7 @@ 'koi8-u
>    "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)"
>    :coding-type 'charset
>    ;; This used to be ?U which collided with UTF-8.
> -  :mnemonic ?\u0423                 ; ?\N{cyrillic capital letter u} У
> +  :mnemonic ?У
>    :charset-list '(koi8-u)
>    :mime-charset 'koi8-u)
 
Could you keep the character's description as a comment? Like this:

  :mnemonic ?У                          ; cyrillic capital letter u



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Disambiguate modeline character for UTF-8?
  2023-07-08 18:21                                                         ` Ulrich Mueller
@ 2023-07-08 21:31                                                           ` Basil Contovounesios
  0 siblings, 0 replies; 97+ messages in thread
From: Basil Contovounesios @ 2023-07-08 21:31 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel

Ulrich Mueller [2023-07-08 20:21 +0200] wrote:

>>>>>> On Sat, 08 Jul 2023, Basil Contovounesios wrote:
>
>> diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el
>> index 9ad65877140..cf3ee5a2b9d 100644
>> --- a/lisp/language/cyrillic.el
>> +++ b/lisp/language/cyrillic.el
>> @@ -127,7 +127,7 @@ 'koi8-u
>>    "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)"
>>    :coding-type 'charset
>>    ;; This used to be ?U which collided with UTF-8.
>> -  :mnemonic ?\u0423                 ; ?\N{cyrillic capital letter u} У
>> +  :mnemonic ?У
>  
> Could you keep the character's description as a comment? Like this:
>
>   :mnemonic ?У                          ; cyrillic capital letter u

Sure:

; Re-add recently removed comment in cyrillic.el.
afa4fa17232 2023-07-08 22:27:20 +0100
https://git.sv.gnu.org/cgit/emacs.git/commit/?id=afa4fa17232

-- 
Basil



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?)
  2023-07-08 15:38                                                     ` Eli Zaretskii
  2023-07-08 16:21                                                       ` Basil Contovounesios
@ 2023-07-09  9:22                                                       ` Ulrich Mueller
  2023-07-09  9:57                                                         ` Lisp reader syntax and bootstrap Po Lu
  2023-07-09 11:35                                                         ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Eli Zaretskii
  1 sibling, 2 replies; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-09  9:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Basil Contovounesios, eggert, emacs-devel

>>>>> On Sat, 08 Jul 2023, Eli Zaretskii wrote:

>> Thanks, but I think it's too early in the build to use \N{name} syntax:

> Thanks, I've somehow missed that.

I had done my tests if a non-ASCII char is displayed correctly with ?У.
This including building Emacs from scratch. I changed from ?У to
?\N{name} last minute and apparently tested only if the file can be
loaded and byte-compiled, but not a full bootstrap. :( Sorry for that.

Are there any other features of lisp reader syntax where one must be
careful? Maybe the Elisp Reference Manual should warn about this?



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-09  9:22                                                       ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
@ 2023-07-09  9:57                                                         ` Po Lu
  2023-07-13  2:04                                                           ` Richard Stallman
  2023-07-09 11:35                                                         ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Eli Zaretskii
  1 sibling, 1 reply; 97+ messages in thread
From: Po Lu @ 2023-07-09  9:57 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Eli Zaretskii, Basil Contovounesios, eggert, emacs-devel

Ulrich Mueller <ulm@gentoo.org> writes:

> Are there any other features of lisp reader syntax where one must be
> careful? Maybe the Elisp Reference Manual should warn about this?

I know of one: reader syntax for NaN and Inf cannot be used on machines
without IEEE floating point, where they will be read as symbols instead
(with possibly disastrous consequences.)



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?)
  2023-07-09  9:22                                                       ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
  2023-07-09  9:57                                                         ` Lisp reader syntax and bootstrap Po Lu
@ 2023-07-09 11:35                                                         ` Eli Zaretskii
  1 sibling, 0 replies; 97+ messages in thread
From: Eli Zaretskii @ 2023-07-09 11:35 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: contovob, eggert, emacs-devel

> From: Ulrich Mueller <ulm@gentoo.org>
> Cc: Basil Contovounesios <contovob@tcd.ie>,  eggert@cs.ucla.edu,
>   emacs-devel@gnu.org
> Date: Sun, 09 Jul 2023 11:22:39 +0200
> 
> Are there any other features of lisp reader syntax where one must be
> careful? Maybe the Elisp Reference Manual should warn about this?

If you change a file FOO.el that is preloaded in loadup.el, you need
to make sure any function/macro/feature you use there is
defined/implemented either in C or by files loaded by loadup _before_
FOO.el.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-09  9:57                                                         ` Lisp reader syntax and bootstrap Po Lu
@ 2023-07-13  2:04                                                           ` Richard Stallman
  2023-07-13  4:27                                                             ` Po Lu
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Stallman @ 2023-07-13  2:04 UTC (permalink / raw)
  To: Po Lu; +Cc: ulm, eliz, contovob, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I know of one: reader syntax for NaN and Inf cannot be used on machines
  > without IEEE floating point, where they will be read as symbols instead
  > (with possibly disastrous consequences.)

Maybe the reader ought to recognize these anyway, and do something
sensible with them -- such as, signal an error.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-13  2:04                                                           ` Richard Stallman
@ 2023-07-13  4:27                                                             ` Po Lu
  2023-07-13 22:07                                                               ` Paul Eggert
  2023-07-16  2:19                                                               ` Richard Stallman
  0 siblings, 2 replies; 97+ messages in thread
From: Po Lu @ 2023-07-13  4:27 UTC (permalink / raw)
  To: Richard Stallman; +Cc: ulm, eliz, contovob, eggert, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Maybe the reader ought to recognize these anyway, and do something
> sensible with them -- such as, signal an error.

Perhaps, but it still wouldn't make it any easier to write portable Lisp
code.  Lisp programmers should simply avoid using NaN and Inf, or at
least handle arithmetic errors around functions that may generate them.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-13  4:27                                                             ` Po Lu
@ 2023-07-13 22:07                                                               ` Paul Eggert
  2023-07-14  5:05                                                                 ` Ulrich Mueller
  2023-07-15  2:10                                                                 ` Richard Stallman
  2023-07-16  2:19                                                               ` Richard Stallman
  1 sibling, 2 replies; 97+ messages in thread
From: Paul Eggert @ 2023-07-13 22:07 UTC (permalink / raw)
  To: Po Lu, Richard Stallman; +Cc: ulm, eliz, contovob, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]

On 2023-07-12 21:27, Po Lu wrote:
> Lisp programmers should simply avoid using NaN and Inf

Support for NaN and Inf is ubiquitous nowadays, and it should be OK to 
write Emacs Lisp programs that use NaN and Inf, especially since Emacs 
itself uses them in its own Lisp files.

As I actually made money back in the 1970s writing code for VAXes, I was 
moved to look into RMS's suggestion to signal an error when reading 
"0.0e+NaN" on a VAX. Unfortunately this broke calculator.el - that is, I 
couldn't load calculator.el (or calculator.elc), as loading signaled an 
error when it saw the NaN. And even if we removed the NaN from 
calculator.el (and I suppose, infinities from other .el files), users 
would run into similar issues with their own code.

So I instead installed the attached patch. On a VAX this approximates 
infinities and NaNs with extremal values and non-numeric objects, 
respectively. Although I do not have a VAX to test it on, and am too 
lazy to spin up an emulator, I did test it on x86-64 by pretending that 
the x86-64 lacked IEEE support, and it seemed to work OK. (At least, I 
could load calculator.el....)

This patch shouldn't change behavior (or even the executable code) on 
today's platforms. It's purely for computer museums.

[-- Attachment #2: 0001-Port-NaN-infinity-handling-better-to-VAX.patch --]
[-- Type: text/x-patch, Size: 6370 bytes --]

From 0cd519971d199836ba0a6e9f0e36af9b9accaf0d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 13 Jul 2023 14:26:29 -0700
Subject: [PATCH] Port NaN, infinity handling better to VAX

Nowadays .elc files routinely contain tokens like 1.0e+INF and
0.0e+NaN that do not work on antiques like the VAX that lack IEEE fp.
Port Emacs to these platforms, by treating infinities as extreme
values and NaNs as strings that trap if used numerically.
* src/lread.c (INFINITY): Default to HUGE_VAL if non-IEEE.
(not_a_number) [!IEEE_FLOATING_POINT]: New static array.
(syms_of_lread) [!IEEE_FLOATING_POINT]: Initialize it.
(read0): Report invalid syntax for +0.0e+NaN on platforms
that lack NaNs.
(string_to_number): On non-IEEE platforms, return HUGE_VAL
for infinity and a string for NaN.  All callers changed.
---
 doc/lispref/numbers.texi | 10 ++++++----
 etc/NEWS                 |  8 ++++++++
 src/data.c               |  3 ++-
 src/lread.c              | 29 ++++++++++++++++++++++++++---
 src/process.c            |  3 ++-
 5 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index 3e45aa90fda..bcf89fc9ab1 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -270,10 +270,6 @@ Float Basics
 signs and significands agree.  Significands of NaNs are
 machine-dependent, as are the digits in their string representation.
 
-  NaNs are not available on systems which do not use IEEE
-floating-point arithmetic; if the read syntax for a NaN is used on a
-VAX, for example, the reader signals an error.
-
   When NaNs and signed zeros are involved, non-numeric functions like
 @code{eql}, @code{equal}, @code{sxhash-eql}, @code{sxhash-equal} and
 @code{gethash} determine whether values are indistinguishable, not
@@ -283,6 +279,12 @@ Float Basics
 conversely, @code{(equal 0.0 -0.0)} returns @code{nil} whereas
 @code{(= 0.0 -0.0)} returns @code{t}.
 
+  Infinities and NaNs are not available on legacy systems that lack
+IEEE floating-point arithmetic.  On a circa 1980 VAX, for example, the
+Lisp reader approximates an infinity with the nearest finite value,
+and a NaN with some other non-numeric Lisp object that provokes an
+error if used numerically.
+
 Here are read syntaxes for these special floating-point values:
 
 @table @asis
diff --git a/etc/NEWS b/etc/NEWS
index 5d5ea990b92..997f7e82c2b 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -585,6 +585,14 @@ behavior back for any other reason, you can do that using the
 previous behavior of showing 'U' in the mode line for 'koi8-u':
 
      (coding-system-put 'koi8-u :mnemonic ?U)
+
++++
+** Infinities and NaNs no longer act as symbols on non-IEEE platforms.
+On old platforms like the VAX that do not support IEEE floating-point,
+tokens like 0.0e+NaN and 1.0e+INF are no longer read as symbols.
+Instead, the Lisp reader approximates an infinity with the nearest
+finite value, and a NaN with some other non-numeric object that
+provokes an error if used numerically.
 \f
 * Lisp Changes in Emacs 30.1
 
diff --git a/src/data.c b/src/data.c
index 6de8e0cf1a1..5a31462d8ca 100644
--- a/src/data.c
+++ b/src/data.c
@@ -3033,7 +3033,8 @@ DEFUN ("string-to-number", Fstring_to_number, Sstring_to_number, 1, 2, 0,
     p++;
 
   Lisp_Object val = string_to_number (p, b, 0);
-  return NILP (val) ? make_fixnum (0) : val;
+  return ((IEEE_FLOATING_POINT ? NILP (val) : !NUMBERP (val))
+	  ? make_fixnum (0) : val);
 }
 \f
 enum arithop
diff --git a/src/lread.c b/src/lread.c
index 51d0d2a3c24..6792ef27206 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -75,6 +75,10 @@ #define file_tell ftell
 # ifndef INFINITY
 #  define INFINITY ((union ieee754_double) {.ieee = {.exponent = -1}}.d)
 # endif
+#else
+# ifndef INFINITY
+#  define INFINITY HUGE_VAL
+# endif
 #endif
 
 /* The objects or placeholders read with the #n=object form.
@@ -4477,10 +4481,17 @@ substitute_in_interval (INTERVAL interval, void *arg)
 }
 
 \f
+#if !IEEE_FLOATING_POINT
+/* Strings that stand in for +NaN, -NaN, respectively.  */
+static Lisp_Object not_a_number[2];
+#endif
+
 /* Convert the initial prefix of STRING to a number, assuming base BASE.
    If the prefix has floating point syntax and BASE is 10, return a
    nearest float; otherwise, if the prefix has integer syntax, return
-   the integer; otherwise, return nil.  If PLEN, set *PLEN to the
+   the integer; otherwise, return nil.  (On antique platforms that lack
+   support for NaNs, if the prefix has NaN syntax return a Lisp object that
+   will provoke an error if used as a number.)  If PLEN, set *PLEN to the
    length of the numeric prefix if there is one, otherwise *PLEN is
    unspecified.  */
 
@@ -4545,7 +4556,6 @@ string_to_number (char const *string, int base, ptrdiff_t *plen)
 		cp++;
 	      while ('0' <= *cp && *cp <= '9');
 	    }
-#if IEEE_FLOATING_POINT
 	  else if (cp[-1] == '+'
 		   && cp[0] == 'I' && cp[1] == 'N' && cp[2] == 'F')
 	    {
@@ -4558,12 +4568,17 @@ string_to_number (char const *string, int base, ptrdiff_t *plen)
 	    {
 	      state |= E_EXP;
 	      cp += 3;
+#if IEEE_FLOATING_POINT
 	      union ieee754_double u
 		= { .ieee_nan = { .exponent = 0x7ff, .quiet_nan = 1,
 				  .mantissa0 = n >> 31 >> 1, .mantissa1 = n }};
 	      value = u.d;
-	    }
+#else
+	      if (plen)
+		*plen = cp - string;
+	      return not_a_number[negative];
 #endif
+	    }
 	  else
 	    cp = ecp;
 	}
@@ -5707,6 +5722,14 @@ syms_of_lread (void)
   DEFSYM (Qcomma, ",");
   DEFSYM (Qcomma_at, ",@");
 
+#if !IEEE_FLOATING_POINT
+  for (int negative = 0; negative < 2; negative++)
+    {
+      not_a_number[negative] = build_pure_c_string (&"-0.0e+NaN"[!negative]);
+      staticpro (&not_a_number[negative]);
+    }
+#endif
+
   DEFSYM (Qinhibit_file_name_operation, "inhibit-file-name-operation");
   DEFSYM (Qascii_character, "ascii-character");
   DEFSYM (Qfunction, "function");
diff --git a/src/process.c b/src/process.c
index 67d1d3e425f..2d6e08f16b5 100644
--- a/src/process.c
+++ b/src/process.c
@@ -7130,7 +7130,8 @@ DEFUN ("internal-default-signal-process",
 	{
 	  ptrdiff_t len;
 	  tem = string_to_number (SSDATA (process), 10, &len);
-	  if (NILP (tem) || len != SBYTES (process))
+	  if ((IEEE_FLOATING_POINT ? NILP (tem) : !NUMBERP (tem))
+	      || len != SBYTES (process))
 	    return Qnil;
 	}
       process = tem;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-13 22:07                                                               ` Paul Eggert
@ 2023-07-14  5:05                                                                 ` Ulrich Mueller
  2023-07-14  6:57                                                                   ` Paul Eggert
  2023-07-15  2:10                                                                 ` Richard Stallman
  1 sibling, 1 reply; 97+ messages in thread
From: Ulrich Mueller @ 2023-07-14  5:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Po Lu, Richard Stallman, eliz, contovob, emacs-devel

>>>>> On Fri, 14 Jul 2023, Paul Eggert wrote:

> @@ -283,6 +279,12 @@ Float Basics
>  conversely, @code{(equal 0.0 -0.0)} returns @code{nil} whereas
>  @code{(= 0.0 -0.0)} returns @code{t}.
>  
> +  Infinities and NaNs are not available on legacy systems that lack
> +IEEE floating-point arithmetic.  On a circa 1980 VAX, for example, the
> +Lisp reader approximates an infinity with the nearest finite value,

"Nearest" sounds a little strange here. HUGE_VAL has still an infinite
distance from infinity.

Maybe some wording like "approximates positive and negative infinities
with the largest and smallest representable finite numbers" would be
more accurate?

> +and a NaN with some other non-numeric Lisp object that provokes an
> +error if used numerically.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-14  5:05                                                                 ` Ulrich Mueller
@ 2023-07-14  6:57                                                                   ` Paul Eggert
  0 siblings, 0 replies; 97+ messages in thread
From: Paul Eggert @ 2023-07-14  6:57 UTC (permalink / raw)
  To: Ulrich Mueller; +Cc: Po Lu, Richard Stallman, eliz, contovob, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 486 bytes --]

On 2023-07-13 22:05, Ulrich Mueller wrote:
> Maybe some wording like "approximates positive and negative infinities
> with the largest and smallest representable finite numbers" would be
> more accurate?

Unfortunately "smallest" connotes being close to zero. Also, I just 
looked at the C Standard again, and it doesn't guarantee that HUGE_VAL 
is the maximum 'double' on a VAX (!).

Anyway, thanks for pointing out the confusion. I installed the attached 
to try to clear matters up.

[-- Attachment #2: 0001-Improve-doc-for-VAX-reading-NaN-INF.patch --]
[-- Type: text/x-patch, Size: 1250 bytes --]

From be501f468ed36cddf01305b88bab44366b447c03 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 13 Jul 2023 23:36:33 -0700
Subject: [PATCH 1/2] Improve doc for VAX reading NaN, INF

* doc/lispref/numbers.texi (Float Basics): Improve description of
how Lisp reads infinities and NaNs on a VAX.
---
 doc/lispref/numbers.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index bcf89fc9ab1..a49afb73539 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -280,9 +280,9 @@ Float Basics
 @code{(= 0.0 -0.0)} returns @code{t}.
 
   Infinities and NaNs are not available on legacy systems that lack
-IEEE floating-point arithmetic.  On a circa 1980 VAX, for example, the
-Lisp reader approximates an infinity with the nearest finite value,
-and a NaN with some other non-numeric Lisp object that provokes an
+IEEE floating-point arithmetic.  On a circa 1980 VAX, for example,
+Lisp reads @samp{1.0e+INF} as a large but finite floating-point number,
+and @samp{0.0e+NaN} as some other non-numeric Lisp object that provokes an
 error if used numerically.
 
 Here are read syntaxes for these special floating-point values:
-- 
2.39.2


[-- Attachment #3: 0002-Reorder-NaN-INF-paras.patch --]
[-- Type: text/x-patch, Size: 1582 bytes --]

From 01b80a6f0e40a4390717a79a73c61899e2ec2968 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 13 Jul 2023 23:55:50 -0700
Subject: [PATCH 2/2] Reorder NaN, INF paras

* doc/lispref/numbers.texi (Float Basics):
Reorder paragraphs so that examples follow defns.
---
 doc/lispref/numbers.texi | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index a49afb73539..071ec0f518d 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -279,12 +279,6 @@ Float Basics
 conversely, @code{(equal 0.0 -0.0)} returns @code{nil} whereas
 @code{(= 0.0 -0.0)} returns @code{t}.
 
-  Infinities and NaNs are not available on legacy systems that lack
-IEEE floating-point arithmetic.  On a circa 1980 VAX, for example,
-Lisp reads @samp{1.0e+INF} as a large but finite floating-point number,
-and @samp{0.0e+NaN} as some other non-numeric Lisp object that provokes an
-error if used numerically.
-
 Here are read syntaxes for these special floating-point values:
 
 @table @asis
@@ -294,6 +288,12 @@ Float Basics
 @samp{0.0e+NaN} and @samp{-0.0e+NaN}
 @end table
 
+  Infinities and NaNs are not available on legacy systems that lack
+IEEE floating-point arithmetic.  On a circa 1980 VAX, for example,
+Lisp reads @samp{1.0e+INF} as a large but finite floating-point number,
+and @samp{0.0e+NaN} as some other non-numeric Lisp object that provokes an
+error if used numerically.
+
   The following functions are specialized for handling floating-point
 numbers:
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-13 22:07                                                               ` Paul Eggert
  2023-07-14  5:05                                                                 ` Ulrich Mueller
@ 2023-07-15  2:10                                                                 ` Richard Stallman
  2023-07-15  2:38                                                                   ` Po Lu
  2023-07-15 15:22                                                                   ` Paul Eggert
  1 sibling, 2 replies; 97+ messages in thread
From: Richard Stallman @ 2023-07-15  2:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: luangruo, ulm, eliz, contovob, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > So I instead installed the attached patch. On a VAX this approximates 
  > infinities and NaNs with extremal values and non-numeric objects, 

That's a bit of a kludge -- using those values might be wrong for
some purposes... if this will really happen.

  > This patch shouldn't change behavior (or even the executable code) on 
  > today's platforms. It's purely for computer museums.

If that's true, maybe your solution is fine.

Pu Lu, you wrote

  > I know of one: reader syntax for NaN and Inf cannot be used on machines
  > without IEEE floating point, where they will be read as symbols instead
  > (with possibly disastrous consequences.)

Do you know of a machine where this would really happen?


-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-15  2:10                                                                 ` Richard Stallman
@ 2023-07-15  2:38                                                                   ` Po Lu
  2023-07-15  5:18                                                                     ` Philip Kaludercic
  2023-07-15 15:22                                                                   ` Paul Eggert
  1 sibling, 1 reply; 97+ messages in thread
From: Po Lu @ 2023-07-15  2:38 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Paul Eggert, ulm, eliz, contovob, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Do you know of a machine where this would really happen?

VAXen, of course, and possibly future machines that haven't been
designed yet.

I think I overheard some of the NetBSD porters talking about this
problem.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-15  2:38                                                                   ` Po Lu
@ 2023-07-15  5:18                                                                     ` Philip Kaludercic
  2023-07-15  5:50                                                                       ` Po Lu
  0 siblings, 1 reply; 97+ messages in thread
From: Philip Kaludercic @ 2023-07-15  5:18 UTC (permalink / raw)
  To: Po Lu; +Cc: Richard Stallman, Paul Eggert, ulm, eliz, contovob, emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> Richard Stallman <rms@gnu.org> writes:
>
>> Do you know of a machine where this would really happen?
>
> VAXen, of course, and possibly future machines that haven't been
> designed yet.

Floating point operations are regarded as optional (but conventional)
extensions by RISC-V.

> I think I overheard some of the NetBSD porters talking about this
> problem.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-15  5:18                                                                     ` Philip Kaludercic
@ 2023-07-15  5:50                                                                       ` Po Lu
  0 siblings, 0 replies; 97+ messages in thread
From: Po Lu @ 2023-07-15  5:50 UTC (permalink / raw)
  To: Philip Kaludercic
  Cc: Richard Stallman, Paul Eggert, ulm, eliz, contovob, emacs-devel

Philip Kaludercic <philipk@posteo.net> writes:

> Floating point operations are regarded as optional (but conventional)
> extensions by RISC-V.

Right, though somehow I doubt compilers will choose to implement
non-IEEE floating point support for that particular CPU, at least by
default.

Being thorough about portability is simply good future proofing.
I don't anticipate any new non-IEEE machines entering general use in the
short term.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-15  2:10                                                                 ` Richard Stallman
  2023-07-15  2:38                                                                   ` Po Lu
@ 2023-07-15 15:22                                                                   ` Paul Eggert
  2023-07-17  2:22                                                                     ` Richard Stallman
  2023-07-17  2:32                                                                     ` Po Lu
  1 sibling, 2 replies; 97+ messages in thread
From: Paul Eggert @ 2023-07-15 15:22 UTC (permalink / raw)
  To: rms; +Cc: luangruo, ulm, eliz, contovob, emacs-devel

On 2023-07-14 19:10, Richard Stallman wrote:
> That's a bit of a kludge -- using those values might be wrong for
> some purposes...

Yes, though there are similar problems even for finite numbers, since 
they also behave differently on the VAX, sometimes significantly. E.g., 
VAX subtraction can underflow to zero via catastrophic cancellation, 
whereas IEEE subtraction cannot.

Though my change is indeed a hack, it is an improvement in that Emacs 
can now start up and load 'calculator' whereas formerly it could not. I 
"tested" this by manually setting IEEE_FLOATING_POINT to zero on x86-64, 
and compiling and running the result. Of course this is not the same as 
a real VAX.


>    > This patch shouldn't change behavior (or even the executable code) on
>    > today's platforms. It's purely for computer museums.
> 
> If that's true, maybe your solution is fine.

I surveyed the net and it appears to be true. The only Emacs platform 
that still insists on non-IEEE floating point is NetBSD/vax, and 
nowadays that platform seems to be run only for computer-museum-like 
purposes. These uses are rare even by computer-museum standards, as most 
hobbyist historians that simulate VAXes seem to prefer VMS to NetBSD, I 
expect partly because VMS "feels" older (it differs more from GNU :-). 
And VMS-using hobbyists can't run current Emacs, as Emacs 23 dropped VMS 
support.

PS. Are you aware of the licensing dispute over the emulator that 
hobbyists typically use to run VAX code? This emulator, SIMH, is based 
on software that dates back to the 1960s, so in some sense it's even 
older than Emacs. The dispute involves a license clause introduced a 
year ago that I've not seen before, and it's not clear to me that SIMH 
is free software any more. However, this change has been disputed and 
SIMH has forked off a new project Open SIMH that does not have the 
controversial clause. For details, please see:

https://groups.io/g/simh/topic/91528716#1659

The disputed licensing clause is at the end of this change:

https://github.com/simh/simh/commit/ce2adce632e1a22e6d76d4bf726d6b863373c550




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-13  4:27                                                             ` Po Lu
  2023-07-13 22:07                                                               ` Paul Eggert
@ 2023-07-16  2:19                                                               ` Richard Stallman
  1 sibling, 0 replies; 97+ messages in thread
From: Richard Stallman @ 2023-07-16  2:19 UTC (permalink / raw)
  To: Po Lu; +Cc: ulm, eliz, contovob, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Perhaps, but it still wouldn't make it any easier to write portable Lisp
  > code.  Lisp programmers should simply avoid using NaN and Inf, or at
  > least handle arithmetic errors around functions that may generate them.

If this problem is only on the VAX and the VAX is obsolete,
and the problem is now limited to replacing NaN and Inf
with specific numbers, it may not be worth any extra work.

If you want to do work for this goal, the way to do it would be
to figure out some constructs that are more portable, that we could
implement alongside what we have now.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-15 15:22                                                                   ` Paul Eggert
@ 2023-07-17  2:22                                                                     ` Richard Stallman
  2023-07-17  5:26                                                                       ` Paul Eggert
  2023-07-17  2:32                                                                     ` Po Lu
  1 sibling, 1 reply; 97+ messages in thread
From: Richard Stallman @ 2023-07-17  2:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The dispute involves a license clause introduced a 
  > year ago that I've not seen before, and it's not clear to me that SIMH 
  > is free software any more. However, this change has been disputed and 
  > SIMH has forked off a new project Open SIMH that does not have the 
  > controversial clause. For details, please see:

Bravo for them!  If only they called it Libre SIMH...

  > The disputed licensing clause is at the end of this change:

  > https://github.com/simh/simh/commit/ce2adce632e1a22e6d76d4bf726d6b863373c550

When I visit that URL I see two files created initially in that commit.
It is not explicitly clear which text is the "disputed licensing clause".
Does that consist of the text starting with line 32 in LICENSE.txt?

Anyway it is quite clear that that text makes the license nonfree.

Can you tell me how to contact the develoers of Open SIMH?


-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-15 15:22                                                                   ` Paul Eggert
  2023-07-17  2:22                                                                     ` Richard Stallman
@ 2023-07-17  2:32                                                                     ` Po Lu
  1 sibling, 0 replies; 97+ messages in thread
From: Po Lu @ 2023-07-17  2:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: rms, ulm, eliz, contovob, emacs-devel

Paul Eggert <eggert@cs.ucla.edu> writes:

> Yes, though there are similar problems even for finite numbers, since
> they also behave differently on the VAX, sometimes
> significantly. E.g., VAX subtraction can underflow to zero via
> catastrophic cancellation, whereas IEEE subtraction cannot.

Subnormal numbers (or rather, the lack thereof) shouldn't pose a
portability problem for practical Lisp code, as they don't affect reader
syntax or cause errors to be signaled.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: Lisp reader syntax and bootstrap
  2023-07-17  2:22                                                                     ` Richard Stallman
@ 2023-07-17  5:26                                                                       ` Paul Eggert
  0 siblings, 0 replies; 97+ messages in thread
From: Paul Eggert @ 2023-07-17  5:26 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel

On 2023-07-16 19:22, Richard Stallman wrote:

>    > https://github.com/simh/simh/commit/ce2adce632e1a22e6d76d4bf726d6b863373c550
> 
> When I visit that URL I see two files created initially in that commit.
> It is not explicitly clear which text is the "disputed licensing clause".
> Does that consist of the text starting with line 32 in LICENSE.txt?

Yes.


> Anyway it is quite clear that that text makes the license nonfree.
> 
> Can you tell me how to contact the develoers of Open SIMH?

Contact instructions are here:

https://opensimh.org/contacts/

The email address is <simh@groups.io>; I don't know whether one must be 
a list member to send email to the list.

PS. Their mailing list's most recent post was about the IBM 1620, which 
if you'll recall had variable-precision decimal floating point. Total 
memory was at most 20,000 decimal digits (they used a form of BCD so 
this was equivalent to 120,000 bits). I hope nobody tries to port Emacs 
to an IBM 1620....



^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2023-07-17  5:26 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-23 11:46 Disambiguate modeline character for UTF-8? Ulrich Mueller
2020-08-23 15:27 ` Stefan Monnier
2020-08-23 16:07   ` Eli Zaretskii
2020-08-23 18:24     ` Paul Eggert
2020-08-23 18:53       ` Ulrich Mueller
2020-08-23 18:56         ` Eli Zaretskii
2020-08-23 18:57         ` Eli Zaretskii
2020-08-23 19:13           ` Ulrich Mueller
2020-08-23 19:42             ` Eli Zaretskii
2020-08-23 21:23               ` Stefan Monnier
2020-08-24  7:06                 ` Ulrich Mueller
2020-08-24 14:30                   ` Yuri Khan
2020-08-29 11:17                     ` Ulrich Mueller
2020-08-24 14:36                   ` Drew Adams
2020-08-24 15:23                     ` Ulrich Mueller
2020-08-24 16:43                       ` Stefan Monnier
2023-07-05 10:08                       ` Ulrich Mueller
2023-07-05 11:41                         ` Eli Zaretskii
2023-07-05 13:04                           ` Ulrich Mueller
2023-07-05 13:44                             ` Eli Zaretskii
2023-07-05 21:50                               ` Ulrich Mueller
2023-07-05 22:11                                 ` Paul Eggert
2023-07-06  8:51                                   ` Ulrich Mueller
2023-07-06  5:33                                 ` Eli Zaretskii
2023-07-06  8:47                                   ` Ulrich Mueller
2023-07-06  9:20                                     ` Eli Zaretskii
2023-07-06  9:46                                       ` Ulrich Mueller
2023-07-06 12:34                                         ` Po Lu
2023-07-06 12:32                                     ` Po Lu
2023-07-06 12:31                                 ` Po Lu
2023-07-06 13:02                                   ` Andreas Schwab
2023-07-06 13:08                                   ` Ulrich Mueller
2023-07-06 17:37                                     ` Paul Eggert
2023-07-06 18:13                                       ` Eli Zaretskii
2023-07-06 18:44                                       ` Ulrich Müller
2023-07-06 19:01                                         ` Eli Zaretskii
2023-07-06 19:31                                           ` Ulrich Mueller
2023-07-07  5:18                                             ` Eli Zaretskii
2023-07-07  5:48                                               ` Ulrich Müller
2023-07-07  6:16                                                 ` Po Lu
2023-07-07  6:41                                                   ` Ulrich Mueller
2023-07-07  7:38                                                     ` Po Lu
2023-07-07  9:44                                                       ` Ulrich Mueller
2023-07-07 10:21                                                         ` Eli Zaretskii
2023-07-07 10:42                                                           ` Ulrich Mueller
2023-07-07 12:04                                                             ` Po Lu
2023-07-07 13:01                                                               ` Ulrich Mueller
2023-07-07 13:38                                                                 ` Po Lu
2023-07-07 12:01                                                         ` Po Lu
2023-07-07 12:38                                                           ` Andreas Schwab
2023-07-07 13:37                                                             ` Po Lu
2023-07-07 13:45                                                               ` Andreas Schwab
2023-07-07 12:58                                                           ` Eli Zaretskii
2023-07-08  8:49                                                 ` Eli Zaretskii
2023-07-08 15:27                                                   ` Basil Contovounesios
2023-07-08 15:38                                                     ` Eli Zaretskii
2023-07-08 16:21                                                       ` Basil Contovounesios
2023-07-08 16:33                                                         ` Eli Zaretskii
2023-07-08 16:57                                                           ` Basil Contovounesios
2023-07-08 18:21                                                         ` Ulrich Mueller
2023-07-08 21:31                                                           ` Basil Contovounesios
2023-07-09  9:22                                                       ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
2023-07-09  9:57                                                         ` Lisp reader syntax and bootstrap Po Lu
2023-07-13  2:04                                                           ` Richard Stallman
2023-07-13  4:27                                                             ` Po Lu
2023-07-13 22:07                                                               ` Paul Eggert
2023-07-14  5:05                                                                 ` Ulrich Mueller
2023-07-14  6:57                                                                   ` Paul Eggert
2023-07-15  2:10                                                                 ` Richard Stallman
2023-07-15  2:38                                                                   ` Po Lu
2023-07-15  5:18                                                                     ` Philip Kaludercic
2023-07-15  5:50                                                                       ` Po Lu
2023-07-15 15:22                                                                   ` Paul Eggert
2023-07-17  2:22                                                                     ` Richard Stallman
2023-07-17  5:26                                                                       ` Paul Eggert
2023-07-17  2:32                                                                     ` Po Lu
2023-07-16  2:19                                                               ` Richard Stallman
2023-07-09 11:35                                                         ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Eli Zaretskii
2023-07-07  0:19                                     ` Disambiguate modeline character for UTF-8? Po Lu
2023-07-06 12:27                             ` Po Lu
2023-07-07  7:09                               ` UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller
2023-07-07  7:34                                 ` Eli Zaretskii
2023-07-07  8:20                                   ` UTF-32 Ulrich Mueller
2023-07-07 10:16                                     ` UTF-32 Eli Zaretskii
2023-07-07 10:34                                       ` UTF-32 Ulrich Mueller
2023-07-07 12:49                                         ` UTF-32 Eli Zaretskii
2023-07-07 13:24                                           ` UTF-32 Andreas Schwab
2023-07-07 13:36                                           ` UTF-32 Ulrich Mueller
2023-07-07 14:06                                             ` UTF-32 Eli Zaretskii
2023-07-05 12:49                         ` Disambiguate modeline character for UTF-8? Stefan Monnier
2023-07-05 13:38                           ` Eli Zaretskii
2023-07-06 19:07                           ` Filipp Gunbin
2020-08-23 19:47             ` Stefan Kangas
2020-08-24 18:35     ` Juri Linkov
2020-08-24 18:55       ` Eli Zaretskii
2020-08-25 18:59         ` Juri Linkov
2020-08-25 19:26           ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).