* Disambiguate modeline character for UTF-8? @ 2020-08-23 11:46 Ulrich Mueller 2020-08-23 15:27 ` Stefan Monnier 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2020-08-23 11:46 UTC (permalink / raw) To: emacs-devel Presumably UTF-8 is the most popular coding system today. Nevertheless, it shares its mnemonic character displayed in the modeline with several others (while legacy codings like ISO-8859-1 have their unique char): U -- utf-8* (all variants) U -- utf-16* (all variants) U -- utf-7 u -- utf-7-imap U -- koi8-u I wonder if this could be disambiguated, such that "U" would be used exclusively for UTF-8 and its variants. For example, as follows: U -- utf-8* (all variants) u -- utf-16* (all variants) m -- utf-7* (Mnemonic: "m" for mail-safe or MIME) Y -- koi8-u (Mnemonic: "Y" looks similar to 1st letter in "Українська") WDYT? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 11:46 Disambiguate modeline character for UTF-8? Ulrich Mueller @ 2020-08-23 15:27 ` Stefan Monnier 2020-08-23 16:07 ` Eli Zaretskii 0 siblings, 1 reply; 97+ messages in thread From: Stefan Monnier @ 2020-08-23 15:27 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel > Presumably UTF-8 is the most popular coding system today. Nevertheless, > it shares its mnemonic character displayed in the modeline with several > others (while legacy codings like ISO-8859-1 have their unique char): > > U -- utf-8* (all variants) > U -- utf-16* (all variants) > U -- utf-7 > u -- utf-7-imap > U -- koi8-u Agreed. > I wonder if this could be disambiguated, such that "U" would be used > exclusively for UTF-8 and its variants. For example, as follows: > > U -- utf-8* (all variants) > u -- utf-16* (all variants) > m -- utf-7* (Mnemonic: "m" for mail-safe or MIME) > Y -- koi8-u (Mnemonic: "Y" looks similar to 1st letter in "Українська") > > WDYT? Yay, bikeshedding ;-) I don't see a strong reason to limit ourselves to a single char, FWIW, so I think `u7` is fine for utf-7* (it should be very rare anyway). Stefan ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 15:27 ` Stefan Monnier @ 2020-08-23 16:07 ` Eli Zaretskii 2020-08-23 18:24 ` Paul Eggert 2020-08-24 18:35 ` Juri Linkov 0 siblings, 2 replies; 97+ messages in thread From: Eli Zaretskii @ 2020-08-23 16:07 UTC (permalink / raw) To: Stefan Monnier; +Cc: ulm, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Sun, 23 Aug 2020 11:27:22 -0400 > Cc: emacs-devel@gnu.org > > I don't see a strong reason to limit ourselves to a single char, FWIW, > so I think `u7` is fine for utf-7* (it should be very rare anyway). It must be a single character, but OTOH it doesn't have to be an ASCII character. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 16:07 ` Eli Zaretskii @ 2020-08-23 18:24 ` Paul Eggert 2020-08-23 18:53 ` Ulrich Mueller 2020-08-24 18:35 ` Juri Linkov 1 sibling, 1 reply; 97+ messages in thread From: Paul Eggert @ 2020-08-23 18:24 UTC (permalink / raw) To: Eli Zaretskii, Stefan Monnier; +Cc: ulm, emacs-devel On 8/23/20 9:07 AM, Eli Zaretskii wrote: > it doesn't have to be an ASCII > character. OK, then how about this refinement of Ulrich's suggestion? U -- utf-8* (all variants) W -- utf-16* (all variants) Ǔ -- utf-7 U+01D3 LATIN CAPITAL LETTER U WITH CARON ǔ -- utf-7-imap U+01D4 LATIN SMALL LETTER U WITH CARON У -- koi8-u U+0423 CYRILLIC CAPITAL LETTER U W because it's double-U, and a caron because it looks like a 7 rotated. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 18:24 ` Paul Eggert @ 2020-08-23 18:53 ` Ulrich Mueller 2020-08-23 18:56 ` Eli Zaretskii 2020-08-23 18:57 ` Eli Zaretskii 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2020-08-23 18:53 UTC (permalink / raw) To: Paul Eggert; +Cc: Eli Zaretskii, Stefan Monnier, emacs-devel >>>>> On Sun, 23 Aug 2020, Paul Eggert wrote: > On 8/23/20 9:07 AM, Eli Zaretskii wrote: >> it doesn't have to be an ASCII character. I really didn't want to open that can of worms. Now we can have endless bikeshedding. :) Also, shouldn't one be extra conservative for the characters displayed in the modeline, as not all systems may be capable of displaying the full unicode repertoire? > OK, then how about this refinement of Ulrich's suggestion? > U -- utf-8* (all variants) > W -- utf-16* (all variants) > Ǔ -- utf-7 U+01D3 LATIN CAPITAL LETTER U WITH CARON > ǔ -- utf-7-imap U+01D4 LATIN SMALL LETTER U WITH CARON > У -- koi8-u U+0423 CYRILLIC CAPITAL LETTER U > W because it's double-U, and a caron because it looks like a 7 rotated. W is already used for iso-latin-8 aka iso-8859-14. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 18:53 ` Ulrich Mueller @ 2020-08-23 18:56 ` Eli Zaretskii 2020-08-23 18:57 ` Eli Zaretskii 1 sibling, 0 replies; 97+ messages in thread From: Eli Zaretskii @ 2020-08-23 18:56 UTC (permalink / raw) To: Ulrich Mueller; +Cc: eggert, monnier, emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: Eli Zaretskii <eliz@gnu.org>, Stefan Monnier > <monnier@iro.umontreal.ca>, emacs-devel@gnu.org > Date: Sun, 23 Aug 2020 20:53:36 +0200 > > >>>>> On Sun, 23 Aug 2020, Paul Eggert wrote: > > > On 8/23/20 9:07 AM, Eli Zaretskii wrote: > >> it doesn't have to be an ASCII character. > > I really didn't want to open that can of worms. Now we can have endless > bikeshedding. :) It was you who started it. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 18:53 ` Ulrich Mueller 2020-08-23 18:56 ` Eli Zaretskii @ 2020-08-23 18:57 ` Eli Zaretskii 2020-08-23 19:13 ` Ulrich Mueller 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2020-08-23 18:57 UTC (permalink / raw) To: Ulrich Mueller; +Cc: eggert, monnier, emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Date: Sun, 23 Aug 2020 20:53:36 +0200 > Cc: Eli Zaretskii <eliz@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>, > emacs-devel@gnu.org > > Also, shouldn't one be extra conservative for the characters displayed > in the modeline, as not all systems may be capable of displaying the > full unicode repertoire? I just said we could do it, I didn't say we should. From my POV, we could simply let these sleeping dogs lie, I very much doubt that many users even look at these indicators. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 18:57 ` Eli Zaretskii @ 2020-08-23 19:13 ` Ulrich Mueller 2020-08-23 19:42 ` Eli Zaretskii 2020-08-23 19:47 ` Stefan Kangas 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2020-08-23 19:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, monnier, emacs-devel >>>>> On Sun, 23 Aug 2020, Eli Zaretskii wrote: >> Also, shouldn't one be extra conservative for the characters >> displayed in the modeline, as not all systems may be capable of >> displaying the full unicode repertoire? > I just said we could do it, I didn't say we should. From my POV, we > could simply let these sleeping dogs lie, I very much doubt that many > users even look at these indicators. I stumbled upon this when updating a short section about Emacs in the Gentoo developer manual, where I realised that I cannot say that "-" and "U" in the modeline indicate ASCII and UTF-8, respectively. IMHO these two are the most important ones nowadays, so they should be unique. I don't really care about the rest (maybe "1" is still somewhat important here in Europe), and tried to change as little as possible in my suggestion. Namely, only move the ones colliding with "U" out of the way and otherwise stay with ASCII. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 19:13 ` Ulrich Mueller @ 2020-08-23 19:42 ` Eli Zaretskii 2020-08-23 21:23 ` Stefan Monnier 2020-08-23 19:47 ` Stefan Kangas 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2020-08-23 19:42 UTC (permalink / raw) To: Ulrich Mueller; +Cc: eggert, monnier, emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: eggert@cs.ucla.edu, monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Sun, 23 Aug 2020 21:13:25 +0200 > > IMHO these two are the most important ones nowadays, so they should be > unique. I don't really care about the rest (maybe "1" is still somewhat > important here in Europe), and tried to change as little as possible in > my suggestion. Namely, only move the ones colliding with "U" out of the > way and otherwise stay with ASCII. I'm asking whether this whole issue is important enough to trigger yet another round of endless arguments and gratuitous changes for very little gain. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 19:42 ` Eli Zaretskii @ 2020-08-23 21:23 ` Stefan Monnier 2020-08-24 7:06 ` Ulrich Mueller 0 siblings, 1 reply; 97+ messages in thread From: Stefan Monnier @ 2020-08-23 21:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ulrich Mueller, eggert, emacs-devel > I'm asking whether this whole issue is important enough to trigger yet > another round of endless arguments and gratuitous changes for very > little gain. I would appreciate it if utf-16 and utf-7 (those werd things from which I'd rather stay away) is made somehow different from utf-8. Stefan ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 21:23 ` Stefan Monnier @ 2020-08-24 7:06 ` Ulrich Mueller 2020-08-24 14:30 ` Yuri Khan 2020-08-24 14:36 ` Drew Adams 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2020-08-24 7:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, eggert, emacs-devel >>>>> On Sun, 23 Aug 2020, Stefan Monnier wrote: >> I'm asking whether this whole issue is important enough to trigger yet >> another round of endless arguments and gratuitous changes for very >> little gain. > I would appreciate it if utf-16 and utf-7 (those werd things from which > I'd rather stay away) is made somehow different from utf-8. The smallest change to achieve this would be to change both utf-16 and utf-7 from "U" to "u" (and koi8-u to "Y"). ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-24 7:06 ` Ulrich Mueller @ 2020-08-24 14:30 ` Yuri Khan 2020-08-29 11:17 ` Ulrich Mueller 2020-08-24 14:36 ` Drew Adams 1 sibling, 1 reply; 97+ messages in thread From: Yuri Khan @ 2020-08-24 14:30 UTC (permalink / raw) To: Ulrich Mueller Cc: Eli Zaretskii, Paul Eggert, Stefan Monnier, Emacs developers On Mon, 24 Aug 2020 at 14:07, Ulrich Mueller <ulm@gentoo.org> wrote: > The smallest change to achieve this would be to change both utf-16 and > utf-7 from "U" to "u" (and koi8-u to "Y"). The letter Y has nothing in common with Ukraine (country) and Ukrainian (language). Pretty much everyone who might be looking at a file encoded in koi8-u will have fonts with Cyrillic coverage so using the Cyrillic letter У should be better. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-24 14:30 ` Yuri Khan @ 2020-08-29 11:17 ` Ulrich Mueller 0 siblings, 0 replies; 97+ messages in thread From: Ulrich Mueller @ 2020-08-29 11:17 UTC (permalink / raw) To: Yuri Khan; +Cc: Eli Zaretskii, Paul Eggert, Stefan Monnier, Emacs developers >>>>> On Mon, 24 Aug 2020, Yuri Khan wrote: >> The smallest change to achieve this would be to change both utf-16 and >> utf-7 from "U" to "u" (and koi8-u to "Y"). > The letter Y has nothing in common with Ukraine (country) and > Ukrainian (language). Pretty much everyone who might be looking at a > file encoded in koi8-u will have fonts with Cyrillic coverage so using > the Cyrillic letter У should be better. I asked a fellow Gentoo developer from Ukraine, and he confirms that using Y for koi8-u would be "strange". He also made the suggestion to use K for all koi8*, even if it would collide with Korean. It should be obvious from the buffer's content if it's (e.g.) Russian or Korean, so there isn't any real ambiguity. Interestingly, the following comment (by RMS) in cyrillic.el goes into the same direction: ;; We used to use ?K. It is true that ?K is more strictly correct, ;; but it is also used for Korean. So people who use koi8 for ;; languages other than Russian will have to forgive us. ^ permalink raw reply [flat|nested] 97+ messages in thread
* RE: Disambiguate modeline character for UTF-8? 2020-08-24 7:06 ` Ulrich Mueller 2020-08-24 14:30 ` Yuri Khan @ 2020-08-24 14:36 ` Drew Adams 2020-08-24 15:23 ` Ulrich Mueller 1 sibling, 1 reply; 97+ messages in thread From: Drew Adams @ 2020-08-24 14:36 UTC (permalink / raw) To: Ulrich Mueller, Stefan Monnier; +Cc: Eli Zaretskii, eggert, emacs-devel > The smallest change to achieve this would be to change both utf-16 and > utf-7 from "U" to "u" (and koi8-u to "Y"). Not really wanting to get into this particular bike-shed discussion, as I don't care about it and don't have a suggestion of what indicators to use. I'll just say this, as some have suggested that one main thing they want is to be able to easily and quickly tell whether the encoding is NOT utf-8 (and not ASCII, presumably): The characters "u" and "U" are not so easily distinguished. You might want to pick some other, quite different looking, character for the non-UTF-8 (i.e., UTF-16 etc.). ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-24 14:36 ` Drew Adams @ 2020-08-24 15:23 ` Ulrich Mueller 2020-08-24 16:43 ` Stefan Monnier 2023-07-05 10:08 ` Ulrich Mueller 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2020-08-24 15:23 UTC (permalink / raw) To: Drew Adams; +Cc: Eli Zaretskii, eggert, Stefan Monnier, emacs-devel >>>>> On Mon, 24 Aug 2020, Drew Adams wrote: > I'll just say this, as some have suggested that > one main thing they want is to be able to easily > and quickly tell whether the encoding is NOT > utf-8 (and not ASCII, presumably): > The characters "u" and "U" are not so easily > distinguished. You might want to pick some > other, quite different looking, character for > the non-UTF-8 (i.e., UTF-16 etc.). Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8? This would be visually unobtrusive, so any uncommon coding system would stand out against it. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-24 15:23 ` Ulrich Mueller @ 2020-08-24 16:43 ` Stefan Monnier 2023-07-05 10:08 ` Ulrich Mueller 1 sibling, 0 replies; 97+ messages in thread From: Stefan Monnier @ 2020-08-24 16:43 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, Drew Adams, emacs-devel > Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8? > This would be visually unobtrusive, so any uncommon coding system would > stand out against it. U1 from me! Stefan ;-) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-24 15:23 ` Ulrich Mueller 2020-08-24 16:43 ` Stefan Monnier @ 2023-07-05 10:08 ` Ulrich Mueller 2023-07-05 11:41 ` Eli Zaretskii 2023-07-05 12:49 ` Disambiguate modeline character for UTF-8? Stefan Monnier 1 sibling, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-05 10:08 UTC (permalink / raw) To: emacs-devel; +Cc: Drew Adams, Eli Zaretskii, eggert, Stefan Monnier >>>>> On Mon, 24 Aug 2020, Ulrich Mueller wrote: >>>>> On Mon, 24 Aug 2020, Drew Adams wrote: >> I'll just say this, as some have suggested that >> one main thing they want is to be able to easily >> and quickly tell whether the encoding is NOT >> utf-8 (and not ASCII, presumably): >> The characters "u" and "U" are not so easily >> distinguished. You might want to pick some >> other, quite different looking, character for >> the non-UTF-8 (i.e., UTF-16 etc.). > Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8? > This would be visually unobtrusive, so any uncommon coding system would > stand out against it. Coming back to this thread (which at the time ended in bikeshedding). The goal I had in mind was to disambiguate UTF-8, i.e. a unique modeline character would be used for it. Currently this is not the case: U -- utf-8* (all variants) U -- utf-16* (all variants) U -- utf-7 U -- koi8-u So, I propose to change this to either: + -- utf-8* (all variants) (everything else unchanged) or: U -- utf-8* (all variants) u -- utf-16* (all variants) u -- utf-7 K -- koi8-u Note that "K" is also used for Korean. I think that's not a real conflict, because normally it would be clear from context whether the buffer's content is Korean or Ukrainian. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 10:08 ` Ulrich Mueller @ 2023-07-05 11:41 ` Eli Zaretskii 2023-07-05 13:04 ` Ulrich Mueller 2023-07-05 12:49 ` Disambiguate modeline character for UTF-8? Stefan Monnier 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-05 11:41 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier > From: Ulrich Mueller <ulm@gentoo.org> > Cc: Drew Adams <drew.adams@oracle.com>, Eli Zaretskii <eliz@gnu.org>, > eggert@cs.ucla.edu, Stefan Monnier <monnier@iro.umontreal.ca> > Date: Wed, 05 Jul 2023 12:08:59 +0200 > > >>>>> On Mon, 24 Aug 2020, Ulrich Mueller wrote: > > >>>>> On Mon, 24 Aug 2020, Drew Adams wrote: > >> I'll just say this, as some have suggested that > >> one main thing they want is to be able to easily > >> and quickly tell whether the encoding is NOT > >> utf-8 (and not ASCII, presumably): > > >> The characters "u" and "U" are not so easily > >> distinguished. You might want to pick some > >> other, quite different looking, character for > >> the non-UTF-8 (i.e., UTF-16 etc.). > > > Another idea: Since "-" is used for ASCII, maybe use "+" for UTF-8? > > This would be visually unobtrusive, so any uncommon coding system would > > stand out against it. > > Coming back to this thread (which at the time ended in bikeshedding). > The goal I had in mind was to disambiguate UTF-8, i.e. a unique modeline > character would be used for it. Currently this is not the case: > > U -- utf-8* (all variants) > U -- utf-16* (all variants) > U -- utf-7 > U -- koi8-u > > So, I propose to change this to either: > > + -- utf-8* (all variants) > (everything else unchanged) > > or: > > U -- utf-8* (all variants) > u -- utf-16* (all variants) > u -- utf-7 > K -- koi8-u TBH, I don't like to change such long-time features. The only real problem is between UTF-8 and UTF-16, since the others are hardly ever used these days. UTF-16 is also quite rarely used, basically only on MS-Windows for system-level files. So is this really a problem that we need to solve, at the risk of breaking people's "muscle" memory? If I see the lower-case "u" on the modeline when I expect to see "U" instead, I'd be surprised. Is it worth it? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 11:41 ` Eli Zaretskii @ 2023-07-05 13:04 ` Ulrich Mueller 2023-07-05 13:44 ` Eli Zaretskii 2023-07-06 12:27 ` Po Lu 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-05 13:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel, drew.adams, eggert, monnier >>>>> On Wed, 05 Jul 2023, Eli Zaretskii wrote: >> Coming back to this thread (which at the time ended in bikeshedding). >> The goal I had in mind was to disambiguate UTF-8, i.e. a unique modeline >> character would be used for it. Currently this is not the case: >> >> U -- utf-8* (all variants) >> U -- utf-16* (all variants) >> U -- utf-7 >> U -- koi8-u >> >> So, I propose to change this to either: >> >> + -- utf-8* (all variants) >> (everything else unchanged) >> >> or: >> >> U -- utf-8* (all variants) >> u -- utf-16* (all variants) >> u -- utf-7 >> K -- koi8-u > TBH, I don't like to change such long-time features. > The only real problem is between UTF-8 and UTF-16, since the others > are hardly ever used these days. UTF-16 is also quite rarely used, > basically only on MS-Windows for system-level files. So is this > really a problem that we need to solve, at the risk of breaking > people's "muscle" memory? If I see the lower-case "u" on the > modeline when I expect to see "U" instead, I'd be surprised. Is it > worth it? UTF-8 is one of the most common encodings, and it is strange that it shares its modeline indicator with anything else. And the "U" is really ambiguous, because context won't help (or how would you decide if a buffer's file encoding is e.g. koi8-u or utf-8?). As you say, the others in the above list are rarely used nowadays. So, maybe users should see the "u" or the "K" to indicate that the file has an unusual encoding? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 13:04 ` Ulrich Mueller @ 2023-07-05 13:44 ` Eli Zaretskii 2023-07-05 21:50 ` Ulrich Mueller 2023-07-06 12:27 ` Po Lu 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-05 13:44 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier > From: Ulrich Mueller <ulm@gentoo.org> > Cc: Ulrich Mueller <ulm@gentoo.org>, emacs-devel@gnu.org, > drew.adams@oracle.com, eggert@cs.ucla.edu, monnier@iro.umontreal.ca > Date: Wed, 05 Jul 2023 15:04:08 +0200 > > >>>>> On Wed, 05 Jul 2023, Eli Zaretskii wrote: > > > The only real problem is between UTF-8 and UTF-16, since the others > > are hardly ever used these days. UTF-16 is also quite rarely used, > > basically only on MS-Windows for system-level files. So is this > > really a problem that we need to solve, at the risk of breaking > > people's "muscle" memory? If I see the lower-case "u" on the > > modeline when I expect to see "U" instead, I'd be surprised. Is it > > worth it? > > UTF-8 is one of the most common encodings, and it is strange that it > shares its modeline indicator with anything else. And the "U" is really > ambiguous, because context won't help (or how would you decide if a > buffer's file encoding is e.g. koi8-u or utf-8?). Is the problem that koi8-u also uses 'U'? That is, if we change koi8-u to some other character, will that be good enough? The other encodings are all from the UTF family, so using 'U' for them all does make sense. (The lower-case 'u' for utf-7 is IMO simply a mistake, and can be fixed with a low risk, I think, since this encoding is rare.) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 13:44 ` Eli Zaretskii @ 2023-07-05 21:50 ` Ulrich Mueller 2023-07-05 22:11 ` Paul Eggert ` (2 more replies) 0 siblings, 3 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-05 21:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, drew.adams, eggert, monnier >>>>> On Wed, 05 Jul 2023, Eli Zaretskii wrote: >> UTF-8 is one of the most common encodings, and it is strange that it >> shares its modeline indicator with anything else. And the "U" is really >> ambiguous, because context won't help (or how would you decide if a >> buffer's file encoding is e.g. koi8-u or utf-8?). > Is the problem that koi8-u also uses 'U'? That is, if we change > koi8-u to some other character, will that be good enough? It would help, but it would solve only part of the problem. (I had suggested "K" for koi8 before.) > The other encodings are all from the UTF family, so using 'U' for them > all does make sense. IMHO it doesn't make sense at all. UTF-8, UTF-16 and UTF-7 are completely different encodings which have nothing in common except their name. All I'm asking for is a unique indicator for UTF-8. Wouldn't this be justified for the most common encoding (or maybe it's second after ASCII)? > (The lower-case 'u' for utf-7 is IMO simply a mistake, and can be > fixed with a low risk, I think, since this encoding is rare.) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 21:50 ` Ulrich Mueller @ 2023-07-05 22:11 ` Paul Eggert 2023-07-06 8:51 ` Ulrich Mueller 2023-07-06 5:33 ` Eli Zaretskii 2023-07-06 12:31 ` Po Lu 2 siblings, 1 reply; 97+ messages in thread From: Paul Eggert @ 2023-07-05 22:11 UTC (permalink / raw) To: Ulrich Mueller, Eli Zaretskii; +Cc: emacs-devel, drew.adams, monnier On 2023-07-05 14:50, Ulrich Mueller wrote: > All I'm asking for is a unique indicator for UTF-8. Wouldn't this be > justified for the most common encoding (or maybe it's second after > ASCII)? Is the idea to use 'u' for UTF-8, and 'U' for the other Unicode-related encodings? That sounds good to me, since 'u' should be common and 'U' rare. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 22:11 ` Paul Eggert @ 2023-07-06 8:51 ` Ulrich Mueller 0 siblings, 0 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-06 8:51 UTC (permalink / raw) To: Paul Eggert; +Cc: Eli Zaretskii, emacs-devel, drew.adams, monnier >>>>> On Thu, 06 Jul 2023, Paul Eggert wrote: > Is the idea to use 'u' for UTF-8, and 'U' for the other > Unicode-related encodings? That sounds good to me, since 'u' should be > common and 'U' rare. My suggestion was to keep "U" for UTF-8, and change the others to "u" (so in the most common case there would be no change). But I'd also be fine with the other way around. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 21:50 ` Ulrich Mueller 2023-07-05 22:11 ` Paul Eggert @ 2023-07-06 5:33 ` Eli Zaretskii 2023-07-06 8:47 ` Ulrich Mueller 2023-07-06 12:31 ` Po Lu 2 siblings, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-06 5:33 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier > From: Ulrich Mueller <ulm@gentoo.org> > Cc: emacs-devel@gnu.org, drew.adams@oracle.com, eggert@cs.ucla.edu, > monnier@iro.umontreal.ca > Date: Wed, 05 Jul 2023 23:50:53 +0200 > > > The other encodings are all from the UTF family, so using 'U' for them > > all does make sense. > > IMHO it doesn't make sense at all. UTF-8, UTF-16 and UTF-7 are > completely different encodings which have nothing in common except > their name. They do have something important in common: they are all Unicode encodings, thus "U". > All I'm asking for is a unique indicator for UTF-8. Wouldn't this be > justified for the most common encoding (or maybe it's second after > ASCII)? Sorry, I'm not interested in such radical changes. UTF-8 is an important encoding, but that doesn't justify what you propose. In the absolute majority of cases, "U" already means UTF-8 and nothing else, so the issue is marginal at best, and making such significant incompatible changes in user-facing displays is unjustified from my POV. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 5:33 ` Eli Zaretskii @ 2023-07-06 8:47 ` Ulrich Mueller 2023-07-06 9:20 ` Eli Zaretskii 2023-07-06 12:32 ` Po Lu 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-06 8:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, drew.adams, eggert, monnier >>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote: >> All I'm asking for is a unique indicator for UTF-8. Wouldn't this be >> justified for the most common encoding (or maybe it's second after >> ASCII)? > Sorry, I'm not interested in such radical changes. UTF-8 is an > important encoding, but that doesn't justify what you propose. In the > absolute majority of cases, "U" already means UTF-8 and nothing else, > so the issue is marginal at best, and making such significant > incompatible changes in user-facing displays is unjustified from my > POV. Sorry, but in what world does this qualify as a "radical change"? I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u", and koi8 to some other character like "K". ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 8:47 ` Ulrich Mueller @ 2023-07-06 9:20 ` Eli Zaretskii 2023-07-06 9:46 ` Ulrich Mueller 2023-07-06 12:32 ` Po Lu 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-06 9:20 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel, drew.adams, eggert, monnier > From: Ulrich Mueller <ulm@gentoo.org> > Cc: emacs-devel@gnu.org, drew.adams@oracle.com, eggert@cs.ucla.edu, > monnier@iro.umontreal.ca > Date: Thu, 06 Jul 2023 10:47:53 +0200 > > >>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote: > > >> All I'm asking for is a unique indicator for UTF-8. Wouldn't this be > >> justified for the most common encoding (or maybe it's second after > >> ASCII)? > > > Sorry, I'm not interested in such radical changes. UTF-8 is an > > important encoding, but that doesn't justify what you propose. In the > > absolute majority of cases, "U" already means UTF-8 and nothing else, > > so the issue is marginal at best, and making such significant > > incompatible changes in user-facing displays is unjustified from my > > POV. > > Sorry, but in what world does this qualify as a "radical change"? In this one. People have been staring at "U" (and "UUU" on TTY frames) since Emacs 23.1 was released. If someone cares about those characters so much so that they want them changed, please think about others who care about them and would be surprised and probably worried by suddenly seeing a different character. (If the assumption is that people don't care about these indications, then this whole discussion is moot to begin with.) > I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u", > and koi8 to some other character like "K". Yes, I understand the proposal. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 9:20 ` Eli Zaretskii @ 2023-07-06 9:46 ` Ulrich Mueller 2023-07-06 12:34 ` Po Lu 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-06 9:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, drew.adams, eggert, monnier >>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote: >> > Sorry, I'm not interested in such radical changes. UTF-8 is an >> > important encoding, but that doesn't justify what you propose. In the >> > absolute majority of cases, "U" already means UTF-8 and nothing else, >> > so the issue is marginal at best, and making such significant >> > incompatible changes in user-facing displays is unjustified from my >> > POV. >> >> Sorry, but in what world does this qualify as a "radical change"? > In this one. People have been staring at "U" (and "UUU" on TTY > frames) since Emacs 23.1 was released. If someone cares about those > characters so much so that they want them changed, please think about > others who care about them and would be surprised and probably worried > by suddenly seeing a different character. (If the assumption is that > people don't care about these indications, then this whole discussion > is moot to begin with.) Well, in the absolute majority of cases (UTF-8) the "U" would stay. I'd rather expect users to be surprised when they see the "U" but then find out that the encoding is something entirely different. >> I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u", >> and koi8 to some other character like "K". > Yes, I understand the proposal. How about the following then? - Keep "U" for both UTF-8 and UTF-16. - Change UTF-7 to "u" (which is already used for one of its variants). - Change koi8 to "K". ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 9:46 ` Ulrich Mueller @ 2023-07-06 12:34 ` Po Lu 0 siblings, 0 replies; 97+ messages in thread From: Po Lu @ 2023-07-06 12:34 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier Ulrich Mueller <ulm@gentoo.org> writes: > Well, in the absolute majority of cases (UTF-8) the "U" would stay. > > I'd rather expect users to be surprised when they see the "U" but then > find out that the encoding is something entirely different. When I see `U', my only expectation is for the coding system being used to represent a Unicode character set. The tool tip text displays exactly which coding system that is. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 8:47 ` Ulrich Mueller 2023-07-06 9:20 ` Eli Zaretskii @ 2023-07-06 12:32 ` Po Lu 1 sibling, 0 replies; 97+ messages in thread From: Po Lu @ 2023-07-06 12:32 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier Ulrich Mueller <ulm@gentoo.org> writes: > Sorry, but in what world does this qualify as a "radical change"? A change to long standing behavior is always radical. > I propose changing UTF-16 and UTF-7 from uppercase "U" to lowercase "u", > and koi8 to some other character like "K". I'm fine with changing koi8-u. But changing the characters that represent Unicode encodings is unreasonable. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 21:50 ` Ulrich Mueller 2023-07-05 22:11 ` Paul Eggert 2023-07-06 5:33 ` Eli Zaretskii @ 2023-07-06 12:31 ` Po Lu 2023-07-06 13:02 ` Andreas Schwab 2023-07-06 13:08 ` Ulrich Mueller 2 siblings, 2 replies; 97+ messages in thread From: Po Lu @ 2023-07-06 12:31 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier Ulrich Mueller <ulm@gentoo.org> writes: > IMHO it doesn't make sense at all. UTF-8, UTF-16 and UTF-7 are > completely different encodings which have nothing in common except > their name. I disagree. UTF-7, UTF-8 and UTF-16 both encode the same coded character set (or at least the BMP of the same character set.) That's a far cry from there being ``nothing in common''. > All I'm asking for is a unique indicator for UTF-8. Wouldn't this be > justified for the most common encoding (or maybe it's second after > ASCII)? Why would UTF-8 warrant a unique indicator on the basis of popularity alone? The indicator is supposed to describe the coded character set: people who also need to know the coding system (which is not needed as often) can also read its tooltip text. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 12:31 ` Po Lu @ 2023-07-06 13:02 ` Andreas Schwab 2023-07-06 13:08 ` Ulrich Mueller 1 sibling, 0 replies; 97+ messages in thread From: Andreas Schwab @ 2023-07-06 13:02 UTC (permalink / raw) To: Po Lu Cc: Ulrich Mueller, Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier On Jul 06 2023, Po Lu wrote: > I disagree. UTF-7, UTF-8 and UTF-16 both encode the same coded > character set (or at least the BMP of the same character set.) As does GB18030. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 12:31 ` Po Lu 2023-07-06 13:02 ` Andreas Schwab @ 2023-07-06 13:08 ` Ulrich Mueller 2023-07-06 17:37 ` Paul Eggert 2023-07-07 0:19 ` Disambiguate modeline character for UTF-8? Po Lu 1 sibling, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-06 13:08 UTC (permalink / raw) To: Po Lu; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier >>>>> On Thu, 06 Jul 2023, Po Lu wrote: > I disagree. UTF-7, UTF-8 and UTF-16 both encode the same coded > character set (or at least the BMP of the same character set.) That's a > far cry from there being ``nothing in common''. This argument applies only to UTF-8 and UTF-16. OTOH, UTF-7 isn't part of the Unicode standard. Also, it cannot encode all of Unicode but only the first 65536 code points [1]. >> All I'm asking for is a unique indicator for UTF-8. Wouldn't this >> be justified for the most common encoding (or maybe it's second >> after ASCII)? > Why would UTF-8 warrant a unique indicator on the basis of popularity > alone? The indicator is supposed to describe the coded character set: > people who also need to know the coding system (which is not needed as > often) can also read its tooltip text. Right, and for both UTF-7 and koi8-u the coded character set is not Unicode but only a subset of it. [1] https://www.rfc-editor.org/rfc/rfc2152.txt ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 13:08 ` Ulrich Mueller @ 2023-07-06 17:37 ` Paul Eggert 2023-07-06 18:13 ` Eli Zaretskii 2023-07-06 18:44 ` Ulrich Müller 2023-07-07 0:19 ` Disambiguate modeline character for UTF-8? Po Lu 1 sibling, 2 replies; 97+ messages in thread From: Paul Eggert @ 2023-07-06 17:37 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel On 2023-07-06 06:08, Ulrich Mueller wrote: > for both UTF-7 and koi8-u the coded character set is not > Unicode but only a subset of it It would be helpful to use 'u' when only a subset of Unicode can be represented, as a clue that something odd is going on, compared to the more-usual 'U'. Also, Andreas made a good point: since GB18030 encodes Unicode, shouldn't it be displayed as 'U' too? Why treat it specially? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 17:37 ` Paul Eggert @ 2023-07-06 18:13 ` Eli Zaretskii 2023-07-06 18:44 ` Ulrich Müller 1 sibling, 0 replies; 97+ messages in thread From: Eli Zaretskii @ 2023-07-06 18:13 UTC (permalink / raw) To: Paul Eggert; +Cc: ulm, emacs-devel > Date: Thu, 6 Jul 2023 10:37:53 -0700 > Cc: emacs-devel@gnu.org > From: Paul Eggert <eggert@cs.ucla.edu> > > Also, Andreas made a good point: since GB18030 encodes Unicode, > shouldn't it be displayed as 'U' too? Why treat it specially? I don't think this is what Andreas had in mind, but in any case, I don't think we should make such a change with the GB18030 users actually asking for that. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 17:37 ` Paul Eggert 2023-07-06 18:13 ` Eli Zaretskii @ 2023-07-06 18:44 ` Ulrich Müller 2023-07-06 19:01 ` Eli Zaretskii 1 sibling, 1 reply; 97+ messages in thread From: Ulrich Müller @ 2023-07-06 18:44 UTC (permalink / raw) To: Paul Eggert; +Cc: emacs-devel >>>>> On Thu, 06 Jul 2023, Paul Eggert wrote: > On 2023-07-06 06:08, Ulrich Mueller wrote: >> for both UTF-7 and koi8-u the coded character set is not >> Unicode but only a subset of it > It would be helpful to use 'u' when only a subset of Unicode can be > represented, as a clue that something odd is going on, compared to the > more-usual 'U'. How about the following patch then? From b33df88e456092e89bad52565b68a77ea3d0c71a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ulrich=20M=C3=BCller?= <ulm@gentoo.org> Date: Thu, 6 Jul 2023 20:36:09 +0200 Subject: [PATCH] Disambiguate mode line indication for utf-8 and utf-16 * lisp/international/mule-conf.el (utf-7): * lisp/language/cyrillic.el (koi8-u): Change mnemonic letters to ?u and ?K, respectively. --- lisp/international/mule-conf.el | 2 +- lisp/language/cyrillic.el | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el index a27aaf9e522..f65f124b633 100644 --- a/lisp/international/mule-conf.el +++ b/lisp/international/mule-conf.el @@ -1600,7 +1600,7 @@ 'ascii (define-coding-system 'utf-7 "UTF-7 encoding of Unicode (RFC 2152)." :coding-type 'utf-8 - :mnemonic ?U + :mnemonic ?u :mime-charset 'utf-7 :charset-list '(unicode) :pre-write-conversion 'utf-7-pre-write-conversion diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el index 7af87e65703..1ad1302095b 100644 --- a/lisp/language/cyrillic.el +++ b/lisp/language/cyrillic.el @@ -126,7 +126,10 @@ 'cp878 (define-coding-system 'koi8-u "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)" :coding-type 'charset - :mnemonic ?U + ;; This used to be ?U which collided with UTF-8. ?K is also used + ;; for Korean, but it shouldn't be a real conflict since Cyrillic + ;; and Hangul can be disambiguated from context. + :mnemonic ?K :charset-list '(koi8-u) :mime-charset 'koi8-u) -- 2.41.0 ^ permalink raw reply related [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 18:44 ` Ulrich Müller @ 2023-07-06 19:01 ` Eli Zaretskii 2023-07-06 19:31 ` Ulrich Mueller 0 siblings, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-06 19:01 UTC (permalink / raw) To: Ulrich Müller; +Cc: eggert, emacs-devel > From: Ulrich Müller <ulm@gentoo.org> > Cc: emacs-devel@gnu.org > Date: Thu, 06 Jul 2023 20:44:05 +0200 > > --- a/lisp/language/cyrillic.el > +++ b/lisp/language/cyrillic.el > @@ -126,7 +126,10 @@ 'cp878 > (define-coding-system 'koi8-u > "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)" > :coding-type 'charset > - :mnemonic ?U > + ;; This used to be ?U which collided with UTF-8. ?K is also used > + ;; for Korean, but it shouldn't be a real conflict since Cyrillic > + ;; and Hangul can be disambiguated from context. > + :mnemonic ?K K is not a good idea, for 2 reasons: . the KOI8 family includes 3 encodings, not 1 . U in koi8-u stands for "Ukraine", so replacing it with K will probably be frowned upon How about using У instead? (Assuming using non-ASCII works there; the code seems to allow that.) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 19:01 ` Eli Zaretskii @ 2023-07-06 19:31 ` Ulrich Mueller 2023-07-07 5:18 ` Eli Zaretskii 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-06 19:31 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, emacs-devel >>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote: >> From: Ulrich Müller <ulm@gentoo.org> >> Cc: emacs-devel@gnu.org >> Date: Thu, 06 Jul 2023 20:44:05 +0200 >> >> --- a/lisp/language/cyrillic.el >> +++ b/lisp/language/cyrillic.el >> @@ -126,7 +126,10 @@ 'cp878 >> (define-coding-system 'koi8-u >> "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)" >> :coding-type 'charset >> - :mnemonic ?U >> + ;; This used to be ?U which collided with UTF-8. ?K is also used >> + ;; for Korean, but it shouldn't be a real conflict since Cyrillic >> + ;; and Hangul can be disambiguated from context. >> + :mnemonic ?K > K is not a good idea, for 2 reasons: > . the KOI8 family includes 3 encodings, not 1 > . U in koi8-u stands for "Ukraine", so replacing it with K will > probably be frowned upon The K (for all KOI8 variants) was actually suggested by a person from Ukraine, back in 2020: https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg01010.html > How about using У instead? (Assuming using non-ASCII works there; the > code seems to allow that.) I've just tested a patch with current master, and for me the У works both in an X frame ("У" in the mode line), in a text terminal under X ("UUУ") and in the Linux console ("UUУ"). Can we assume that users have the necessary fonts installed? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 19:31 ` Ulrich Mueller @ 2023-07-07 5:18 ` Eli Zaretskii 2023-07-07 5:48 ` Ulrich Müller 0 siblings, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-07 5:18 UTC (permalink / raw) To: Ulrich Mueller; +Cc: eggert, emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org > Date: Thu, 06 Jul 2023 21:31:49 +0200 > > >>>>> On Thu, 06 Jul 2023, Eli Zaretskii wrote: > > > K is not a good idea, for 2 reasons: > > > . the KOI8 family includes 3 encodings, not 1 > > . U in koi8-u stands for "Ukraine", so replacing it with K will > > probably be frowned upon > > The K (for all KOI8 variants) was actually suggested by a person from > Ukraine, back in 2020: > https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg01010.html That's one person... > > How about using У instead? (Assuming using non-ASCII works there; the > > code seems to allow that.) > > I've just tested a patch with current master, and for me the У works > both in an X frame ("У" in the mode line), in a text terminal under X > ("UUУ") and in the Linux console ("UUУ"). > > Can we assume that users have the necessary fonts installed? In Ukraine? most probably. Did that character on a GUI frame need a non-default font, or was it supported by the default font. I'd expect the Cyrillic script to be supported by the fonts people use as the default in Emacs. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 5:18 ` Eli Zaretskii @ 2023-07-07 5:48 ` Ulrich Müller 2023-07-07 6:16 ` Po Lu 2023-07-08 8:49 ` Eli Zaretskii 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Müller @ 2023-07-07 5:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, emacs-devel >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote: >> The K (for all KOI8 variants) was actually suggested by a person from >> Ukraine, back in 2020: >> https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg01010.html > That's one person... Yes. >> > How about using У instead? (Assuming using non-ASCII works there; the >> > code seems to allow that.) >> >> I've just tested a patch with current master, and for me the У works >> both in an X frame ("У" in the mode line), in a text terminal under X >> ("UUУ") and in the Linux console ("UUУ"). >> >> Can we assume that users have the necessary fonts installed? > In Ukraine? most probably. > Did that character on a GUI frame need a non-default font, or was it > supported by the default font. I'd expect the Cyrillic script to be > supported by the fonts people use as the default in Emacs. It's supported by the default font for me (which is Droid Sans Mono). У aka U+0423 is contained in WGL4, and I guess that most fonts (also on GNU/Linux) would cover a superset of it. Updated patch below. From fbcc65ebde142f998e9dae8ad711f484585ef29b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ulrich=20M=C3=BCller?= <ulm@gentoo.org> Date: Thu, 6 Jul 2023 20:36:09 +0200 Subject: [PATCH] Disambiguate mode line indication for utf-8 and utf-16 * lisp/international/mule-conf.el (utf-7): * lisp/language/cyrillic.el (koi8-u): Change mnemonic letters to ?u and ?\N{cyrillic capital letter u}, respectively. --- lisp/international/mule-conf.el | 2 +- lisp/language/cyrillic.el | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el index a27aaf9e522..f65f124b633 100644 --- a/lisp/international/mule-conf.el +++ b/lisp/international/mule-conf.el @@ -1600,7 +1600,7 @@ 'ascii (define-coding-system 'utf-7 "UTF-7 encoding of Unicode (RFC 2152)." :coding-type 'utf-8 - :mnemonic ?U + :mnemonic ?u :mime-charset 'utf-7 :charset-list '(unicode) :pre-write-conversion 'utf-7-pre-write-conversion diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el index 7af87e65703..f923c84e221 100644 --- a/lisp/language/cyrillic.el +++ b/lisp/language/cyrillic.el @@ -126,7 +126,8 @@ 'cp878 (define-coding-system 'koi8-u "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)" :coding-type 'charset - :mnemonic ?U + ;; This used to be ?U which collided with UTF-8. + :mnemonic ?\N{cyrillic capital letter u} ; У :charset-list '(koi8-u) :mime-charset 'koi8-u) -- 2.41.0 ^ permalink raw reply related [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 5:48 ` Ulrich Müller @ 2023-07-07 6:16 ` Po Lu 2023-07-07 6:41 ` Ulrich Mueller 2023-07-08 8:49 ` Eli Zaretskii 1 sibling, 1 reply; 97+ messages in thread From: Po Lu @ 2023-07-07 6:16 UTC (permalink / raw) To: Ulrich Müller; +Cc: Eli Zaretskii, eggert, emacs-devel Ulrich Müller <ulm@gentoo.org> writes: > diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el > index a27aaf9e522..f65f124b633 100644 > --- a/lisp/international/mule-conf.el > +++ b/lisp/international/mule-conf.el > @@ -1600,7 +1600,7 @@ 'ascii > (define-coding-system 'utf-7 > "UTF-7 encoding of Unicode (RFC 2152)." > :coding-type 'utf-8 > - :mnemonic ?U > + :mnemonic ?u > :mime-charset 'utf-7 > :charset-list '(unicode) > :pre-write-conversion 'utf-7-pre-write-conversion I thought we agreed NOT to change the mnemonic used for UTF-7, which also encodes the Unicode BMP. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 6:16 ` Po Lu @ 2023-07-07 6:41 ` Ulrich Mueller 2023-07-07 7:38 ` Po Lu 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 6:41 UTC (permalink / raw) To: Po Lu; +Cc: Eli Zaretskii, eggert, emacs-devel >>>>> On Fri, 07 Jul 2023, Po Lu wrote: >> (define-coding-system 'utf-7 >> "UTF-7 encoding of Unicode (RFC 2152)." >> :coding-type 'utf-8 >> - :mnemonic ?U >> + :mnemonic ?u >> :mime-charset 'utf-7 >> :charset-list '(unicode) >> :pre-write-conversion 'utf-7-pre-write-conversion > I thought we agreed NOT to change the mnemonic used for UTF-7, which > also encodes the Unicode BMP. Which isn't Unicode but only a subset, so it's reasonable that it has a different mnemonic. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 6:41 ` Ulrich Mueller @ 2023-07-07 7:38 ` Po Lu 2023-07-07 9:44 ` Ulrich Mueller 0 siblings, 1 reply; 97+ messages in thread From: Po Lu @ 2023-07-07 7:38 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel Ulrich Mueller <ulm@gentoo.org> writes: > Which isn't Unicode but only a subset, so it's reasonable that it has > a different mnemonic. Why? Most of the characters users will want to save in a Unicode document will be part of the BMP. If it turns out that a character is not representable, Emacs will ask the user to select a better coding system upon trying to save the file. Not that I expect this to happen in practice anyway, since UTF-7 is rarely encountered nowadays. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 7:38 ` Po Lu @ 2023-07-07 9:44 ` Ulrich Mueller 2023-07-07 10:21 ` Eli Zaretskii 2023-07-07 12:01 ` Po Lu 0 siblings, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 9:44 UTC (permalink / raw) To: Po Lu; +Cc: Eli Zaretskii, eggert, emacs-devel >>>>> On Fri, 07 Jul 2023, Po Lu wrote: > Ulrich Mueller <ulm@gentoo.org> writes: >> Which isn't Unicode but only a subset, so it's reasonable that it has >> a different mnemonic. > Why? Most of the characters users will want to save in a Unicode > document will be part of the BMP. If it turns out that a character is > not representable, Emacs will ask the user to select a better coding > system upon trying to save the file. Do you agree that the character repertoire that can be encoded by UTF-7 is not identical to the one that can be encoded by UTF-8 or UTF-16? > Not that I expect this to happen in practice anyway, since UTF-7 is > rarely encountered nowadays. Yes, and when users encounter that rare case, they should get an indication that the file they are visiting is in an unusual encoding. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 9:44 ` Ulrich Mueller @ 2023-07-07 10:21 ` Eli Zaretskii 2023-07-07 10:42 ` Ulrich Mueller 2023-07-07 12:01 ` Po Lu 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-07 10:21 UTC (permalink / raw) To: Ulrich Mueller; +Cc: luangruo, eggert, emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: Eli Zaretskii <eliz@gnu.org>, eggert@cs.ucla.edu, emacs-devel@gnu.org > Date: Fri, 07 Jul 2023 11:44:51 +0200 > > > Not that I expect this to happen in practice anyway, since UTF-7 is > > rarely encountered nowadays. > > Yes, and when users encounter that rare case, they should get an > indication that the file they are visiting is in an unusual encoding. We don't have a notion of "unusual" encoding in Emacs. Basically, anything besides ASCII and perhaps UTF-8 is "unusual" nowadays, so such a notion won't be useful, IMO. How is utf-7 more "unusual" than, say, ebcdic or iso-2022-jp or even windows-1251? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 10:21 ` Eli Zaretskii @ 2023-07-07 10:42 ` Ulrich Mueller 2023-07-07 12:04 ` Po Lu 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 10:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: luangruo, eggert, emacs-devel >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote: >> > Not that I expect this to happen in practice anyway, since UTF-7 is >> > rarely encountered nowadays. >> >> Yes, and when users encounter that rare case, they should get an >> indication that the file they are visiting is in an unusual encoding. > We don't have a notion of "unusual" encoding in Emacs. Basically, > anything besides ASCII and perhaps UTF-8 is "unusual" nowadays, so > such a notion won't be useful, IMO. > How is utf-7 more "unusual" than, say, ebcdic or iso-2022-jp or even > windows-1251? Sorry, poor choice of wording. What I meant was that when users encounter a rare case like UTF-7 (or any of the others you just mentioned), then they should get an indication of the fact. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 10:42 ` Ulrich Mueller @ 2023-07-07 12:04 ` Po Lu 2023-07-07 13:01 ` Ulrich Mueller 0 siblings, 1 reply; 97+ messages in thread From: Po Lu @ 2023-07-07 12:04 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel Ulrich Mueller <ulm@gentoo.org> writes: > Sorry, poor choice of wording. What I meant was that when users > encounter a rare case like UTF-7 (or any of the others you just > mentioned), then they should get an indication of the fact. Just to encourage them to convert those files into another ``common'' coding system? Emacs should stay impartial on such matters, and I can't think of any other purpose for specifically indicating that a rarely encountered coding system is being used. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 12:04 ` Po Lu @ 2023-07-07 13:01 ` Ulrich Mueller 2023-07-07 13:38 ` Po Lu 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 13:01 UTC (permalink / raw) To: Po Lu; +Cc: Eli Zaretskii, eggert, emacs-devel >>>>> On Fri, 07 Jul 2023, Po Lu wrote: >> Sorry, poor choice of wording. What I meant was that when users >> encounter a rare case like UTF-7 (or any of the others you just >> mentioned), then they should get an indication of the fact. > Just to encourage them to convert those files into another ``common'' > coding system? No, to make them aware that the the encoding is something other that UTF-8. And yes, I do want to know if the file that I'm about to save can be used as input for the next program (e.g. a compiler or a web server). > Emacs should stay impartial on such matters, and I can't think of any > other purpose for specifically indicating that a rarely encountered > coding system is being used. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 13:01 ` Ulrich Mueller @ 2023-07-07 13:38 ` Po Lu 0 siblings, 0 replies; 97+ messages in thread From: Po Lu @ 2023-07-07 13:38 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel Ulrich Mueller <ulm@gentoo.org> writes: > No, to make them aware that the the encoding is something other that > UTF-8. And yes, I do want to know if the file that I'm about to save can > be used as input for the next program (e.g. a compiler or a web server). If you set the buffer coding system to utf-7, you should already be aware of that fact. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 9:44 ` Ulrich Mueller 2023-07-07 10:21 ` Eli Zaretskii @ 2023-07-07 12:01 ` Po Lu 2023-07-07 12:38 ` Andreas Schwab 2023-07-07 12:58 ` Eli Zaretskii 1 sibling, 2 replies; 97+ messages in thread From: Po Lu @ 2023-07-07 12:01 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel Ulrich Mueller <ulm@gentoo.org> writes: > Do you agree that the character repertoire that can be encoded by UTF-7 > is not identical to the one that can be encoded by UTF-8 or UTF-16? Only one of the repertories. The Unicode BMP can be encoded by all of those coding systems. > Yes, and when users encounter that rare case, they should get an > indication that the file they are visiting is in an unusual encoding. Why? Again, the purpose of the indicator is to indicate the characters that can be represented in the buffer's coding system. If the user wants to know exactly which coding system is in use, he can view the indicator's tooltip. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 12:01 ` Po Lu @ 2023-07-07 12:38 ` Andreas Schwab 2023-07-07 13:37 ` Po Lu 2023-07-07 12:58 ` Eli Zaretskii 1 sibling, 1 reply; 97+ messages in thread From: Andreas Schwab @ 2023-07-07 12:38 UTC (permalink / raw) To: Po Lu; +Cc: Ulrich Mueller, Eli Zaretskii, eggert, emacs-devel On Jul 07 2023, Po Lu wrote: > Why? Again, the purpose of the indicator is to indicate the characters > that can be represented in the buffer's coding system. No. The purpose is to indicate the buffer's file coding system. You can put non-latin-1 characters in a buffer with a latin-1 file coding system just fine. The file coding system is only relevant when the buffer contents is saved. The buffer contents, if multibyte, is always represented by utf-8-emacs-unix, but that is an internal detail. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 12:38 ` Andreas Schwab @ 2023-07-07 13:37 ` Po Lu 2023-07-07 13:45 ` Andreas Schwab 0 siblings, 1 reply; 97+ messages in thread From: Po Lu @ 2023-07-07 13:37 UTC (permalink / raw) To: Andreas Schwab; +Cc: Ulrich Mueller, Eli Zaretskii, eggert, emacs-devel Andreas Schwab <schwab@linux-m68k.org> writes: > No. The purpose is to indicate the buffer's file coding system. You > can put non-latin-1 characters in a buffer with a latin-1 file coding > system just fine. It should have been clear that was what I meant... > The file coding system is only relevant when the buffer contents is > saved. since there is little point in editing a buffer visiting a file without the intention to save it. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 13:37 ` Po Lu @ 2023-07-07 13:45 ` Andreas Schwab 0 siblings, 0 replies; 97+ messages in thread From: Andreas Schwab @ 2023-07-07 13:45 UTC (permalink / raw) To: Po Lu; +Cc: Ulrich Mueller, Eli Zaretskii, eggert, emacs-devel On Jul 07 2023, Po Lu wrote: > since there is little point in editing a buffer visiting a file without > the intention to save it. There is nothing wrong with that. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 12:01 ` Po Lu 2023-07-07 12:38 ` Andreas Schwab @ 2023-07-07 12:58 ` Eli Zaretskii 1 sibling, 0 replies; 97+ messages in thread From: Eli Zaretskii @ 2023-07-07 12:58 UTC (permalink / raw) To: Po Lu; +Cc: ulm, eggert, emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: Eli Zaretskii <eliz@gnu.org>, eggert@cs.ucla.edu, emacs-devel@gnu.org > Date: Fri, 07 Jul 2023 20:01:23 +0800 > > Again, the purpose of the indicator is to indicate the characters > that can be represented in the buffer's coding system. That's not true. The mnemonic is the indication of the coding-system itself, not of its supported characters. You may perceive it as the indication of the characters, but it is a wrong interpretation. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-07 5:48 ` Ulrich Müller 2023-07-07 6:16 ` Po Lu @ 2023-07-08 8:49 ` Eli Zaretskii 2023-07-08 15:27 ` Basil Contovounesios 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-08 8:49 UTC (permalink / raw) To: Ulrich Müller; +Cc: eggert, emacs-devel > From: Ulrich Müller <ulm@gentoo.org> > Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org > Date: Fri, 07 Jul 2023 07:48:41 +0200 > > > Did that character on a GUI frame need a non-default font, or was it > > supported by the default font. I'd expect the Cyrillic script to be > > supported by the fonts people use as the default in Emacs. > > It's supported by the default font for me (which is Droid Sans Mono). > У aka U+0423 is contained in WGL4, and I guess that most fonts (also on > GNU/Linux) would cover a superset of it. > > Updated patch below. Thanks, installed on master, with a followup change to document this in NEWS and explain how to get back the old behavior. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-08 8:49 ` Eli Zaretskii @ 2023-07-08 15:27 ` Basil Contovounesios 2023-07-08 15:38 ` Eli Zaretskii 0 siblings, 1 reply; 97+ messages in thread From: Basil Contovounesios @ 2023-07-08 15:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ulrich Müller, eggert, emacs-devel Eli Zaretskii [2023-07-08 11:49 +0300] wrote: >> From: Ulrich Müller <ulm@gentoo.org> >> Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org >> Date: Fri, 07 Jul 2023 07:48:41 +0200 >> >> > Did that character on a GUI frame need a non-default font, or was it >> > supported by the default font. I'd expect the Cyrillic script to be >> > supported by the fonts people use as the default in Emacs. >> >> It's supported by the default font for me (which is Droid Sans Mono). >> У aka U+0423 is contained in WGL4, and I guess that most fonts (also on >> GNU/Linux) would cover a superset of it. >> >> Updated patch below. > > Thanks, installed on master, with a followup change to document this > in NEWS and explain how to get back the old behavior. Thanks, but I think it's too early in the build to use \N{name} syntax: Loading /home/blc/.local/src/emacs/lisp/language/cyrillic.el (source)... Error: invalid-read-syntax ("\\N{cyrillic capital letter u}" 130 42) [...] load-with-code-conversion("/home/blc/.local/src/emacs/lisp/language/cyrillic.el" "/home/blc/.local/src/emacs/lisp/language/cyrillic.el" nil nil) load("language/cyrillic") load("loadup.el") So I followed the example of lisp/international/mule-conf.el and switched to ?\uXXXX syntax: ; Fix last change to lisp/language/cyrillic.el. 1a9d454ebf6 2023-07-08 16:24:15 +0100 https://git.sv.gnu.org/cgit/emacs.git/commit/?id=1a9d454ebf6 -- Basil ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-08 15:27 ` Basil Contovounesios @ 2023-07-08 15:38 ` Eli Zaretskii 2023-07-08 16:21 ` Basil Contovounesios 2023-07-09 9:22 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller 0 siblings, 2 replies; 97+ messages in thread From: Eli Zaretskii @ 2023-07-08 15:38 UTC (permalink / raw) To: Basil Contovounesios; +Cc: ulm, eggert, emacs-devel > From: Basil Contovounesios <contovob@tcd.ie> > Cc: Ulrich Müller <ulm@gentoo.org>, eggert@cs.ucla.edu, > emacs-devel@gnu.org > Date: Sat, 08 Jul 2023 16:27:43 +0100 > > Eli Zaretskii [2023-07-08 11:49 +0300] wrote: > > Thanks, but I think it's too early in the build to use \N{name} syntax: Thanks, I've somehow missed that. But why \uNNNN instead of just the character itself? *.el files are always UTF-8 encoded, so there's no need to use only ASCII. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-08 15:38 ` Eli Zaretskii @ 2023-07-08 16:21 ` Basil Contovounesios 2023-07-08 16:33 ` Eli Zaretskii 2023-07-08 18:21 ` Ulrich Mueller 2023-07-09 9:22 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller 1 sibling, 2 replies; 97+ messages in thread From: Basil Contovounesios @ 2023-07-08 16:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, eggert, emacs-devel [-- Attachment #1: Type: text/plain, Size: 717 bytes --] Eli Zaretskii [2023-07-08 18:38 +0300] wrote: >> From: Basil Contovounesios <contovob@tcd.ie> >> Cc: Ulrich Müller <ulm@gentoo.org>, eggert@cs.ucla.edu, >> emacs-devel@gnu.org >> Date: Sat, 08 Jul 2023 16:27:43 +0100 >> >> Eli Zaretskii [2023-07-08 11:49 +0300] wrote: >> >> Thanks, but I think it's too early in the build to use \N{name} syntax: > > Thanks, I've somehow missed that. But why \uNNNN instead of just the > character itself? I just followed the example of characters.el and mule-conf.el, but... > *.el files are always UTF-8 encoded, so there's no need to use only > ASCII. ...I see cyrillic.el explicitly declares -*- coding: utf-8 -*-. Is this what you had in mind? [-- Attachment #2: cyrillic.diff --] [-- Type: text/x-diff, Size: 487 bytes --] diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el index 9ad65877140..cf3ee5a2b9d 100644 --- a/lisp/language/cyrillic.el +++ b/lisp/language/cyrillic.el @@ -127,7 +127,7 @@ 'koi8-u "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)" :coding-type 'charset ;; This used to be ?U which collided with UTF-8. - :mnemonic ?\u0423 ; ?\N{cyrillic capital letter u} У + :mnemonic ?У :charset-list '(koi8-u) :mime-charset 'koi8-u) [-- Attachment #3: Type: text/plain, Size: 20 bytes --] Thanks, -- Basil ^ permalink raw reply related [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-08 16:21 ` Basil Contovounesios @ 2023-07-08 16:33 ` Eli Zaretskii 2023-07-08 16:57 ` Basil Contovounesios 2023-07-08 18:21 ` Ulrich Mueller 1 sibling, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-08 16:33 UTC (permalink / raw) To: Basil Contovounesios; +Cc: ulm, eggert, emacs-devel > From: Basil Contovounesios <contovob@tcd.ie> > Cc: ulm@gentoo.org, eggert@cs.ucla.edu, emacs-devel@gnu.org > Date: Sat, 08 Jul 2023 17:21:15 +0100 > > > *.el files are always UTF-8 encoded, so there's no need to use only > > ASCII. > > ...I see cyrillic.el explicitly declares -*- coding: utf-8 -*-. That's history. Nowadays we have this in file-coding-system-alist: ("\\.el\\'" . prefer-utf-8) > Is this what you had in mind? Yes, thanks. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-08 16:33 ` Eli Zaretskii @ 2023-07-08 16:57 ` Basil Contovounesios 0 siblings, 0 replies; 97+ messages in thread From: Basil Contovounesios @ 2023-07-08 16:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, eggert, emacs-devel Eli Zaretskii [2023-07-08 19:33 +0300] wrote: >> From: Basil Contovounesios <contovob@tcd.ie> >> Cc: ulm@gentoo.org, eggert@cs.ucla.edu, emacs-devel@gnu.org >> Date: Sat, 08 Jul 2023 17:21:15 +0100 >> >> Is this what you had in mind? > Yes, thanks. Done: ; Simplify last change to cyrillic.el. 05984303a58 2023-07-08 17:51:58 +0100 https://git.sv.gnu.org/cgit/emacs.git/commit/?id=05984303a58 Thanks, -- Basil ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-08 16:21 ` Basil Contovounesios 2023-07-08 16:33 ` Eli Zaretskii @ 2023-07-08 18:21 ` Ulrich Mueller 2023-07-08 21:31 ` Basil Contovounesios 1 sibling, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-08 18:21 UTC (permalink / raw) To: Basil Contovounesios; +Cc: Eli Zaretskii, eggert, emacs-devel >>>>> On Sat, 08 Jul 2023, Basil Contovounesios wrote: > Is this what you had in mind? > diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el > index 9ad65877140..cf3ee5a2b9d 100644 > --- a/lisp/language/cyrillic.el > +++ b/lisp/language/cyrillic.el > @@ -127,7 +127,7 @@ 'koi8-u > "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)" > :coding-type 'charset > ;; This used to be ?U which collided with UTF-8. > - :mnemonic ?\u0423 ; ?\N{cyrillic capital letter u} У > + :mnemonic ?У > :charset-list '(koi8-u) > :mime-charset 'koi8-u) Could you keep the character's description as a comment? Like this: :mnemonic ?У ; cyrillic capital letter u ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-08 18:21 ` Ulrich Mueller @ 2023-07-08 21:31 ` Basil Contovounesios 0 siblings, 0 replies; 97+ messages in thread From: Basil Contovounesios @ 2023-07-08 21:31 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, eggert, emacs-devel Ulrich Mueller [2023-07-08 20:21 +0200] wrote: >>>>>> On Sat, 08 Jul 2023, Basil Contovounesios wrote: > >> diff --git a/lisp/language/cyrillic.el b/lisp/language/cyrillic.el >> index 9ad65877140..cf3ee5a2b9d 100644 >> --- a/lisp/language/cyrillic.el >> +++ b/lisp/language/cyrillic.el >> @@ -127,7 +127,7 @@ 'koi8-u >> "KOI8-U 8-bit encoding for Cyrillic (MIME: KOI8-U)" >> :coding-type 'charset >> ;; This used to be ?U which collided with UTF-8. >> - :mnemonic ?\u0423 ; ?\N{cyrillic capital letter u} У >> + :mnemonic ?У > > Could you keep the character's description as a comment? Like this: > > :mnemonic ?У ; cyrillic capital letter u Sure: ; Re-add recently removed comment in cyrillic.el. afa4fa17232 2023-07-08 22:27:20 +0100 https://git.sv.gnu.org/cgit/emacs.git/commit/?id=afa4fa17232 -- Basil ^ permalink raw reply [flat|nested] 97+ messages in thread
* Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) 2023-07-08 15:38 ` Eli Zaretskii 2023-07-08 16:21 ` Basil Contovounesios @ 2023-07-09 9:22 ` Ulrich Mueller 2023-07-09 9:57 ` Lisp reader syntax and bootstrap Po Lu 2023-07-09 11:35 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Eli Zaretskii 1 sibling, 2 replies; 97+ messages in thread From: Ulrich Mueller @ 2023-07-09 9:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Basil Contovounesios, eggert, emacs-devel >>>>> On Sat, 08 Jul 2023, Eli Zaretskii wrote: >> Thanks, but I think it's too early in the build to use \N{name} syntax: > Thanks, I've somehow missed that. I had done my tests if a non-ASCII char is displayed correctly with ?У. This including building Emacs from scratch. I changed from ?У to ?\N{name} last minute and apparently tested only if the file can be loaded and byte-compiled, but not a full bootstrap. :( Sorry for that. Are there any other features of lisp reader syntax where one must be careful? Maybe the Elisp Reference Manual should warn about this? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-09 9:22 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller @ 2023-07-09 9:57 ` Po Lu 2023-07-13 2:04 ` Richard Stallman 2023-07-09 11:35 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Eli Zaretskii 1 sibling, 1 reply; 97+ messages in thread From: Po Lu @ 2023-07-09 9:57 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, Basil Contovounesios, eggert, emacs-devel Ulrich Mueller <ulm@gentoo.org> writes: > Are there any other features of lisp reader syntax where one must be > careful? Maybe the Elisp Reference Manual should warn about this? I know of one: reader syntax for NaN and Inf cannot be used on machines without IEEE floating point, where they will be read as symbols instead (with possibly disastrous consequences.) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-09 9:57 ` Lisp reader syntax and bootstrap Po Lu @ 2023-07-13 2:04 ` Richard Stallman 2023-07-13 4:27 ` Po Lu 0 siblings, 1 reply; 97+ messages in thread From: Richard Stallman @ 2023-07-13 2:04 UTC (permalink / raw) To: Po Lu; +Cc: ulm, eliz, contovob, eggert, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I know of one: reader syntax for NaN and Inf cannot be used on machines > without IEEE floating point, where they will be read as symbols instead > (with possibly disastrous consequences.) Maybe the reader ought to recognize these anyway, and do something sensible with them -- such as, signal an error. -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-13 2:04 ` Richard Stallman @ 2023-07-13 4:27 ` Po Lu 2023-07-13 22:07 ` Paul Eggert 2023-07-16 2:19 ` Richard Stallman 0 siblings, 2 replies; 97+ messages in thread From: Po Lu @ 2023-07-13 4:27 UTC (permalink / raw) To: Richard Stallman; +Cc: ulm, eliz, contovob, eggert, emacs-devel Richard Stallman <rms@gnu.org> writes: > Maybe the reader ought to recognize these anyway, and do something > sensible with them -- such as, signal an error. Perhaps, but it still wouldn't make it any easier to write portable Lisp code. Lisp programmers should simply avoid using NaN and Inf, or at least handle arithmetic errors around functions that may generate them. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-13 4:27 ` Po Lu @ 2023-07-13 22:07 ` Paul Eggert 2023-07-14 5:05 ` Ulrich Mueller 2023-07-15 2:10 ` Richard Stallman 2023-07-16 2:19 ` Richard Stallman 1 sibling, 2 replies; 97+ messages in thread From: Paul Eggert @ 2023-07-13 22:07 UTC (permalink / raw) To: Po Lu, Richard Stallman; +Cc: ulm, eliz, contovob, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1261 bytes --] On 2023-07-12 21:27, Po Lu wrote: > Lisp programmers should simply avoid using NaN and Inf Support for NaN and Inf is ubiquitous nowadays, and it should be OK to write Emacs Lisp programs that use NaN and Inf, especially since Emacs itself uses them in its own Lisp files. As I actually made money back in the 1970s writing code for VAXes, I was moved to look into RMS's suggestion to signal an error when reading "0.0e+NaN" on a VAX. Unfortunately this broke calculator.el - that is, I couldn't load calculator.el (or calculator.elc), as loading signaled an error when it saw the NaN. And even if we removed the NaN from calculator.el (and I suppose, infinities from other .el files), users would run into similar issues with their own code. So I instead installed the attached patch. On a VAX this approximates infinities and NaNs with extremal values and non-numeric objects, respectively. Although I do not have a VAX to test it on, and am too lazy to spin up an emulator, I did test it on x86-64 by pretending that the x86-64 lacked IEEE support, and it seemed to work OK. (At least, I could load calculator.el....) This patch shouldn't change behavior (or even the executable code) on today's platforms. It's purely for computer museums. [-- Attachment #2: 0001-Port-NaN-infinity-handling-better-to-VAX.patch --] [-- Type: text/x-patch, Size: 6370 bytes --] From 0cd519971d199836ba0a6e9f0e36af9b9accaf0d Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Thu, 13 Jul 2023 14:26:29 -0700 Subject: [PATCH] Port NaN, infinity handling better to VAX Nowadays .elc files routinely contain tokens like 1.0e+INF and 0.0e+NaN that do not work on antiques like the VAX that lack IEEE fp. Port Emacs to these platforms, by treating infinities as extreme values and NaNs as strings that trap if used numerically. * src/lread.c (INFINITY): Default to HUGE_VAL if non-IEEE. (not_a_number) [!IEEE_FLOATING_POINT]: New static array. (syms_of_lread) [!IEEE_FLOATING_POINT]: Initialize it. (read0): Report invalid syntax for +0.0e+NaN on platforms that lack NaNs. (string_to_number): On non-IEEE platforms, return HUGE_VAL for infinity and a string for NaN. All callers changed. --- doc/lispref/numbers.texi | 10 ++++++---- etc/NEWS | 8 ++++++++ src/data.c | 3 ++- src/lread.c | 29 ++++++++++++++++++++++++++--- src/process.c | 3 ++- 5 files changed, 44 insertions(+), 9 deletions(-) diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi index 3e45aa90fda..bcf89fc9ab1 100644 --- a/doc/lispref/numbers.texi +++ b/doc/lispref/numbers.texi @@ -270,10 +270,6 @@ Float Basics signs and significands agree. Significands of NaNs are machine-dependent, as are the digits in their string representation. - NaNs are not available on systems which do not use IEEE -floating-point arithmetic; if the read syntax for a NaN is used on a -VAX, for example, the reader signals an error. - When NaNs and signed zeros are involved, non-numeric functions like @code{eql}, @code{equal}, @code{sxhash-eql}, @code{sxhash-equal} and @code{gethash} determine whether values are indistinguishable, not @@ -283,6 +279,12 @@ Float Basics conversely, @code{(equal 0.0 -0.0)} returns @code{nil} whereas @code{(= 0.0 -0.0)} returns @code{t}. + Infinities and NaNs are not available on legacy systems that lack +IEEE floating-point arithmetic. On a circa 1980 VAX, for example, the +Lisp reader approximates an infinity with the nearest finite value, +and a NaN with some other non-numeric Lisp object that provokes an +error if used numerically. + Here are read syntaxes for these special floating-point values: @table @asis diff --git a/etc/NEWS b/etc/NEWS index 5d5ea990b92..997f7e82c2b 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -585,6 +585,14 @@ behavior back for any other reason, you can do that using the previous behavior of showing 'U' in the mode line for 'koi8-u': (coding-system-put 'koi8-u :mnemonic ?U) + ++++ +** Infinities and NaNs no longer act as symbols on non-IEEE platforms. +On old platforms like the VAX that do not support IEEE floating-point, +tokens like 0.0e+NaN and 1.0e+INF are no longer read as symbols. +Instead, the Lisp reader approximates an infinity with the nearest +finite value, and a NaN with some other non-numeric object that +provokes an error if used numerically. \f * Lisp Changes in Emacs 30.1 diff --git a/src/data.c b/src/data.c index 6de8e0cf1a1..5a31462d8ca 100644 --- a/src/data.c +++ b/src/data.c @@ -3033,7 +3033,8 @@ DEFUN ("string-to-number", Fstring_to_number, Sstring_to_number, 1, 2, 0, p++; Lisp_Object val = string_to_number (p, b, 0); - return NILP (val) ? make_fixnum (0) : val; + return ((IEEE_FLOATING_POINT ? NILP (val) : !NUMBERP (val)) + ? make_fixnum (0) : val); } \f enum arithop diff --git a/src/lread.c b/src/lread.c index 51d0d2a3c24..6792ef27206 100644 --- a/src/lread.c +++ b/src/lread.c @@ -75,6 +75,10 @@ #define file_tell ftell # ifndef INFINITY # define INFINITY ((union ieee754_double) {.ieee = {.exponent = -1}}.d) # endif +#else +# ifndef INFINITY +# define INFINITY HUGE_VAL +# endif #endif /* The objects or placeholders read with the #n=object form. @@ -4477,10 +4481,17 @@ substitute_in_interval (INTERVAL interval, void *arg) } \f +#if !IEEE_FLOATING_POINT +/* Strings that stand in for +NaN, -NaN, respectively. */ +static Lisp_Object not_a_number[2]; +#endif + /* Convert the initial prefix of STRING to a number, assuming base BASE. If the prefix has floating point syntax and BASE is 10, return a nearest float; otherwise, if the prefix has integer syntax, return - the integer; otherwise, return nil. If PLEN, set *PLEN to the + the integer; otherwise, return nil. (On antique platforms that lack + support for NaNs, if the prefix has NaN syntax return a Lisp object that + will provoke an error if used as a number.) If PLEN, set *PLEN to the length of the numeric prefix if there is one, otherwise *PLEN is unspecified. */ @@ -4545,7 +4556,6 @@ string_to_number (char const *string, int base, ptrdiff_t *plen) cp++; while ('0' <= *cp && *cp <= '9'); } -#if IEEE_FLOATING_POINT else if (cp[-1] == '+' && cp[0] == 'I' && cp[1] == 'N' && cp[2] == 'F') { @@ -4558,12 +4568,17 @@ string_to_number (char const *string, int base, ptrdiff_t *plen) { state |= E_EXP; cp += 3; +#if IEEE_FLOATING_POINT union ieee754_double u = { .ieee_nan = { .exponent = 0x7ff, .quiet_nan = 1, .mantissa0 = n >> 31 >> 1, .mantissa1 = n }}; value = u.d; - } +#else + if (plen) + *plen = cp - string; + return not_a_number[negative]; #endif + } else cp = ecp; } @@ -5707,6 +5722,14 @@ syms_of_lread (void) DEFSYM (Qcomma, ","); DEFSYM (Qcomma_at, ",@"); +#if !IEEE_FLOATING_POINT + for (int negative = 0; negative < 2; negative++) + { + not_a_number[negative] = build_pure_c_string (&"-0.0e+NaN"[!negative]); + staticpro (¬_a_number[negative]); + } +#endif + DEFSYM (Qinhibit_file_name_operation, "inhibit-file-name-operation"); DEFSYM (Qascii_character, "ascii-character"); DEFSYM (Qfunction, "function"); diff --git a/src/process.c b/src/process.c index 67d1d3e425f..2d6e08f16b5 100644 --- a/src/process.c +++ b/src/process.c @@ -7130,7 +7130,8 @@ DEFUN ("internal-default-signal-process", { ptrdiff_t len; tem = string_to_number (SSDATA (process), 10, &len); - if (NILP (tem) || len != SBYTES (process)) + if ((IEEE_FLOATING_POINT ? NILP (tem) : !NUMBERP (tem)) + || len != SBYTES (process)) return Qnil; } process = tem; -- 2.39.2 ^ permalink raw reply related [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-13 22:07 ` Paul Eggert @ 2023-07-14 5:05 ` Ulrich Mueller 2023-07-14 6:57 ` Paul Eggert 2023-07-15 2:10 ` Richard Stallman 1 sibling, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-14 5:05 UTC (permalink / raw) To: Paul Eggert; +Cc: Po Lu, Richard Stallman, eliz, contovob, emacs-devel >>>>> On Fri, 14 Jul 2023, Paul Eggert wrote: > @@ -283,6 +279,12 @@ Float Basics > conversely, @code{(equal 0.0 -0.0)} returns @code{nil} whereas > @code{(= 0.0 -0.0)} returns @code{t}. > > + Infinities and NaNs are not available on legacy systems that lack > +IEEE floating-point arithmetic. On a circa 1980 VAX, for example, the > +Lisp reader approximates an infinity with the nearest finite value, "Nearest" sounds a little strange here. HUGE_VAL has still an infinite distance from infinity. Maybe some wording like "approximates positive and negative infinities with the largest and smallest representable finite numbers" would be more accurate? > +and a NaN with some other non-numeric Lisp object that provokes an > +error if used numerically. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-14 5:05 ` Ulrich Mueller @ 2023-07-14 6:57 ` Paul Eggert 0 siblings, 0 replies; 97+ messages in thread From: Paul Eggert @ 2023-07-14 6:57 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Po Lu, Richard Stallman, eliz, contovob, emacs-devel [-- Attachment #1: Type: text/plain, Size: 486 bytes --] On 2023-07-13 22:05, Ulrich Mueller wrote: > Maybe some wording like "approximates positive and negative infinities > with the largest and smallest representable finite numbers" would be > more accurate? Unfortunately "smallest" connotes being close to zero. Also, I just looked at the C Standard again, and it doesn't guarantee that HUGE_VAL is the maximum 'double' on a VAX (!). Anyway, thanks for pointing out the confusion. I installed the attached to try to clear matters up. [-- Attachment #2: 0001-Improve-doc-for-VAX-reading-NaN-INF.patch --] [-- Type: text/x-patch, Size: 1250 bytes --] From be501f468ed36cddf01305b88bab44366b447c03 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Thu, 13 Jul 2023 23:36:33 -0700 Subject: [PATCH 1/2] Improve doc for VAX reading NaN, INF * doc/lispref/numbers.texi (Float Basics): Improve description of how Lisp reads infinities and NaNs on a VAX. --- doc/lispref/numbers.texi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi index bcf89fc9ab1..a49afb73539 100644 --- a/doc/lispref/numbers.texi +++ b/doc/lispref/numbers.texi @@ -280,9 +280,9 @@ Float Basics @code{(= 0.0 -0.0)} returns @code{t}. Infinities and NaNs are not available on legacy systems that lack -IEEE floating-point arithmetic. On a circa 1980 VAX, for example, the -Lisp reader approximates an infinity with the nearest finite value, -and a NaN with some other non-numeric Lisp object that provokes an +IEEE floating-point arithmetic. On a circa 1980 VAX, for example, +Lisp reads @samp{1.0e+INF} as a large but finite floating-point number, +and @samp{0.0e+NaN} as some other non-numeric Lisp object that provokes an error if used numerically. Here are read syntaxes for these special floating-point values: -- 2.39.2 [-- Attachment #3: 0002-Reorder-NaN-INF-paras.patch --] [-- Type: text/x-patch, Size: 1582 bytes --] From 01b80a6f0e40a4390717a79a73c61899e2ec2968 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Thu, 13 Jul 2023 23:55:50 -0700 Subject: [PATCH 2/2] Reorder NaN, INF paras * doc/lispref/numbers.texi (Float Basics): Reorder paragraphs so that examples follow defns. --- doc/lispref/numbers.texi | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi index a49afb73539..071ec0f518d 100644 --- a/doc/lispref/numbers.texi +++ b/doc/lispref/numbers.texi @@ -279,12 +279,6 @@ Float Basics conversely, @code{(equal 0.0 -0.0)} returns @code{nil} whereas @code{(= 0.0 -0.0)} returns @code{t}. - Infinities and NaNs are not available on legacy systems that lack -IEEE floating-point arithmetic. On a circa 1980 VAX, for example, -Lisp reads @samp{1.0e+INF} as a large but finite floating-point number, -and @samp{0.0e+NaN} as some other non-numeric Lisp object that provokes an -error if used numerically. - Here are read syntaxes for these special floating-point values: @table @asis @@ -294,6 +288,12 @@ Float Basics @samp{0.0e+NaN} and @samp{-0.0e+NaN} @end table + Infinities and NaNs are not available on legacy systems that lack +IEEE floating-point arithmetic. On a circa 1980 VAX, for example, +Lisp reads @samp{1.0e+INF} as a large but finite floating-point number, +and @samp{0.0e+NaN} as some other non-numeric Lisp object that provokes an +error if used numerically. + The following functions are specialized for handling floating-point numbers: -- 2.39.2 ^ permalink raw reply related [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-13 22:07 ` Paul Eggert 2023-07-14 5:05 ` Ulrich Mueller @ 2023-07-15 2:10 ` Richard Stallman 2023-07-15 2:38 ` Po Lu 2023-07-15 15:22 ` Paul Eggert 1 sibling, 2 replies; 97+ messages in thread From: Richard Stallman @ 2023-07-15 2:10 UTC (permalink / raw) To: Paul Eggert; +Cc: luangruo, ulm, eliz, contovob, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > So I instead installed the attached patch. On a VAX this approximates > infinities and NaNs with extremal values and non-numeric objects, That's a bit of a kludge -- using those values might be wrong for some purposes... if this will really happen. > This patch shouldn't change behavior (or even the executable code) on > today's platforms. It's purely for computer museums. If that's true, maybe your solution is fine. Pu Lu, you wrote > I know of one: reader syntax for NaN and Inf cannot be used on machines > without IEEE floating point, where they will be read as symbols instead > (with possibly disastrous consequences.) Do you know of a machine where this would really happen? -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-15 2:10 ` Richard Stallman @ 2023-07-15 2:38 ` Po Lu 2023-07-15 5:18 ` Philip Kaludercic 2023-07-15 15:22 ` Paul Eggert 1 sibling, 1 reply; 97+ messages in thread From: Po Lu @ 2023-07-15 2:38 UTC (permalink / raw) To: Richard Stallman; +Cc: Paul Eggert, ulm, eliz, contovob, emacs-devel Richard Stallman <rms@gnu.org> writes: > Do you know of a machine where this would really happen? VAXen, of course, and possibly future machines that haven't been designed yet. I think I overheard some of the NetBSD porters talking about this problem. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-15 2:38 ` Po Lu @ 2023-07-15 5:18 ` Philip Kaludercic 2023-07-15 5:50 ` Po Lu 0 siblings, 1 reply; 97+ messages in thread From: Philip Kaludercic @ 2023-07-15 5:18 UTC (permalink / raw) To: Po Lu; +Cc: Richard Stallman, Paul Eggert, ulm, eliz, contovob, emacs-devel Po Lu <luangruo@yahoo.com> writes: > Richard Stallman <rms@gnu.org> writes: > >> Do you know of a machine where this would really happen? > > VAXen, of course, and possibly future machines that haven't been > designed yet. Floating point operations are regarded as optional (but conventional) extensions by RISC-V. > I think I overheard some of the NetBSD porters talking about this > problem. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-15 5:18 ` Philip Kaludercic @ 2023-07-15 5:50 ` Po Lu 0 siblings, 0 replies; 97+ messages in thread From: Po Lu @ 2023-07-15 5:50 UTC (permalink / raw) To: Philip Kaludercic Cc: Richard Stallman, Paul Eggert, ulm, eliz, contovob, emacs-devel Philip Kaludercic <philipk@posteo.net> writes: > Floating point operations are regarded as optional (but conventional) > extensions by RISC-V. Right, though somehow I doubt compilers will choose to implement non-IEEE floating point support for that particular CPU, at least by default. Being thorough about portability is simply good future proofing. I don't anticipate any new non-IEEE machines entering general use in the short term. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-15 2:10 ` Richard Stallman 2023-07-15 2:38 ` Po Lu @ 2023-07-15 15:22 ` Paul Eggert 2023-07-17 2:22 ` Richard Stallman 2023-07-17 2:32 ` Po Lu 1 sibling, 2 replies; 97+ messages in thread From: Paul Eggert @ 2023-07-15 15:22 UTC (permalink / raw) To: rms; +Cc: luangruo, ulm, eliz, contovob, emacs-devel On 2023-07-14 19:10, Richard Stallman wrote: > That's a bit of a kludge -- using those values might be wrong for > some purposes... Yes, though there are similar problems even for finite numbers, since they also behave differently on the VAX, sometimes significantly. E.g., VAX subtraction can underflow to zero via catastrophic cancellation, whereas IEEE subtraction cannot. Though my change is indeed a hack, it is an improvement in that Emacs can now start up and load 'calculator' whereas formerly it could not. I "tested" this by manually setting IEEE_FLOATING_POINT to zero on x86-64, and compiling and running the result. Of course this is not the same as a real VAX. > > This patch shouldn't change behavior (or even the executable code) on > > today's platforms. It's purely for computer museums. > > If that's true, maybe your solution is fine. I surveyed the net and it appears to be true. The only Emacs platform that still insists on non-IEEE floating point is NetBSD/vax, and nowadays that platform seems to be run only for computer-museum-like purposes. These uses are rare even by computer-museum standards, as most hobbyist historians that simulate VAXes seem to prefer VMS to NetBSD, I expect partly because VMS "feels" older (it differs more from GNU :-). And VMS-using hobbyists can't run current Emacs, as Emacs 23 dropped VMS support. PS. Are you aware of the licensing dispute over the emulator that hobbyists typically use to run VAX code? This emulator, SIMH, is based on software that dates back to the 1960s, so in some sense it's even older than Emacs. The dispute involves a license clause introduced a year ago that I've not seen before, and it's not clear to me that SIMH is free software any more. However, this change has been disputed and SIMH has forked off a new project Open SIMH that does not have the controversial clause. For details, please see: https://groups.io/g/simh/topic/91528716#1659 The disputed licensing clause is at the end of this change: https://github.com/simh/simh/commit/ce2adce632e1a22e6d76d4bf726d6b863373c550 ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-15 15:22 ` Paul Eggert @ 2023-07-17 2:22 ` Richard Stallman 2023-07-17 5:26 ` Paul Eggert 2023-07-17 2:32 ` Po Lu 1 sibling, 1 reply; 97+ messages in thread From: Richard Stallman @ 2023-07-17 2:22 UTC (permalink / raw) To: Paul Eggert; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > The dispute involves a license clause introduced a > year ago that I've not seen before, and it's not clear to me that SIMH > is free software any more. However, this change has been disputed and > SIMH has forked off a new project Open SIMH that does not have the > controversial clause. For details, please see: Bravo for them! If only they called it Libre SIMH... > The disputed licensing clause is at the end of this change: > https://github.com/simh/simh/commit/ce2adce632e1a22e6d76d4bf726d6b863373c550 When I visit that URL I see two files created initially in that commit. It is not explicitly clear which text is the "disputed licensing clause". Does that consist of the text starting with line 32 in LICENSE.txt? Anyway it is quite clear that that text makes the license nonfree. Can you tell me how to contact the develoers of Open SIMH? -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-17 2:22 ` Richard Stallman @ 2023-07-17 5:26 ` Paul Eggert 0 siblings, 0 replies; 97+ messages in thread From: Paul Eggert @ 2023-07-17 5:26 UTC (permalink / raw) To: rms; +Cc: emacs-devel On 2023-07-16 19:22, Richard Stallman wrote: > > https://github.com/simh/simh/commit/ce2adce632e1a22e6d76d4bf726d6b863373c550 > > When I visit that URL I see two files created initially in that commit. > It is not explicitly clear which text is the "disputed licensing clause". > Does that consist of the text starting with line 32 in LICENSE.txt? Yes. > Anyway it is quite clear that that text makes the license nonfree. > > Can you tell me how to contact the develoers of Open SIMH? Contact instructions are here: https://opensimh.org/contacts/ The email address is <simh@groups.io>; I don't know whether one must be a list member to send email to the list. PS. Their mailing list's most recent post was about the IBM 1620, which if you'll recall had variable-precision decimal floating point. Total memory was at most 20,000 decimal digits (they used a form of BCD so this was equivalent to 120,000 bits). I hope nobody tries to port Emacs to an IBM 1620.... ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-15 15:22 ` Paul Eggert 2023-07-17 2:22 ` Richard Stallman @ 2023-07-17 2:32 ` Po Lu 1 sibling, 0 replies; 97+ messages in thread From: Po Lu @ 2023-07-17 2:32 UTC (permalink / raw) To: Paul Eggert; +Cc: rms, ulm, eliz, contovob, emacs-devel Paul Eggert <eggert@cs.ucla.edu> writes: > Yes, though there are similar problems even for finite numbers, since > they also behave differently on the VAX, sometimes > significantly. E.g., VAX subtraction can underflow to zero via > catastrophic cancellation, whereas IEEE subtraction cannot. Subnormal numbers (or rather, the lack thereof) shouldn't pose a portability problem for practical Lisp code, as they don't affect reader syntax or cause errors to be signaled. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap 2023-07-13 4:27 ` Po Lu 2023-07-13 22:07 ` Paul Eggert @ 2023-07-16 2:19 ` Richard Stallman 1 sibling, 0 replies; 97+ messages in thread From: Richard Stallman @ 2023-07-16 2:19 UTC (permalink / raw) To: Po Lu; +Cc: ulm, eliz, contovob, eggert, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Perhaps, but it still wouldn't make it any easier to write portable Lisp > code. Lisp programmers should simply avoid using NaN and Inf, or at > least handle arithmetic errors around functions that may generate them. If this problem is only on the VAX and the VAX is obsolete, and the problem is now limited to replacing NaN and Inf with specific numbers, it may not be worth any extra work. If you want to do work for this goal, the way to do it would be to figure out some constructs that are more portable, that we could implement alongside what we have now. -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) 2023-07-09 9:22 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller 2023-07-09 9:57 ` Lisp reader syntax and bootstrap Po Lu @ 2023-07-09 11:35 ` Eli Zaretskii 1 sibling, 0 replies; 97+ messages in thread From: Eli Zaretskii @ 2023-07-09 11:35 UTC (permalink / raw) To: Ulrich Mueller; +Cc: contovob, eggert, emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: Basil Contovounesios <contovob@tcd.ie>, eggert@cs.ucla.edu, > emacs-devel@gnu.org > Date: Sun, 09 Jul 2023 11:22:39 +0200 > > Are there any other features of lisp reader syntax where one must be > careful? Maybe the Elisp Reference Manual should warn about this? If you change a file FOO.el that is preloaded in loadup.el, you need to make sure any function/macro/feature you use there is defined/implemented either in C or by files loaded by loadup _before_ FOO.el. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-06 13:08 ` Ulrich Mueller 2023-07-06 17:37 ` Paul Eggert @ 2023-07-07 0:19 ` Po Lu 1 sibling, 0 replies; 97+ messages in thread From: Po Lu @ 2023-07-07 0:19 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier Ulrich Mueller <ulm@gentoo.org> writes: >>>>>> On Thu, 06 Jul 2023, Po Lu wrote: > >> I disagree. UTF-7, UTF-8 and UTF-16 both encode the same coded >> character set (or at least the BMP of the same character set.) That's a >> far cry from there being ``nothing in common''. > > This argument applies only to UTF-8 and UTF-16. > > OTOH, UTF-7 isn't part of the Unicode standard. Also, it cannot encode > all of Unicode but only the first 65536 code points [1]. As I said, the BMP of the same character set. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 13:04 ` Ulrich Mueller 2023-07-05 13:44 ` Eli Zaretskii @ 2023-07-06 12:27 ` Po Lu 2023-07-07 7:09 ` UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller 1 sibling, 1 reply; 97+ messages in thread From: Po Lu @ 2023-07-06 12:27 UTC (permalink / raw) To: Ulrich Mueller; +Cc: Eli Zaretskii, emacs-devel, drew.adams, eggert, monnier Ulrich Mueller <ulm@gentoo.org> writes: > UTF-8 is one of the most common encodings, and it is strange that it > shares its modeline indicator with anything else. And the "U" is really > ambiguous, because context won't help (or how would you decide if a > buffer's file encoding is e.g. koi8-u or utf-8?). > > As you say, the others in the above list are rarely used nowadays. So, > maybe users should see the "u" or the "K" to indicate that the file has > an unusual encoding? The coding system indication in the mode line is most useful for determining which characters can be represented in the file's coding system. Since the same characters can be encoded in all of UTF-16, UTF-32 and UTF-8, it is only natural for them to share the same mode line indicator. ^ permalink raw reply [flat|nested] 97+ messages in thread
* UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) 2023-07-06 12:27 ` Po Lu @ 2023-07-07 7:09 ` Ulrich Mueller 2023-07-07 7:34 ` Eli Zaretskii 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 7:09 UTC (permalink / raw) To: emacs-devel >>>>> On Thu, 06 Jul 2023, Po Lu wrote: > [...] Since the same characters can be encoded in all of UTF-16, > UTF-32 and UTF-8, it is only natural for them to share the same mode > line indicator. On a different tangent, Emacs doesn't seem to know about UTF-32, which I find a little surprising. Is there simply no need for that encoding, or am I missing something? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) 2023-07-07 7:09 ` UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller @ 2023-07-07 7:34 ` Eli Zaretskii 2023-07-07 8:20 ` UTF-32 Ulrich Mueller 0 siblings, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-07 7:34 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Date: Fri, 07 Jul 2023 09:09:20 +0200 > > On a different tangent, Emacs doesn't seem to know about UTF-32, which > I find a little surprising. > > Is there simply no need for that encoding, or am I missing something? There's no need. We don't support character codepoints that are wider than 32 bits. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 2023-07-07 7:34 ` Eli Zaretskii @ 2023-07-07 8:20 ` Ulrich Mueller 2023-07-07 10:16 ` UTF-32 Eli Zaretskii 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 8:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote: >> On a different tangent, Emacs doesn't seem to know about UTF-32, which >> I find a little surprising. >> >> Is there simply no need for that encoding, or am I missing something? > There's no need. We don't support character codepoints that are wider > than 32 bits. IIUC UTF-32 (aka UCS-4) encodes only Unicode codepoints, and it encodes every character as 4 bytes. https://www.unicode.org/reports/tr19/tr19-9.html ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 2023-07-07 8:20 ` UTF-32 Ulrich Mueller @ 2023-07-07 10:16 ` Eli Zaretskii 2023-07-07 10:34 ` UTF-32 Ulrich Mueller 0 siblings, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2023-07-07 10:16 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: emacs-devel@gnu.org > Date: Fri, 07 Jul 2023 10:20:07 +0200 > > >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote: > > >> On a different tangent, Emacs doesn't seem to know about UTF-32, which > >> I find a little surprising. > >> > >> Is there simply no need for that encoding, or am I missing something? > > > There's no need. We don't support character codepoints that are wider > > than 32 bits. > > IIUC UTF-32 (aka UCS-4) encodes only Unicode codepoints, and it encodes > every character as 4 bytes. > > https://www.unicode.org/reports/tr19/tr19-9.html Yes, I know. Not sure why you posted this, though. If you are saying that this somehow contradicts what I wrote above, please elaborate, because I don't see the contradiction. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 2023-07-07 10:16 ` UTF-32 Eli Zaretskii @ 2023-07-07 10:34 ` Ulrich Mueller 2023-07-07 12:49 ` UTF-32 Eli Zaretskii 0 siblings, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 10:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote: >> >> On a different tangent, Emacs doesn't seem to know about UTF-32, which >> >> I find a little surprising. >> >> >> >> Is there simply no need for that encoding, or am I missing something? >> >> > There's no need. We don't support character codepoints that are wider >> > than 32 bits. >> >> IIUC UTF-32 (aka UCS-4) encodes only Unicode codepoints, and it encodes >> every character as 4 bytes. >> >> https://www.unicode.org/reports/tr19/tr19-9.html > Yes, I know. Not sure why you posted this, though. If you are saying > that this somehow contradicts what I wrote above, please elaborate, > because I don't see the contradiction. I don't understand how "codepoints that are wider than 32 bits" are related to UTF-32. UTF-8, UTF-16, and UTF-32 all encode the same repertoire (U+0000 to U+10FFFF). Emacs knows about UTF-8 and UTF-16 but not about UTF-32. Is it an unreasonable question to ask why that is so? (Just out of interest, I do not challenge it, and I have no need for UTF-32.) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 2023-07-07 10:34 ` UTF-32 Ulrich Mueller @ 2023-07-07 12:49 ` Eli Zaretskii 2023-07-07 13:24 ` UTF-32 Andreas Schwab 2023-07-07 13:36 ` UTF-32 Ulrich Mueller 0 siblings, 2 replies; 97+ messages in thread From: Eli Zaretskii @ 2023-07-07 12:49 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: emacs-devel@gnu.org > Date: Fri, 07 Jul 2023 12:34:17 +0200 > > >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote: > > >> https://www.unicode.org/reports/tr19/tr19-9.html > > > Yes, I know. Not sure why you posted this, though. If you are saying > > that this somehow contradicts what I wrote above, please elaborate, > > because I don't see the contradiction. > > I don't understand how "codepoints that are wider than 32 bits" > are related to UTF-32. Because using UTF-32 for codepoints that fit in 32 bits makes very little sense. See, e.g., https://en.wikipedia.org/wiki/UTF-32. > UTF-8, UTF-16, and UTF-32 all encode the same > repertoire (U+0000 to U+10FFFF). UTF-8 is identical with the codepoints as long as the codepoints are plain-ASCII. UTF-16 is identical with the codepoints as long as the codepoints are inside the BMP. UTF-32 is identical with the codepoints as long as the codepoints don't exceed 32 bits. Since Unicode doesn't exceed 32 bits, and Emacs extensions of the Unicode codepoint space also don't exceed 32 bits, Emacs doesn't need to use UTF-32. > Emacs knows about UTF-8 and UTF-16 but not about UTF-32. Is it an > unreasonable question to ask why that is so? (Just out of interest, > I do not challenge it, and I have no need for UTF-32.) The question is fine, and I think I answered it. Did I miss some aspects of the question? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 2023-07-07 12:49 ` UTF-32 Eli Zaretskii @ 2023-07-07 13:24 ` Andreas Schwab 2023-07-07 13:36 ` UTF-32 Ulrich Mueller 1 sibling, 0 replies; 97+ messages in thread From: Andreas Schwab @ 2023-07-07 13:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ulrich Mueller, emacs-devel On Jul 07 2023, Eli Zaretskii wrote: > Because using UTF-32 for codepoints that fit in 32 bits makes very > little sense. *Every* codepoint fits in 32 bits. That's why UTF-32 (aka UCS-4) exists. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 2023-07-07 12:49 ` UTF-32 Eli Zaretskii 2023-07-07 13:24 ` UTF-32 Andreas Schwab @ 2023-07-07 13:36 ` Ulrich Mueller 2023-07-07 14:06 ` UTF-32 Eli Zaretskii 1 sibling, 1 reply; 97+ messages in thread From: Ulrich Mueller @ 2023-07-07 13:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >>>>> On Fri, 07 Jul 2023, Eli Zaretskii wrote: >> I don't understand how "codepoints that are wider than 32 bits" >> are related to UTF-32. > Because using UTF-32 for codepoints that fit in 32 bits makes very > little sense. See, e.g., https://en.wikipedia.org/wiki/UTF-32. Sure, it is a wasteful encoding, and it has issues with byte ordering (but the same is true for UTF-16). >> UTF-8, UTF-16, and UTF-32 all encode the same >> repertoire (U+0000 to U+10FFFF). > UTF-8 is identical with the codepoints as long as the codepoints are > plain-ASCII. UTF-16 is identical with the codepoints as long as the > codepoints are inside the BMP. UTF-32 is identical with the > codepoints as long as the codepoints don't exceed 32 bits. Since > Unicode doesn't exceed 32 bits, and Emacs extensions of the Unicode > codepoint space also don't exceed 32 bits, Emacs doesn't need to use > UTF-32. >> Emacs knows about UTF-8 and UTF-16 but not about UTF-32. Is it an >> unreasonable question to ask why that is so? (Just out of interest, >> I do not challenge it, and I have no need for UTF-32.) > The question is fine, and I think I answered it. Did I miss some > aspects of the question? The previous discussion was in the context of _file_ coding systems. Emacs cannot read or write files encoded in UTF-32, correct? So probably such files just don't exist, or somebody would have implemented it in the meantime? (OTOH, GNU Recode knows about UTF-32, UTF-32BE, and UTF-32LE. No UTF-32NUXI, though. :) ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: UTF-32 2023-07-07 13:36 ` UTF-32 Ulrich Mueller @ 2023-07-07 14:06 ` Eli Zaretskii 0 siblings, 0 replies; 97+ messages in thread From: Eli Zaretskii @ 2023-07-07 14:06 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel > From: Ulrich Mueller <ulm@gentoo.org> > Cc: emacs-devel@gnu.org > Date: Fri, 07 Jul 2023 15:36:31 +0200 > > The previous discussion was in the context of _file_ coding systems. > Emacs cannot read or write files encoded in UTF-32, correct? It can't, but when did you see such files in the wild? The Wikipedia article says UTF-32 is used internally by programs, and says that for a reason. > So probably such files just don't exist, or somebody would have > implemented it in the meantime? (OTOH, GNU Recode knows about UTF-32, > UTF-32BE, and UTF-32LE. No UTF-32NUXI, though. :) Implementing this would not be hard, but why implement something for which we have no use? ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 10:08 ` Ulrich Mueller 2023-07-05 11:41 ` Eli Zaretskii @ 2023-07-05 12:49 ` Stefan Monnier 2023-07-05 13:38 ` Eli Zaretskii 2023-07-06 19:07 ` Filipp Gunbin 1 sibling, 2 replies; 97+ messages in thread From: Stefan Monnier @ 2023-07-05 12:49 UTC (permalink / raw) To: Ulrich Mueller; +Cc: emacs-devel, Drew Adams, Eli Zaretskii, eggert > + -- utf-8* (all variants) > (everything else unchanged) > > or: > > U -- utf-8* (all variants) > u -- utf-16* (all variants) > u -- utf-7 > K -- koi8-u They both sound good to me. BTW, things like 8bit coding systems like koi8 are (and should be) becoming sufficiently rare nowadays in my experience that we could consider using non-single-letter thingies for them (I'd even welcome extra highlighting with some kind of warning color). Stefan ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 12:49 ` Disambiguate modeline character for UTF-8? Stefan Monnier @ 2023-07-05 13:38 ` Eli Zaretskii 2023-07-06 19:07 ` Filipp Gunbin 1 sibling, 0 replies; 97+ messages in thread From: Eli Zaretskii @ 2023-07-05 13:38 UTC (permalink / raw) To: Stefan Monnier; +Cc: ulm, emacs-devel, drew.adams, eggert > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: emacs-devel@gnu.org, Drew Adams <drew.adams@oracle.com>, Eli Zaretskii > <eliz@gnu.org>, eggert@cs.ucla.edu > Date: Wed, 05 Jul 2023 08:49:44 -0400 > > BTW, things like 8bit coding systems like koi8 are (and should be) > becoming sufficiently rare nowadays in my experience that we could > consider using non-single-letter thingies for them This will require infrastructure changes, since currently the mnemonic can only be a single character. I'm not sure such a change (and the difficulties it will cause, like how do you display the triplet, which is today "UUU", on the modeline) would be justified. We could instead simply reuse the letters used by those ancient encodings, disregarding the fact that they are already "in use". ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2023-07-05 12:49 ` Disambiguate modeline character for UTF-8? Stefan Monnier 2023-07-05 13:38 ` Eli Zaretskii @ 2023-07-06 19:07 ` Filipp Gunbin 1 sibling, 0 replies; 97+ messages in thread From: Filipp Gunbin @ 2023-07-06 19:07 UTC (permalink / raw) To: Stefan Monnier Cc: Ulrich Mueller, emacs-devel, Drew Adams, Eli Zaretskii, eggert On 05/07/2023 08:49 -0400, Stefan Monnier wrote: >> + -- utf-8* (all variants) >> (everything else unchanged) >> >> or: >> >> U -- utf-8* (all variants) >> u -- utf-16* (all variants) >> u -- utf-7 >> K -- koi8-u > > They both sound good to me. > > BTW, things like 8bit coding systems like koi8 are (and should be) > becoming sufficiently rare nowadays in my experience that we could > consider using non-single-letter thingies for them (I'd even welcome > extra highlighting with some kind of warning color). My experience differs: for example, cp1251 is often used for auto-generated csv files (like transaction list from an online bank) in Russian environments. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 19:13 ` Ulrich Mueller 2020-08-23 19:42 ` Eli Zaretskii @ 2020-08-23 19:47 ` Stefan Kangas 1 sibling, 0 replies; 97+ messages in thread From: Stefan Kangas @ 2020-08-23 19:47 UTC (permalink / raw) To: Ulrich Mueller, Eli Zaretskii; +Cc: eggert, monnier, emacs-devel Ulrich Mueller <ulm@gentoo.org> writes: > I stumbled upon this when updating a short section about Emacs in the > Gentoo developer manual, where I realised that I cannot say that "-" > and "U" in the modeline indicate ASCII and UTF-8, respectively. > > IMHO these two are the most important ones nowadays, so they should be > unique. I don't really care about the rest (maybe "1" is still somewhat > important here in Europe), and tried to change as little as possible in > my suggestion. Namely, only move the ones colliding with "U" out of the > way and otherwise stay with ASCII. Do we even need an indicator for UTF-8? I find that I only need to know when it's something else. But maybe I'm missing something obvious. Best regards, Stefan Kangas ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-23 16:07 ` Eli Zaretskii 2020-08-23 18:24 ` Paul Eggert @ 2020-08-24 18:35 ` Juri Linkov 2020-08-24 18:55 ` Eli Zaretskii 1 sibling, 1 reply; 97+ messages in thread From: Juri Linkov @ 2020-08-24 18:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, Stefan Monnier, emacs-devel >> I don't see a strong reason to limit ourselves to a single char, FWIW, >> so I think `u7` is fine for utf-7* (it should be very rare anyway). > > It must be a single character, but OTOH it doesn't have to be an ASCII > character. I don't know where the requirement for a single character comes from, but since I can't memorize these cryptic characters, I customized the mode-line to display coding names in full, except a few characters that I can remember: "U" for UTF-8, and "-" for ASCII: ;; This fix uses mnemonics only for known codings that are frequently used. ;; Otherwise, it displays the full name of the codings. (setq-default mode-line-mule-info `("" (current-input-method (:propertize ("" current-input-method-title) local-map ,mode-line-input-method-map mouse-face mode-line-highlight)) (:eval (propertize (cond ((not (memq buffer-file-coding-system '(no-conversion undecided-unix prefer-utf-8-unix utf-8 utf-8-dos utf-8-emacs utf-8-emacs-dos utf-8-emacs-unix utf-8-unix))) (replace-regexp-in-string "-\\(?:dos\\|unix\\)$" "" (format "%S" buffer-file-coding-system))) (t "%z")) 'help-echo 'mode-line-mule-info-help-echo 'mouse-face 'mode-line-highlight 'local-map mode-line-coding-system-map)) (:eval (mode-line-eol-desc)))) A long coding string in the mode-line also serves as a warning that a non-standard coding is used in the buffer. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-24 18:35 ` Juri Linkov @ 2020-08-24 18:55 ` Eli Zaretskii 2020-08-25 18:59 ` Juri Linkov 0 siblings, 1 reply; 97+ messages in thread From: Eli Zaretskii @ 2020-08-24 18:55 UTC (permalink / raw) To: Juri Linkov; +Cc: ulm, monnier, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, ulm@gentoo.org, > emacs-devel@gnu.org > Date: Mon, 24 Aug 2020 21:35:56 +0300 > > >> I don't see a strong reason to limit ourselves to a single char, FWIW, > >> so I think `u7` is fine for utf-7* (it should be very rare anyway). > > > > It must be a single character, but OTOH it doesn't have to be an ASCII > > character. > > I don't know where the requirement for a single character comes from, Look at the implementation of %z format on the mode line, and you will see that it expects a single character. > but since I can't memorize these cryptic characters, I customized > the mode-line to display coding names in full, except a few characters > that I can remember: "U" for UTF-8, and "-" for ASCII: So on a TTY, you can have "UTF-8UTF-8UTF-8", if all the 3 encodings are UTF-8? Or do you only handle buffer-file-coding-system and ignore the other 2 encodings? > A long coding string in the mode-line also serves as a warning that > a non-standard coding is used in the buffer. It's okay to customize the mode line to your personal needs, but are you really proposing this for a general-purpose feature in Emacs? Because then we'd need to start by deciding what is "non-standard" in this context. For example, assuming the "standard" encoding is the one determined by the locale, then if one lives in a non-UTF-8 locale, they will always see "non-standard" strings in each and every .el file they ever edit, which doesn't sound like a good idea to me. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-24 18:55 ` Eli Zaretskii @ 2020-08-25 18:59 ` Juri Linkov 2020-08-25 19:26 ` Eli Zaretskii 0 siblings, 1 reply; 97+ messages in thread From: Juri Linkov @ 2020-08-25 18:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, monnier, emacs-devel >> but since I can't memorize these cryptic characters, I customized >> the mode-line to display coding names in full, except a few characters >> that I can remember: "U" for UTF-8, and "-" for ASCII: > > So on a TTY, you can have "UTF-8UTF-8UTF-8", if all the 3 encodings > are UTF-8? Or do you only handle buffer-file-coding-system and ignore > the other 2 encodings? Currently the other 2 encodings are ignored on a TTY, but since often all 3 encodings are the same, then maybe it would be enough to display the full name for buffer-file-coding-system, and mnemonics for the other 2 encodings. >> A long coding string in the mode-line also serves as a warning that >> a non-standard coding is used in the buffer. > > It's okay to customize the mode line to your personal needs, but are > you really proposing this for a general-purpose feature in Emacs? > Because then we'd need to start by deciding what is "non-standard" in > this context. For example, assuming the "standard" encoding is the > one determined by the locale, then if one lives in a non-UTF-8 locale, > they will always see "non-standard" strings in each and every .el file > they ever edit, which doesn't sound like a good idea to me. A list of "standard" codings could be customizable, so every user could add more codings to it after learning their mnemonic characters. ^ permalink raw reply [flat|nested] 97+ messages in thread
* Re: Disambiguate modeline character for UTF-8? 2020-08-25 18:59 ` Juri Linkov @ 2020-08-25 19:26 ` Eli Zaretskii 0 siblings, 0 replies; 97+ messages in thread From: Eli Zaretskii @ 2020-08-25 19:26 UTC (permalink / raw) To: Juri Linkov; +Cc: ulm, monnier, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: monnier@iro.umontreal.ca, ulm@gentoo.org, emacs-devel@gnu.org > Date: Tue, 25 Aug 2020 21:59:00 +0300 > > > So on a TTY, you can have "UTF-8UTF-8UTF-8", if all the 3 encodings > > are UTF-8? Or do you only handle buffer-file-coding-system and ignore > > the other 2 encodings? > > Currently the other 2 encodings are ignored on a TTY, but since > often all 3 encodings are the same, then maybe it would be enough They might be the same in your locale, but it isn't necessarily the situation for everyone. > > It's okay to customize the mode line to your personal needs, but are > > you really proposing this for a general-purpose feature in Emacs? > > Because then we'd need to start by deciding what is "non-standard" in > > this context. For example, assuming the "standard" encoding is the > > one determined by the locale, then if one lives in a non-UTF-8 locale, > > they will always see "non-standard" strings in each and every .el file > > they ever edit, which doesn't sound like a good idea to me. > > A list of "standard" codings could be customizable, so every user > could add more codings to it after learning their mnemonic characters. My point is that "standard" depends on several factors, so a fixed preferred value is probably not enough. So I don't think you suggestion is a good idea in general, sorry. ^ permalink raw reply [flat|nested] 97+ messages in thread
end of thread, other threads:[~2023-07-17 5:26 UTC | newest] Thread overview: 97+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-08-23 11:46 Disambiguate modeline character for UTF-8? Ulrich Mueller 2020-08-23 15:27 ` Stefan Monnier 2020-08-23 16:07 ` Eli Zaretskii 2020-08-23 18:24 ` Paul Eggert 2020-08-23 18:53 ` Ulrich Mueller 2020-08-23 18:56 ` Eli Zaretskii 2020-08-23 18:57 ` Eli Zaretskii 2020-08-23 19:13 ` Ulrich Mueller 2020-08-23 19:42 ` Eli Zaretskii 2020-08-23 21:23 ` Stefan Monnier 2020-08-24 7:06 ` Ulrich Mueller 2020-08-24 14:30 ` Yuri Khan 2020-08-29 11:17 ` Ulrich Mueller 2020-08-24 14:36 ` Drew Adams 2020-08-24 15:23 ` Ulrich Mueller 2020-08-24 16:43 ` Stefan Monnier 2023-07-05 10:08 ` Ulrich Mueller 2023-07-05 11:41 ` Eli Zaretskii 2023-07-05 13:04 ` Ulrich Mueller 2023-07-05 13:44 ` Eli Zaretskii 2023-07-05 21:50 ` Ulrich Mueller 2023-07-05 22:11 ` Paul Eggert 2023-07-06 8:51 ` Ulrich Mueller 2023-07-06 5:33 ` Eli Zaretskii 2023-07-06 8:47 ` Ulrich Mueller 2023-07-06 9:20 ` Eli Zaretskii 2023-07-06 9:46 ` Ulrich Mueller 2023-07-06 12:34 ` Po Lu 2023-07-06 12:32 ` Po Lu 2023-07-06 12:31 ` Po Lu 2023-07-06 13:02 ` Andreas Schwab 2023-07-06 13:08 ` Ulrich Mueller 2023-07-06 17:37 ` Paul Eggert 2023-07-06 18:13 ` Eli Zaretskii 2023-07-06 18:44 ` Ulrich Müller 2023-07-06 19:01 ` Eli Zaretskii 2023-07-06 19:31 ` Ulrich Mueller 2023-07-07 5:18 ` Eli Zaretskii 2023-07-07 5:48 ` Ulrich Müller 2023-07-07 6:16 ` Po Lu 2023-07-07 6:41 ` Ulrich Mueller 2023-07-07 7:38 ` Po Lu 2023-07-07 9:44 ` Ulrich Mueller 2023-07-07 10:21 ` Eli Zaretskii 2023-07-07 10:42 ` Ulrich Mueller 2023-07-07 12:04 ` Po Lu 2023-07-07 13:01 ` Ulrich Mueller 2023-07-07 13:38 ` Po Lu 2023-07-07 12:01 ` Po Lu 2023-07-07 12:38 ` Andreas Schwab 2023-07-07 13:37 ` Po Lu 2023-07-07 13:45 ` Andreas Schwab 2023-07-07 12:58 ` Eli Zaretskii 2023-07-08 8:49 ` Eli Zaretskii 2023-07-08 15:27 ` Basil Contovounesios 2023-07-08 15:38 ` Eli Zaretskii 2023-07-08 16:21 ` Basil Contovounesios 2023-07-08 16:33 ` Eli Zaretskii 2023-07-08 16:57 ` Basil Contovounesios 2023-07-08 18:21 ` Ulrich Mueller 2023-07-08 21:31 ` Basil Contovounesios 2023-07-09 9:22 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller 2023-07-09 9:57 ` Lisp reader syntax and bootstrap Po Lu 2023-07-13 2:04 ` Richard Stallman 2023-07-13 4:27 ` Po Lu 2023-07-13 22:07 ` Paul Eggert 2023-07-14 5:05 ` Ulrich Mueller 2023-07-14 6:57 ` Paul Eggert 2023-07-15 2:10 ` Richard Stallman 2023-07-15 2:38 ` Po Lu 2023-07-15 5:18 ` Philip Kaludercic 2023-07-15 5:50 ` Po Lu 2023-07-15 15:22 ` Paul Eggert 2023-07-17 2:22 ` Richard Stallman 2023-07-17 5:26 ` Paul Eggert 2023-07-17 2:32 ` Po Lu 2023-07-16 2:19 ` Richard Stallman 2023-07-09 11:35 ` Lisp reader syntax and bootstrap (was: Re: Disambiguate modeline character for UTF-8?) Eli Zaretskii 2023-07-07 0:19 ` Disambiguate modeline character for UTF-8? Po Lu 2023-07-06 12:27 ` Po Lu 2023-07-07 7:09 ` UTF-32 (was: Re: Disambiguate modeline character for UTF-8?) Ulrich Mueller 2023-07-07 7:34 ` Eli Zaretskii 2023-07-07 8:20 ` UTF-32 Ulrich Mueller 2023-07-07 10:16 ` UTF-32 Eli Zaretskii 2023-07-07 10:34 ` UTF-32 Ulrich Mueller 2023-07-07 12:49 ` UTF-32 Eli Zaretskii 2023-07-07 13:24 ` UTF-32 Andreas Schwab 2023-07-07 13:36 ` UTF-32 Ulrich Mueller 2023-07-07 14:06 ` UTF-32 Eli Zaretskii 2023-07-05 12:49 ` Disambiguate modeline character for UTF-8? Stefan Monnier 2023-07-05 13:38 ` Eli Zaretskii 2023-07-06 19:07 ` Filipp Gunbin 2020-08-23 19:47 ` Stefan Kangas 2020-08-24 18:35 ` Juri Linkov 2020-08-24 18:55 ` Eli Zaretskii 2020-08-25 18:59 ` Juri Linkov 2020-08-25 19:26 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).