* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset [not found] ` <20180727064909.85288203C0@vcs0.savannah.gnu.org> @ 2018-07-27 12:38 ` Stefan Monnier 2018-07-27 13:10 ` Robert Pluim 2018-07-27 13:53 ` Eli Zaretskii 0 siblings, 2 replies; 15+ messages in thread From: Stefan Monnier @ 2018-07-27 12:38 UTC (permalink / raw) To: emacs-devel; +Cc: Eli Zaretskii > (char-charset char)))) > + ;; TIS620.2533 overlaps eight-bit-control, but we want to > + ;; show eight-bit for raw bytes, not some obscure character > + ;; set no one heard of. > + (charset (if (eq charset 'tis620-2533) > + 'eight-bit > + charset)) Do we need `char-charset` to return `tis620-2533` for these, or could we instead of the above change `char-charset` to return `eight-bit` or `eight-bit-control`? Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 12:38 ` [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset Stefan Monnier @ 2018-07-27 13:10 ` Robert Pluim 2018-07-27 13:46 ` Eli Zaretskii 2018-07-27 15:00 ` Andreas Schwab 2018-07-27 13:53 ` Eli Zaretskii 1 sibling, 2 replies; 15+ messages in thread From: Robert Pluim @ 2018-07-27 13:10 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> (char-charset char)))) >> + ;; TIS620.2533 overlaps eight-bit-control, but we want to >> + ;; show eight-bit for raw bytes, not some obscure character >> + ;; set no one heard of. >> + (charset (if (eq charset 'tis620-2533) >> + 'eight-bit >> + charset)) > > Do we need `char-charset` to return `tis620-2533` for these, or could we > instead of the above change `char-charset` to return `eight-bit` or > `eight-bit-control`? Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' to return 'eight-bit' for raw byte chars. Robert ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 13:10 ` Robert Pluim @ 2018-07-27 13:46 ` Eli Zaretskii 2018-07-27 15:00 ` Andreas Schwab 1 sibling, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2018-07-27 13:46 UTC (permalink / raw) To: Robert Pluim, Kenichi Handa; +Cc: emacs-devel > From: Robert Pluim <rpluim@gmail.com> > Cc: emacs-devel@gnu.org, Eli Zaretskii <eliz@gnu.org> > Gmane-Reply-To-List: yes > Date: Fri, 27 Jul 2018 15:10:23 +0200 > > Stefan Monnier <monnier@IRO.UMontreal.CA> writes: > > >> (char-charset char)))) > >> + ;; TIS620.2533 overlaps eight-bit-control, but we want to > >> + ;; show eight-bit for raw bytes, not some obscure character > >> + ;; set no one heard of. > >> + (charset (if (eq charset 'tis620-2533) > >> + 'eight-bit > >> + charset)) > > > > Do we need `char-charset` to return `tis620-2533` for these, or could we > > instead of the above change `char-charset` to return `eight-bit` or > > `eight-bit-control`? > > Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' to > return 'eight-bit' for raw byte chars. Yes, which is why I asked about Handa-san that. The documentation doesn't tell what are the considerations or disadvantages of using this property. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 13:10 ` Robert Pluim 2018-07-27 13:46 ` Eli Zaretskii @ 2018-07-27 15:00 ` Andreas Schwab 2018-07-27 17:54 ` Eli Zaretskii 1 sibling, 1 reply; 15+ messages in thread From: Andreas Schwab @ 2018-07-27 15:00 UTC (permalink / raw) To: emacs-devel On Jul 27 2018, Robert Pluim <rpluim@gmail.com> wrote: > Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' to > return 'eight-bit' for raw byte chars. But setting the language environment to "Thai" moves tis620-2533 back before eight-bit in the priority list. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 15:00 ` Andreas Schwab @ 2018-07-27 17:54 ` Eli Zaretskii 2018-07-27 17:59 ` Andreas Schwab 2018-07-27 19:03 ` Stefan Monnier 0 siblings, 2 replies; 15+ messages in thread From: Eli Zaretskii @ 2018-07-27 17:54 UTC (permalink / raw) To: emacs-devel, Andreas Schwab On July 27, 2018 6:00:18 PM GMT+03:00, Andreas Schwab <schwab@linux-m68k.org> wrote: > On Jul 27 2018, Robert Pluim <rpluim@gmail.com> wrote: > > > Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' > to > > return 'eight-bit' for raw byte chars. > > But setting the language environment to "Thai" moves tis620-2533 back > before eight-bit in the priority list. > > Andreas. In the Thai language environment, that just might make sense. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 17:54 ` Eli Zaretskii @ 2018-07-27 17:59 ` Andreas Schwab 2018-07-27 20:50 ` Eli Zaretskii 2018-07-27 19:03 ` Stefan Monnier 1 sibling, 1 reply; 15+ messages in thread From: Andreas Schwab @ 2018-07-27 17:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel On Jul 27 2018, Eli Zaretskii <eliz@gnu.org> wrote: > On July 27, 2018 6:00:18 PM GMT+03:00, Andreas Schwab <schwab@linux-m68k.org> wrote: >> On Jul 27 2018, Robert Pluim <rpluim@gmail.com> wrote: >> >> > Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' >> to >> > return 'eight-bit' for raw byte chars. >> >> But setting the language environment to "Thai" moves tis620-2533 back >> before eight-bit in the priority list. >> >> Andreas. > > In the Thai language environment, that just might make sense. Moving back to a different language environment doesn't fix that up. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 17:59 ` Andreas Schwab @ 2018-07-27 20:50 ` Eli Zaretskii 0 siblings, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2018-07-27 20:50 UTC (permalink / raw) To: Andreas Schwab; +Cc: emacs-devel > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: emacs-devel@gnu.org > Date: Fri, 27 Jul 2018 19:59:07 +0200 > > >> But setting the language environment to "Thai" moves tis620-2533 back > >> before eight-bit in the priority list. > >> > >> Andreas. > > > > In the Thai language environment, that just might make sense. > > Moving back to a different language environment doesn't fix that up. That is arguably a bug, but OTOH it is a very rare use case that the user needs to change their language environment in the middle of a session, let alone change it twice. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 17:54 ` Eli Zaretskii 2018-07-27 17:59 ` Andreas Schwab @ 2018-07-27 19:03 ` Stefan Monnier 2018-07-27 21:04 ` Eli Zaretskii 2018-07-28 7:53 ` Andreas Schwab 1 sibling, 2 replies; 15+ messages in thread From: Stefan Monnier @ 2018-07-27 19:03 UTC (permalink / raw) To: emacs-devel > The above change is safe, more-or-less, because it only affects how we > display information about characters. Making an equivalent change > inside char-charset would affect all users of that API. Yes, clearly this would be for `master`. > In the Thai language environment, that just might make sense. I don't see why it would ever make sense. Actually, I don't understand why (char-charset (unibyte-char-to-multibyte #x90)) should return `tis620-2533`, since the tis620.2533 standard defines a coding-system similar to those of iso8859 and the corresponding chars all exist in the BMP of Unicode (the "Thai block" U+0E01 through U+0E7F), so I can't see any obvious reason why they should affect the 3x3fffxx private space used by Emacs for "eight-bit chars". Could it be just a bug? Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 19:03 ` Stefan Monnier @ 2018-07-27 21:04 ` Eli Zaretskii 2018-07-28 14:15 ` Stefan Monnier 2018-07-28 7:53 ` Andreas Schwab 1 sibling, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2018-07-27 21:04 UTC (permalink / raw) To: Stefan Monnier, Kenichi Handa; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Fri, 27 Jul 2018 15:03:48 -0400 > > > In the Thai language environment, that just might make sense. > > I don't see why it would ever make sense. because users of the Thai language environment most probably know and understand what this charset is for. > Actually, I don't understand why > > (char-charset (unibyte-char-to-multibyte #x90)) > > should return `tis620-2533`, since the tis620.2533 standard defines > a coding-system similar to those of iso8859 and the corresponding chars > all exist in the BMP of Unicode (the "Thai block" U+0E01 through > U+0E7F), so I can't see any obvious reason why they should affect the > 3x3fffxx private space used by Emacs for "eight-bit chars". You are asking why we have this charset in the first place? > Could it be just a bug? No, it definitely isn't, look at how this charset is defined in mule-conf.el. Whether we still need it in modern Emacs is a separate question; I don't know the answer to that, but maybe Handa-san does. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 21:04 ` Eli Zaretskii @ 2018-07-28 14:15 ` Stefan Monnier 2018-07-28 15:15 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Stefan Monnier @ 2018-07-28 14:15 UTC (permalink / raw) To: emacs-devel >> > In the Thai language environment, that just might make sense. >> I don't see why it would ever make sense. > because users of the Thai language environment most probably know and > understand what this charset is for. I know and understand what this charset is for. Yet I don't see why eight-bit-control chars should be reported as belonging to this charset (any more than they should be reported to belong to any of the other charsets to which they may also belong, such as all the iso8859 charsets). I believe this happens only by accident: it seems to be the only charset of its kind defined with `:superset (... eight-bit-control ...)` and without :supplementary-p. But maybe the real source of the problem is that eight-bit-control is defined as :supplementary-p (hard to tell, because I only see doc of how/when :supplementary-p should be used, but not what it does). > You are asking why we have this charset in the first place? No, I understand why we have it, what I don't understand why it should be considered anything but a bug that eight-bit-control chars should be considered as belonging to that charset instead of to the eight-bit-control charset: they belong to both charsets (and to many others as well), but the "canonical" charset to which they belong should be eight-bit-control. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-28 14:15 ` Stefan Monnier @ 2018-07-28 15:15 ` Eli Zaretskii 2018-07-28 16:49 ` Stefan Monnier 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2018-07-28 15:15 UTC (permalink / raw) To: Stefan Monnier; +Cc: Kenichi Handa, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Sat, 28 Jul 2018 10:15:39 -0400 > > I know and understand what this charset is for. Yet I don't see why > eight-bit-control chars should be reported as belonging to this charset > (any more than they should be reported to belong to any of the other > charsets to which they may also belong, such as all the iso8859 > charsets). Because ISO-8859 charsets don't include bytes between 128 and 159 (inclusive), I guess. > I believe this happens only by accident: it seems to be the only charset > of its kind defined with `:superset (... eight-bit-control ...)` and > without :supplementary-p. It's true it's the only such combination, but it is much less clear to me why do you think this is an accident. It could be, but I see no reason to assume that without some independent evidence. I asked Handa-san to comment on that in the hope that he might be able to shed some light on this situation (and on the use of :supplementary-p in general). > But maybe the real source of the problem is that eight-bit-control is > defined as :supplementary-p (hard to tell, because I only see doc of > how/when :supplementary-p should be used, but not what it does). Maybe. And again, I don't see why you'd assume it's a bug. > > You are asking why we have this charset in the first place? > > No, I understand why we have it, what I don't understand why it should > be considered anything but a bug that eight-bit-control chars should be > considered as belonging to that charset instead of to the > eight-bit-control charset We actually don't want to expose eight-bit-control to users at all, we want there to be a single charset called 'eight-bit' covering all the raw bytes. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-28 15:15 ` Eli Zaretskii @ 2018-07-28 16:49 ` Stefan Monnier 2018-07-28 17:37 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Stefan Monnier @ 2018-07-28 16:49 UTC (permalink / raw) To: emacs-devel >> I know and understand what this charset is for. Yet I don't see why >> eight-bit-control chars should be reported as belonging to this charset >> (any more than they should be reported to belong to any of the other >> charsets to which they may also belong, such as all the iso8859 >> charsets). > > Because ISO-8859 charsets don't include bytes between 128 and 159 > (inclusive), I guess. According to the info I found, tis620 doesn't include those codes either (and doesn't include 160 either, contrary to iso8859-11 which is otherwise identical). > We actually don't want to expose eight-bit-control to users at all, we > want there to be a single charset called 'eight-bit' covering all the > raw bytes. Fair enough, then the chars in eight-bit-control should have `eight-bit` as their canonical charset (no matter the user's language environment). Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-28 16:49 ` Stefan Monnier @ 2018-07-28 17:37 ` Eli Zaretskii 0 siblings, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2018-07-28 17:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: Kenichi Handa, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Sat, 28 Jul 2018 12:49:10 -0400 > > >> I know and understand what this charset is for. Yet I don't see why > >> eight-bit-control chars should be reported as belonging to this charset > >> (any more than they should be reported to belong to any of the other > >> charsets to which they may also belong, such as all the iso8859 > >> charsets). > > > > Because ISO-8859 charsets don't include bytes between 128 and 159 > > (inclusive), I guess. > > According to the info I found, tis620 doesn't include those codes either > (and doesn't include 160 either, contrary to iso8859-11 which is > otherwise identical). > > > We actually don't want to expose eight-bit-control to users at all, we > > want there to be a single charset called 'eight-bit' covering all the > > raw bytes. > > Fair enough, then the chars in eight-bit-control should have `eight-bit` > as their canonical charset (no matter the user's language environment). OK, I've removed tis620-2533 on master, let's see if this breaks anything. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 19:03 ` Stefan Monnier 2018-07-27 21:04 ` Eli Zaretskii @ 2018-07-28 7:53 ` Andreas Schwab 1 sibling, 0 replies; 15+ messages in thread From: Andreas Schwab @ 2018-07-28 7:53 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel On Jul 27 2018, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > Actually, I don't understand why > > (char-charset (unibyte-char-to-multibyte #x90)) > > should return `tis620-2533`, since the tis620.2533 standard defines > a coding-system similar to those of iso8859 and the corresponding chars > all exist in the BMP of Unicode (the "Thai block" U+0E01 through > U+0E7F), so I can't see any obvious reason why they should affect the > 3x3fffxx private space used by Emacs for "eight-bit chars". > Could it be just a bug? tis620-2533 is defined as a superset of eight-bit-control, for whatever reason. That could be MULE related. There is also mule-lao with the same property, but that is marked :supplementary-p. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset 2018-07-27 12:38 ` [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset Stefan Monnier 2018-07-27 13:10 ` Robert Pluim @ 2018-07-27 13:53 ` Eli Zaretskii 1 sibling, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2018-07-27 13:53 UTC (permalink / raw) To: Stefan Monnier, Kenichi Handa; +Cc: emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Date: Fri, 27 Jul 2018 08:38:56 -0400 > Cc: Eli Zaretskii <eliz@gnu.org> > > > (char-charset char)))) > > + ;; TIS620.2533 overlaps eight-bit-control, but we want to > > + ;; show eight-bit for raw bytes, not some obscure character > > + ;; set no one heard of. > > + (charset (if (eq charset 'tis620-2533) > > + 'eight-bit > > + charset)) > > Do we need `char-charset` to return `tis620-2533` for these, or could we > instead of the above change `char-charset` to return `eight-bit` or > `eight-bit-control`? I don't know, because I don't understand why in this situation we return tis620-2533 instead of eight-bit. Maybe it's intended, and maybe it's just a side effect of how we implement char-charset. The above change is safe, more-or-less, because it only affects how we display information about characters. Making an equivalent change inside char-charset would affect all users of that API. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-07-28 17:37 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20180727064907.6305.13029@vcs0.savannah.gnu.org> [not found] ` <20180727064909.85288203C0@vcs0.savannah.gnu.org> 2018-07-27 12:38 ` [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset Stefan Monnier 2018-07-27 13:10 ` Robert Pluim 2018-07-27 13:46 ` Eli Zaretskii 2018-07-27 15:00 ` Andreas Schwab 2018-07-27 17:54 ` Eli Zaretskii 2018-07-27 17:59 ` Andreas Schwab 2018-07-27 20:50 ` Eli Zaretskii 2018-07-27 19:03 ` Stefan Monnier 2018-07-27 21:04 ` Eli Zaretskii 2018-07-28 14:15 ` Stefan Monnier 2018-07-28 15:15 ` Eli Zaretskii 2018-07-28 16:49 ` Stefan Monnier 2018-07-28 17:37 ` Eli Zaretskii 2018-07-28 7:53 ` Andreas Schwab 2018-07-27 13:53 ` Eli Zaretskii
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.