unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
       [not found] ` <20180727064909.85288203C0@vcs0.savannah.gnu.org>
@ 2018-07-27 12:38   ` Stefan Monnier
  2018-07-27 13:10     ` Robert Pluim
  2018-07-27 13:53     ` Eli Zaretskii
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Monnier @ 2018-07-27 12:38 UTC (permalink / raw)
  To: emacs-devel; +Cc: Eli Zaretskii

>                            (char-charset char))))
> +           ;; TIS620.2533 overlaps eight-bit-control, but we want to
> +           ;; show eight-bit for raw bytes, not some obscure character
> +           ;; set no one heard of.
> +           (charset (if (eq charset 'tis620-2533)
> +                        'eight-bit
> +                      charset))

Do we need `char-charset` to return `tis620-2533` for these, or could we
instead of the above change `char-charset` to return `eight-bit` or
`eight-bit-control`?


        Stefan



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 12:38   ` [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset Stefan Monnier
@ 2018-07-27 13:10     ` Robert Pluim
  2018-07-27 13:46       ` Eli Zaretskii
  2018-07-27 15:00       ` Andreas Schwab
  2018-07-27 13:53     ` Eli Zaretskii
  1 sibling, 2 replies; 15+ messages in thread
From: Robert Pluim @ 2018-07-27 13:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>                            (char-charset char))))
>> +           ;; TIS620.2533 overlaps eight-bit-control, but we want to
>> +           ;; show eight-bit for raw bytes, not some obscure character
>> +           ;; set no one heard of.
>> +           (charset (if (eq charset 'tis620-2533)
>> +                        'eight-bit
>> +                      charset))
>
> Do we need `char-charset` to return `tis620-2533` for these, or could we
> instead of the above change `char-charset` to return `eight-bit` or
> `eight-bit-control`?

Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' to
return 'eight-bit' for raw byte chars.

Robert



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 13:10     ` Robert Pluim
@ 2018-07-27 13:46       ` Eli Zaretskii
  2018-07-27 15:00       ` Andreas Schwab
  1 sibling, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2018-07-27 13:46 UTC (permalink / raw)
  To: Robert Pluim, Kenichi Handa; +Cc: emacs-devel

> From: Robert Pluim <rpluim@gmail.com>
> Cc: emacs-devel@gnu.org,  Eli Zaretskii <eliz@gnu.org>
> Gmane-Reply-To-List: yes
> Date: Fri, 27 Jul 2018 15:10:23 +0200
> 
> Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
> 
> >>                            (char-charset char))))
> >> +           ;; TIS620.2533 overlaps eight-bit-control, but we want to
> >> +           ;; show eight-bit for raw bytes, not some obscure character
> >> +           ;; set no one heard of.
> >> +           (charset (if (eq charset 'tis620-2533)
> >> +                        'eight-bit
> >> +                      charset))
> >
> > Do we need `char-charset` to return `tis620-2533` for these, or could we
> > instead of the above change `char-charset` to return `eight-bit` or
> > `eight-bit-control`?
> 
> Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' to
> return 'eight-bit' for raw byte chars.

Yes, which is why I asked about Handa-san that.  The documentation
doesn't tell what are the considerations or disadvantages of using
this property.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 12:38   ` [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset Stefan Monnier
  2018-07-27 13:10     ` Robert Pluim
@ 2018-07-27 13:53     ` Eli Zaretskii
  1 sibling, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2018-07-27 13:53 UTC (permalink / raw)
  To: Stefan Monnier, Kenichi Handa; +Cc: emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Date: Fri, 27 Jul 2018 08:38:56 -0400
> Cc: Eli Zaretskii <eliz@gnu.org>
> 
> >                            (char-charset char))))
> > +           ;; TIS620.2533 overlaps eight-bit-control, but we want to
> > +           ;; show eight-bit for raw bytes, not some obscure character
> > +           ;; set no one heard of.
> > +           (charset (if (eq charset 'tis620-2533)
> > +                        'eight-bit
> > +                      charset))
> 
> Do we need `char-charset` to return `tis620-2533` for these, or could we
> instead of the above change `char-charset` to return `eight-bit` or
> `eight-bit-control`?

I don't know, because I don't understand why in this situation we
return tis620-2533 instead of eight-bit.  Maybe it's intended, and
maybe it's just a side effect of how we implement char-charset.

The above change is safe, more-or-less, because it only affects how we
display information about characters.  Making an equivalent change
inside char-charset would affect all users of that API.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 13:10     ` Robert Pluim
  2018-07-27 13:46       ` Eli Zaretskii
@ 2018-07-27 15:00       ` Andreas Schwab
  2018-07-27 17:54         ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Schwab @ 2018-07-27 15:00 UTC (permalink / raw)
  To: emacs-devel

On Jul 27 2018, Robert Pluim <rpluim@gmail.com> wrote:

> Setting ':supplementary-p t' on tis620-2533 induces 'char-charset' to
> return 'eight-bit' for raw byte chars.

But setting the language environment to "Thai" moves tis620-2533 back
before eight-bit in the priority list.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 15:00       ` Andreas Schwab
@ 2018-07-27 17:54         ` Eli Zaretskii
  2018-07-27 17:59           ` Andreas Schwab
  2018-07-27 19:03           ` Stefan Monnier
  0 siblings, 2 replies; 15+ messages in thread
From: Eli Zaretskii @ 2018-07-27 17:54 UTC (permalink / raw)
  To: emacs-devel, Andreas Schwab

On July 27, 2018 6:00:18 PM GMT+03:00, Andreas Schwab <schwab@linux-m68k.org> wrote:
> On Jul 27 2018, Robert Pluim <rpluim@gmail.com> wrote:
> 
> > Setting ':supplementary-p t' on tis620-2533 induces 'char-charset'
> to
> > return 'eight-bit' for raw byte chars.
> 
> But setting the language environment to "Thai" moves tis620-2533 back
> before eight-bit in the priority list.
> 
> Andreas.

In the Thai language environment, that just might make sense.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 17:54         ` Eli Zaretskii
@ 2018-07-27 17:59           ` Andreas Schwab
  2018-07-27 20:50             ` Eli Zaretskii
  2018-07-27 19:03           ` Stefan Monnier
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Schwab @ 2018-07-27 17:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Jul 27 2018, Eli Zaretskii <eliz@gnu.org> wrote:

> On July 27, 2018 6:00:18 PM GMT+03:00, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> On Jul 27 2018, Robert Pluim <rpluim@gmail.com> wrote:
>> 
>> > Setting ':supplementary-p t' on tis620-2533 induces 'char-charset'
>> to
>> > return 'eight-bit' for raw byte chars.
>> 
>> But setting the language environment to "Thai" moves tis620-2533 back
>> before eight-bit in the priority list.
>> 
>> Andreas.
>
> In the Thai language environment, that just might make sense.

Moving back to a different language environment doesn't fix that up.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 17:54         ` Eli Zaretskii
  2018-07-27 17:59           ` Andreas Schwab
@ 2018-07-27 19:03           ` Stefan Monnier
  2018-07-27 21:04             ` Eli Zaretskii
  2018-07-28  7:53             ` Andreas Schwab
  1 sibling, 2 replies; 15+ messages in thread
From: Stefan Monnier @ 2018-07-27 19:03 UTC (permalink / raw)
  To: emacs-devel

> The above change is safe, more-or-less, because it only affects how we
> display information about characters.  Making an equivalent change
> inside char-charset would affect all users of that API.

Yes, clearly this would be for `master`.

> In the Thai language environment, that just might make sense.

I don't see why it would ever make sense.

Actually, I don't understand why

    (char-charset (unibyte-char-to-multibyte #x90))

should return `tis620-2533`, since the tis620.2533 standard defines
a coding-system similar to those of iso8859 and the corresponding chars
all exist in the BMP of Unicode (the "Thai block" U+0E01 through
U+0E7F), so I can't see any obvious reason why they should affect the
3x3fffxx private space used by Emacs for "eight-bit chars".
Could it be just a bug?


        Stefan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 17:59           ` Andreas Schwab
@ 2018-07-27 20:50             ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2018-07-27 20:50 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 27 Jul 2018 19:59:07 +0200
> 
> >> But setting the language environment to "Thai" moves tis620-2533 back
> >> before eight-bit in the priority list.
> >> 
> >> Andreas.
> >
> > In the Thai language environment, that just might make sense.
> 
> Moving back to a different language environment doesn't fix that up.

That is arguably a bug, but OTOH it is a very rare use case that the
user needs to change their language environment in the middle of a
session, let alone change it twice.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 19:03           ` Stefan Monnier
@ 2018-07-27 21:04             ` Eli Zaretskii
  2018-07-28 14:15               ` Stefan Monnier
  2018-07-28  7:53             ` Andreas Schwab
  1 sibling, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2018-07-27 21:04 UTC (permalink / raw)
  To: Stefan Monnier, Kenichi Handa; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 27 Jul 2018 15:03:48 -0400
> 
> > In the Thai language environment, that just might make sense.
> 
> I don't see why it would ever make sense.

because users of the Thai language environment most probably know and
understand what this charset is for.

> Actually, I don't understand why
> 
>     (char-charset (unibyte-char-to-multibyte #x90))
> 
> should return `tis620-2533`, since the tis620.2533 standard defines
> a coding-system similar to those of iso8859 and the corresponding chars
> all exist in the BMP of Unicode (the "Thai block" U+0E01 through
> U+0E7F), so I can't see any obvious reason why they should affect the
> 3x3fffxx private space used by Emacs for "eight-bit chars".

You are asking why we have this charset in the first place?

> Could it be just a bug?

No, it definitely isn't, look at how this charset is defined in
mule-conf.el.  Whether we still need it in modern Emacs is a separate
question; I don't know the answer to that, but maybe Handa-san does.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 19:03           ` Stefan Monnier
  2018-07-27 21:04             ` Eli Zaretskii
@ 2018-07-28  7:53             ` Andreas Schwab
  1 sibling, 0 replies; 15+ messages in thread
From: Andreas Schwab @ 2018-07-28  7:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On Jul 27 2018, Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> Actually, I don't understand why
>
>     (char-charset (unibyte-char-to-multibyte #x90))
>
> should return `tis620-2533`, since the tis620.2533 standard defines
> a coding-system similar to those of iso8859 and the corresponding chars
> all exist in the BMP of Unicode (the "Thai block" U+0E01 through
> U+0E7F), so I can't see any obvious reason why they should affect the
> 3x3fffxx private space used by Emacs for "eight-bit chars".
> Could it be just a bug?

tis620-2533 is defined as a superset of eight-bit-control, for whatever
reason.  That could be MULE related.  There is also mule-lao with the
same property, but that is marked :supplementary-p.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-27 21:04             ` Eli Zaretskii
@ 2018-07-28 14:15               ` Stefan Monnier
  2018-07-28 15:15                 ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2018-07-28 14:15 UTC (permalink / raw)
  To: emacs-devel

>> > In the Thai language environment, that just might make sense.
>> I don't see why it would ever make sense.
> because users of the Thai language environment most probably know and
> understand what this charset is for.

I know and understand what this charset is for.  Yet I don't see why
eight-bit-control chars should be reported as belonging to this charset
(any more than they should be reported to belong to any of the other
charsets to which they may also belong, such as all the iso8859
charsets).

I believe this happens only by accident: it seems to be the only charset
of its kind defined with `:superset (... eight-bit-control ...)` and
without :supplementary-p.

But maybe the real source of the problem is that eight-bit-control is
defined as :supplementary-p (hard to tell, because I only see doc of
how/when :supplementary-p should be used, but not what it does).

> You are asking why we have this charset in the first place?

No, I understand why we have it, what I don't understand why it should
be considered anything but a bug that eight-bit-control chars should be
considered as belonging to that charset instead of to the
eight-bit-control charset: they belong to both charsets (and to many
others as well), but the "canonical" charset to which they belong should
be eight-bit-control.


        Stefan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-28 14:15               ` Stefan Monnier
@ 2018-07-28 15:15                 ` Eli Zaretskii
  2018-07-28 16:49                   ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2018-07-28 15:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Kenichi Handa, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 28 Jul 2018 10:15:39 -0400
> 
> I know and understand what this charset is for.  Yet I don't see why
> eight-bit-control chars should be reported as belonging to this charset
> (any more than they should be reported to belong to any of the other
> charsets to which they may also belong, such as all the iso8859
> charsets).

Because ISO-8859 charsets don't include bytes between 128 and 159
(inclusive), I guess.

> I believe this happens only by accident: it seems to be the only charset
> of its kind defined with `:superset (... eight-bit-control ...)` and
> without :supplementary-p.

It's true it's the only such combination, but it is much less clear to
me why do you think this is an accident.  It could be, but I see no
reason to assume that without some independent evidence.  I asked
Handa-san to comment on that in the hope that he might be able to shed
some light on this situation (and on the use of :supplementary-p in
general).

> But maybe the real source of the problem is that eight-bit-control is
> defined as :supplementary-p (hard to tell, because I only see doc of
> how/when :supplementary-p should be used, but not what it does).

Maybe.  And again, I don't see why you'd assume it's a bug.

> > You are asking why we have this charset in the first place?
> 
> No, I understand why we have it, what I don't understand why it should
> be considered anything but a bug that eight-bit-control chars should be
> considered as belonging to that charset instead of to the
> eight-bit-control charset

We actually don't want to expose eight-bit-control to users at all, we
want there to be a single charset called 'eight-bit' covering all the
raw bytes.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-28 15:15                 ` Eli Zaretskii
@ 2018-07-28 16:49                   ` Stefan Monnier
  2018-07-28 17:37                     ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2018-07-28 16:49 UTC (permalink / raw)
  To: emacs-devel

>> I know and understand what this charset is for.  Yet I don't see why
>> eight-bit-control chars should be reported as belonging to this charset
>> (any more than they should be reported to belong to any of the other
>> charsets to which they may also belong, such as all the iso8859
>> charsets).
>
> Because ISO-8859 charsets don't include bytes between 128 and 159
> (inclusive), I guess.

According to the info I found, tis620 doesn't include those codes either
(and doesn't include 160 either, contrary to iso8859-11 which is
otherwise identical).

> We actually don't want to expose eight-bit-control to users at all, we
> want there to be a single charset called 'eight-bit' covering all the
> raw bytes.

Fair enough, then the chars in eight-bit-control should have `eight-bit`
as their canonical charset (no matter the user's language environment).


        Stefan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset
  2018-07-28 16:49                   ` Stefan Monnier
@ 2018-07-28 17:37                     ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2018-07-28 17:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Kenichi Handa, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 28 Jul 2018 12:49:10 -0400
> 
> >> I know and understand what this charset is for.  Yet I don't see why
> >> eight-bit-control chars should be reported as belonging to this charset
> >> (any more than they should be reported to belong to any of the other
> >> charsets to which they may also belong, such as all the iso8859
> >> charsets).
> >
> > Because ISO-8859 charsets don't include bytes between 128 and 159
> > (inclusive), I guess.
> 
> According to the info I found, tis620 doesn't include those codes either
> (and doesn't include 160 either, contrary to iso8859-11 which is
> otherwise identical).
> 
> > We actually don't want to expose eight-bit-control to users at all, we
> > want there to be a single charset called 'eight-bit' covering all the
> > raw bytes.
> 
> Fair enough, then the chars in eight-bit-control should have `eight-bit`
> as their canonical charset (no matter the user's language environment).

OK, I've removed tis620-2533 on master, let's see if this breaks
anything.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-07-28 17:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180727064907.6305.13029@vcs0.savannah.gnu.org>
     [not found] ` <20180727064909.85288203C0@vcs0.savannah.gnu.org>
2018-07-27 12:38   ` [Emacs-diffs] emacs-26 0feb673: Display raw bytes as belonging to 'eight-bit' charset Stefan Monnier
2018-07-27 13:10     ` Robert Pluim
2018-07-27 13:46       ` Eli Zaretskii
2018-07-27 15:00       ` Andreas Schwab
2018-07-27 17:54         ` Eli Zaretskii
2018-07-27 17:59           ` Andreas Schwab
2018-07-27 20:50             ` Eli Zaretskii
2018-07-27 19:03           ` Stefan Monnier
2018-07-27 21:04             ` Eli Zaretskii
2018-07-28 14:15               ` Stefan Monnier
2018-07-28 15:15                 ` Eli Zaretskii
2018-07-28 16:49                   ` Stefan Monnier
2018-07-28 17:37                     ` Eli Zaretskii
2018-07-28  7:53             ` Andreas Schwab
2018-07-27 13:53     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).