all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* What is a preferred charset?
@ 2018-11-21 14:55 Stephen Berman
  2018-11-21 15:24 ` Andreas Schwab
  2018-11-21 15:37 ` Eli Zaretskii
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Berman @ 2018-11-21 14:55 UTC (permalink / raw)
  To: emacs-devel

The discussion in bug#33445 made me realize that I don't know what
distinguishes a preferred charset from other charsets and why this term
is used in `describe-char' (and no where else in Emacs that I could
find, aside from a comment in lisp/org/ox-ascii.el).  For example,
etc/HELLO uses the non-standard text/enriched annotation "x-charset" to
make `describe-char' show "latin-iso8859-1" as the preferred charset of
INVERTED EXCLAMATION MARK (#xa1), whereas when I use `C-x 8' to enter
that character in a buffer `describe-char' says its preferred charset is
"unicode".  Why are there different preferred charsets in these cases
and what's the significance and use of that difference in general, and
should it be documented?

Steve Berman  



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 14:55 What is a preferred charset? Stephen Berman
@ 2018-11-21 15:24 ` Andreas Schwab
  2018-11-21 15:40   ` Stefan Monnier
  2018-11-21 16:48   ` Stephen Berman
  2018-11-21 15:37 ` Eli Zaretskii
  1 sibling, 2 replies; 12+ messages in thread
From: Andreas Schwab @ 2018-11-21 15:24 UTC (permalink / raw)
  To: Stephen Berman; +Cc: emacs-devel

On Nov 21 2018, Stephen Berman <stephen.berman@gmx.net> wrote:

> The discussion in bug#33445 made me realize that I don't know what
> distinguishes a preferred charset from other charsets

It's the first from (charset-priority-list) that can encode the
character.  The priority is defined by the language environment.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 14:55 What is a preferred charset? Stephen Berman
  2018-11-21 15:24 ` Andreas Schwab
@ 2018-11-21 15:37 ` Eli Zaretskii
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2018-11-21 15:37 UTC (permalink / raw)
  To: Stephen Berman; +Cc: emacs-devel

> From: Stephen Berman <stephen.berman@gmx.net>
> Date: Wed, 21 Nov 2018 15:55:32 +0100
> 
> The discussion in bug#33445 made me realize that I don't know what
> distinguishes a preferred charset from other charsets and why this term
> is used in `describe-char' (and no where else in Emacs that I could
> find, aside from a comment in lisp/org/ox-ascii.el).

"Preferred" is used there in the sense of "highest priority".  See
charset-priority-list, set-charset-priority, and char-charset.  They
are described in the node "Character Sets" of the ELisp manual.

I guess someone tried to say "highest-priority" in fewer characters,
to avoid making the line too long.

> For example, etc/HELLO uses the non-standard text/enriched
> annotation "x-charset" to make `describe-char' show
> "latin-iso8859-1" as the preferred charset of INVERTED EXCLAMATION
> MARK (#xa1), whereas when I use `C-x 8' to enter that character in a
> buffer `describe-char' says its preferred charset is "unicode".  Why
> are there different preferred charsets in these cases and what's the
> significance and use of that difference in general

When text has the 'charset' property, we show its value as the
highest-priority charset of the characters having that property.  This
property is described in "Explicit Encoding".

In the case of HELLO, each hello phrase was given the 'charset'
property corresponding to its language's script, so as to instruct
Emacs to choose the most appropriate font for that greeting.

> and should it be documented?

Now that you know what this is about, you tell me ;-)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 15:24 ` Andreas Schwab
@ 2018-11-21 15:40   ` Stefan Monnier
  2018-11-21 16:04     ` Eli Zaretskii
  2018-11-21 16:48   ` Stephen Berman
  1 sibling, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2018-11-21 15:40 UTC (permalink / raw)
  To: emacs-devel

>> The discussion in bug#33445 made me realize that I don't know what
>> distinguishes a preferred charset from other charsets
> It's the first from (charset-priority-list) that can encode the
> character.  The priority is defined by the language environment.

And IIUC the only impact it has is on the choice of font to display it.
For latin chars, this tends to make no difference, but for some Asian
scripts it can make a much more significant difference because the same
Unicode chars are used for different languages even though they are
traditionally rendered differently in those different languages.

IOW, I think this `charset` property is used to solve the problems
descried at https://en.wikipedia.org/wiki/Han_unification

BTW, I see this page mentions the existence of "variation selectors"
in Unicode.  Is that considered a good solution?  Does Emacs support them?


        Stefan




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 15:40   ` Stefan Monnier
@ 2018-11-21 16:04     ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2018-11-21 16:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 21 Nov 2018 10:40:42 -0500
> 
> And IIUC the only impact it has is on the choice of font to display it.
> For latin chars, this tends to make no difference, but for some Asian
> scripts it can make a much more significant difference because the same
> Unicode chars are used for different languages even though they are
> traditionally rendered differently in those different languages.

It can also have effect on Latin characters in Far Eastern locales,
because the default font might display them as double-width or
something.

> IOW, I think this `charset` property is used to solve the problems
> descried at https://en.wikipedia.org/wiki/Han_unification

Not only.  Some scripts need special support from the font, and AFAIR
this property also helps with that.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 15:24 ` Andreas Schwab
  2018-11-21 15:40   ` Stefan Monnier
@ 2018-11-21 16:48   ` Stephen Berman
  2018-11-21 22:30     ` Eli Zaretskii
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Berman @ 2018-11-21 16:48 UTC (permalink / raw)
  To: Andreas Schwab, Eli Zaretskii; +Cc: emacs-devel

On Wed, 21 Nov 2018 16:24:26 +0100 Andreas Schwab <schwab@suse.de> wrote:

> On Nov 21 2018, Stephen Berman <stephen.berman@gmx.net> wrote:
>
>> The discussion in bug#33445 made me realize that I don't know what
>> distinguishes a preferred charset from other charsets
>
> It's the first from (charset-priority-list) that can encode the
> character.  The priority is defined by the language environment.

On Wed, 21 Nov 2018 17:37:07 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

> "Preferred" is used there in the sense of "highest priority".  See
> charset-priority-list, set-charset-priority, and char-charset.  They
> are described in the node "Character Sets" of the ELisp manual.
>
> I guess someone tried to say "highest-priority" in fewer characters,
> to avoid making the line too long.

I had read that section of the manual before posting and at first did
conclude that preferred meant highest priority, but the output of
describe-char in HELLO seemed to conflict with that:

>> For example, etc/HELLO uses the non-standard text/enriched
>> annotation "x-charset" to make `describe-char' show
>> "latin-iso8859-1" as the preferred charset of INVERTED EXCLAMATION
>> MARK (#xa1), whereas when I use `C-x 8' to enter that character in a
>> buffer `describe-char' says its preferred charset is "unicode".  Why
>> are there different preferred charsets in these cases and what's the
>> significance and use of that difference in general
>
> When text has the 'charset' property, we show its value as the
> highest-priority charset of the characters having that property.  This
> property is described in "Explicit Encoding".

On my system (where the value of locale-coding-system is utf-8-unix) the
first entries in charset-priority-list are: ascii iso-8859-1 unicode
latin-iso8859-1 ...  And calling char-charset on the character named
INVERTED EXCLAMATION MARK returns "unicode" here.  That accords with
what you both wrote above about highest priority, but...

> In the case of HELLO, each hello phrase was given the 'charset'
> property corresponding to its language's script, so as to instruct
> Emacs to choose the most appropriate font for that greeting.

...this seems to be a different criterion for preferred, not the highest
priority as defined above, but (maybe) the smallest charset able to
encode the character?

>> and should it be documented?
>
> Now that you know what this is about, you tell me ;-)

I'm still not sure.

Steve Berman



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 16:48   ` Stephen Berman
@ 2018-11-21 22:30     ` Eli Zaretskii
  2018-11-21 22:44       ` Stephen Berman
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2018-11-21 22:30 UTC (permalink / raw)
  To: Stephen Berman; +Cc: schwab, emacs-devel

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: emacs-devel@gnu.org
> Date: Wed, 21 Nov 2018 17:48:37 +0100
> 
> > When text has the 'charset' property, we show its value as the
> > highest-priority charset of the characters having that property.  This
> > property is described in "Explicit Encoding".
> 
> On my system (where the value of locale-coding-system is utf-8-unix) the
> first entries in charset-priority-list are: ascii iso-8859-1 unicode
> latin-iso8859-1 ...  And calling char-charset on the character named
> INVERTED EXCLAMATION MARK returns "unicode" here.  That accords with
> what you both wrote above about highest priority, but...
> 
> > In the case of HELLO, each hello phrase was given the 'charset'
> > property corresponding to its language's script, so as to instruct
> > Emacs to choose the most appropriate font for that greeting.
> 
> ...this seems to be a different criterion for preferred, not the highest
> priority as defined above

Not a different criterion, it's just that the 'charset' text property
overrides the charset priority order (as you'd expect).



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 22:30     ` Eli Zaretskii
@ 2018-11-21 22:44       ` Stephen Berman
  2018-11-22  3:39         ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Berman @ 2018-11-21 22:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

On Thu, 22 Nov 2018 00:30:22 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Stephen Berman <stephen.berman@gmx.net>
>> Cc: emacs-devel@gnu.org
>> Date: Wed, 21 Nov 2018 17:48:37 +0100
>> 
>> > When text has the 'charset' property, we show its value as the
>> > highest-priority charset of the characters having that property.  This
>> > property is described in "Explicit Encoding".
>> 
>> On my system (where the value of locale-coding-system is utf-8-unix) the
>> first entries in charset-priority-list are: ascii iso-8859-1 unicode
>> latin-iso8859-1 ...  And calling char-charset on the character named
>> INVERTED EXCLAMATION MARK returns "unicode" here.  That accords with
>> what you both wrote above about highest priority, but...
>> 
>> > In the case of HELLO, each hello phrase was given the 'charset'
>> > property corresponding to its language's script, so as to instruct
>> > Emacs to choose the most appropriate font for that greeting.
>> 
>> ...this seems to be a different criterion for preferred, not the highest
>> priority as defined above
>
> Not a different criterion, it's just that the 'charset' text property
> overrides the charset priority order (as you'd expect).

But why override the priority order if the preferred charset by that
criterion (namely unicode, in this case) works (i.e. the character is
correctly displayed)?  I guess because there are cases where the
preferred charset according to priority order fails?  So then it is not
preferred.  But then preferred in that case does not mean according to
priority order.  So I'm still not clear about this...

Steve Berman



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-21 22:44       ` Stephen Berman
@ 2018-11-22  3:39         ` Eli Zaretskii
  2018-11-22  9:07           ` Stephen Berman
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2018-11-22  3:39 UTC (permalink / raw)
  To: Stephen Berman; +Cc: schwab, emacs-devel

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: schwab@suse.de,  emacs-devel@gnu.org
> Date: Wed, 21 Nov 2018 23:44:55 +0100
> 
> >> > In the case of HELLO, each hello phrase was given the 'charset'
> >> > property corresponding to its language's script, so as to instruct
> >> > Emacs to choose the most appropriate font for that greeting.
> >> 
> >> ...this seems to be a different criterion for preferred, not the highest
> >> priority as defined above
> >
> > Not a different criterion, it's just that the 'charset' text property
> > overrides the charset priority order (as you'd expect).
> 
> But why override the priority order if the preferred charset by that
> criterion (namely unicode, in this case) works (i.e. the character is
> correctly displayed)?  I guess because there are cases where the
> preferred charset according to priority order fails?

It is not a question of success or failure: every charset which
supports the character "succeeds".  We choose one of them in order to
produce the effect (such as select a font for displaying it) that
suits best what this particular user in this particular case expects.
When text comes from an encoding that specifies its charset (such as
Latin-N), we can determine that charset from the encoding; if not, we
use the charset-priority order that is determined by the locale, as
fallback.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-22  3:39         ` Eli Zaretskii
@ 2018-11-22  9:07           ` Stephen Berman
  2018-11-22 15:30             ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Berman @ 2018-11-22  9:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

On Thu, 22 Nov 2018 05:39:13 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Stephen Berman <stephen.berman@gmx.net>
>> Cc: schwab@suse.de,  emacs-devel@gnu.org
>> Date: Wed, 21 Nov 2018 23:44:55 +0100
>> 
>> >> > In the case of HELLO, each hello phrase was given the 'charset'
>> >> > property corresponding to its language's script, so as to instruct
>> >> > Emacs to choose the most appropriate font for that greeting.
>> >> 
>> >> ...this seems to be a different criterion for preferred, not the highest
>> >> priority as defined above
>> >
>> > Not a different criterion, it's just that the 'charset' text property
>> > overrides the charset priority order (as you'd expect).
>> 
>> But why override the priority order if the preferred charset by that
>> criterion (namely unicode, in this case) works (i.e. the character is
>> correctly displayed)?  I guess because there are cases where the
>> preferred charset according to priority order fails?
>
> It is not a question of success or failure: every charset which
> supports the character "succeeds".  We choose one of them in order to
> produce the effect (such as select a font for displaying it) that
> suits best what this particular user in this particular case expects.
> When text comes from an encoding that specifies its charset (such as
> Latin-N), we can determine that charset from the encoding; if not, we
> use the charset-priority order that is determined by the locale, as
> fallback.

So "preferred charset" means "charset the encoding specifies, if any,
otherwise the locale-specific highest priority charset"?  If so, it's
still not clear to me why HELLO specifies charsets that (at least in
some cases, like INVERTED EXCLAMATION MARK) differ from the highest
priority; is it because the specified charsets are known to correctly
display the characters regardless of locale (if that's even possible),
while it's not known whether the highest priority charset can correctly
display them?  In any case, it's ok with me to drop this now, since it's
become clear to me that "preferred charset" is not a technical term but
a term of convenience used only by describe-char, and it hasn't bothered
anyone till now (and I hadn't thought about it till now either).  Thanks
for the feedback.

Steve Berman



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-22  9:07           ` Stephen Berman
@ 2018-11-22 15:30             ` Eli Zaretskii
  2018-11-22 16:28               ` Stephen Berman
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2018-11-22 15:30 UTC (permalink / raw)
  To: Stephen Berman; +Cc: schwab, emacs-devel

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: schwab@suse.de,  emacs-devel@gnu.org
> Date: Thu, 22 Nov 2018 10:07:00 +0100
> 
> > It is not a question of success or failure: every charset which
> > supports the character "succeeds".  We choose one of them in order to
> > produce the effect (such as select a font for displaying it) that
> > suits best what this particular user in this particular case expects.
> > When text comes from an encoding that specifies its charset (such as
> > Latin-N), we can determine that charset from the encoding; if not, we
> > use the charset-priority order that is determined by the locale, as
> > fallback.
> 
> So "preferred charset" means "charset the encoding specifies, if any,
> otherwise the locale-specific highest priority charset"?

Yes, but that's not a useful definition, see below.

> If so, it's still not clear to me why HELLO specifies charsets that
> (at least in some cases, like INVERTED EXCLAMATION MARK) differ from
> the highest priority

Because it wants to demonstrate that Emacs is capable of using mixed
character sets in the same buffer, and still have each one displayed
as it would in its native locale.

> is it because the specified charsets are known to correctly
> display the characters regardless of locale (if that's even possible),
> while it's not known whether the highest priority charset can correctly
> display them?

No, the highest priority charset will also succeed in displaying
them.  But HELLO wants each greeting to be a good representative of
its native locale, regardless of the locale in which the Emacs session
showing HELLO runs.

I find the following description useful when thinking about this:
Emacs wants to know the charset of each character to be able to
display it correctly using the proper fonts (and also for a few other
features).  If the text announces its charset via the 'charset' text
property, Emacs uses that; otherwise it guesses using the locale's
defaults as guidelines.  It is similar to what Emacs does when it
needs to guess the encoding of a file.

> In any case, it's ok with me to drop this now, since it's
> become clear to me that "preferred charset" is not a technical term but
> a term of convenience used only by describe-char, and it hasn't bothered
> anyone till now (and I hadn't thought about it till now either).  Thanks
> for the feedback.

Thanks for pointing out how this display might be confusing; I have
now removed the "preferred" part from the display, and added
descriptions of how each attribute of the character is obtained, so
that interested users could drill down.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: What is a preferred charset?
  2018-11-22 15:30             ` Eli Zaretskii
@ 2018-11-22 16:28               ` Stephen Berman
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Berman @ 2018-11-22 16:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

On Thu, 22 Nov 2018 17:30:29 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> In any case, it's ok with me to drop this now, since it's
>> become clear to me that "preferred charset" is not a technical term but
>> a term of convenience used only by describe-char, and it hasn't bothered
>> anyone till now (and I hadn't thought about it till now either).  Thanks
>> for the feedback.
>
> Thanks for pointing out how this display might be confusing; I have
> now removed the "preferred" part from the display, and added
> descriptions of how each attribute of the character is obtained, so
> that interested users could drill down.

Thanks, that's an improvement.  And thanks for clarifying the issue for
me.

Steve Berman



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-11-22 16:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-21 14:55 What is a preferred charset? Stephen Berman
2018-11-21 15:24 ` Andreas Schwab
2018-11-21 15:40   ` Stefan Monnier
2018-11-21 16:04     ` Eli Zaretskii
2018-11-21 16:48   ` Stephen Berman
2018-11-21 22:30     ` Eli Zaretskii
2018-11-21 22:44       ` Stephen Berman
2018-11-22  3:39         ` Eli Zaretskii
2018-11-22  9:07           ` Stephen Berman
2018-11-22 15:30             ` Eli Zaretskii
2018-11-22 16:28               ` Stephen Berman
2018-11-21 15:37 ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.