unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
       [not found] ` <20240803073044.42052C1CAF7@vcs2.savannah.gnu.org>
@ 2024-08-03  9:27   ` Po Lu
  2024-08-03 15:23     ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Po Lu @ 2024-08-03  9:27 UTC (permalink / raw)
  To: emacs-devel; +Cc: Eli Zaretskii

Eli Zaretskii <eliz@gnu.org> writes:

> branch: master
> commit 15afa72460b4a0ec910749646cb9852b4c578f5e
> Author: Eli Zaretskii <eliz@gnu.org>
> Commit: Eli Zaretskii <eliz@gnu.org>
>
>     Fix 'script-representative-chars' for the 'han' script
>     
>     * lisp/international/fontset.el (script-representative-chars):
>     Remove from 'han' codepoints that belong to 'cjk-misc'.
> ---
>  lisp/international/fontset.el | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/lisp/international/fontset.el b/lisp/international/fontset.el
> index f5b4b0b4aa4..695c313cb26 100644
> --- a/lisp/international/fontset.el
> +++ b/lisp/international/fontset.el
> @@ -208,8 +208,7 @@
>  	(kana #x304B)
>  	(bopomofo #x3105)
>  	(kanbun #x319D)
> -	(han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 #x31c0 #x4e10
> -             #x5B57 #xfe30 #xf900)
> +	(han #x2e90 #x2f00 #x3200 #x3300 #x3400 #x4e10 #x5B57 #xfe30 #xf900)

Someone reports that this set of characters still does not enable the
detection of WenQuanYi Micro Hei, which is certainly complete enough to
display all Han text that will be encountered in practice.  U+2E90,
U+2F00, U+3300 and U+3400 are absent from this font, and quite
reasonably so, since they are freestanding radicals, Kana, which belong
in the entry for kana rather than han, or obsolete.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-03  9:27   ` master 15afa72460b: Fix 'script-representative-chars' for the 'han' script Po Lu
@ 2024-08-03 15:23     ` Eli Zaretskii
  2024-08-04  0:16       ` Po Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2024-08-03 15:23 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: Eli Zaretskii <eliz@gnu.org>
> Date: Sat, 03 Aug 2024 17:27:27 +0800
> 
> > --- a/lisp/international/fontset.el
> > +++ b/lisp/international/fontset.el
> > @@ -208,8 +208,7 @@
> >  	(kana #x304B)
> >  	(bopomofo #x3105)
> >  	(kanbun #x319D)
> > -	(han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 #x31c0 #x4e10
> > -             #x5B57 #xfe30 #xf900)
> > +	(han #x2e90 #x2f00 #x3200 #x3300 #x3400 #x4e10 #x5B57 #xfe30 #xf900)
> 
> Someone reports that this set of characters still does not enable the
> detection of WenQuanYi Micro Hei, which is certainly complete enough to
> display all Han text that will be encountered in practice.  U+2E90,
> U+2F00, U+3300 and U+3400 are absent from this font, and quite
> reasonably so, since they are freestanding radicals, Kana, which belong
> in the entry for kana rather than han, or obsolete.

On what system did that happen?

And I don't understand why you say these characters are Kana, this
page disagrees:

  https://en.wikipedia.org/wiki/Kangxi_radical



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-03 15:23     ` Eli Zaretskii
@ 2024-08-04  0:16       ` Po Lu
  2024-08-04  4:57         ` Eli Zaretskii
  2024-08-05 16:25         ` Eli Zaretskii
  0 siblings, 2 replies; 11+ messages in thread
From: Po Lu @ 2024-08-04  0:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>
>> Date: Sat, 03 Aug 2024 17:27:27 +0800
>> 
>> > --- a/lisp/international/fontset.el
>> > +++ b/lisp/international/fontset.el
>> > @@ -208,8 +208,7 @@
>> >  	(kana #x304B)
>> >  	(bopomofo #x3105)
>> >  	(kanbun #x319D)
>> > -	(han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 #x31c0 #x4e10
>> > -             #x5B57 #xfe30 #xf900)
>> > +	(han #x2e90 #x2f00 #x3200 #x3300 #x3400 #x4e10 #x5B57 #xfe30 #xf900)
>> 
>> Someone reports that this set of characters still does not enable the
>> detection of WenQuanYi Micro Hei, which is certainly complete enough to
>> display all Han text that will be encountered in practice.  U+2E90,
>> U+2F00, U+3300 and U+3400 are absent from this font, and quite
>> reasonably so, since they are freestanding radicals, Kana, which belong
>> in the entry for kana rather than han, or obsolete.
>
> On what system did that happen?

Not "what system", "which font": WenQuanYi Micro Hei, one of the better
free Han fonts.

> And I don't understand why you say these characters are Kana, this
> page disagrees:
>
>   https://en.wikipedia.org/wiki/Kangxi_radical

That's U+2E90.  U+3300 is Kana, according to Scripts.txt:

3300..3357    ; Katakana # So  [88] SQUARE APAATO..SQUARE WATTO



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-04  0:16       ` Po Lu
@ 2024-08-04  4:57         ` Eli Zaretskii
  2024-08-04  7:58           ` Po Lu
  2024-08-05 16:25         ` Eli Zaretskii
  1 sibling, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2024-08-04  4:57 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Sun, 04 Aug 2024 08:16:20 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Someone reports that this set of characters still does not enable the
> >> detection of WenQuanYi Micro Hei, which is certainly complete enough to
> >> display all Han text that will be encountered in practice.  U+2E90,
> >> U+2F00, U+3300 and U+3400 are absent from this font, and quite
> >> reasonably so, since they are freestanding radicals, Kana, which belong
> >> in the entry for kana rather than han, or obsolete.
> >
> > On what system did that happen?
> 
> Not "what system", "which font": WenQuanYi Micro Hei, one of the better
> free Han fonts.

<Shrug>Then users should perhaps look for better fonts.  I'm quite
astonished to hear that free fonts on free systems do so much worse a
job than MS-Windows.  I have hard time believing that.

> > And I don't understand why you say these characters are Kana, this
> > page disagrees:
> >
> >   https://en.wikipedia.org/wiki/Kangxi_radical
> 
> That's U+2E90.  U+3300 is Kana, according to Scripts.txt:
> 
> 3300..3357    ; Katakana # So  [88] SQUARE APAATO..SQUARE WATTO

That's just one block out of 4 that you mentioned.  And if we want to
treat that as Kana, we should change admin/blocks.awk first.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-04  4:57         ` Eli Zaretskii
@ 2024-08-04  7:58           ` Po Lu
  0 siblings, 0 replies; 11+ messages in thread
From: Po Lu @ 2024-08-04  7:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> <Shrug>Then users should perhaps look for better fonts.  I'm quite
> astonished to hear that free fonts on free systems do so much worse a
> job than MS-Windows.  I have hard time believing that.

Since they are sufficient, Microsoft's excess is not the standard by
which to set Emacs's expectations.  (Which expectations must not be set
by proprietary fonts in any event.)

>> > And I don't understand why you say these characters are Kana, this
>> > page disagrees:
>> >
>> >   https://en.wikipedia.org/wiki/Kangxi_radical
>> 
>> That's U+2E90.  U+3300 is Kana, according to Scripts.txt:
>> 
>> 3300..3357    ; Katakana # So  [88] SQUARE APAATO..SQUARE WATTO
>
> That's just one block out of 4 that you mentioned.

The remainder are, as I said, radicals or obsolete, which are not to be
found in real documents and many perfectly serviceable fonts.

> And if we want to treat that as Kana, we should change
> admin/blocks.awk first.

Its not being treated as Kana is a bug in blocks.awk, so this is a
forgone conclusion.  Regardless, this character should be deleted from
script-representative-chars, because on my system it is provided by:

  xfthb:-ADBO-Noto Sans CJK JP-regular-normal-normal-*-16-*-*-*-*-0-iso10646-1 (#x889)

which is not the proper regional variant of Noto Sans (or Serif) CJK for
Chinese text.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-04  0:16       ` Po Lu
  2024-08-04  4:57         ` Eli Zaretskii
@ 2024-08-05 16:25         ` Eli Zaretskii
  2024-08-05 23:58           ` Po Lu
  1 sibling, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2024-08-05 16:25 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Sun, 04 Aug 2024 08:16:20 +0800
> 
> >> > --- a/lisp/international/fontset.el
> >> > +++ b/lisp/international/fontset.el
> >> > @@ -208,8 +208,7 @@
> >> >  	(kana #x304B)
> >> >  	(bopomofo #x3105)
> >> >  	(kanbun #x319D)
> >> > -	(han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 #x31c0 #x4e10
> >> > -             #x5B57 #xfe30 #xf900)
> >> > +	(han #x2e90 #x2f00 #x3200 #x3300 #x3400 #x4e10 #x5B57 #xfe30 #xf900)
> >> 
> >> Someone reports that this set of characters still does not enable the
> >> detection of WenQuanYi Micro Hei, which is certainly complete enough to
> >> display all Han text that will be encountered in practice.  U+2E90,
> >> U+2F00, U+3300 and U+3400 are absent from this font, and quite
> >> reasonably so, since they are freestanding radicals, Kana, which belong
> >> in the entry for kana rather than han, or obsolete.
> >
> > On what system did that happen?
> 
> Not "what system", "which font": WenQuanYi Micro Hei, one of the better
> free Han fonts.

If you remove U+2E90, U+2F00, U+3300 and U+3400 from the list and
rebuild Emacs, what happens if you insert U+2F75?  Does Emacs succeed
to find another font which support that codepoint or does it appear as
tofu?  If the latter, what happens if you in install some additional
font which does support U+2F75?

IOW, I'm interested to know what happens on GNU/Linux if more than one
font is available that together cover both the "usual" han characters
and those additional ones which you think we should remove from
script-representative-chars, but neither of these fonts supports all
of those characters.  Can Emacs solve this by itself on GNU/Linux, or
does it need "help" from the user's customization of the fontset?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-05 16:25         ` Eli Zaretskii
@ 2024-08-05 23:58           ` Po Lu
  2024-08-06 11:35             ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Po Lu @ 2024-08-05 23:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> If you remove U+2E90, U+2F00, U+3300 and U+3400 from the list and
> rebuild Emacs, what happens if you insert U+2F75?  Does Emacs succeed
> to find another font which support that codepoint or does it appear as
> tofu?  If the latter, what happens if you in install some additional
> font which does support U+2F75?

I'll ask, but my intuition is that no font will be discovered, since a
font must support all of any characters defined as lists in
script-representative-chars to be eligible.

> IOW, I'm interested to know what happens on GNU/Linux if more than one
> font is available that together cover both the "usual" han characters
> and those additional ones which you think we should remove from
> script-representative-chars, but neither of these fonts supports all
> of those characters.  Can Emacs solve this by itself on GNU/Linux, or
> does it need "help" from the user's customization of the fontset?

Probably the latter, unless `han' is divided into scripts for
characters, obsolete characters, radicals, and the like.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-05 23:58           ` Po Lu
@ 2024-08-06 11:35             ` Eli Zaretskii
  2024-08-07  0:17               ` Po Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2024-08-06 11:35 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Tue, 06 Aug 2024 07:58:54 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > If you remove U+2E90, U+2F00, U+3300 and U+3400 from the list and
> > rebuild Emacs, what happens if you insert U+2F75?  Does Emacs succeed
> > to find another font which support that codepoint or does it appear as
> > tofu?  If the latter, what happens if you in install some additional
> > font which does support U+2F75?
> 
> I'll ask, but my intuition is that no font will be discovered, since a
> font must support all of any characters defined as lists in
> script-representative-chars to be eligible.

Note that I said "if you remove those characters".

If you did note that, then does it mean when U+2F75 needs to be
installed and the current font for han doesn't support it, Emacs will
never try to look for _another_ font which supports han characters?
Or will it try, but always fail?

> > IOW, I'm interested to know what happens on GNU/Linux if more than one
> > font is available that together cover both the "usual" han characters
> > and those additional ones which you think we should remove from
> > script-representative-chars, but neither of these fonts supports all
> > of those characters.  Can Emacs solve this by itself on GNU/Linux, or
> > does it need "help" from the user's customization of the fontset?
> 
> Probably the latter, unless `han' is divided into scripts for
> characters, obsolete characters, radicals, and the like.

That is again quite disappointing, since I always thought font
backends based on Fontconfig can do a better job, because (AFAIR)
Fontconfig caches the font information and makes it available for
programs that search fonts covering specific characters.

What you describe happens on MS-Windows, but there we don't have a way
to test whether a font supports a character without actually loading
the font (the 'has_char' backends method always fails).



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-06 11:35             ` Eli Zaretskii
@ 2024-08-07  0:17               ` Po Lu
  2024-08-07 11:47                 ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Po Lu @ 2024-08-07  0:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Note that I said "if you remove those characters".
>
> If you did note that, then does it mean when U+2F75 needs to be
> installed and the current font for han doesn't support it, Emacs will
> never try to look for _another_ font which supports han characters?
> Or will it try, but always fail?

How do you mean?  During Emacs's search for a suitable font, it is yet
to decide what is the "current font for han."

>> > IOW, I'm interested to know what happens on GNU/Linux if more than one
>> > font is available that together cover both the "usual" han characters
>> > and those additional ones which you think we should remove from
>> > script-representative-chars, but neither of these fonts supports all
>> > of those characters.  Can Emacs solve this by itself on GNU/Linux, or
>> > does it need "help" from the user's customization of the fontset?
>> 
>> Probably the latter, unless `han' is divided into scripts for
>> characters, obsolete characters, radicals, and the like.
>
> That is again quite disappointing, since I always thought font
> backends based on Fontconfig can do a better job, because (AFAIR)
> Fontconfig caches the font information and makes it available for
> programs that search fonts covering specific characters.

Fontconfig is capable of this, but not telepathy.  If Emacs submits
multiple requests for such and such a list of characters, ftfont cannot
telepathically deduce that in the one instance it should only consider
those characters which are in common usage, while in the other radicals
or obsolete characters.

> What you describe happens on MS-Windows, but there we don't have a way
> to test whether a font supports a character without actually loading
> the font (the 'has_char' backends method always fails).




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-07  0:17               ` Po Lu
@ 2024-08-07 11:47                 ` Eli Zaretskii
  2024-08-07 12:12                   ` Po Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2024-08-07 11:47 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Wed, 07 Aug 2024 08:17:08 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Note that I said "if you remove those characters".
> >
> > If you did note that, then does it mean when U+2F75 needs to be
> > installed and the current font for han doesn't support it, Emacs will
> > never try to look for _another_ font which supports han characters?
> > Or will it try, but always fail?
> 
> How do you mean?  During Emacs's search for a suitable font, it is yet
> to decide what is the "current font for han."

I mean the following scenario:

  . start Emacs
  . type some common han character, which will be displayed by a font
    that supports the common han characters
  . type some rare han character, such as U+2F75, not supported by the
    font chosen in the previous step

I'm asking whether Emacs will in step 3 search and find a font which
can display U+2F75, or will it show tofu because it already has a han
font, and that font doesn't support U+2F75?

> >> Probably the latter, unless `han' is divided into scripts for
> >> characters, obsolete characters, radicals, and the like.
> >
> > That is again quite disappointing, since I always thought font
> > backends based on Fontconfig can do a better job, because (AFAIR)
> > Fontconfig caches the font information and makes it available for
> > programs that search fonts covering specific characters.
> 
> Fontconfig is capable of this, but not telepathy.  If Emacs submits
> multiple requests for such and such a list of characters, ftfont cannot
> telepathically deduce that in the one instance it should only consider
> those characters which are in common usage, while in the other radicals
> or obsolete characters.

But when Emacs actually needs to display one of those rare characters,
will Emacs which uses Fontconfig then be able to find a suitable font,
if it is installed?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: master 15afa72460b: Fix 'script-representative-chars' for the 'han' script
  2024-08-07 11:47                 ` Eli Zaretskii
@ 2024-08-07 12:12                   ` Po Lu
  0 siblings, 0 replies; 11+ messages in thread
From: Po Lu @ 2024-08-07 12:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Wed, 07 Aug 2024 08:17:08 +0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Note that I said "if you remove those characters".
>> >
>> > If you did note that, then does it mean when U+2F75 needs to be
>> > installed and the current font for han doesn't support it, Emacs will
>> > never try to look for _another_ font which supports han characters?
>> > Or will it try, but always fail?
>> 
>> How do you mean?  During Emacs's search for a suitable font, it is yet
>> to decide what is the "current font for han."
>
> I mean the following scenario:
>
>   . start Emacs
>   . type some common han character, which will be displayed by a font
>     that supports the common han characters
>   . type some rare han character, such as U+2F75, not supported by the
>     font chosen in the previous step
>
> I'm asking whether Emacs will in step 3 search and find a font which
> can display U+2F75, or will it show tofu because it already has a han
> font, and that font doesn't support U+2F75?

In principle, yes, but with the important exception that the font which
supports U+2F75 must also support all of the characters in the entry in
script-representative-chars for han.

> But when Emacs actually needs to display one of those rare characters,
> will Emacs which uses Fontconfig then be able to find a suitable font,
> if it is installed?

The answer is yes, at least subject to the above.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-08-07 12:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <172267024373.1752.11669700725951474437@vcs2.savannah.gnu.org>
     [not found] ` <20240803073044.42052C1CAF7@vcs2.savannah.gnu.org>
2024-08-03  9:27   ` master 15afa72460b: Fix 'script-representative-chars' for the 'han' script Po Lu
2024-08-03 15:23     ` Eli Zaretskii
2024-08-04  0:16       ` Po Lu
2024-08-04  4:57         ` Eli Zaretskii
2024-08-04  7:58           ` Po Lu
2024-08-05 16:25         ` Eli Zaretskii
2024-08-05 23:58           ` Po Lu
2024-08-06 11:35             ` Eli Zaretskii
2024-08-07  0:17               ` Po Lu
2024-08-07 11:47                 ` Eli Zaretskii
2024-08-07 12:12                   ` Po Lu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).