unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
@ 2024-07-31 15:45 Eli Zaretskii
  2024-08-01  0:07 ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2024-07-31 15:45 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

I've reverted the above commit.  The change which added those
characters was not an accident: I found that Emacs would choose an
inappropriate (sub-optimal) font for Chinese characters because it
generally stops looking once it find the first font that fulfills the
requirements.  The font Emacs sometimes selects due to those
characters missing lacked support for important Han blocks because
those blocks had no characters in script-representative-chars.

If this causes problems to Android, then please implement a fix that
is specific to Android, without affecting other platforms.

Thanks.

P.S. And once again, when you undo changes done by someone else just a
few days ago, please discuss this before making the change.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-07-31 15:45 master bf0aeaa0d7a: Re-enable displaying `han' characters on Android Eli Zaretskii
@ 2024-08-01  0:07 ` Po Lu
  2024-08-01  0:33   ` Po Lu
                     ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Po Lu @ 2024-08-01  0:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I've reverted the above commit.  The change which added those
> characters was not an accident: I found that Emacs would choose an
> inappropriate (sub-optimal) font for Chinese characters because it
> generally stops looking once it find the first font that fulfills the
> requirements.

The reason behind your discovery is that with your choice of
`script-representative-chars', no font will ever match this font spec
(in the default fontset):

          ,(font-spec :registry "iso10646-1" :script 'han)

so that Emacs returns to the preceding ones, which specify a design
language rather than a script:

	  ,(font-spec :registry "iso10646-1" :lang 'ja)
	  ,(font-spec :registry "iso10646-1" :lang 'zh)

which is supported elsewhere than on Android.

> The font Emacs sometimes selects due to those characters missing
> lacked support for important Han blocks because those blocks had no
> characters in script-representative-chars.

I didn't revert your change in whole, only characters beyond the BMP
that seldom appear in real Chinese writing; of the characters that were
deleted:

  #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804

the first is "SQUARED CJK UNIFIED IDEOGRAPH-624B", which is a stylized
variant of its base character that is absent from Droid Sans Fallback.
The remainder, #x2a700, #x2b740, #x2b820, #x2ceb0, and #x2f804 are
esoteric characters that are provided by no CJK font on my GNU/Linux
system, or compatibility ideographs that were never designed to be
displayed.  Needless to say, neither are they provided by any of the CJK
fonts users will probably install on Android

> If this causes problems to Android, then please implement a fix that
> is specific to Android, without affecting other platforms.

It does affect other platforms, but I'm only in the habit of installing
master regularly on Android.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  0:07 ` Po Lu
@ 2024-08-01  0:33   ` Po Lu
  2024-08-01  5:52     ` Eli Zaretskii
  2024-08-01  5:32   ` Eli Zaretskii
  2024-08-01  7:57   ` Andrea Corallo
  2 siblings, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-01  0:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> It does affect other platforms, but I'm only in the habit of installing
> master regularly on Android.

In the event, this was not completely accurate.  I've specialized some
generic code to Android that was enabled for all systems by accident,
rendering this matter moot, as `script-representative-chars' should not
have been consulted on other systems at the outset.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  0:07 ` Po Lu
  2024-08-01  0:33   ` Po Lu
@ 2024-08-01  5:32   ` Eli Zaretskii
  2024-08-01  8:16     ` Po Lu
  2024-08-01  7:57   ` Andrea Corallo
  2 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-01  5:32 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 01 Aug 2024 08:07:35 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I've reverted the above commit.  The change which added those
> > characters was not an accident: I found that Emacs would choose an
> > inappropriate (sub-optimal) font for Chinese characters because it
> > generally stops looking once it find the first font that fulfills the
> > requirements.
> 
> The reason behind your discovery is that with your choice of
> `script-representative-chars', no font will ever match this font spec
> (in the default fontset):
> 
>           ,(font-spec :registry "iso10646-1" :script 'han)
> 
> so that Emacs returns to the preceding ones, which specify a design
> language rather than a script:
> 
> 	  ,(font-spec :registry "iso10646-1" :lang 'ja)
> 	  ,(font-spec :registry "iso10646-1" :lang 'zh)
> 
> which is supported elsewhere than on Android.

I don't understand what you are saying.  What is "my discovery"?  And
why no font will ever match the font spec for 'han'? I've definitely
seen fonts that support _all_ of the characters I added to
script-representative-chars, so why wouldn't they be found?

> > The font Emacs sometimes selects due to those characters missing
> > lacked support for important Han blocks because those blocks had no
> > characters in script-representative-chars.
> 
> I didn't revert your change in whole, only characters beyond the BMP
> that seldom appear in real Chinese writing; of the characters that were
> deleted:
> 
>   #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804

I added them because when those sub-optimal fonts are selected, some
of these characters appear as "tofu", which is ridiculous on a system
that has fonts installed that cover all of them.

And "seldom" is in the eyes of the beholder, at least IME.  When one
has text with these characters, the absolute frequency of their
appearance is not very relevant; what _is_ relevant is the fact that
the character cannot be shown by Emacs.

> the first is "SQUARED CJK UNIFIED IDEOGRAPH-624B", which is a stylized
> variant of its base character that is absent from Droid Sans Fallback.
> The remainder, #x2a700, #x2b740, #x2b820, #x2ceb0, and #x2f804 are
> esoteric characters that are provided by no CJK font on my GNU/Linux
> system, or compatibility ideographs that were never designed to be
> displayed.  Needless to say, neither are they provided by any of the CJK
> fonts users will probably install on Android

If there's no fonts installed that support those representative
characters, and Emacs is capable of finding less capable fonts that
support some of CJK (e.g., the BMP blocks), then why is that a
problem?  The purpose of the change is to allow Emacs to find better
fonts if they are installed, instead of ignoring them.  How is that a
Bad Thing?

I still don't understand why this breaks Android, btw.  If Emacs
employs the fallback font specs with :lang you show above, why don't
they work for Android?

> > If this causes problems to Android, then please implement a fix that
> > is specific to Android, without affecting other platforms.
> 
> It does affect other platforms, but I'm only in the habit of installing
> master regularly on Android.

The log message talked only about Android.  If users on GNU/Linux
report problems caused by this change, with enough details to
understand the problems, we can reconsider the change and modify it or
even revert.  But I do need the details on those other platforms to
think and discuss this intelligently.  This is all about details, it
isn't an abstract or academic issue.  Choosing which characters to
consider representative is frequently a judgment call based on
practical considerations and practical problems with existing fonts.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  0:33   ` Po Lu
@ 2024-08-01  5:52     ` Eli Zaretskii
  2024-08-01  7:55       ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-01  5:52 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 01 Aug 2024 08:33:46 +0800
> 
> Po Lu <luangruo@yahoo.com> writes:
> 
> > It does affect other platforms, but I'm only in the habit of installing
> > master regularly on Android.
> 
> In the event, this was not completely accurate.  I've specialized some
> generic code to Android that was enabled for all systems by accident,
> rendering this matter moot, as `script-representative-chars' should not
> have been consulted on other systems at the outset.

I don't understand the changes you installed.  The comments and the
log message don't tell enough, and you have again installed the
changes before discussing them, although I explicitly asked you not to
do that.

I see the changes in fontset setup related to 'han', which make them
specific to Android.  But the representative characters _are_ used on
other systems, at least on MS-Windows (and AFAIU on other systems as
well: see ftfont.c and font.c).  So why you again removed the SMP
characters from the list is not clear to me; I think it's a mistake
and tend to revert that part.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  5:52     ` Eli Zaretskii
@ 2024-08-01  7:55       ` Po Lu
  2024-08-01  8:52         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-01  7:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Thu, 01 Aug 2024 08:33:46 +0800
>> 
>> Po Lu <luangruo@yahoo.com> writes:
>> 
>> > It does affect other platforms, but I'm only in the habit of installing
>> > master regularly on Android.
>> 
>> In the event, this was not completely accurate.  I've specialized some
>> generic code to Android that was enabled for all systems by accident,
>> rendering this matter moot, as `script-representative-chars' should not
>> have been consulted on other systems at the outset.
>
> I don't understand the changes you installed.  The comments and the
> log message don't tell enough, and you have again installed the
> changes before discussing them, although I explicitly asked you not to
> do that.
>
> I see the changes in fontset setup related to 'han', which make them
> specific to Android.  But the representative characters _are_ used on
> other systems, at least on MS-Windows (and AFAIU on other systems as
> well: see ftfont.c and font.c).  So why you again removed the SMP
> characters from the list is not clear to me; I think it's a mistake
> and tend to revert that part.

No font-spec in the default fontset now specifies the `han' script on
these systems, as in Emacs 29, so that `script-representative-chars' is
no longer consulted in connection with it.  What is specified is QClang,
which is tested against font metadata (e.g., the design language) rather
than character repertories.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  0:07 ` Po Lu
  2024-08-01  0:33   ` Po Lu
  2024-08-01  5:32   ` Eli Zaretskii
@ 2024-08-01  7:57   ` Andrea Corallo
  2 siblings, 0 replies; 36+ messages in thread
From: Andrea Corallo @ 2024-08-01  7:57 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> I've reverted the above commit.  The change which added those
>> characters was not an accident: I found that Emacs would choose an
>> inappropriate (sub-optimal) font for Chinese characters because it
>> generally stops looking once it find the first font that fulfills the
>> requirements.
>
> The reason behind your discovery is that with your choice of
> `script-representative-chars', no font will ever match this font spec
> (in the default fontset):
>
>           ,(font-spec :registry "iso10646-1" :script 'han)
>
> so that Emacs returns to the preceding ones, which specify a design
> language rather than a script:
>
> 	  ,(font-spec :registry "iso10646-1" :lang 'ja)
> 	  ,(font-spec :registry "iso10646-1" :lang 'zh)
>
> which is supported elsewhere than on Android.
>
>> The font Emacs sometimes selects due to those characters missing
>> lacked support for important Han blocks because those blocks had no
>> characters in script-representative-chars.
>
> I didn't revert your change in whole, only characters beyond the BMP
> that seldom appear in real Chinese writing; of the characters that were
> deleted:

Still, even if is not a complete revert, if you are undoing even
partially a change by someone else, please discuss first why you'd want
to do it on the list, especially if it's recent change.

As this discussion proves the consequences of this change are not
trivial.

Thanks

  Andrea



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  5:32   ` Eli Zaretskii
@ 2024-08-01  8:16     ` Po Lu
  2024-08-01  9:49       ` Eli Zaretskii
  2024-08-02 10:44       ` Benjamin Riefenstahl
  0 siblings, 2 replies; 36+ messages in thread
From: Po Lu @ 2024-08-01  8:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I don't understand what you are saying.  What is "my discovery"?  And
> why no font will ever match the font spec for 'han'? I've definitely
> seen fonts that support _all_ of the characters I added to
> script-representative-chars, so why wouldn't they be found?

I wasn't implying that no font will ever support one or more of these
characters, but that such fonts are sufficiently obscure that the
chances of a CJK font's being located by this value of
`script-representative-chars' is nil.

>> > The font Emacs sometimes selects due to those characters missing
>> > lacked support for important Han blocks because those blocks had no
>> > characters in script-representative-chars.
>> 
>> I didn't revert your change in whole, only characters beyond the BMP
>> that seldom appear in real Chinese writing; of the characters that were
>> deleted:
>> 
>>   #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804
>
> I added them because when those sub-optimal fonts are selected, some
> of these characters appear as "tofu", which is ridiculous on a system
> that has fonts installed that cover all of them.

These characters may be han script, but they are not attested in CJK
documents in practice, except in contrived scenarios such as conversions
between incomplete character encodings.

> And "seldom" is in the eyes of the beholder, at least IME.  When one
> has text with these characters, the absolute frequency of their
> appearance is not very relevant; what _is_ relevant is the fact that
> the character cannot be shown by Emacs.

In practice, the outcome of this principle is that no font is detected
with which to display CJK documents featuring none of these characters,
very much against the expectations of CJK users.

> If there's no fonts installed that support those representative
> characters, and Emacs is capable of finding less capable fonts that
> support some of CJK (e.g., the BMP blocks), then why is that a
> problem?

I thought I explained that Emacs is _not_ capable of doing so on
Android.

> The purpose of the change is to allow Emacs to find better fonts if
> they are installed, instead of ignoring them.  How is that a Bad
> Thing?

Because it renders the `han' script incapable of matching any fonts that
are installed in practice.

> I still don't understand why this breaks Android, btw.  If Emacs
> employs the fallback font specs with :lang you show above, why don't
> they work for Android?

The problem is that QClang is not available on Android, because fonts do
not provide their design languages in one of the standard TrueType
tables its font backend groks, which deficiency prompted the addition of
the font spec in question in:

2023-02-16  Po Lu  <luangruo@yahoo.com>

	* doc/emacs/android.texi (Android Fonts):
	* doc/emacs/input.texi (On-Screen Keyboards):
	* doc/lispref/commands.texi (Misc Events): Update documentation.

	* java/org/gnu/emacs/EmacsInputConnection.java (setSelection): New
	function.
	* java/org/gnu/emacs/EmacsSurfaceView.java
	(reconfigureFrontBuffer): Make bitmap references weak references.

	* java/org/gnu/emacs/EmacsView.java (handleDirtyBitmap): Don't
	clear surfaceView bitmap.

	* lisp/comint.el (comint-mode): Set text-conversion-style to
	`action' so on screen keyboards' Return buttons send an actual key
	press event.

	* lisp/international/fontset.el (script-representative-chars)
	(setup-default-fontset): Improve detection of CJK fonts.

> The log message talked only about Android.  If users on GNU/Linux
> report problems caused by this change, with enough details to
> understand the problems, we can reconsider the change and modify it or
> even revert.  But I do need the details on those other platforms to
> think and discuss this intelligently.  This is all about details, it
> isn't an abstract or academic issue.  Choosing which characters to
> consider representative is frequently a judgment call based on
> practical considerations and practical problems with existing fonts.

The fact of the matter is that:

(let ((script-representative-chars
       '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900
	      #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804))))
  (clear-font-cache)
  (find-font (font-spec :registry "iso10646-1" :script 'han
                        :type 'xfthb))) ;; or another ftfont backend.

returns no font on an up-to-date Fedora Workstation installation with a
wealth of multilingual fonts for CJK scripts, whereas:

(let ((script-representative-chars
       '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900))))
  (clear-font-cache)
  (find-font (font-spec :registry "iso10646-1" :script 'han
                        :type 'xfthb)))

returns:

#<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0>

which is more than adequate for editing CJK text in my language and
others.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  7:55       ` Po Lu
@ 2024-08-01  8:52         ` Eli Zaretskii
  2024-08-01  9:47           ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-01  8:52 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 01 Aug 2024 15:55:43 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> In the event, this was not completely accurate.  I've specialized some
> >> generic code to Android that was enabled for all systems by accident,
> >> rendering this matter moot, as `script-representative-chars' should not
> >> have been consulted on other systems at the outset.
> >
> > I don't understand the changes you installed.  The comments and the
> > log message don't tell enough, and you have again installed the
> > changes before discussing them, although I explicitly asked you not to
> > do that.
> >
> > I see the changes in fontset setup related to 'han', which make them
> > specific to Android.  But the representative characters _are_ used on
> > other systems, at least on MS-Windows (and AFAIU on other systems as
> > well: see ftfont.c and font.c).  So why you again removed the SMP
> > characters from the list is not clear to me; I think it's a mistake
> > and tend to revert that part.
> 
> No font-spec in the default fontset now specifies the `han' script on
> these systems, as in Emacs 29, so that `script-representative-chars' is
> no longer consulted in connection with it.  What is specified is QClang,
> which is tested against font metadata (e.g., the design language) rather
> than character repertories.

But users can add a font spec for 'han' to the fontset, cannot they?
And if they do, then the representative characters _are_ important,
aren't they?  So I don't think we should remove those characters.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  8:52         ` Eli Zaretskii
@ 2024-08-01  9:47           ` Po Lu
  2024-08-01  9:56             ` Eli Zaretskii
  2024-08-01 21:17             ` Dmitry Gutov
  0 siblings, 2 replies; 36+ messages in thread
From: Po Lu @ 2024-08-01  9:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> But users can add a font spec for 'han' to the fontset, cannot they?
> And if they do, then the representative characters _are_ important,
> aren't they?  So I don't think we should remove those characters.

Such an action would be pointless, as the fontset would not match any
CJK font actually in existence, and it would break the Android build to
boot.  If anyone seriously considers non-existent characters important
enough to construct a font spec that matches them, he can easily amend
script-representative-chars for himself or define another script.  If
these pages are opened, for example:

  https://www.compart.com/en/unicode/U+20000
  https://www.compart.com/en/unicode/U+2a700
  https://www.compart.com/en/unicode/U+2b740
  https://www.compart.com/en/unicode/U+2b820
  https://www.compart.com/en/unicode/U+2ceb0

in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android,
tofu is displayed, and there can hardly be said to exist an OS system
that is better internationalized out of the box than is Android.  The
remaining characters:

  https://www.compart.com/en/unicode/U+2f804
  https://www.compart.com/en/unicode/U+1f210

are displayed correctly, but are barely attested or expected to be
present by CJK users in practice, and U+1F210 is arguably rather a
symbol than a proper character.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  8:16     ` Po Lu
@ 2024-08-01  9:49       ` Eli Zaretskii
  2024-08-01 10:30         ` Po Lu
  2024-08-02 10:44       ` Benjamin Riefenstahl
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-01  9:49 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 01 Aug 2024 16:16:58 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I don't understand what you are saying.  What is "my discovery"?  And
> > why no font will ever match the font spec for 'han'? I've definitely
> > seen fonts that support _all_ of the characters I added to
> > script-representative-chars, so why wouldn't they be found?
> 
> I wasn't implying that no font will ever support one or more of these
> characters, but that such fonts are sufficiently obscure that the
> chances of a CJK font's being located by this value of
> `script-representative-chars' is nil.

I very much doubt that.  I see a few fonts on my Windows 11 system
which support all of those.  I'd be very surprised to know that no
such fonts are available on a modern GNU/Linux system.

Can someone please check that?

> >>   #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804
> >
> > I added them because when those sub-optimal fonts are selected, some
> > of these characters appear as "tofu", which is ridiculous on a system
> > that has fonts installed that cover all of them.
> 
> These characters may be han script, but they are not attested in CJK
> documents in practice, except in contrived scenarios such as conversions
> between incomplete character encodings.

You keep saying that, but those are just your assertions.  These
characters are there for a reason, and they should be supported as
well as we can.  If worse comes to worst, we could split 'han' into
two or more scripts, and have separate setup in our fontsets for them.
Then each one could use its own representative characters.

> > And "seldom" is in the eyes of the beholder, at least IME.  When one
> > has text with these characters, the absolute frequency of their
> > appearance is not very relevant; what _is_ relevant is the fact that
> > the character cannot be shown by Emacs.
> 
> In practice, the outcome of this principle is that no font is detected
> with which to display CJK documents featuring none of these characters,
> very much against the expectations of CJK users.

Not IME.

> > If there's no fonts installed that support those representative
> > characters, and Emacs is capable of finding less capable fonts that
> > support some of CJK (e.g., the BMP blocks), then why is that a
> > problem?
> 
> I thought I explained that Emacs is _not_ capable of doing so on
> Android.

Then please design and implement a suitable solution for Android.  It
is not right to punish other platforms for Android-specific issues.

> > The purpose of the change is to allow Emacs to find better fonts if
> > they are installed, instead of ignoring them.  How is that a Bad
> > Thing?
> 
> Because it renders the `han' script incapable of matching any fonts that
> are installed in practice.

Again, not IME.

> > I still don't understand why this breaks Android, btw.  If Emacs
> > employs the fallback font specs with :lang you show above, why don't
> > they work for Android?
> 
> The problem is that QClang is not available on Android, because fonts do
> not provide their design languages in one of the standard TrueType
> tables its font backend groks, which deficiency prompted the addition of
> the font spec in question in:

OK, then it means we need to work around this, but without hampering
other platforms.

> The fact of the matter is that:
> 
> (let ((script-representative-chars
>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900
> 	      #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804))))
>   (clear-font-cache)
>   (find-font (font-spec :registry "iso10646-1" :script 'han
>                         :type 'xfthb))) ;; or another ftfont backend.
> 
> returns no font on an up-to-date Fedora Workstation installation with a
> wealth of multilingual fonts for CJK scripts, whereas:
> 
> (let ((script-representative-chars
>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900))))
>   (clear-font-cache)
>   (find-font (font-spec :registry "iso10646-1" :script 'han
>                         :type 'xfthb)))
> 
> returns:
> 
> #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0>
> 
> which is more than adequate for editing CJK text in my language and
> others.

Not on MS-Windows: here, both of the above return

  #<font-entity harfbuzz outline Malgun\ Gothic sans iso10646-1 bold normal normal 0 nil 0 nil>

which is a lie in the latter case, since those additional characters
are not supported by this font.  Given that this method is evidently
unreliable, I don't think we should consider this a proof of your
argument.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  9:47           ` Po Lu
@ 2024-08-01  9:56             ` Eli Zaretskii
  2024-08-01 10:13               ` Po Lu
  2024-08-01 21:17             ` Dmitry Gutov
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-01  9:56 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 01 Aug 2024 17:47:54 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > But users can add a font spec for 'han' to the fontset, cannot they?
> > And if they do, then the representative characters _are_ important,
> > aren't they?  So I don't think we should remove those characters.
> 
> Such an action would be pointless, as the fontset would not match any
> CJK font actually in existence, and it would break the Android build to
> boot.

Then Android users should not do that.  But users of other systems
could, and we should not prevent them from doing so.

> If anyone seriously considers non-existent characters important
> enough to construct a font spec that matches them, he can easily amend
> script-representative-chars for himself or define another script.  If
> these pages are opened, for example:
> 
>   https://www.compart.com/en/unicode/U+20000
>   https://www.compart.com/en/unicode/U+2a700
>   https://www.compart.com/en/unicode/U+2b740
>   https://www.compart.com/en/unicode/U+2b820
>   https://www.compart.com/en/unicode/U+2ceb0
> 
> in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android,
> tofu is displayed, and there can hardly be said to exist an OS system
> that is better internationalized out of the box than is Android.  The
> remaining characters:
> 
>   https://www.compart.com/en/unicode/U+2f804
>   https://www.compart.com/en/unicode/U+1f210
> 
> are displayed correctly, but are barely attested or expected to be
> present by CJK users in practice, and U+1F210 is arguably rather a
> symbol than a proper character.

On my Windows 11 system, I see all of them, and I didn't install any
additional fonts for CJK.  So your assertion is simply not true.
Maybe your GNU/Linux system is outdated, or maybe GNU/Linux users with
CJK interests are supposed to install optional packages that you don't
have installed.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  9:56             ` Eli Zaretskii
@ 2024-08-01 10:13               ` Po Lu
  2024-08-01 10:19                 ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-01 10:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Thu, 01 Aug 2024 17:47:54 +0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > But users can add a font spec for 'han' to the fontset, cannot they?
>> > And if they do, then the representative characters _are_ important,
>> > aren't they?  So I don't think we should remove those characters.
>> 
>> Such an action would be pointless, as the fontset would not match any
>> CJK font actually in existence, and it would break the Android build to
>> boot.
>
> Then Android users should not do that.  But users of other systems
> could, and we should not prevent them from doing so.

We don't prevent users from modifying script-representative-chars
anywhere, no?

>> If anyone seriously considers non-existent characters important
>> enough to construct a font spec that matches them, he can easily amend
>> script-representative-chars for himself or define another script.  If
>> these pages are opened, for example:
>> 
>>   https://www.compart.com/en/unicode/U+20000
>>   https://www.compart.com/en/unicode/U+2a700
>>   https://www.compart.com/en/unicode/U+2b740
>>   https://www.compart.com/en/unicode/U+2b820
>>   https://www.compart.com/en/unicode/U+2ceb0
>> 
>> in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android,
>> tofu is displayed, and there can hardly be said to exist an OS system
>> that is better internationalized out of the box than is Android.  The
>> remaining characters:
>> 
>>   https://www.compart.com/en/unicode/U+2f804
>>   https://www.compart.com/en/unicode/U+1f210
>> 
>> are displayed correctly, but are barely attested or expected to be
>> present by CJK users in practice, and U+1F210 is arguably rather a
>> symbol than a proper character.
>
> On my Windows 11 system, I see all of them, and I didn't install any
> additional fonts for CJK.  So your assertion is simply not true.
> Maybe your GNU/Linux system is outdated, or maybe GNU/Linux users with
> CJK interests are supposed to install optional packages that you don't
> have installed.

This is an installation of Fedora Workstation 40 that was updated
yesterday, and where all of the Noto font packages that are required for
displaying CJK text are undoubtedly installed.  The reason these
characters are omitted from the suggested set of CJK fonts is that there
is simply insufficient interest in these characters, and probably the
same reasonining holds on Android, where users cannot install fonts at
all, and where the entirety of these pages, save about a dozen, is tofu:

  https://commons.wikimedia.org/wiki/Category:Unicode_20000-2A6DF_CJK_Unified_Ideographs_Extension_B
  https://commons.wikimedia.org/wiki/Category:Unicode_2A700-2B73F_CJK_Unified_Ideographs_Extension_C
  https://commons.wikimedia.org/wiki/Category:Unicode_2B740-2B81F_CJK_Unified_Ideographs_Extension_D
  https://commons.wikimedia.org/wiki/Category:Unicode_2B820-2CEAF_CJK_Unified_Ideographs_Extension_E
  https://commons.wikimedia.org/wiki/Category:Unicode_2CEB0-2EBEF_CJK_Unified_Ideographs_Extension_F

Noto are apparently quite reluctant to support Extension B:

  https://github.com/notofonts/noto-cjk/issues/13

and they are the go-to source of Free CJK fonts nowadays.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01 10:13               ` Po Lu
@ 2024-08-01 10:19                 ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-01 10:19 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 01 Aug 2024 18:13:03 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Po Lu <luangruo@yahoo.com>
> >> Cc: emacs-devel@gnu.org
> >> Date: Thu, 01 Aug 2024 17:47:54 +0800
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> > But users can add a font spec for 'han' to the fontset, cannot they?
> >> > And if they do, then the representative characters _are_ important,
> >> > aren't they?  So I don't think we should remove those characters.
> >> 
> >> Such an action would be pointless, as the fontset would not match any
> >> CJK font actually in existence, and it would break the Android build to
> >> boot.
> >
> > Then Android users should not do that.  But users of other systems
> > could, and we should not prevent them from doing so.
> 
> We don't prevent users from modifying script-representative-chars
> anywhere, no?

We don't prevent users from doing silly things, no.  But they
shouldn't.

> > On my Windows 11 system, I see all of them, and I didn't install any
> > additional fonts for CJK.  So your assertion is simply not true.
> > Maybe your GNU/Linux system is outdated, or maybe GNU/Linux users with
> > CJK interests are supposed to install optional packages that you don't
> > have installed.
> 
> This is an installation of Fedora Workstation 40 that was updated
> yesterday, and where all of the Noto font packages that are required for
> displaying CJK text are undoubtedly installed.  The reason these
> characters are omitted from the suggested set of CJK fonts is that there
> is simply insufficient interest in these characters, and probably the
> same reasonining holds on Android, where users cannot install fonts at
> all, and where the entirety of these pages, save about a dozen, is tofu:
> 
>   https://commons.wikimedia.org/wiki/Category:Unicode_20000-2A6DF_CJK_Unified_Ideographs_Extension_B
>   https://commons.wikimedia.org/wiki/Category:Unicode_2A700-2B73F_CJK_Unified_Ideographs_Extension_C
>   https://commons.wikimedia.org/wiki/Category:Unicode_2B740-2B81F_CJK_Unified_Ideographs_Extension_D
>   https://commons.wikimedia.org/wiki/Category:Unicode_2B820-2CEAF_CJK_Unified_Ideographs_Extension_E
>   https://commons.wikimedia.org/wiki/Category:Unicode_2CEB0-2EBEF_CJK_Unified_Ideographs_Extension_F
> 
> Noto are apparently quite reluctant to support Extension B:
> 
>   https://github.com/notofonts/noto-cjk/issues/13
> 
> and they are the go-to source of Free CJK fonts nowadays.

I'm unconvinced, sorry.  I'll wait a bit for others to chime in,
before I decide what to do with this, but currently I tend to revert
you change to script-representative-chars.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  9:49       ` Eli Zaretskii
@ 2024-08-01 10:30         ` Po Lu
  2024-08-01 10:35           ` Eli Zaretskii
  2024-08-02 10:52           ` Benjamin Riefenstahl
  0 siblings, 2 replies; 36+ messages in thread
From: Po Lu @ 2024-08-01 10:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Thu, 01 Aug 2024 16:16:58 +0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > I don't understand what you are saying.  What is "my discovery"?  And
>> > why no font will ever match the font spec for 'han'? I've definitely
>> > seen fonts that support _all_ of the characters I added to
>> > script-representative-chars, so why wouldn't they be found?
>> 
>> I wasn't implying that no font will ever support one or more of these
>> characters, but that such fonts are sufficiently obscure that the
>> chances of a CJK font's being located by this value of
>> `script-representative-chars' is nil.
>
> I very much doubt that.  I see a few fonts on my Windows 11 system
> which support all of those.  I'd be very surprised to know that no
> such fonts are available on a modern GNU/Linux system.
>
> Can someone please check that?

I think I just did, but anyone is more than welcome to put the links I
posted into Firefox.

>> These characters may be han script, but they are not attested in CJK
>> documents in practice, except in contrived scenarios such as conversions
>> between incomplete character encodings.
>
> You keep saying that, but those are just your assertions.  These
> characters are there for a reason, and they should be supported as
> well as we can.  If worse comes to worst, we could split 'han' into
> two or more scripts, and have separate setup in our fontsets for them.
> Then each one could use its own representative characters.

I keep saying this because I read and write CJK characters in Emacs and
elsewhere, every day, without ever encountering the characters in
question, since, had I done so, the tofu would not have escaped my
notice.

> Not IME.

With respect, I think I write within compass when I say that no one has
expressed enough interest in these characters to persuade Noto
developers of their importance despite having nearly a decade:

  https://github.com/notofonts/noto-cjk/issues/13

and so these characters are likely to remain unavailable from a single
font on GNU/Linux systems for the foreseeable future.

>> > If there's no fonts installed that support those representative
>> > characters, and Emacs is capable of finding less capable fonts that
>> > support some of CJK (e.g., the BMP blocks), then why is that a
>> > problem?
>> 
>> I thought I explained that Emacs is _not_ capable of doing so on
>> Android.
>
> Then please design and implement a suitable solution for Android.  It
> is not right to punish other platforms for Android-specific issues.

No one is being punished or put at a disadvantage in any wise.
Previously, script-representative-chars was downright ignored, an
arrangement that existed till the Android port was installed and which
has been restored, and no more.

>> > The purpose of the change is to allow Emacs to find better fonts if
>> > they are installed, instead of ignoring them.  How is that a Bad
>> > Thing?
>> 
>> Because it renders the `han' script incapable of matching any fonts that
>> are installed in practice.
>
> Again, not IME.

Any GNU/Linux user is perfectly welcome to run:

  (dolist (char '(#x20000 #x2a700 #x2b740 #x2b820 #x2ceb0))
    (insert (format "%c" char)))

and reach the same conclusion I have.

>> > I still don't understand why this breaks Android, btw.  If Emacs
>> > employs the fallback font specs with :lang you show above, why don't
>> > they work for Android?
>> 
>> The problem is that QClang is not available on Android, because fonts do
>> not provide their design languages in one of the standard TrueType
>> tables its font backend groks, which deficiency prompted the addition of
>> the font spec in question in:
>
> OK, then it means we need to work around this, but without hampering
> other platforms.

Yes, that's the essence of the change installed in emacs-30.

>> The fact of the matter is that:
>> 
>> (let ((script-representative-chars
>>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
>> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900
>> 	      #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804))))
>>   (clear-font-cache)
>>   (find-font (font-spec :registry "iso10646-1" :script 'han
>>                         :type 'xfthb))) ;; or another ftfont backend.
>> 
>> returns no font on an up-to-date Fedora Workstation installation with a
>> wealth of multilingual fonts for CJK scripts, whereas:
>> 
>> (let ((script-representative-chars
>>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
>> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900))))
>>   (clear-font-cache)
>>   (find-font (font-spec :registry "iso10646-1" :script 'han
>>                         :type 'xfthb)))
>> 
>> returns:
>> 
>> #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0>
>> 
>> which is more than adequate for editing CJK text in my language and
>> others.
>
> Not on MS-Windows: here, both of the above return
>
>   #<font-entity harfbuzz outline Malgun\ Gothic sans iso10646-1 bold normal normal 0 nil 0 nil>
>
> which is a lie in the latter case, since those additional characters
> are not supported by this font.  Given that this method is evidently
> unreliable, I don't think we should consider this a proof of your
> argument.

That's queer, and I suspect something is amiss in clear-font-cache.
Nevertheless, the results from Mozilla are essentially conclusive, as
far as I am concerned.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01 10:30         ` Po Lu
@ 2024-08-01 10:35           ` Eli Zaretskii
  2024-08-02 10:52           ` Benjamin Riefenstahl
  1 sibling, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-01 10:35 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 01 Aug 2024 18:30:31 +0800
> 
> Nevertheless, the results from Mozilla are essentially conclusive, as
> far as I am concerned.

Which in my case shows all of the characters.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  9:47           ` Po Lu
  2024-08-01  9:56             ` Eli Zaretskii
@ 2024-08-01 21:17             ` Dmitry Gutov
  1 sibling, 0 replies; 36+ messages in thread
From: Dmitry Gutov @ 2024-08-01 21:17 UTC (permalink / raw)
  To: Po Lu, Eli Zaretskii; +Cc: emacs-devel

On 01/08/2024 12:47, Po Lu wrote:
> If
> these pages are opened, for example:
> 
>    https://www.compart.com/en/unicode/U+20000
>    https://www.compart.com/en/unicode/U+2a700
>    https://www.compart.com/en/unicode/U+2b740
>    https://www.compart.com/en/unicode/U+2b820
>    https://www.compart.com/en/unicode/U+2ceb0
> 
> in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android,
> tofu is displayed, and there can hardly be said to exist an OS system
> that is better internationalized out of the box than is Android.  The
> remaining characters:
> 
>    https://www.compart.com/en/unicode/U+2f804
>    https://www.compart.com/en/unicode/U+1f210
> 
> are displayed correctly, but are barely attested or expected to be
> present by CJK users in practice, and U+1F210 is arguably rather a
> symbol than a proper character.

Same here on my GNU/Linux system.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01  8:16     ` Po Lu
  2024-08-01  9:49       ` Eli Zaretskii
@ 2024-08-02 10:44       ` Benjamin Riefenstahl
  2024-08-02 11:42         ` Po Lu
  1 sibling, 1 reply; 36+ messages in thread
From: Benjamin Riefenstahl @ 2024-08-02 10:44 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, emacs-devel

Hi there,

Po Lu writes:
> (let ((script-representative-chars
>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900
> 	      #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804))))
>   (clear-font-cache)
>   (find-font (font-spec :registry "iso10646-1" :script 'han
>                         :type 'xfthb))) ;; or another ftfont backend.
>
> returns no font on an up-to-date Fedora Workstation installation with a
> wealth of multilingual fonts for CJK scripts, whereas:
>
> (let ((script-representative-chars
>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900))))
>   (clear-font-cache)
>   (find-font (font-spec :registry "iso10646-1" :script 'han
>                         :type 'xfthb)))
>
> returns:
>
> #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0>

FTR, for me these both return (with the backend ftcrhb):

#<font-entity ftcrhb GOOG Noto\ Sans\ CJK\ KR nil iso10646-1 regular
 normal normal 0 nil nil 0>

This is on "Debian GNU/Linux 12 (bookworm)".

benny



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-01 10:30         ` Po Lu
  2024-08-01 10:35           ` Eli Zaretskii
@ 2024-08-02 10:52           ` Benjamin Riefenstahl
  2024-08-02 12:29             ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Benjamin Riefenstahl @ 2024-08-02 10:52 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, emacs-devel

Po Lu writes:
>   (dolist (char '(#x20000 #x2a700 #x2b740 #x2b820 #x2ceb0))
>     (insert (format "%c" char)))

Indeed these characters seem to be unsupported in Emacs here.  They do
not show up in Firefox either.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-02 10:44       ` Benjamin Riefenstahl
@ 2024-08-02 11:42         ` Po Lu
  0 siblings, 0 replies; 36+ messages in thread
From: Po Lu @ 2024-08-02 11:42 UTC (permalink / raw)
  To: Benjamin Riefenstahl; +Cc: Eli Zaretskii, emacs-devel

Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net> writes:

> Hi there,
>
> Po Lu writes:
>> (let ((script-representative-chars
>>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
>> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900
>> 	      #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804))))
>>   (clear-font-cache)
>>   (find-font (font-spec :registry "iso10646-1" :script 'han
>>                         :type 'xfthb))) ;; or another ftfont backend.
>>
>> returns no font on an up-to-date Fedora Workstation installation with a
>> wealth of multilingual fonts for CJK scripts, whereas:
>>
>> (let ((script-representative-chars
>>        '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
>> 	      #x31c0 #x4e10 #x5B57 #xfe30 #xf900))))
>>   (clear-font-cache)
>>   (find-font (font-spec :registry "iso10646-1" :script 'han
>>                         :type 'xfthb)))
>>
>> returns:
>>
>> #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0>
>
> FTR, for me these both return (with the backend ftcrhb):
>
> #<font-entity ftcrhb GOOG Noto\ Sans\ CJK\ KR nil iso10646-1 regular
>  normal normal 0 nil nil 0>
>
> This is on "Debian GNU/Linux 12 (bookworm)".
>
> benny

Yes, it was concluded that this test is not reliable.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-02 10:52           ` Benjamin Riefenstahl
@ 2024-08-02 12:29             ` Eli Zaretskii
  2024-08-02 12:55               ` Benjamin Riefenstahl
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-02 12:29 UTC (permalink / raw)
  To: Benjamin Riefenstahl; +Cc: luangruo, emacs-devel

> From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Fri, 02 Aug 2024 12:52:40 +0200
> 
> Po Lu writes:
> >   (dolist (char '(#x20000 #x2a700 #x2b740 #x2b820 #x2ceb0))
> >     (insert (format "%c" char)))
> 
> Indeed these characters seem to be unsupported in Emacs here.  They do
> not show up in Firefox either.

So find-font returns on your system a font which doesn't actually
support some of the characters in script-representative-chars, is that
true?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-02 12:29             ` Eli Zaretskii
@ 2024-08-02 12:55               ` Benjamin Riefenstahl
  2024-08-02 13:13                 ` Benjamin Riefenstahl
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Riefenstahl @ 2024-08-02 12:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, emacs-devel

Eli Zaretskii writes:
> So find-font returns on your system a font which doesn't actually
> support some of the characters in script-representative-chars, is that
> true?

I'm just reporting what I saw in testing earlier.  Now, when I try to
repeat the find-font test I get nil for both tests.  This is both in my
current (new) session as well as in emacs -Q.  I don't know what is
going on.  Must have been something I did earlier in that other session.

  (dolist (char '(#x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400
                  #x31c0 #x4e10 #x5B57 #xfe30 #xf900
                  #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0))
    (insert (format "%c" char)))

Still shows that the characters in the first two lines work, while the
last line does not.

This is all on:

GNU Emacs 30.0.60 (build 5, x86_64-pc-linux-gnu, GTK+ Version 3.24.38,
cairo version 1.16.0) of 2024-07-25



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-02 12:55               ` Benjamin Riefenstahl
@ 2024-08-02 13:13                 ` Benjamin Riefenstahl
  2024-08-03  7:12                   ` pipcet
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Riefenstahl @ 2024-08-02 13:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, emacs-devel

Benjamin Riefenstahl writes:
> I don't know what is going on.  Must have been something I did earlier
> in that other session.

It's the order.  If I first evaluate the version with fewer characters,
I get a font for that and for the longer list of characters after that,
too.

If I do it the other way round, I get nil for both.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-02 13:13                 ` Benjamin Riefenstahl
@ 2024-08-03  7:12                   ` pipcet
  2024-08-03  8:52                     ` Po Lu
  2024-08-03 15:15                     ` Eli Zaretskii
  0 siblings, 2 replies; 36+ messages in thread
From: pipcet @ 2024-08-03  7:12 UTC (permalink / raw)
  To: Benjamin Riefenstahl; +Cc: Eli Zaretskii, luangruo, emacs-devel

"Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes:

> Benjamin Riefenstahl writes:
>> I don't know what is going on.  Must have been something I did earlier
>> in that other session.
>
> It's the order.  If I first evaluate the version with fewer characters,
> I get a font for that and for the longer list of characters after that,
> too.

I've looked at that a little, and I don't think 'clear-font-cache', uh,
clears the font cache.

ftfont.c also interns random binary strings as symbols here. This helps:

diff --git a/src/ftfont.c b/src/ftfont.c
index c89feea1d46..882d3eec256 100644
--- a/src/ftfont.c
+++ b/src/ftfont.c
@@ -174,11 +174,11 @@ get_adstyle_property (FcPattern *p)
   USE_SAFE_ALLOCA;
   tmp = SAFE_ALLOCA (end - str);
   for (i = 0; i < end - str; ++i)
-    tmp[i] = ((end[i] != '?'
-	       && end[i] != '*'
-	       && end[i] != '"'
-	       && end[i] != '-')
-	      ? end[i] : ' ');
+    tmp[i] = ((str[i] != '?'
+	       && str[i] != '*'
+	       && str[i] != '"'
+	       && str[i] != '-')
+	      ? str[i] : ' ');
   adstyle = font_intern_prop (tmp, end - str, 1);
   SAFE_FREE ();
   if (font_style_to_value (FONT_WIDTH_INDEX, adstyle, 0) >= 0)

xfont.c is particularly weird: it's limited to 64k characters, of
course, but it also hardcodes 'han as a supported script for all
Japanese or Korean fonts; and xfont_has_char will return false for all
non-ASCII chars in iso10646-1 fonts that don't have "ja" or "ko"
adstyles. In addition, it has its own caching mechanism
(xfont_scripts_cache) which is never cleared, shrunk, or exposed to
Lisp.

sfntfont.c only looks at the first fixnum in a vector specified in
Vscript_representative_chars, and fails if it isn't there, even though
it should continue looking.

w32font.c seems to ignore Vscript_representative_chars entirely. This
also appears to apply to the harfbuzz backend.

GNU Unifont now supports #x20000 (since February), but is split into two
fonts, one for the BMP and one for the upper planes, so it won't be
detected here.

So I'm not sure which font backends the additional required characters
are supposed to have a positive effect on.

Pip




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03  7:12                   ` pipcet
@ 2024-08-03  8:52                     ` Po Lu
  2024-08-03  9:21                       ` pipcet
  2024-08-03 15:15                     ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-03  8:52 UTC (permalink / raw)
  To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

pipcet@protonmail.com writes:

> "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes:
>
>> Benjamin Riefenstahl writes:
>>> I don't know what is going on.  Must have been something I did earlier
>>> in that other session.
>>
>> It's the order.  If I first evaluate the version with fewer characters,
>> I get a font for that and for the longer list of characters after that,
>> too.
>
> I've looked at that a little, and I don't think 'clear-font-cache', uh,
> clears the font cache.
>
> ftfont.c also interns random binary strings as symbols here. This helps:
>
> diff --git a/src/ftfont.c b/src/ftfont.c
> index c89feea1d46..882d3eec256 100644
> --- a/src/ftfont.c
> +++ b/src/ftfont.c
> @@ -174,11 +174,11 @@ get_adstyle_property (FcPattern *p)
>    USE_SAFE_ALLOCA;
>    tmp = SAFE_ALLOCA (end - str);
>    for (i = 0; i < end - str; ++i)
> -    tmp[i] = ((end[i] != '?'
> -	       && end[i] != '*'
> -	       && end[i] != '"'
> -	       && end[i] != '-')
> -	      ? end[i] : ' ');
> +    tmp[i] = ((str[i] != '?'
> +	       && str[i] != '*'
> +	       && str[i] != '"'
> +	       && str[i] != '-')
> +	      ? str[i] : ' ');
>    adstyle = font_intern_prop (tmp, end - str, 1);
>    SAFE_FREE ();
>    if (font_style_to_value (FONT_WIDTH_INDEX, adstyle, 0) >= 0)

[...]

> sfntfont.c only looks at the first fixnum in a vector specified in
> Vscript_representative_chars, and fails if it isn't there, even though
> it should continue looking.

These have been fixed.

> xfont.c is particularly weird: it's limited to 64k characters, of
> course, but it also hardcodes 'han as a supported script for all
> Japanese or Korean fonts; and xfont_has_char will return false for all
> non-ASCII chars in iso10646-1 fonts that don't have "ja" or "ko"
> adstyles. In addition, it has its own caching mechanism
> (xfont_scripts_cache) which is never cleared, shrunk, or exposed to
> Lisp.

No, xfont_has_char will return a value that indicates that the presence
of the character cannot be established without opening the font.

> w32font.c seems to ignore Vscript_representative_chars entirely. This
> also appears to apply to the harfbuzz backend.

I wouldn't tamper with either of these backends.

> GNU Unifont now supports #x20000 (since February), but is split into two
> fonts, one for the BMP and one for the upper planes, so it won't be
> detected here.
>
> So I'm not sure which font backends the additional required characters
> are supposed to have a positive effect on.

All except w32font, I guess.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03  8:52                     ` Po Lu
@ 2024-08-03  9:21                       ` pipcet
  2024-08-03  9:33                         ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: pipcet @ 2024-08-03  9:21 UTC (permalink / raw)
  To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> pipcet@protonmail.com writes:
>
>> "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes:
>>
>>> Benjamin Riefenstahl writes:
>>>> I don't know what is going on.  Must have been something I did earlier
>>>> in that other session.
>>>
>>> It's the order.  If I first evaluate the version with fewer characters,
>>> I get a font for that and for the longer list of characters after that,
>>> too.
>>
>> I've looked at that a little, and I don't think 'clear-font-cache', uh,
>> clears the font cache.
>>
>> ftfont.c also interns random binary strings as symbols here. This helps:
>>
>> diff --git a/src/ftfont.c b/src/ftfont.c
>> index c89feea1d46..882d3eec256 100644
>> --- a/src/ftfont.c
>> +++ b/src/ftfont.c
>> @@ -174,11 +174,11 @@ get_adstyle_property (FcPattern *p)
>>    USE_SAFE_ALLOCA;
>>    tmp = SAFE_ALLOCA (end - str);
>>    for (i = 0; i < end - str; ++i)
>> -    tmp[i] = ((end[i] != '?'
>> -	       && end[i] != '*'
>> -	       && end[i] != '"'
>> -	       && end[i] != '-')
>> -	      ? end[i] : ' ');
>> +    tmp[i] = ((str[i] != '?'
>> +	       && str[i] != '*'
>> +	       && str[i] != '"'
>> +	       && str[i] != '-')
>> +	      ? str[i] : ' ');
>>    adstyle = font_intern_prop (tmp, end - str, 1);
>>    SAFE_FREE ();
>>    if (font_style_to_value (FONT_WIDTH_INDEX, adstyle, 0) >= 0)
>
> [...]
>
>> sfntfont.c only looks at the first fixnum in a vector specified in
>> Vscript_representative_chars, and fails if it isn't there, even though
>> it should continue looking.
>
> These have been fixed.

Thank you!

One more thing I'm noticing is that on Android, the foundry
string/ach_vendor_id is interned as OG^A@ rather than GOOG. It seems to
me that removing the second read() call in
daefd6771a4879bb8e71ea67f69522700155df01 may have caused the
problem. IIUC, panose[] is four-byte-aligned, but ul_unicode_range is
not, and that messes up our offsets. Is that correct?

Are we actually using ul_unicode_range, by the way?

>> xfont.c is particularly weird: it's limited to 64k characters, of
>> course, but it also hardcodes 'han as a supported script for all
>> Japanese or Korean fonts; and xfont_has_char will return false for all
>> non-ASCII chars in iso10646-1 fonts that don't have "ja" or "ko"
>> adstyles. In addition, it has its own caching mechanism
>> (xfont_scripts_cache) which is never cleared, shrunk, or exposed to
>> Lisp.
>
> No, xfont_has_char will return a value that indicates that the presence
> of the character cannot be established without opening the font.

You're right; I still wonder whether this is the intended behavior...

Pip




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03  9:21                       ` pipcet
@ 2024-08-03  9:33                         ` Po Lu
  2024-08-03 13:13                           ` pipcet
  0 siblings, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-03  9:33 UTC (permalink / raw)
  To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

pipcet@protonmail.com writes:

> Thank you!
>
> One more thing I'm noticing is that on Android, the foundry
> string/ach_vendor_id is interned as OG^A@ rather than GOOG. It seems to
> me that removing the second read() call in
> daefd6771a4879bb8e71ea67f69522700155df01 may have caused the
> problem. IIUC, panose[] is four-byte-aligned, but ul_unicode_range is
> not, and that messes up our offsets. Is that correct?

Right, I misunderstood FreeType's implementation.  Also fixed.

> Are we actually using ul_unicode_range, by the way?

Not that I recall.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03  9:33                         ` Po Lu
@ 2024-08-03 13:13                           ` pipcet
  2024-08-03 13:31                             ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: pipcet @ 2024-08-03 13:13 UTC (permalink / raw)
  To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> pipcet@protonmail.com writes:
>
>> Thank you!
>>
>> One more thing I'm noticing is that on Android, the foundry
>> string/ach_vendor_id is interned as OG^A@ rather than GOOG. It seems to
>> me that removing the second read() call in
>> daefd6771a4879bb8e71ea67f69522700155df01 may have caused the
>> problem. IIUC, panose[] is four-byte-aligned, but ul_unicode_range is
>> not, and that messes up our offsets. Is that correct?
>
> Right, I misunderstood FreeType's implementation.  Also fixed.

Thanks again! I really appreciate that there's a second set of eyeballs
going over these before pushing to master, let alone emacs-30 :-)

The next issue on my LineageOS (an Android variant) phone is that the
Noto CJK fonts have an "OTTO" header, not a "true"/00010000 one, so
they're not enumerated at all. At first glance, this doesn't appear to
be a LineageOS quirk; downloads available elsewhere also have the OTTO
header.

My understanding of the source code is we currently don't support OTTO
fonts at all, and my experiment in forcing the header to be recognized
seems to agree with me there: I get tofu even for ASCII characters,
which FontForge indicates are present in the font.

I have (lossily!) converted Noto Sans CJK SC to TTF format (with a
"glyf" table), and installed that in
/data/data/org.gnu.emacs/files/fonts, customized the "region" face to
use that font, and now I can see some Han characters when I select them,
but they turn into tofu when using the default face. (It's possible this
is due to the experiment I've described above and goes away when I
revert it...)

Of course, Noto CJK doesn't provide U+20000 (it does provide U+2000B),
so all this is a bit off-topic.

Pip




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03 13:13                           ` pipcet
@ 2024-08-03 13:31                             ` Po Lu
  2024-08-03 14:31                               ` pipcet
  0 siblings, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-03 13:31 UTC (permalink / raw)
  To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

pipcet@protonmail.com writes:

> Thanks again! I really appreciate that there's a second set of eyeballs
> going over these before pushing to master, let alone emacs-30 :-)
>
> The next issue on my LineageOS (an Android variant) phone is that the
> Noto CJK fonts have an "OTTO" header, not a "true"/00010000 one, so
> they're not enumerated at all. At first glance, this doesn't appear to
> be a LineageOS quirk; downloads available elsewhere also have the OTTO
> header.
>
> My understanding of the source code is we currently don't support OTTO
> fonts at all, and my experiment in forcing the header to be recognized
> seems to agree with me there: I get tofu even for ASCII characters,
> which FontForge indicates are present in the font.

Yes, because OTTO fonts are actually a completely distinct format from
TTF, with a unique bytecode language for constructing glyph outlines.
See:

  https://learn.microsoft.com/en-us/typography/opentype/spec/glyphformatcomparison

> I have (lossily!) converted Noto Sans CJK SC to TTF format (with a
> "glyf" table), and installed that in
> /data/data/org.gnu.emacs/files/fonts, customized the "region" face to
> use that font, and now I can see some Han characters when I select them,
> but they turn into tofu when using the default face. (It's possible this
> is due to the experiment I've described above and goes away when I
> revert it...)
>
> Of course, Noto CJK doesn't provide U+20000 (it does provide U+2000B),
> so all this is a bit off-topic.

That's quite a circutous means of obtaining a TTF version of the Noto
CJK fonts.  These fonts are compiled into both formats from a number of
common source files, and there is an index of the options available at:

  https://github.com/notofonts/noto-cjk/tree/main/Serif

an amalgamation of all CJK variants is:

  https://github.com/googlefonts/noto-cjk/raw/main/Serif/Variable/OTC/NotoSerifCJK-VF.ttf.ttc

but since it is enormous, I suggest installing only those regional
variants which cover the scripts you require.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03 13:31                             ` Po Lu
@ 2024-08-03 14:31                               ` pipcet
  2024-08-03 14:54                                 ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: pipcet @ 2024-08-03 14:31 UTC (permalink / raw)
  To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> pipcet@protonmail.com writes:
>
>> Thanks again! I really appreciate that there's a second set of eyeballs
>> going over these before pushing to master, let alone emacs-30 :-)
>>
>> The next issue on my LineageOS (an Android variant) phone is that the
>> Noto CJK fonts have an "OTTO" header, not a "true"/00010000 one, so
>> they're not enumerated at all. At first glance, this doesn't appear to
>> be a LineageOS quirk; downloads available elsewhere also have the OTTO
>> header.
>>
>> My understanding of the source code is we currently don't support OTTO
>> fonts at all, and my experiment in forcing the header to be recognized
>> seems to agree with me there: I get tofu even for ASCII characters,
>> which FontForge indicates are present in the font.
>
> Yes, because OTTO fonts are actually a completely distinct format from
> TTF, with a unique bytecode language for constructing glyph outlines.
> See:
>
>   https://learn.microsoft.com/en-us/typography/opentype/spec/glyphformatcomparison

Thanks! So supporting these fonts on Android using the sfnt driver is
very hard, correct? The androidfont.c fallback driver appears to support
them, but I understand that's not a good option either. (I tried
(set-frame-parameter nil 'font-backend "android") and the difference in
a C-h h buffer was quite noticeable).

>> I have (lossily!) converted Noto Sans CJK SC to TTF format (with a
>> "glyf" table), and installed that in
>> /data/data/org.gnu.emacs/files/fonts, customized the "region" face to
>> use that font, and now I can see some Han characters when I select them,
>> but they turn into tofu when using the default face. (It's possible this
>> is due to the experiment I've described above and goes away when I
>> revert it...)
>>
>> Of course, Noto CJK doesn't provide U+20000 (it does provide U+2000B),
>> so all this is a bit off-topic.
>
> That's quite a circutous means of obtaining a TTF version of the Noto
> CJK fonts.

Agreed, but at least the size is manageable (probably due to loss of
quality, though).

> These fonts are compiled into both formats from a number of
> common source files, and there is an index of the options available at:
>
>   https://github.com/notofonts/noto-cjk/tree/main/Serif
>
> an amalgamation of all CJK variants is:
>
>   https://github.com/googlefonts/noto-cjk/raw/main/Serif/Variable/OTC/NotoSerifCJK-VF.ttf.ttc
>
> but since it is enormous, I suggest installing only those regional
> variants which cover the scripts you require.

In any case, asking users to install an extra font only to see basic CJK
glyphs can't be a long-term solution, can it?

Pip




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03 14:31                               ` pipcet
@ 2024-08-03 14:54                                 ` Po Lu
  2024-08-07 17:52                                   ` Pip Cet
  0 siblings, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-03 14:54 UTC (permalink / raw)
  To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

pipcet@protonmail.com writes:

> Thanks! So supporting these fonts on Android using the sfnt driver is
> very hard, correct?

Yes, it would amount to writing a new font driver.

> The androidfont.c fallback driver appears to support them, but I
> understand that's not a good option either. (I tried
> (set-frame-parameter nil 'font-backend "android") and the difference
> in a C-h h buffer was quite noticeable).

Correct.

> In any case, asking users to install an extra font only to see basic
> CJK glyphs can't be a long-term solution, can it?

Till someone implements the aforesaid new font driver, why not?  The
font is installed once, and that is the end of the matter, and
ultimately it is the same font that would ideally be loaded from
/system/fonts, with no degredation in quality.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03  7:12                   ` pipcet
  2024-08-03  8:52                     ` Po Lu
@ 2024-08-03 15:15                     ` Eli Zaretskii
  1 sibling, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2024-08-03 15:15 UTC (permalink / raw)
  To: pipcet; +Cc: b.riefenstahl, luangruo, emacs-devel

> Date: Sat, 03 Aug 2024 07:12:18 +0000
> From: pipcet@protonmail.com
> Cc: Eli Zaretskii <eliz@gnu.org>, luangruo@yahoo.com, emacs-devel@gnu.org
> 
> w32font.c seems to ignore Vscript_representative_chars entirely. This
> also appears to apply to the harfbuzz backend.

While the font search on MS-Windows indeed cannot use
script-representative-chars, the representative characters are still
used on Windows, just in a different way and only for certain scripts.
See w32-find-non-USB-fonts.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-03 14:54                                 ` Po Lu
@ 2024-08-07 17:52                                   ` Pip Cet
  2024-08-08  0:10                                     ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: Pip Cet @ 2024-08-07 17:52 UTC (permalink / raw)
  To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> pipcet@protonmail.com writes:
>
>> Thanks! So supporting these fonts on Android using the sfnt driver is
>> very hard, correct?
>
> Yes, it would amount to writing a new font driver.

Just so that we have that option, I've done the minimal work necessary
to build Emacs for Android with fontconfig and the ftcrhb and ftcr font
drivers. As far as I can tell, it works now (including, regrettably,
color emoji provided in font files). Harfbuzz shaping seems to work,
too. I still have to look at what the DRAW_CURSOR stuff does...

Of course there are good reasons not to want to do this: the
sfntfont-android driver is very fast, this is not.  It also requires an
additional file to configure fontconfig and a cache directory for
fontconfig's disk-based cache. Plus there are the extra dependencies...

Obviously, this would be and remain an optional feature, with the other
font drivers still available (that's not true for my test builds
including these and quite a few other changes, which currently disable
the android-specific drivers: https://codeberg.org/pipcet/emacs-android )

>> The androidfont.c fallback driver appears to support them, but I
>> understand that's not a good option either. (I tried
>> (set-frame-parameter nil 'font-backend "android") and the difference
>> in a C-h h buffer was quite noticeable).
>
> Correct.

Which devices in the androidfont.c driver currently used on?  Very old
ones with 15/16 bpp?

>> In any case, asking users to install an extra font only to see basic
>> CJK glyphs can't be a long-term solution, can it?
>
> Till someone implements the aforesaid new font driver, why not?  The
> font is installed once, and that is the end of the matter, and
> ultimately it is the same font that would ideally be loaded from
> /system/fonts, with no degredation in quality.

How do we know? I can't find this alleged source form for the Noto CJK
font anywhere, just binaries produced by Google.  Maybe they are editing
one of the versions directly, and generating the other one from it,
which would probably mean the secondary version is degraded in quality,
but even then Google would have to make available instructions for how
to build one from the other in order for the font to be considered
free (or, if I understand the OSI definition correctly, "open source")

> The source code must be the preferred form in which a programmer would
> modify the program. Deliberately obfuscated source code is not
> allowed. Intermediate forms such as the output of a preprocessor or
> translator are not allowed.

Anyway, here's the main part of the patch:



Pip




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-07 17:52                                   ` Pip Cet
@ 2024-08-08  0:10                                     ` Po Lu
  2024-08-09 12:33                                       ` Pip Cet
  0 siblings, 1 reply; 36+ messages in thread
From: Po Lu @ 2024-08-08  0:10 UTC (permalink / raw)
  To: Pip Cet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> Just so that we have that option, I've done the minimal work necessary
> to build Emacs for Android with fontconfig and the ftcrhb and ftcr font
> drivers. As far as I can tell, it works now (including, regrettably,
> color emoji provided in font files). Harfbuzz shaping seems to work,
> too. I still have to look at what the DRAW_CURSOR stuff does...
>
> Of course there are good reasons not to want to do this: the
> sfntfont-android driver is very fast, this is not.  It also requires an
> additional file to configure fontconfig and a cache directory for
> fontconfig's disk-based cache. Plus there are the extra dependencies...
>
> Obviously, this would be and remain an optional feature, with the other
> font drivers still available (that's not true for my test builds
> including these and quite a few other changes, which currently disable
> the android-specific drivers: https://codeberg.org/pipcet/emacs-android )

Thanks, but I won't agree to install this: the invariable rule is that
people, for foolish reasons, will begin to use this font driver, with
all their flaws and imperfections, and we will ultimately be held
responsible for its upkeep.  What is truly irritating is that FreeType
is part of the OS, but that it is not stable, and that the linker is
rigged not to permit third-party programs from linking to such unstable
libraries.  If not for this, I could have agreed to a version of the
ftfont driver disentangled from Fontconfig.

> Which devices in the androidfont.c driver currently used on?  Very old
> ones with 15/16 bpp?

No, on non-standard operating systems where /system/fonts does not
exist.  Bit depth is no object, since the OS always provides Emacs with
a 32-bit "RGBA" (which is ABGR on little-endian systems) surface on
which to draw, and handles conversion between this format and that of
the screen.

> How do we know? I can't find this alleged source form for the Noto CJK
> font anywhere, just binaries produced by Google.  Maybe they are editing
> one of the versions directly, and generating the other one from it,
> which would probably mean the secondary version is degraded in quality,
> but even then Google would have to make available instructions for how
> to build one from the other in order for the font to be considered
> free (or, if I understand the OSI definition correctly, "open source")

Noto CJK is a version of Source Han Sans:

  https://github.com/adobe-fonts/source-han-sans

and its binaries are generated from the same source code by ADFKO
(though I am surprised to learn that this source is not in UFO format).



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-08  0:10                                     ` Po Lu
@ 2024-08-09 12:33                                       ` Pip Cet
  2024-08-09 13:10                                         ` Po Lu
  0 siblings, 1 reply; 36+ messages in thread
From: Pip Cet @ 2024-08-09 12:33 UTC (permalink / raw)
  To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Just so that we have that option, I've done the minimal work necessary
>> to build Emacs for Android with fontconfig and the ftcrhb and ftcr font
>> drivers. As far as I can tell, it works now (including, regrettably,
>> color emoji provided in font files). Harfbuzz shaping seems to work,
>> too. I still have to look at what the DRAW_CURSOR stuff does...
>>
>> Of course there are good reasons not to want to do this: the
>> sfntfont-android driver is very fast, this is not.  It also requires an
>> additional file to configure fontconfig and a cache directory for
>> fontconfig's disk-based cache. Plus there are the extra dependencies...
>>
>> Obviously, this would be and remain an optional feature, with the other
>> font drivers still available (that's not true for my test builds
>> including these and quite a few other changes, which currently disable
>> the android-specific drivers: https://codeberg.org/pipcet/emacs-android )
>
> Thanks, but I won't agree to install this

No problem at all. Thanks for your response.

> the invariable rule is that
> people, for foolish reasons, will begin to use this font driver, with
> all their flaws and imperfections, and we will ultimately be held
> responsible for its upkeep.  What is truly irritating is that FreeType
> is part of the OS, but that it is not stable, and that the linker is
> rigged not to permit third-party programs from linking to such unstable
> libraries.  If not for this, I could have agreed to a version of the
> ftfont driver disentangled from Fontconfig.

I think the decision to rely on fontconfig for the ftcr(hb) drivers has
been made, though. I must confess I haven't looked at sfnt.c very much,
but I'm surprised to find it has been made part of Emacs. Since it has,
though, we might as well use it. And it seems Noto is working on
replacing the CJK fonts by TrueType fonts.

>> Which devices in the androidfont.c driver currently used on?  Very old
>> ones with 15/16 bpp?
>
> No, on non-standard operating systems where /system/fonts does not
> exist.  Bit depth is no object, since the OS always provides Emacs with
> a 32-bit "RGBA" (which is ABGR on little-endian systems) surface on
> which to draw, and handles conversion between this format and that of
> the screen.

Thanks!

>> How do we know? I can't find this alleged source form for the Noto CJK
>> font anywhere, just binaries produced by Google.  Maybe they are editing
>> one of the versions directly, and generating the other one from it,
>> which would probably mean the secondary version is degraded in quality,
>> but even then Google would have to make available instructions for how
>> to build one from the other in order for the font to be considered
>> free (or, if I understand the OSI definition correctly, "open source")
>
> Noto CJK is a version of Source Han Sans:

Thanks, I'm aware it's a modified version of that font.

>   https://github.com/adobe-fonts/source-han-sans

> and its binaries are generated from the same source code by ADFKO
> (though I am surprised to learn that this source is not in UFO format).

I don't think it's the "source" at all. It's an intermediate binary,
produced by a proprietary tool (as Adobe states), probably from a
similarly proprietary, actual source format.

Anyway, as for technical issues, the Type 1 font in the Source Han Sans
CID is clearly closer to the source than the TrueType fonts are. Thus,
there is some quality degradation when I, or when Google, generates a
TrueType font from it.

Pip




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android
  2024-08-09 12:33                                       ` Pip Cet
@ 2024-08-09 13:10                                         ` Po Lu
  0 siblings, 0 replies; 36+ messages in thread
From: Po Lu @ 2024-08-09 13:10 UTC (permalink / raw)
  To: Pip Cet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

>>   https://github.com/adobe-fonts/source-han-sans
>
>> and its binaries are generated from the same source code by ADFKO
>> (though I am surprised to learn that this source is not in UFO format).
>
> I don't think it's the "source" at all. It's an intermediate binary,
> produced by a proprietary tool (as Adobe states), probably from a
> similarly proprietary, actual source format.
>
> Anyway, as for technical issues, the Type 1 font in the Source Han Sans
> CID is clearly closer to the source than the TrueType fonts are. Thus,
> there is some quality degradation when I, or when Google, generates a
> TrueType font from it.

Be that as it may, there is no perceptible difference between the two,
if hinting is disabled for the OTF original.

At all events, I do invite interested persons to undertake implementing
support for OTF font files in a manner that does not require Freetype,
Fontconfig, or further dependencies.  This might be feasible (so far as
Noto and other outline fonts are concerned) without also supporting
color glyphs.



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-08-09 13:10 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-31 15:45 master bf0aeaa0d7a: Re-enable displaying `han' characters on Android Eli Zaretskii
2024-08-01  0:07 ` Po Lu
2024-08-01  0:33   ` Po Lu
2024-08-01  5:52     ` Eli Zaretskii
2024-08-01  7:55       ` Po Lu
2024-08-01  8:52         ` Eli Zaretskii
2024-08-01  9:47           ` Po Lu
2024-08-01  9:56             ` Eli Zaretskii
2024-08-01 10:13               ` Po Lu
2024-08-01 10:19                 ` Eli Zaretskii
2024-08-01 21:17             ` Dmitry Gutov
2024-08-01  5:32   ` Eli Zaretskii
2024-08-01  8:16     ` Po Lu
2024-08-01  9:49       ` Eli Zaretskii
2024-08-01 10:30         ` Po Lu
2024-08-01 10:35           ` Eli Zaretskii
2024-08-02 10:52           ` Benjamin Riefenstahl
2024-08-02 12:29             ` Eli Zaretskii
2024-08-02 12:55               ` Benjamin Riefenstahl
2024-08-02 13:13                 ` Benjamin Riefenstahl
2024-08-03  7:12                   ` pipcet
2024-08-03  8:52                     ` Po Lu
2024-08-03  9:21                       ` pipcet
2024-08-03  9:33                         ` Po Lu
2024-08-03 13:13                           ` pipcet
2024-08-03 13:31                             ` Po Lu
2024-08-03 14:31                               ` pipcet
2024-08-03 14:54                                 ` Po Lu
2024-08-07 17:52                                   ` Pip Cet
2024-08-08  0:10                                     ` Po Lu
2024-08-09 12:33                                       ` Pip Cet
2024-08-09 13:10                                         ` Po Lu
2024-08-03 15:15                     ` Eli Zaretskii
2024-08-02 10:44       ` Benjamin Riefenstahl
2024-08-02 11:42         ` Po Lu
2024-08-01  7:57   ` Andrea Corallo

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).