* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android @ 2024-07-31 15:45 Eli Zaretskii 2024-08-01 0:07 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2024-07-31 15:45 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel I've reverted the above commit. The change which added those characters was not an accident: I found that Emacs would choose an inappropriate (sub-optimal) font for Chinese characters because it generally stops looking once it find the first font that fulfills the requirements. The font Emacs sometimes selects due to those characters missing lacked support for important Han blocks because those blocks had no characters in script-representative-chars. If this causes problems to Android, then please implement a fix that is specific to Android, without affecting other platforms. Thanks. P.S. And once again, when you undo changes done by someone else just a few days ago, please discuss this before making the change. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-07-31 15:45 master bf0aeaa0d7a: Re-enable displaying `han' characters on Android Eli Zaretskii @ 2024-08-01 0:07 ` Po Lu 2024-08-01 0:33 ` Po Lu ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Po Lu @ 2024-08-01 0:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > I've reverted the above commit. The change which added those > characters was not an accident: I found that Emacs would choose an > inappropriate (sub-optimal) font for Chinese characters because it > generally stops looking once it find the first font that fulfills the > requirements. The reason behind your discovery is that with your choice of `script-representative-chars', no font will ever match this font spec (in the default fontset): ,(font-spec :registry "iso10646-1" :script 'han) so that Emacs returns to the preceding ones, which specify a design language rather than a script: ,(font-spec :registry "iso10646-1" :lang 'ja) ,(font-spec :registry "iso10646-1" :lang 'zh) which is supported elsewhere than on Android. > The font Emacs sometimes selects due to those characters missing > lacked support for important Han blocks because those blocks had no > characters in script-representative-chars. I didn't revert your change in whole, only characters beyond the BMP that seldom appear in real Chinese writing; of the characters that were deleted: #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804 the first is "SQUARED CJK UNIFIED IDEOGRAPH-624B", which is a stylized variant of its base character that is absent from Droid Sans Fallback. The remainder, #x2a700, #x2b740, #x2b820, #x2ceb0, and #x2f804 are esoteric characters that are provided by no CJK font on my GNU/Linux system, or compatibility ideographs that were never designed to be displayed. Needless to say, neither are they provided by any of the CJK fonts users will probably install on Android > If this causes problems to Android, then please implement a fix that > is specific to Android, without affecting other platforms. It does affect other platforms, but I'm only in the habit of installing master regularly on Android. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 0:07 ` Po Lu @ 2024-08-01 0:33 ` Po Lu 2024-08-01 5:52 ` Eli Zaretskii 2024-08-01 5:32 ` Eli Zaretskii 2024-08-01 7:57 ` Andrea Corallo 2 siblings, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-01 0:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Po Lu <luangruo@yahoo.com> writes: > It does affect other platforms, but I'm only in the habit of installing > master regularly on Android. In the event, this was not completely accurate. I've specialized some generic code to Android that was enabled for all systems by accident, rendering this matter moot, as `script-representative-chars' should not have been consulted on other systems at the outset. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 0:33 ` Po Lu @ 2024-08-01 5:52 ` Eli Zaretskii 2024-08-01 7:55 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2024-08-01 5:52 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: emacs-devel@gnu.org > Date: Thu, 01 Aug 2024 08:33:46 +0800 > > Po Lu <luangruo@yahoo.com> writes: > > > It does affect other platforms, but I'm only in the habit of installing > > master regularly on Android. > > In the event, this was not completely accurate. I've specialized some > generic code to Android that was enabled for all systems by accident, > rendering this matter moot, as `script-representative-chars' should not > have been consulted on other systems at the outset. I don't understand the changes you installed. The comments and the log message don't tell enough, and you have again installed the changes before discussing them, although I explicitly asked you not to do that. I see the changes in fontset setup related to 'han', which make them specific to Android. But the representative characters _are_ used on other systems, at least on MS-Windows (and AFAIU on other systems as well: see ftfont.c and font.c). So why you again removed the SMP characters from the list is not clear to me; I think it's a mistake and tend to revert that part. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 5:52 ` Eli Zaretskii @ 2024-08-01 7:55 ` Po Lu 2024-08-01 8:52 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-01 7:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Po Lu <luangruo@yahoo.com> >> Cc: emacs-devel@gnu.org >> Date: Thu, 01 Aug 2024 08:33:46 +0800 >> >> Po Lu <luangruo@yahoo.com> writes: >> >> > It does affect other platforms, but I'm only in the habit of installing >> > master regularly on Android. >> >> In the event, this was not completely accurate. I've specialized some >> generic code to Android that was enabled for all systems by accident, >> rendering this matter moot, as `script-representative-chars' should not >> have been consulted on other systems at the outset. > > I don't understand the changes you installed. The comments and the > log message don't tell enough, and you have again installed the > changes before discussing them, although I explicitly asked you not to > do that. > > I see the changes in fontset setup related to 'han', which make them > specific to Android. But the representative characters _are_ used on > other systems, at least on MS-Windows (and AFAIU on other systems as > well: see ftfont.c and font.c). So why you again removed the SMP > characters from the list is not clear to me; I think it's a mistake > and tend to revert that part. No font-spec in the default fontset now specifies the `han' script on these systems, as in Emacs 29, so that `script-representative-chars' is no longer consulted in connection with it. What is specified is QClang, which is tested against font metadata (e.g., the design language) rather than character repertories. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 7:55 ` Po Lu @ 2024-08-01 8:52 ` Eli Zaretskii 2024-08-01 9:47 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2024-08-01 8:52 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: emacs-devel@gnu.org > Date: Thu, 01 Aug 2024 15:55:43 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> In the event, this was not completely accurate. I've specialized some > >> generic code to Android that was enabled for all systems by accident, > >> rendering this matter moot, as `script-representative-chars' should not > >> have been consulted on other systems at the outset. > > > > I don't understand the changes you installed. The comments and the > > log message don't tell enough, and you have again installed the > > changes before discussing them, although I explicitly asked you not to > > do that. > > > > I see the changes in fontset setup related to 'han', which make them > > specific to Android. But the representative characters _are_ used on > > other systems, at least on MS-Windows (and AFAIU on other systems as > > well: see ftfont.c and font.c). So why you again removed the SMP > > characters from the list is not clear to me; I think it's a mistake > > and tend to revert that part. > > No font-spec in the default fontset now specifies the `han' script on > these systems, as in Emacs 29, so that `script-representative-chars' is > no longer consulted in connection with it. What is specified is QClang, > which is tested against font metadata (e.g., the design language) rather > than character repertories. But users can add a font spec for 'han' to the fontset, cannot they? And if they do, then the representative characters _are_ important, aren't they? So I don't think we should remove those characters. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 8:52 ` Eli Zaretskii @ 2024-08-01 9:47 ` Po Lu 2024-08-01 9:56 ` Eli Zaretskii 2024-08-01 21:17 ` Dmitry Gutov 0 siblings, 2 replies; 36+ messages in thread From: Po Lu @ 2024-08-01 9:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > But users can add a font spec for 'han' to the fontset, cannot they? > And if they do, then the representative characters _are_ important, > aren't they? So I don't think we should remove those characters. Such an action would be pointless, as the fontset would not match any CJK font actually in existence, and it would break the Android build to boot. If anyone seriously considers non-existent characters important enough to construct a font spec that matches them, he can easily amend script-representative-chars for himself or define another script. If these pages are opened, for example: https://www.compart.com/en/unicode/U+20000 https://www.compart.com/en/unicode/U+2a700 https://www.compart.com/en/unicode/U+2b740 https://www.compart.com/en/unicode/U+2b820 https://www.compart.com/en/unicode/U+2ceb0 in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android, tofu is displayed, and there can hardly be said to exist an OS system that is better internationalized out of the box than is Android. The remaining characters: https://www.compart.com/en/unicode/U+2f804 https://www.compart.com/en/unicode/U+1f210 are displayed correctly, but are barely attested or expected to be present by CJK users in practice, and U+1F210 is arguably rather a symbol than a proper character. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 9:47 ` Po Lu @ 2024-08-01 9:56 ` Eli Zaretskii 2024-08-01 10:13 ` Po Lu 2024-08-01 21:17 ` Dmitry Gutov 1 sibling, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2024-08-01 9:56 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: emacs-devel@gnu.org > Date: Thu, 01 Aug 2024 17:47:54 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > But users can add a font spec for 'han' to the fontset, cannot they? > > And if they do, then the representative characters _are_ important, > > aren't they? So I don't think we should remove those characters. > > Such an action would be pointless, as the fontset would not match any > CJK font actually in existence, and it would break the Android build to > boot. Then Android users should not do that. But users of other systems could, and we should not prevent them from doing so. > If anyone seriously considers non-existent characters important > enough to construct a font spec that matches them, he can easily amend > script-representative-chars for himself or define another script. If > these pages are opened, for example: > > https://www.compart.com/en/unicode/U+20000 > https://www.compart.com/en/unicode/U+2a700 > https://www.compart.com/en/unicode/U+2b740 > https://www.compart.com/en/unicode/U+2b820 > https://www.compart.com/en/unicode/U+2ceb0 > > in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android, > tofu is displayed, and there can hardly be said to exist an OS system > that is better internationalized out of the box than is Android. The > remaining characters: > > https://www.compart.com/en/unicode/U+2f804 > https://www.compart.com/en/unicode/U+1f210 > > are displayed correctly, but are barely attested or expected to be > present by CJK users in practice, and U+1F210 is arguably rather a > symbol than a proper character. On my Windows 11 system, I see all of them, and I didn't install any additional fonts for CJK. So your assertion is simply not true. Maybe your GNU/Linux system is outdated, or maybe GNU/Linux users with CJK interests are supposed to install optional packages that you don't have installed. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 9:56 ` Eli Zaretskii @ 2024-08-01 10:13 ` Po Lu 2024-08-01 10:19 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-01 10:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Po Lu <luangruo@yahoo.com> >> Cc: emacs-devel@gnu.org >> Date: Thu, 01 Aug 2024 17:47:54 +0800 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > But users can add a font spec for 'han' to the fontset, cannot they? >> > And if they do, then the representative characters _are_ important, >> > aren't they? So I don't think we should remove those characters. >> >> Such an action would be pointless, as the fontset would not match any >> CJK font actually in existence, and it would break the Android build to >> boot. > > Then Android users should not do that. But users of other systems > could, and we should not prevent them from doing so. We don't prevent users from modifying script-representative-chars anywhere, no? >> If anyone seriously considers non-existent characters important >> enough to construct a font spec that matches them, he can easily amend >> script-representative-chars for himself or define another script. If >> these pages are opened, for example: >> >> https://www.compart.com/en/unicode/U+20000 >> https://www.compart.com/en/unicode/U+2a700 >> https://www.compart.com/en/unicode/U+2b740 >> https://www.compart.com/en/unicode/U+2b820 >> https://www.compart.com/en/unicode/U+2ceb0 >> >> in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android, >> tofu is displayed, and there can hardly be said to exist an OS system >> that is better internationalized out of the box than is Android. The >> remaining characters: >> >> https://www.compart.com/en/unicode/U+2f804 >> https://www.compart.com/en/unicode/U+1f210 >> >> are displayed correctly, but are barely attested or expected to be >> present by CJK users in practice, and U+1F210 is arguably rather a >> symbol than a proper character. > > On my Windows 11 system, I see all of them, and I didn't install any > additional fonts for CJK. So your assertion is simply not true. > Maybe your GNU/Linux system is outdated, or maybe GNU/Linux users with > CJK interests are supposed to install optional packages that you don't > have installed. This is an installation of Fedora Workstation 40 that was updated yesterday, and where all of the Noto font packages that are required for displaying CJK text are undoubtedly installed. The reason these characters are omitted from the suggested set of CJK fonts is that there is simply insufficient interest in these characters, and probably the same reasonining holds on Android, where users cannot install fonts at all, and where the entirety of these pages, save about a dozen, is tofu: https://commons.wikimedia.org/wiki/Category:Unicode_20000-2A6DF_CJK_Unified_Ideographs_Extension_B https://commons.wikimedia.org/wiki/Category:Unicode_2A700-2B73F_CJK_Unified_Ideographs_Extension_C https://commons.wikimedia.org/wiki/Category:Unicode_2B740-2B81F_CJK_Unified_Ideographs_Extension_D https://commons.wikimedia.org/wiki/Category:Unicode_2B820-2CEAF_CJK_Unified_Ideographs_Extension_E https://commons.wikimedia.org/wiki/Category:Unicode_2CEB0-2EBEF_CJK_Unified_Ideographs_Extension_F Noto are apparently quite reluctant to support Extension B: https://github.com/notofonts/noto-cjk/issues/13 and they are the go-to source of Free CJK fonts nowadays. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 10:13 ` Po Lu @ 2024-08-01 10:19 ` Eli Zaretskii 0 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2024-08-01 10:19 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: emacs-devel@gnu.org > Date: Thu, 01 Aug 2024 18:13:03 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> From: Po Lu <luangruo@yahoo.com> > >> Cc: emacs-devel@gnu.org > >> Date: Thu, 01 Aug 2024 17:47:54 +0800 > >> > >> Eli Zaretskii <eliz@gnu.org> writes: > >> > >> > But users can add a font spec for 'han' to the fontset, cannot they? > >> > And if they do, then the representative characters _are_ important, > >> > aren't they? So I don't think we should remove those characters. > >> > >> Such an action would be pointless, as the fontset would not match any > >> CJK font actually in existence, and it would break the Android build to > >> boot. > > > > Then Android users should not do that. But users of other systems > > could, and we should not prevent them from doing so. > > We don't prevent users from modifying script-representative-chars > anywhere, no? We don't prevent users from doing silly things, no. But they shouldn't. > > On my Windows 11 system, I see all of them, and I didn't install any > > additional fonts for CJK. So your assertion is simply not true. > > Maybe your GNU/Linux system is outdated, or maybe GNU/Linux users with > > CJK interests are supposed to install optional packages that you don't > > have installed. > > This is an installation of Fedora Workstation 40 that was updated > yesterday, and where all of the Noto font packages that are required for > displaying CJK text are undoubtedly installed. The reason these > characters are omitted from the suggested set of CJK fonts is that there > is simply insufficient interest in these characters, and probably the > same reasonining holds on Android, where users cannot install fonts at > all, and where the entirety of these pages, save about a dozen, is tofu: > > https://commons.wikimedia.org/wiki/Category:Unicode_20000-2A6DF_CJK_Unified_Ideographs_Extension_B > https://commons.wikimedia.org/wiki/Category:Unicode_2A700-2B73F_CJK_Unified_Ideographs_Extension_C > https://commons.wikimedia.org/wiki/Category:Unicode_2B740-2B81F_CJK_Unified_Ideographs_Extension_D > https://commons.wikimedia.org/wiki/Category:Unicode_2B820-2CEAF_CJK_Unified_Ideographs_Extension_E > https://commons.wikimedia.org/wiki/Category:Unicode_2CEB0-2EBEF_CJK_Unified_Ideographs_Extension_F > > Noto are apparently quite reluctant to support Extension B: > > https://github.com/notofonts/noto-cjk/issues/13 > > and they are the go-to source of Free CJK fonts nowadays. I'm unconvinced, sorry. I'll wait a bit for others to chime in, before I decide what to do with this, but currently I tend to revert you change to script-representative-chars. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 9:47 ` Po Lu 2024-08-01 9:56 ` Eli Zaretskii @ 2024-08-01 21:17 ` Dmitry Gutov 1 sibling, 0 replies; 36+ messages in thread From: Dmitry Gutov @ 2024-08-01 21:17 UTC (permalink / raw) To: Po Lu, Eli Zaretskii; +Cc: emacs-devel On 01/08/2024 12:47, Po Lu wrote: > If > these pages are opened, for example: > > https://www.compart.com/en/unicode/U+20000 > https://www.compart.com/en/unicode/U+2a700 > https://www.compart.com/en/unicode/U+2b740 > https://www.compart.com/en/unicode/U+2b820 > https://www.compart.com/en/unicode/U+2ceb0 > > in Mozilla (not to mention Emacs) on my GNU/Linux system or on Android, > tofu is displayed, and there can hardly be said to exist an OS system > that is better internationalized out of the box than is Android. The > remaining characters: > > https://www.compart.com/en/unicode/U+2f804 > https://www.compart.com/en/unicode/U+1f210 > > are displayed correctly, but are barely attested or expected to be > present by CJK users in practice, and U+1F210 is arguably rather a > symbol than a proper character. Same here on my GNU/Linux system. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 0:07 ` Po Lu 2024-08-01 0:33 ` Po Lu @ 2024-08-01 5:32 ` Eli Zaretskii 2024-08-01 8:16 ` Po Lu 2024-08-01 7:57 ` Andrea Corallo 2 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2024-08-01 5:32 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: emacs-devel@gnu.org > Date: Thu, 01 Aug 2024 08:07:35 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > I've reverted the above commit. The change which added those > > characters was not an accident: I found that Emacs would choose an > > inappropriate (sub-optimal) font for Chinese characters because it > > generally stops looking once it find the first font that fulfills the > > requirements. > > The reason behind your discovery is that with your choice of > `script-representative-chars', no font will ever match this font spec > (in the default fontset): > > ,(font-spec :registry "iso10646-1" :script 'han) > > so that Emacs returns to the preceding ones, which specify a design > language rather than a script: > > ,(font-spec :registry "iso10646-1" :lang 'ja) > ,(font-spec :registry "iso10646-1" :lang 'zh) > > which is supported elsewhere than on Android. I don't understand what you are saying. What is "my discovery"? And why no font will ever match the font spec for 'han'? I've definitely seen fonts that support _all_ of the characters I added to script-representative-chars, so why wouldn't they be found? > > The font Emacs sometimes selects due to those characters missing > > lacked support for important Han blocks because those blocks had no > > characters in script-representative-chars. > > I didn't revert your change in whole, only characters beyond the BMP > that seldom appear in real Chinese writing; of the characters that were > deleted: > > #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804 I added them because when those sub-optimal fonts are selected, some of these characters appear as "tofu", which is ridiculous on a system that has fonts installed that cover all of them. And "seldom" is in the eyes of the beholder, at least IME. When one has text with these characters, the absolute frequency of their appearance is not very relevant; what _is_ relevant is the fact that the character cannot be shown by Emacs. > the first is "SQUARED CJK UNIFIED IDEOGRAPH-624B", which is a stylized > variant of its base character that is absent from Droid Sans Fallback. > The remainder, #x2a700, #x2b740, #x2b820, #x2ceb0, and #x2f804 are > esoteric characters that are provided by no CJK font on my GNU/Linux > system, or compatibility ideographs that were never designed to be > displayed. Needless to say, neither are they provided by any of the CJK > fonts users will probably install on Android If there's no fonts installed that support those representative characters, and Emacs is capable of finding less capable fonts that support some of CJK (e.g., the BMP blocks), then why is that a problem? The purpose of the change is to allow Emacs to find better fonts if they are installed, instead of ignoring them. How is that a Bad Thing? I still don't understand why this breaks Android, btw. If Emacs employs the fallback font specs with :lang you show above, why don't they work for Android? > > If this causes problems to Android, then please implement a fix that > > is specific to Android, without affecting other platforms. > > It does affect other platforms, but I'm only in the habit of installing > master regularly on Android. The log message talked only about Android. If users on GNU/Linux report problems caused by this change, with enough details to understand the problems, we can reconsider the change and modify it or even revert. But I do need the details on those other platforms to think and discuss this intelligently. This is all about details, it isn't an abstract or academic issue. Choosing which characters to consider representative is frequently a judgment call based on practical considerations and practical problems with existing fonts. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 5:32 ` Eli Zaretskii @ 2024-08-01 8:16 ` Po Lu 2024-08-01 9:49 ` Eli Zaretskii 2024-08-02 10:44 ` Benjamin Riefenstahl 0 siblings, 2 replies; 36+ messages in thread From: Po Lu @ 2024-08-01 8:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > I don't understand what you are saying. What is "my discovery"? And > why no font will ever match the font spec for 'han'? I've definitely > seen fonts that support _all_ of the characters I added to > script-representative-chars, so why wouldn't they be found? I wasn't implying that no font will ever support one or more of these characters, but that such fonts are sufficiently obscure that the chances of a CJK font's being located by this value of `script-representative-chars' is nil. >> > The font Emacs sometimes selects due to those characters missing >> > lacked support for important Han blocks because those blocks had no >> > characters in script-representative-chars. >> >> I didn't revert your change in whole, only characters beyond the BMP >> that seldom appear in real Chinese writing; of the characters that were >> deleted: >> >> #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804 > > I added them because when those sub-optimal fonts are selected, some > of these characters appear as "tofu", which is ridiculous on a system > that has fonts installed that cover all of them. These characters may be han script, but they are not attested in CJK documents in practice, except in contrived scenarios such as conversions between incomplete character encodings. > And "seldom" is in the eyes of the beholder, at least IME. When one > has text with these characters, the absolute frequency of their > appearance is not very relevant; what _is_ relevant is the fact that > the character cannot be shown by Emacs. In practice, the outcome of this principle is that no font is detected with which to display CJK documents featuring none of these characters, very much against the expectations of CJK users. > If there's no fonts installed that support those representative > characters, and Emacs is capable of finding less capable fonts that > support some of CJK (e.g., the BMP blocks), then why is that a > problem? I thought I explained that Emacs is _not_ capable of doing so on Android. > The purpose of the change is to allow Emacs to find better fonts if > they are installed, instead of ignoring them. How is that a Bad > Thing? Because it renders the `han' script incapable of matching any fonts that are installed in practice. > I still don't understand why this breaks Android, btw. If Emacs > employs the fallback font specs with :lang you show above, why don't > they work for Android? The problem is that QClang is not available on Android, because fonts do not provide their design languages in one of the standard TrueType tables its font backend groks, which deficiency prompted the addition of the font spec in question in: 2023-02-16 Po Lu <luangruo@yahoo.com> * doc/emacs/android.texi (Android Fonts): * doc/emacs/input.texi (On-Screen Keyboards): * doc/lispref/commands.texi (Misc Events): Update documentation. * java/org/gnu/emacs/EmacsInputConnection.java (setSelection): New function. * java/org/gnu/emacs/EmacsSurfaceView.java (reconfigureFrontBuffer): Make bitmap references weak references. * java/org/gnu/emacs/EmacsView.java (handleDirtyBitmap): Don't clear surfaceView bitmap. * lisp/comint.el (comint-mode): Set text-conversion-style to `action' so on screen keyboards' Return buttons send an actual key press event. * lisp/international/fontset.el (script-representative-chars) (setup-default-fontset): Improve detection of CJK fonts. > The log message talked only about Android. If users on GNU/Linux > report problems caused by this change, with enough details to > understand the problems, we can reconsider the change and modify it or > even revert. But I do need the details on those other platforms to > think and discuss this intelligently. This is all about details, it > isn't an abstract or academic issue. Choosing which characters to > consider representative is frequently a judgment call based on > practical considerations and practical problems with existing fonts. The fact of the matter is that: (let ((script-representative-chars '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 #x31c0 #x4e10 #x5B57 #xfe30 #xf900 #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804)))) (clear-font-cache) (find-font (font-spec :registry "iso10646-1" :script 'han :type 'xfthb))) ;; or another ftfont backend. returns no font on an up-to-date Fedora Workstation installation with a wealth of multilingual fonts for CJK scripts, whereas: (let ((script-representative-chars '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 #x31c0 #x4e10 #x5B57 #xfe30 #xf900)))) (clear-font-cache) (find-font (font-spec :registry "iso10646-1" :script 'han :type 'xfthb))) returns: #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0> which is more than adequate for editing CJK text in my language and others. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 8:16 ` Po Lu @ 2024-08-01 9:49 ` Eli Zaretskii 2024-08-01 10:30 ` Po Lu 2024-08-02 10:44 ` Benjamin Riefenstahl 1 sibling, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2024-08-01 9:49 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: emacs-devel@gnu.org > Date: Thu, 01 Aug 2024 16:16:58 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > I don't understand what you are saying. What is "my discovery"? And > > why no font will ever match the font spec for 'han'? I've definitely > > seen fonts that support _all_ of the characters I added to > > script-representative-chars, so why wouldn't they be found? > > I wasn't implying that no font will ever support one or more of these > characters, but that such fonts are sufficiently obscure that the > chances of a CJK font's being located by this value of > `script-representative-chars' is nil. I very much doubt that. I see a few fonts on my Windows 11 system which support all of those. I'd be very surprised to know that no such fonts are available on a modern GNU/Linux system. Can someone please check that? > >> #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804 > > > > I added them because when those sub-optimal fonts are selected, some > > of these characters appear as "tofu", which is ridiculous on a system > > that has fonts installed that cover all of them. > > These characters may be han script, but they are not attested in CJK > documents in practice, except in contrived scenarios such as conversions > between incomplete character encodings. You keep saying that, but those are just your assertions. These characters are there for a reason, and they should be supported as well as we can. If worse comes to worst, we could split 'han' into two or more scripts, and have separate setup in our fontsets for them. Then each one could use its own representative characters. > > And "seldom" is in the eyes of the beholder, at least IME. When one > > has text with these characters, the absolute frequency of their > > appearance is not very relevant; what _is_ relevant is the fact that > > the character cannot be shown by Emacs. > > In practice, the outcome of this principle is that no font is detected > with which to display CJK documents featuring none of these characters, > very much against the expectations of CJK users. Not IME. > > If there's no fonts installed that support those representative > > characters, and Emacs is capable of finding less capable fonts that > > support some of CJK (e.g., the BMP blocks), then why is that a > > problem? > > I thought I explained that Emacs is _not_ capable of doing so on > Android. Then please design and implement a suitable solution for Android. It is not right to punish other platforms for Android-specific issues. > > The purpose of the change is to allow Emacs to find better fonts if > > they are installed, instead of ignoring them. How is that a Bad > > Thing? > > Because it renders the `han' script incapable of matching any fonts that > are installed in practice. Again, not IME. > > I still don't understand why this breaks Android, btw. If Emacs > > employs the fallback font specs with :lang you show above, why don't > > they work for Android? > > The problem is that QClang is not available on Android, because fonts do > not provide their design languages in one of the standard TrueType > tables its font backend groks, which deficiency prompted the addition of > the font spec in question in: OK, then it means we need to work around this, but without hampering other platforms. > The fact of the matter is that: > > (let ((script-representative-chars > '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 > #x31c0 #x4e10 #x5B57 #xfe30 #xf900 > #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804)))) > (clear-font-cache) > (find-font (font-spec :registry "iso10646-1" :script 'han > :type 'xfthb))) ;; or another ftfont backend. > > returns no font on an up-to-date Fedora Workstation installation with a > wealth of multilingual fonts for CJK scripts, whereas: > > (let ((script-representative-chars > '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 > #x31c0 #x4e10 #x5B57 #xfe30 #xf900)))) > (clear-font-cache) > (find-font (font-spec :registry "iso10646-1" :script 'han > :type 'xfthb))) > > returns: > > #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0> > > which is more than adequate for editing CJK text in my language and > others. Not on MS-Windows: here, both of the above return #<font-entity harfbuzz outline Malgun\ Gothic sans iso10646-1 bold normal normal 0 nil 0 nil> which is a lie in the latter case, since those additional characters are not supported by this font. Given that this method is evidently unreliable, I don't think we should consider this a proof of your argument. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 9:49 ` Eli Zaretskii @ 2024-08-01 10:30 ` Po Lu 2024-08-01 10:35 ` Eli Zaretskii 2024-08-02 10:52 ` Benjamin Riefenstahl 0 siblings, 2 replies; 36+ messages in thread From: Po Lu @ 2024-08-01 10:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Po Lu <luangruo@yahoo.com> >> Cc: emacs-devel@gnu.org >> Date: Thu, 01 Aug 2024 16:16:58 +0800 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > I don't understand what you are saying. What is "my discovery"? And >> > why no font will ever match the font spec for 'han'? I've definitely >> > seen fonts that support _all_ of the characters I added to >> > script-representative-chars, so why wouldn't they be found? >> >> I wasn't implying that no font will ever support one or more of these >> characters, but that such fonts are sufficiently obscure that the >> chances of a CJK font's being located by this value of >> `script-representative-chars' is nil. > > I very much doubt that. I see a few fonts on my Windows 11 system > which support all of those. I'd be very surprised to know that no > such fonts are available on a modern GNU/Linux system. > > Can someone please check that? I think I just did, but anyone is more than welcome to put the links I posted into Firefox. >> These characters may be han script, but they are not attested in CJK >> documents in practice, except in contrived scenarios such as conversions >> between incomplete character encodings. > > You keep saying that, but those are just your assertions. These > characters are there for a reason, and they should be supported as > well as we can. If worse comes to worst, we could split 'han' into > two or more scripts, and have separate setup in our fontsets for them. > Then each one could use its own representative characters. I keep saying this because I read and write CJK characters in Emacs and elsewhere, every day, without ever encountering the characters in question, since, had I done so, the tofu would not have escaped my notice. > Not IME. With respect, I think I write within compass when I say that no one has expressed enough interest in these characters to persuade Noto developers of their importance despite having nearly a decade: https://github.com/notofonts/noto-cjk/issues/13 and so these characters are likely to remain unavailable from a single font on GNU/Linux systems for the foreseeable future. >> > If there's no fonts installed that support those representative >> > characters, and Emacs is capable of finding less capable fonts that >> > support some of CJK (e.g., the BMP blocks), then why is that a >> > problem? >> >> I thought I explained that Emacs is _not_ capable of doing so on >> Android. > > Then please design and implement a suitable solution for Android. It > is not right to punish other platforms for Android-specific issues. No one is being punished or put at a disadvantage in any wise. Previously, script-representative-chars was downright ignored, an arrangement that existed till the Android port was installed and which has been restored, and no more. >> > The purpose of the change is to allow Emacs to find better fonts if >> > they are installed, instead of ignoring them. How is that a Bad >> > Thing? >> >> Because it renders the `han' script incapable of matching any fonts that >> are installed in practice. > > Again, not IME. Any GNU/Linux user is perfectly welcome to run: (dolist (char '(#x20000 #x2a700 #x2b740 #x2b820 #x2ceb0)) (insert (format "%c" char))) and reach the same conclusion I have. >> > I still don't understand why this breaks Android, btw. If Emacs >> > employs the fallback font specs with :lang you show above, why don't >> > they work for Android? >> >> The problem is that QClang is not available on Android, because fonts do >> not provide their design languages in one of the standard TrueType >> tables its font backend groks, which deficiency prompted the addition of >> the font spec in question in: > > OK, then it means we need to work around this, but without hampering > other platforms. Yes, that's the essence of the change installed in emacs-30. >> The fact of the matter is that: >> >> (let ((script-representative-chars >> '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 >> #x31c0 #x4e10 #x5B57 #xfe30 #xf900 >> #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804)))) >> (clear-font-cache) >> (find-font (font-spec :registry "iso10646-1" :script 'han >> :type 'xfthb))) ;; or another ftfont backend. >> >> returns no font on an up-to-date Fedora Workstation installation with a >> wealth of multilingual fonts for CJK scripts, whereas: >> >> (let ((script-representative-chars >> '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 >> #x31c0 #x4e10 #x5B57 #xfe30 #xf900)))) >> (clear-font-cache) >> (find-font (font-spec :registry "iso10646-1" :script 'han >> :type 'xfthb))) >> >> returns: >> >> #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0> >> >> which is more than adequate for editing CJK text in my language and >> others. > > Not on MS-Windows: here, both of the above return > > #<font-entity harfbuzz outline Malgun\ Gothic sans iso10646-1 bold normal normal 0 nil 0 nil> > > which is a lie in the latter case, since those additional characters > are not supported by this font. Given that this method is evidently > unreliable, I don't think we should consider this a proof of your > argument. That's queer, and I suspect something is amiss in clear-font-cache. Nevertheless, the results from Mozilla are essentially conclusive, as far as I am concerned. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 10:30 ` Po Lu @ 2024-08-01 10:35 ` Eli Zaretskii 2024-08-02 10:52 ` Benjamin Riefenstahl 1 sibling, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2024-08-01 10:35 UTC (permalink / raw) To: Po Lu; +Cc: emacs-devel > From: Po Lu <luangruo@yahoo.com> > Cc: emacs-devel@gnu.org > Date: Thu, 01 Aug 2024 18:30:31 +0800 > > Nevertheless, the results from Mozilla are essentially conclusive, as > far as I am concerned. Which in my case shows all of the characters. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 10:30 ` Po Lu 2024-08-01 10:35 ` Eli Zaretskii @ 2024-08-02 10:52 ` Benjamin Riefenstahl 2024-08-02 12:29 ` Eli Zaretskii 1 sibling, 1 reply; 36+ messages in thread From: Benjamin Riefenstahl @ 2024-08-02 10:52 UTC (permalink / raw) To: Po Lu; +Cc: Eli Zaretskii, emacs-devel Po Lu writes: > (dolist (char '(#x20000 #x2a700 #x2b740 #x2b820 #x2ceb0)) > (insert (format "%c" char))) Indeed these characters seem to be unsupported in Emacs here. They do not show up in Firefox either. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-02 10:52 ` Benjamin Riefenstahl @ 2024-08-02 12:29 ` Eli Zaretskii 2024-08-02 12:55 ` Benjamin Riefenstahl 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2024-08-02 12:29 UTC (permalink / raw) To: Benjamin Riefenstahl; +Cc: luangruo, emacs-devel > From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org > Date: Fri, 02 Aug 2024 12:52:40 +0200 > > Po Lu writes: > > (dolist (char '(#x20000 #x2a700 #x2b740 #x2b820 #x2ceb0)) > > (insert (format "%c" char))) > > Indeed these characters seem to be unsupported in Emacs here. They do > not show up in Firefox either. So find-font returns on your system a font which doesn't actually support some of the characters in script-representative-chars, is that true? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-02 12:29 ` Eli Zaretskii @ 2024-08-02 12:55 ` Benjamin Riefenstahl 2024-08-02 13:13 ` Benjamin Riefenstahl 0 siblings, 1 reply; 36+ messages in thread From: Benjamin Riefenstahl @ 2024-08-02 12:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: luangruo, emacs-devel Eli Zaretskii writes: > So find-font returns on your system a font which doesn't actually > support some of the characters in script-representative-chars, is that > true? I'm just reporting what I saw in testing earlier. Now, when I try to repeat the find-font test I get nil for both tests. This is both in my current (new) session as well as in emacs -Q. I don't know what is going on. Must have been something I did earlier in that other session. (dolist (char '(#x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 #x31c0 #x4e10 #x5B57 #xfe30 #xf900 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0)) (insert (format "%c" char))) Still shows that the characters in the first two lines work, while the last line does not. This is all on: GNU Emacs 30.0.60 (build 5, x86_64-pc-linux-gnu, GTK+ Version 3.24.38, cairo version 1.16.0) of 2024-07-25 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-02 12:55 ` Benjamin Riefenstahl @ 2024-08-02 13:13 ` Benjamin Riefenstahl 2024-08-03 7:12 ` pipcet 0 siblings, 1 reply; 36+ messages in thread From: Benjamin Riefenstahl @ 2024-08-02 13:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: luangruo, emacs-devel Benjamin Riefenstahl writes: > I don't know what is going on. Must have been something I did earlier > in that other session. It's the order. If I first evaluate the version with fewer characters, I get a font for that and for the longer list of characters after that, too. If I do it the other way round, I get nil for both. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-02 13:13 ` Benjamin Riefenstahl @ 2024-08-03 7:12 ` pipcet 2024-08-03 8:52 ` Po Lu 2024-08-03 15:15 ` Eli Zaretskii 0 siblings, 2 replies; 36+ messages in thread From: pipcet @ 2024-08-03 7:12 UTC (permalink / raw) To: Benjamin Riefenstahl; +Cc: Eli Zaretskii, luangruo, emacs-devel "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes: > Benjamin Riefenstahl writes: >> I don't know what is going on. Must have been something I did earlier >> in that other session. > > It's the order. If I first evaluate the version with fewer characters, > I get a font for that and for the longer list of characters after that, > too. I've looked at that a little, and I don't think 'clear-font-cache', uh, clears the font cache. ftfont.c also interns random binary strings as symbols here. This helps: diff --git a/src/ftfont.c b/src/ftfont.c index c89feea1d46..882d3eec256 100644 --- a/src/ftfont.c +++ b/src/ftfont.c @@ -174,11 +174,11 @@ get_adstyle_property (FcPattern *p) USE_SAFE_ALLOCA; tmp = SAFE_ALLOCA (end - str); for (i = 0; i < end - str; ++i) - tmp[i] = ((end[i] != '?' - && end[i] != '*' - && end[i] != '"' - && end[i] != '-') - ? end[i] : ' '); + tmp[i] = ((str[i] != '?' + && str[i] != '*' + && str[i] != '"' + && str[i] != '-') + ? str[i] : ' '); adstyle = font_intern_prop (tmp, end - str, 1); SAFE_FREE (); if (font_style_to_value (FONT_WIDTH_INDEX, adstyle, 0) >= 0) xfont.c is particularly weird: it's limited to 64k characters, of course, but it also hardcodes 'han as a supported script for all Japanese or Korean fonts; and xfont_has_char will return false for all non-ASCII chars in iso10646-1 fonts that don't have "ja" or "ko" adstyles. In addition, it has its own caching mechanism (xfont_scripts_cache) which is never cleared, shrunk, or exposed to Lisp. sfntfont.c only looks at the first fixnum in a vector specified in Vscript_representative_chars, and fails if it isn't there, even though it should continue looking. w32font.c seems to ignore Vscript_representative_chars entirely. This also appears to apply to the harfbuzz backend. GNU Unifont now supports #x20000 (since February), but is split into two fonts, one for the BMP and one for the upper planes, so it won't be detected here. So I'm not sure which font backends the additional required characters are supposed to have a positive effect on. Pip ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 7:12 ` pipcet @ 2024-08-03 8:52 ` Po Lu 2024-08-03 9:21 ` pipcet 2024-08-03 15:15 ` Eli Zaretskii 1 sibling, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-03 8:52 UTC (permalink / raw) To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel pipcet@protonmail.com writes: > "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes: > >> Benjamin Riefenstahl writes: >>> I don't know what is going on. Must have been something I did earlier >>> in that other session. >> >> It's the order. If I first evaluate the version with fewer characters, >> I get a font for that and for the longer list of characters after that, >> too. > > I've looked at that a little, and I don't think 'clear-font-cache', uh, > clears the font cache. > > ftfont.c also interns random binary strings as symbols here. This helps: > > diff --git a/src/ftfont.c b/src/ftfont.c > index c89feea1d46..882d3eec256 100644 > --- a/src/ftfont.c > +++ b/src/ftfont.c > @@ -174,11 +174,11 @@ get_adstyle_property (FcPattern *p) > USE_SAFE_ALLOCA; > tmp = SAFE_ALLOCA (end - str); > for (i = 0; i < end - str; ++i) > - tmp[i] = ((end[i] != '?' > - && end[i] != '*' > - && end[i] != '"' > - && end[i] != '-') > - ? end[i] : ' '); > + tmp[i] = ((str[i] != '?' > + && str[i] != '*' > + && str[i] != '"' > + && str[i] != '-') > + ? str[i] : ' '); > adstyle = font_intern_prop (tmp, end - str, 1); > SAFE_FREE (); > if (font_style_to_value (FONT_WIDTH_INDEX, adstyle, 0) >= 0) [...] > sfntfont.c only looks at the first fixnum in a vector specified in > Vscript_representative_chars, and fails if it isn't there, even though > it should continue looking. These have been fixed. > xfont.c is particularly weird: it's limited to 64k characters, of > course, but it also hardcodes 'han as a supported script for all > Japanese or Korean fonts; and xfont_has_char will return false for all > non-ASCII chars in iso10646-1 fonts that don't have "ja" or "ko" > adstyles. In addition, it has its own caching mechanism > (xfont_scripts_cache) which is never cleared, shrunk, or exposed to > Lisp. No, xfont_has_char will return a value that indicates that the presence of the character cannot be established without opening the font. > w32font.c seems to ignore Vscript_representative_chars entirely. This > also appears to apply to the harfbuzz backend. I wouldn't tamper with either of these backends. > GNU Unifont now supports #x20000 (since February), but is split into two > fonts, one for the BMP and one for the upper planes, so it won't be > detected here. > > So I'm not sure which font backends the additional required characters > are supposed to have a positive effect on. All except w32font, I guess. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 8:52 ` Po Lu @ 2024-08-03 9:21 ` pipcet 2024-08-03 9:33 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: pipcet @ 2024-08-03 9:21 UTC (permalink / raw) To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel "Po Lu" <luangruo@yahoo.com> writes: > pipcet@protonmail.com writes: > >> "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes: >> >>> Benjamin Riefenstahl writes: >>>> I don't know what is going on. Must have been something I did earlier >>>> in that other session. >>> >>> It's the order. If I first evaluate the version with fewer characters, >>> I get a font for that and for the longer list of characters after that, >>> too. >> >> I've looked at that a little, and I don't think 'clear-font-cache', uh, >> clears the font cache. >> >> ftfont.c also interns random binary strings as symbols here. This helps: >> >> diff --git a/src/ftfont.c b/src/ftfont.c >> index c89feea1d46..882d3eec256 100644 >> --- a/src/ftfont.c >> +++ b/src/ftfont.c >> @@ -174,11 +174,11 @@ get_adstyle_property (FcPattern *p) >> USE_SAFE_ALLOCA; >> tmp = SAFE_ALLOCA (end - str); >> for (i = 0; i < end - str; ++i) >> - tmp[i] = ((end[i] != '?' >> - && end[i] != '*' >> - && end[i] != '"' >> - && end[i] != '-') >> - ? end[i] : ' '); >> + tmp[i] = ((str[i] != '?' >> + && str[i] != '*' >> + && str[i] != '"' >> + && str[i] != '-') >> + ? str[i] : ' '); >> adstyle = font_intern_prop (tmp, end - str, 1); >> SAFE_FREE (); >> if (font_style_to_value (FONT_WIDTH_INDEX, adstyle, 0) >= 0) > > [...] > >> sfntfont.c only looks at the first fixnum in a vector specified in >> Vscript_representative_chars, and fails if it isn't there, even though >> it should continue looking. > > These have been fixed. Thank you! One more thing I'm noticing is that on Android, the foundry string/ach_vendor_id is interned as OG^A@ rather than GOOG. It seems to me that removing the second read() call in daefd6771a4879bb8e71ea67f69522700155df01 may have caused the problem. IIUC, panose[] is four-byte-aligned, but ul_unicode_range is not, and that messes up our offsets. Is that correct? Are we actually using ul_unicode_range, by the way? >> xfont.c is particularly weird: it's limited to 64k characters, of >> course, but it also hardcodes 'han as a supported script for all >> Japanese or Korean fonts; and xfont_has_char will return false for all >> non-ASCII chars in iso10646-1 fonts that don't have "ja" or "ko" >> adstyles. In addition, it has its own caching mechanism >> (xfont_scripts_cache) which is never cleared, shrunk, or exposed to >> Lisp. > > No, xfont_has_char will return a value that indicates that the presence > of the character cannot be established without opening the font. You're right; I still wonder whether this is the intended behavior... Pip ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 9:21 ` pipcet @ 2024-08-03 9:33 ` Po Lu 2024-08-03 13:13 ` pipcet 0 siblings, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-03 9:33 UTC (permalink / raw) To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel pipcet@protonmail.com writes: > Thank you! > > One more thing I'm noticing is that on Android, the foundry > string/ach_vendor_id is interned as OG^A@ rather than GOOG. It seems to > me that removing the second read() call in > daefd6771a4879bb8e71ea67f69522700155df01 may have caused the > problem. IIUC, panose[] is four-byte-aligned, but ul_unicode_range is > not, and that messes up our offsets. Is that correct? Right, I misunderstood FreeType's implementation. Also fixed. > Are we actually using ul_unicode_range, by the way? Not that I recall. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 9:33 ` Po Lu @ 2024-08-03 13:13 ` pipcet 2024-08-03 13:31 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: pipcet @ 2024-08-03 13:13 UTC (permalink / raw) To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel "Po Lu" <luangruo@yahoo.com> writes: > pipcet@protonmail.com writes: > >> Thank you! >> >> One more thing I'm noticing is that on Android, the foundry >> string/ach_vendor_id is interned as OG^A@ rather than GOOG. It seems to >> me that removing the second read() call in >> daefd6771a4879bb8e71ea67f69522700155df01 may have caused the >> problem. IIUC, panose[] is four-byte-aligned, but ul_unicode_range is >> not, and that messes up our offsets. Is that correct? > > Right, I misunderstood FreeType's implementation. Also fixed. Thanks again! I really appreciate that there's a second set of eyeballs going over these before pushing to master, let alone emacs-30 :-) The next issue on my LineageOS (an Android variant) phone is that the Noto CJK fonts have an "OTTO" header, not a "true"/00010000 one, so they're not enumerated at all. At first glance, this doesn't appear to be a LineageOS quirk; downloads available elsewhere also have the OTTO header. My understanding of the source code is we currently don't support OTTO fonts at all, and my experiment in forcing the header to be recognized seems to agree with me there: I get tofu even for ASCII characters, which FontForge indicates are present in the font. I have (lossily!) converted Noto Sans CJK SC to TTF format (with a "glyf" table), and installed that in /data/data/org.gnu.emacs/files/fonts, customized the "region" face to use that font, and now I can see some Han characters when I select them, but they turn into tofu when using the default face. (It's possible this is due to the experiment I've described above and goes away when I revert it...) Of course, Noto CJK doesn't provide U+20000 (it does provide U+2000B), so all this is a bit off-topic. Pip ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 13:13 ` pipcet @ 2024-08-03 13:31 ` Po Lu 2024-08-03 14:31 ` pipcet 0 siblings, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-03 13:31 UTC (permalink / raw) To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel pipcet@protonmail.com writes: > Thanks again! I really appreciate that there's a second set of eyeballs > going over these before pushing to master, let alone emacs-30 :-) > > The next issue on my LineageOS (an Android variant) phone is that the > Noto CJK fonts have an "OTTO" header, not a "true"/00010000 one, so > they're not enumerated at all. At first glance, this doesn't appear to > be a LineageOS quirk; downloads available elsewhere also have the OTTO > header. > > My understanding of the source code is we currently don't support OTTO > fonts at all, and my experiment in forcing the header to be recognized > seems to agree with me there: I get tofu even for ASCII characters, > which FontForge indicates are present in the font. Yes, because OTTO fonts are actually a completely distinct format from TTF, with a unique bytecode language for constructing glyph outlines. See: https://learn.microsoft.com/en-us/typography/opentype/spec/glyphformatcomparison > I have (lossily!) converted Noto Sans CJK SC to TTF format (with a > "glyf" table), and installed that in > /data/data/org.gnu.emacs/files/fonts, customized the "region" face to > use that font, and now I can see some Han characters when I select them, > but they turn into tofu when using the default face. (It's possible this > is due to the experiment I've described above and goes away when I > revert it...) > > Of course, Noto CJK doesn't provide U+20000 (it does provide U+2000B), > so all this is a bit off-topic. That's quite a circutous means of obtaining a TTF version of the Noto CJK fonts. These fonts are compiled into both formats from a number of common source files, and there is an index of the options available at: https://github.com/notofonts/noto-cjk/tree/main/Serif an amalgamation of all CJK variants is: https://github.com/googlefonts/noto-cjk/raw/main/Serif/Variable/OTC/NotoSerifCJK-VF.ttf.ttc but since it is enormous, I suggest installing only those regional variants which cover the scripts you require. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 13:31 ` Po Lu @ 2024-08-03 14:31 ` pipcet 2024-08-03 14:54 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: pipcet @ 2024-08-03 14:31 UTC (permalink / raw) To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel "Po Lu" <luangruo@yahoo.com> writes: > pipcet@protonmail.com writes: > >> Thanks again! I really appreciate that there's a second set of eyeballs >> going over these before pushing to master, let alone emacs-30 :-) >> >> The next issue on my LineageOS (an Android variant) phone is that the >> Noto CJK fonts have an "OTTO" header, not a "true"/00010000 one, so >> they're not enumerated at all. At first glance, this doesn't appear to >> be a LineageOS quirk; downloads available elsewhere also have the OTTO >> header. >> >> My understanding of the source code is we currently don't support OTTO >> fonts at all, and my experiment in forcing the header to be recognized >> seems to agree with me there: I get tofu even for ASCII characters, >> which FontForge indicates are present in the font. > > Yes, because OTTO fonts are actually a completely distinct format from > TTF, with a unique bytecode language for constructing glyph outlines. > See: > > https://learn.microsoft.com/en-us/typography/opentype/spec/glyphformatcomparison Thanks! So supporting these fonts on Android using the sfnt driver is very hard, correct? The androidfont.c fallback driver appears to support them, but I understand that's not a good option either. (I tried (set-frame-parameter nil 'font-backend "android") and the difference in a C-h h buffer was quite noticeable). >> I have (lossily!) converted Noto Sans CJK SC to TTF format (with a >> "glyf" table), and installed that in >> /data/data/org.gnu.emacs/files/fonts, customized the "region" face to >> use that font, and now I can see some Han characters when I select them, >> but they turn into tofu when using the default face. (It's possible this >> is due to the experiment I've described above and goes away when I >> revert it...) >> >> Of course, Noto CJK doesn't provide U+20000 (it does provide U+2000B), >> so all this is a bit off-topic. > > That's quite a circutous means of obtaining a TTF version of the Noto > CJK fonts. Agreed, but at least the size is manageable (probably due to loss of quality, though). > These fonts are compiled into both formats from a number of > common source files, and there is an index of the options available at: > > https://github.com/notofonts/noto-cjk/tree/main/Serif > > an amalgamation of all CJK variants is: > > https://github.com/googlefonts/noto-cjk/raw/main/Serif/Variable/OTC/NotoSerifCJK-VF.ttf.ttc > > but since it is enormous, I suggest installing only those regional > variants which cover the scripts you require. In any case, asking users to install an extra font only to see basic CJK glyphs can't be a long-term solution, can it? Pip ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 14:31 ` pipcet @ 2024-08-03 14:54 ` Po Lu 2024-08-07 17:52 ` Pip Cet 0 siblings, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-03 14:54 UTC (permalink / raw) To: pipcet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel pipcet@protonmail.com writes: > Thanks! So supporting these fonts on Android using the sfnt driver is > very hard, correct? Yes, it would amount to writing a new font driver. > The androidfont.c fallback driver appears to support them, but I > understand that's not a good option either. (I tried > (set-frame-parameter nil 'font-backend "android") and the difference > in a C-h h buffer was quite noticeable). Correct. > In any case, asking users to install an extra font only to see basic > CJK glyphs can't be a long-term solution, can it? Till someone implements the aforesaid new font driver, why not? The font is installed once, and that is the end of the matter, and ultimately it is the same font that would ideally be loaded from /system/fonts, with no degredation in quality. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 14:54 ` Po Lu @ 2024-08-07 17:52 ` Pip Cet 2024-08-08 0:10 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: Pip Cet @ 2024-08-07 17:52 UTC (permalink / raw) To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel "Po Lu" <luangruo@yahoo.com> writes: > pipcet@protonmail.com writes: > >> Thanks! So supporting these fonts on Android using the sfnt driver is >> very hard, correct? > > Yes, it would amount to writing a new font driver. Just so that we have that option, I've done the minimal work necessary to build Emacs for Android with fontconfig and the ftcrhb and ftcr font drivers. As far as I can tell, it works now (including, regrettably, color emoji provided in font files). Harfbuzz shaping seems to work, too. I still have to look at what the DRAW_CURSOR stuff does... Of course there are good reasons not to want to do this: the sfntfont-android driver is very fast, this is not. It also requires an additional file to configure fontconfig and a cache directory for fontconfig's disk-based cache. Plus there are the extra dependencies... Obviously, this would be and remain an optional feature, with the other font drivers still available (that's not true for my test builds including these and quite a few other changes, which currently disable the android-specific drivers: https://codeberg.org/pipcet/emacs-android ) >> The androidfont.c fallback driver appears to support them, but I >> understand that's not a good option either. (I tried >> (set-frame-parameter nil 'font-backend "android") and the difference >> in a C-h h buffer was quite noticeable). > > Correct. Which devices in the androidfont.c driver currently used on? Very old ones with 15/16 bpp? >> In any case, asking users to install an extra font only to see basic >> CJK glyphs can't be a long-term solution, can it? > > Till someone implements the aforesaid new font driver, why not? The > font is installed once, and that is the end of the matter, and > ultimately it is the same font that would ideally be loaded from > /system/fonts, with no degredation in quality. How do we know? I can't find this alleged source form for the Noto CJK font anywhere, just binaries produced by Google. Maybe they are editing one of the versions directly, and generating the other one from it, which would probably mean the secondary version is degraded in quality, but even then Google would have to make available instructions for how to build one from the other in order for the font to be considered free (or, if I understand the OSI definition correctly, "open source") > The source code must be the preferred form in which a programmer would > modify the program. Deliberately obfuscated source code is not > allowed. Intermediate forms such as the output of a preprocessor or > translator are not allowed. Anyway, here's the main part of the patch: Pip ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-07 17:52 ` Pip Cet @ 2024-08-08 0:10 ` Po Lu 2024-08-09 12:33 ` Pip Cet 0 siblings, 1 reply; 36+ messages in thread From: Po Lu @ 2024-08-08 0:10 UTC (permalink / raw) To: Pip Cet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel Pip Cet <pipcet@protonmail.com> writes: > Just so that we have that option, I've done the minimal work necessary > to build Emacs for Android with fontconfig and the ftcrhb and ftcr font > drivers. As far as I can tell, it works now (including, regrettably, > color emoji provided in font files). Harfbuzz shaping seems to work, > too. I still have to look at what the DRAW_CURSOR stuff does... > > Of course there are good reasons not to want to do this: the > sfntfont-android driver is very fast, this is not. It also requires an > additional file to configure fontconfig and a cache directory for > fontconfig's disk-based cache. Plus there are the extra dependencies... > > Obviously, this would be and remain an optional feature, with the other > font drivers still available (that's not true for my test builds > including these and quite a few other changes, which currently disable > the android-specific drivers: https://codeberg.org/pipcet/emacs-android ) Thanks, but I won't agree to install this: the invariable rule is that people, for foolish reasons, will begin to use this font driver, with all their flaws and imperfections, and we will ultimately be held responsible for its upkeep. What is truly irritating is that FreeType is part of the OS, but that it is not stable, and that the linker is rigged not to permit third-party programs from linking to such unstable libraries. If not for this, I could have agreed to a version of the ftfont driver disentangled from Fontconfig. > Which devices in the androidfont.c driver currently used on? Very old > ones with 15/16 bpp? No, on non-standard operating systems where /system/fonts does not exist. Bit depth is no object, since the OS always provides Emacs with a 32-bit "RGBA" (which is ABGR on little-endian systems) surface on which to draw, and handles conversion between this format and that of the screen. > How do we know? I can't find this alleged source form for the Noto CJK > font anywhere, just binaries produced by Google. Maybe they are editing > one of the versions directly, and generating the other one from it, > which would probably mean the secondary version is degraded in quality, > but even then Google would have to make available instructions for how > to build one from the other in order for the font to be considered > free (or, if I understand the OSI definition correctly, "open source") Noto CJK is a version of Source Han Sans: https://github.com/adobe-fonts/source-han-sans and its binaries are generated from the same source code by ADFKO (though I am surprised to learn that this source is not in UFO format). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-08 0:10 ` Po Lu @ 2024-08-09 12:33 ` Pip Cet 2024-08-09 13:10 ` Po Lu 0 siblings, 1 reply; 36+ messages in thread From: Pip Cet @ 2024-08-09 12:33 UTC (permalink / raw) To: Po Lu; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel "Po Lu" <luangruo@yahoo.com> writes: > Pip Cet <pipcet@protonmail.com> writes: > >> Just so that we have that option, I've done the minimal work necessary >> to build Emacs for Android with fontconfig and the ftcrhb and ftcr font >> drivers. As far as I can tell, it works now (including, regrettably, >> color emoji provided in font files). Harfbuzz shaping seems to work, >> too. I still have to look at what the DRAW_CURSOR stuff does... >> >> Of course there are good reasons not to want to do this: the >> sfntfont-android driver is very fast, this is not. It also requires an >> additional file to configure fontconfig and a cache directory for >> fontconfig's disk-based cache. Plus there are the extra dependencies... >> >> Obviously, this would be and remain an optional feature, with the other >> font drivers still available (that's not true for my test builds >> including these and quite a few other changes, which currently disable >> the android-specific drivers: https://codeberg.org/pipcet/emacs-android ) > > Thanks, but I won't agree to install this No problem at all. Thanks for your response. > the invariable rule is that > people, for foolish reasons, will begin to use this font driver, with > all their flaws and imperfections, and we will ultimately be held > responsible for its upkeep. What is truly irritating is that FreeType > is part of the OS, but that it is not stable, and that the linker is > rigged not to permit third-party programs from linking to such unstable > libraries. If not for this, I could have agreed to a version of the > ftfont driver disentangled from Fontconfig. I think the decision to rely on fontconfig for the ftcr(hb) drivers has been made, though. I must confess I haven't looked at sfnt.c very much, but I'm surprised to find it has been made part of Emacs. Since it has, though, we might as well use it. And it seems Noto is working on replacing the CJK fonts by TrueType fonts. >> Which devices in the androidfont.c driver currently used on? Very old >> ones with 15/16 bpp? > > No, on non-standard operating systems where /system/fonts does not > exist. Bit depth is no object, since the OS always provides Emacs with > a 32-bit "RGBA" (which is ABGR on little-endian systems) surface on > which to draw, and handles conversion between this format and that of > the screen. Thanks! >> How do we know? I can't find this alleged source form for the Noto CJK >> font anywhere, just binaries produced by Google. Maybe they are editing >> one of the versions directly, and generating the other one from it, >> which would probably mean the secondary version is degraded in quality, >> but even then Google would have to make available instructions for how >> to build one from the other in order for the font to be considered >> free (or, if I understand the OSI definition correctly, "open source") > > Noto CJK is a version of Source Han Sans: Thanks, I'm aware it's a modified version of that font. > https://github.com/adobe-fonts/source-han-sans > and its binaries are generated from the same source code by ADFKO > (though I am surprised to learn that this source is not in UFO format). I don't think it's the "source" at all. It's an intermediate binary, produced by a proprietary tool (as Adobe states), probably from a similarly proprietary, actual source format. Anyway, as for technical issues, the Type 1 font in the Source Han Sans CID is clearly closer to the source than the TrueType fonts are. Thus, there is some quality degradation when I, or when Google, generates a TrueType font from it. Pip ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-09 12:33 ` Pip Cet @ 2024-08-09 13:10 ` Po Lu 0 siblings, 0 replies; 36+ messages in thread From: Po Lu @ 2024-08-09 13:10 UTC (permalink / raw) To: Pip Cet; +Cc: Benjamin Riefenstahl, Eli Zaretskii, emacs-devel Pip Cet <pipcet@protonmail.com> writes: >> https://github.com/adobe-fonts/source-han-sans > >> and its binaries are generated from the same source code by ADFKO >> (though I am surprised to learn that this source is not in UFO format). > > I don't think it's the "source" at all. It's an intermediate binary, > produced by a proprietary tool (as Adobe states), probably from a > similarly proprietary, actual source format. > > Anyway, as for technical issues, the Type 1 font in the Source Han Sans > CID is clearly closer to the source than the TrueType fonts are. Thus, > there is some quality degradation when I, or when Google, generates a > TrueType font from it. Be that as it may, there is no perceptible difference between the two, if hinting is disabled for the OTF original. At all events, I do invite interested persons to undertake implementing support for OTF font files in a manner that does not require Freetype, Fontconfig, or further dependencies. This might be feasible (so far as Noto and other outline fonts are concerned) without also supporting color glyphs. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-03 7:12 ` pipcet 2024-08-03 8:52 ` Po Lu @ 2024-08-03 15:15 ` Eli Zaretskii 1 sibling, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2024-08-03 15:15 UTC (permalink / raw) To: pipcet; +Cc: b.riefenstahl, luangruo, emacs-devel > Date: Sat, 03 Aug 2024 07:12:18 +0000 > From: pipcet@protonmail.com > Cc: Eli Zaretskii <eliz@gnu.org>, luangruo@yahoo.com, emacs-devel@gnu.org > > w32font.c seems to ignore Vscript_representative_chars entirely. This > also appears to apply to the harfbuzz backend. While the font search on MS-Windows indeed cannot use script-representative-chars, the representative characters are still used on Windows, just in a different way and only for certain scripts. See w32-find-non-USB-fonts. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 8:16 ` Po Lu 2024-08-01 9:49 ` Eli Zaretskii @ 2024-08-02 10:44 ` Benjamin Riefenstahl 2024-08-02 11:42 ` Po Lu 1 sibling, 1 reply; 36+ messages in thread From: Benjamin Riefenstahl @ 2024-08-02 10:44 UTC (permalink / raw) To: Po Lu; +Cc: Eli Zaretskii, emacs-devel Hi there, Po Lu writes: > (let ((script-representative-chars > '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 > #x31c0 #x4e10 #x5B57 #xfe30 #xf900 > #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804)))) > (clear-font-cache) > (find-font (font-spec :registry "iso10646-1" :script 'han > :type 'xfthb))) ;; or another ftfont backend. > > returns no font on an up-to-date Fedora Workstation installation with a > wealth of multilingual fonts for CJK scripts, whereas: > > (let ((script-representative-chars > '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 > #x31c0 #x4e10 #x5B57 #xfe30 #xf900)))) > (clear-font-cache) > (find-font (font-spec :registry "iso10646-1" :script 'han > :type 'xfthb))) > > returns: > > #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0> FTR, for me these both return (with the backend ftcrhb): #<font-entity ftcrhb GOOG Noto\ Sans\ CJK\ KR nil iso10646-1 regular normal normal 0 nil nil 0> This is on "Debian GNU/Linux 12 (bookworm)". benny ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-02 10:44 ` Benjamin Riefenstahl @ 2024-08-02 11:42 ` Po Lu 0 siblings, 0 replies; 36+ messages in thread From: Po Lu @ 2024-08-02 11:42 UTC (permalink / raw) To: Benjamin Riefenstahl; +Cc: Eli Zaretskii, emacs-devel Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net> writes: > Hi there, > > Po Lu writes: >> (let ((script-representative-chars >> '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 >> #x31c0 #x4e10 #x5B57 #xfe30 #xf900 >> #x1f210 #x20000 #x2a700 #x2b740 #x2b820 #x2ceb0 #x2f804)))) >> (clear-font-cache) >> (find-font (font-spec :registry "iso10646-1" :script 'han >> :type 'xfthb))) ;; or another ftfont backend. >> >> returns no font on an up-to-date Fedora Workstation installation with a >> wealth of multilingual fonts for CJK scripts, whereas: >> >> (let ((script-representative-chars >> '((han #x2e90 #x2f00 #x3010 #x3200 #x3300 #x3400 >> #x31c0 #x4e10 #x5B57 #xfe30 #xf900)))) >> (clear-font-cache) >> (find-font (font-spec :registry "iso10646-1" :script 'han >> :type 'xfthb))) >> >> returns: >> >> #<font-entity xfthb ADBO Noto\ Sans\ CJK\ HK nil iso10646-1 medium normal normal 0 nil nil 0> > > FTR, for me these both return (with the backend ftcrhb): > > #<font-entity ftcrhb GOOG Noto\ Sans\ CJK\ KR nil iso10646-1 regular > normal normal 0 nil nil 0> > > This is on "Debian GNU/Linux 12 (bookworm)". > > benny Yes, it was concluded that this test is not reliable. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: master bf0aeaa0d7a: Re-enable displaying `han' characters on Android 2024-08-01 0:07 ` Po Lu 2024-08-01 0:33 ` Po Lu 2024-08-01 5:32 ` Eli Zaretskii @ 2024-08-01 7:57 ` Andrea Corallo 2 siblings, 0 replies; 36+ messages in thread From: Andrea Corallo @ 2024-08-01 7:57 UTC (permalink / raw) To: Po Lu; +Cc: Eli Zaretskii, emacs-devel Po Lu <luangruo@yahoo.com> writes: > Eli Zaretskii <eliz@gnu.org> writes: > >> I've reverted the above commit. The change which added those >> characters was not an accident: I found that Emacs would choose an >> inappropriate (sub-optimal) font for Chinese characters because it >> generally stops looking once it find the first font that fulfills the >> requirements. > > The reason behind your discovery is that with your choice of > `script-representative-chars', no font will ever match this font spec > (in the default fontset): > > ,(font-spec :registry "iso10646-1" :script 'han) > > so that Emacs returns to the preceding ones, which specify a design > language rather than a script: > > ,(font-spec :registry "iso10646-1" :lang 'ja) > ,(font-spec :registry "iso10646-1" :lang 'zh) > > which is supported elsewhere than on Android. > >> The font Emacs sometimes selects due to those characters missing >> lacked support for important Han blocks because those blocks had no >> characters in script-representative-chars. > > I didn't revert your change in whole, only characters beyond the BMP > that seldom appear in real Chinese writing; of the characters that were > deleted: Still, even if is not a complete revert, if you are undoing even partially a change by someone else, please discuss first why you'd want to do it on the list, especially if it's recent change. As this discussion proves the consequences of this change are not trivial. Thanks Andrea ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-08-09 13:10 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-31 15:45 master bf0aeaa0d7a: Re-enable displaying `han' characters on Android Eli Zaretskii 2024-08-01 0:07 ` Po Lu 2024-08-01 0:33 ` Po Lu 2024-08-01 5:52 ` Eli Zaretskii 2024-08-01 7:55 ` Po Lu 2024-08-01 8:52 ` Eli Zaretskii 2024-08-01 9:47 ` Po Lu 2024-08-01 9:56 ` Eli Zaretskii 2024-08-01 10:13 ` Po Lu 2024-08-01 10:19 ` Eli Zaretskii 2024-08-01 21:17 ` Dmitry Gutov 2024-08-01 5:32 ` Eli Zaretskii 2024-08-01 8:16 ` Po Lu 2024-08-01 9:49 ` Eli Zaretskii 2024-08-01 10:30 ` Po Lu 2024-08-01 10:35 ` Eli Zaretskii 2024-08-02 10:52 ` Benjamin Riefenstahl 2024-08-02 12:29 ` Eli Zaretskii 2024-08-02 12:55 ` Benjamin Riefenstahl 2024-08-02 13:13 ` Benjamin Riefenstahl 2024-08-03 7:12 ` pipcet 2024-08-03 8:52 ` Po Lu 2024-08-03 9:21 ` pipcet 2024-08-03 9:33 ` Po Lu 2024-08-03 13:13 ` pipcet 2024-08-03 13:31 ` Po Lu 2024-08-03 14:31 ` pipcet 2024-08-03 14:54 ` Po Lu 2024-08-07 17:52 ` Pip Cet 2024-08-08 0:10 ` Po Lu 2024-08-09 12:33 ` Pip Cet 2024-08-09 13:10 ` Po Lu 2024-08-03 15:15 ` Eli Zaretskii 2024-08-02 10:44 ` Benjamin Riefenstahl 2024-08-02 11:42 ` Po Lu 2024-08-01 7:57 ` Andrea Corallo
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).