* bug#31315: wrong font encoding for fallback font @ 2018-04-30 7:21 Werner LEMBERG 2018-04-30 15:13 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Werner LEMBERG @ 2018-04-30 7:21 UTC (permalink / raw) To: 31315 [-- Attachment #1: Type: Text/Plain, Size: 5948 bytes --] The attached image shows that some CJK characters are displayed incorrectly. For the used outline font Emacs reports xft:-PfEd-AR PL UKai TW MBE-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 One character (the highlighted one) is missing in this font, and Emacs uses a different font as a fallback: x:-misc-droid sans fallback-medium-r-normal--18-130-100-100-p-179-gb18030.2000-0 Note that the different font backend seems to produce the ugly rendering; the font in question is the outline font `DroidSansFallbackFull.ttf'. The problem now is that the encoding of the fallback font is not respected. In the image, the highlighted character is U+83EF, but Emacs incorrectly displays U+51BF instead. The GB 18030 bytes to represent U+51BF are \x83\xEF; this clearly shows that Emacs lacks an iconv call (or an equivalent to that); instead, it seems to simply feed the Unicode value to the font backend. * * * It's a completely different question why on my system Emacs uses a font encoded in GB 18030 as a fallback font. It's probably related to the fact that I use `mew' as my e-mail program, manually extended to cover GB 18030. Unfortunately, I wasn't able yet to trigger the issue with `emacs -Q' (which by default uses iso10646 for the fallback font). On the other hand, as soon as the problem happens, it happens with any buffer containing CJK characters not displayable with the current font, so it seems a genuine Emacs core bug. In GNU Emacs 27.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.20.10) of 2018-02-12 built on linux Repository revision: 3a718ffca097b35218c3e041a94adff937f3052f Windowing system distributor 'The X.Org Foundation', version 11.0.11803000 System Description: openSUSE Leap 42.3 Recent messages: Wrote /home/wl/Mail/draft/2 Draft is prepared Sole completion next-line: End of buffer [3 times] Kill draft message? (y or n) y Saving file /home/wl/Mail/draft/2... Wrote /home/wl/Mail/draft/2 Draft was killed Quit Type C-x 1 to remove help window. Configured features: XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS NOTIFY GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 THREADS LIBSYSTEMD JSON LCMS2 Important settings: value of $LANG: de_AT.UTF-8 value of $XMODIFIERS: @im=none locale-coding-system: utf-8-unix Major mode: Summary Minor modes in effect: shell-dirtrack-mode: t TeX-PDF-mode: t tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t column-number-mode: t transient-mark-mode: t Load-path shadows: /usr/local/share/emacs/site-lisp/thai-word hides /usr/local/share/emacs/27.0.50/lisp/language/thai-word Features: (shadow emacsbug message dired dired-loaddefs format-spec rfc822 mml mml-sec epa derived epg gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mail-utils preview prv-emacs noutline outline tex-jp wid-edit descr-text apropos font-latex plain-tex tex-buf latex easy-mmode tex-ispell tex-style tex-mode compile shell pcomplete comint ansi-color ring latexenc misearch multi-isearch shr-color color shr svg browse-url network-stream puny nsm rmc starttls tls gnutls qp pp mew-varsx mew-unix elec-pair edmacro kmacro rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap nxml-util nxml-enc xmltok sgml-mode dom hideshow cal-menu calendar cal-loaddefs mew-auth mew-config mew-imap2 mew-imap mew-nntp2 mew-nntp mew-pop mew-smtp mew-ssl mew-ssh mew-net mew-highlight mew-sort mew-fib mew-ext mew-refile mew-demo mew-attach mew-draft mew-message mew-thread mew-virtual mew-summary4 mew-summary3 mew-summary2 mew-summary mew-search mew-pick mew-passwd mew-scan mew-syntax mew-bq mew-smime mew-pgp mew-header mew-exec mew-mark mew-mime mew-edit mew-decode mew-encode mew-cache mew-minibuf mew-complete mew-addrbook mew-local mew-vars3 mew-vars2 mew-vars mew-env mew-mule3 mew-mule mew-gemacs mew-key mew-func mew-blvs mew-const mew tex dbus xml crm advice auto-loads tex-site quail help-mode cjktilde mm-util mail-prsvr disp-table finder-inf package easymenu epg-config url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json map url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote dbusbind inotify lcms2 dynamic-setting system-font-setting font-render-setting move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 16 454688 29723) (symbols 48 49126 5) (miscs 40 4726 718) (strings 32 150458 3023) (string-bytes 1 3707077) (vectors 16 46771) (vector-slots 8 2048741 170478) (floats 8 303 739) (intervals 56 66880 624) (buffers 992 20) (heap 1024 70029 4352)) [-- Attachment #2: emacs.png --] [-- Type: Image/Png, Size: 79126 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-04-30 7:21 bug#31315: wrong font encoding for fallback font Werner LEMBERG @ 2018-04-30 15:13 ` Eli Zaretskii 2018-04-30 15:42 ` Andreas Schwab 2018-05-01 6:36 ` Werner LEMBERG 0 siblings, 2 replies; 23+ messages in thread From: Eli Zaretskii @ 2018-04-30 15:13 UTC (permalink / raw) To: Werner LEMBERG, Kenichi Handa; +Cc: 31315 > Date: Mon, 30 Apr 2018 09:21:06 +0200 (CEST) > From: Werner LEMBERG <wl@gnu.org> (Adding Handa-san to the discussion in the hope that he might have some comments.) > The attached image shows that some CJK characters are displayed > incorrectly. For the used outline font Emacs reports > > xft:-PfEd-AR PL UKai TW MBE-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 > > One character (the highlighted one) is missing in this font, and Emacs > uses a different font as a fallback: > > x:-misc-droid sans fallback-medium-r-normal--18-130-100-100-p-179-gb18030.2000-0 > > Note that the different font backend seems to produce the ugly > rendering; the font in question is the outline font > `DroidSansFallbackFull.ttf'. That's why it is fallback, I guess... ;-) And I think you might be mistaken in your interpretation of what "gb18030.2000" in the font name means: I think it's the font registry, not its encoding. How sure are you that the encoding of this font is indeed gb18030.2000? (I'm not an expert on this stuff.) > The problem now is that the encoding of the fallback font is not > respected. In the image, the highlighted character is U+83EF, but > Emacs incorrectly displays U+51BF instead. > > The GB 18030 bytes to represent U+51BF are \x83\xEF; this clearly > shows that Emacs lacks an iconv call (or an equivalent to that); > instead, it seems to simply feed the Unicode value to the font > backend. Tz-tz-tz, how can you even suggest something like that about Emacs ;-) If you look in xfont_encode_char, you will see that it does encode the character before handing it to the font-drawing function. But I see that font-encoding-alist has this to say about gb18030: ("gb18030" unicode) Does replacing that with something like this: ("gb18030" (gb18030 . unicode)) solve the problem? What we put in font-encoding-alist now was a deliberate change in Jan 2008, in response to a bug report; see http://lists.gnu.org/archive/html/emacs-devel/2008-01/msg00754.html If fonts like this one need to have characters encoded by gb18030, then I think we need to change what the value says. But this area in Emacs is under-documented, so I'm not sure I've got it right, in particular what is the effect of ENCODING and REPERTORY in this context. For most font back-ends, ENCODING is ignored, because the back-end is capable to encode the character we hand to it. But the xfont back-end indeed uses Emacs's encoding functions to do that externally to the corresponding X APIs. Which might explain why this problem, if indeed we fail to specify the correct encoding for this charset, was never reported till now: xfont is rarely if ever used. > It's a completely different question why on my system Emacs uses a > font encoded in GB 18030 as a fallback font. It's probably related to > the fact that I use `mew' as my e-mail program, manually extended to > cover GB 18030. Unfortunately, I wasn't able yet to trigger the issue > with `emacs -Q' (which by default uses iso10646 for the fallback > font). Well, we cannot try helping you to unlock this unless you tell how you "manually extended" Emacs. In general, the way to request that Emacs uses fonts you like with certain characters or charsets is by customizing your fontsets. I cannot say more without hearing the details. > On the other hand, as soon as the problem happens, it happens > with any buffer containing CJK characters not displayable with the > current font, so it seems a genuine Emacs core bug. What "problem" do you allude to here? The first (seemingly incorrect encoding) or the second (fallback to this particular font)? ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-04-30 15:13 ` Eli Zaretskii @ 2018-04-30 15:42 ` Andreas Schwab 2018-04-30 19:26 ` Eli Zaretskii 2018-05-01 6:36 ` Werner LEMBERG 1 sibling, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2018-04-30 15:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 31315 On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote: > If you look in xfont_encode_char, you will see that it does encode the > character before handing it to the font-drawing function. But I see > that font-encoding-alist has this to say about gb18030: > > ("gb18030" unicode) > > Does replacing that with something like this: > > ("gb18030" (gb18030 . unicode)) > > solve the problem? I think it should be gb18030-2-byte, both as encoding and repertory. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-04-30 15:42 ` Andreas Schwab @ 2018-04-30 19:26 ` Eli Zaretskii 2018-04-30 20:03 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2018-04-30 19:26 UTC (permalink / raw) To: Andreas Schwab; +Cc: 31315 > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: Werner LEMBERG <wl@gnu.org>, Kenichi Handa <handa@gnu.org>, 31315@debbugs.gnu.org > Date: Mon, 30 Apr 2018 17:42:50 +0200 > > On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote: > > > Does replacing that with something like this: > > > > ("gb18030" (gb18030 . unicode)) > > > > solve the problem? > > I think it should be gb18030-2-byte, both as encoding and repertory. Well, if we want to limit the repertory, then why not gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030? But I agree that it would be good to try that, and maybe both. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-04-30 19:26 ` Eli Zaretskii @ 2018-04-30 20:03 ` Andreas Schwab 2018-05-01 2:37 ` Eli Zaretskii 2018-05-01 6:47 ` Werner LEMBERG 0 siblings, 2 replies; 23+ messages in thread From: Andreas Schwab @ 2018-04-30 20:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 31315 On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Andreas Schwab <schwab@linux-m68k.org> >> Cc: Werner LEMBERG <wl@gnu.org>, Kenichi Handa <handa@gnu.org>, 31315@debbugs.gnu.org >> Date: Mon, 30 Apr 2018 17:42:50 +0200 >> >> On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote: >> >> > Does replacing that with something like this: >> > >> > ("gb18030" (gb18030 . unicode)) >> > >> > solve the problem? >> >> I think it should be gb18030-2-byte, both as encoding and repertory. > > Well, if we want to limit the repertory, then why not > gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030? All gb18030 encoded fonts I found only provide 2-byte codes. That may be an inherent limitation. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-04-30 20:03 ` Andreas Schwab @ 2018-05-01 2:37 ` Eli Zaretskii 2018-05-01 6:47 ` Werner LEMBERG 1 sibling, 0 replies; 23+ messages in thread From: Eli Zaretskii @ 2018-05-01 2:37 UTC (permalink / raw) To: Andreas Schwab; +Cc: 31315 > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: wl@gnu.org, handa@gnu.org, 31315@debbugs.gnu.org > Date: Mon, 30 Apr 2018 22:03:42 +0200 > > >> > Does replacing that with something like this: > >> > > >> > ("gb18030" (gb18030 . unicode)) > >> > > >> > solve the problem? > >> > >> I think it should be gb18030-2-byte, both as encoding and repertory. > > > > Well, if we want to limit the repertory, then why not > > gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030? > > All gb18030 encoded fonts I found only provide 2-byte codes. That may > be an inherent limitation. Thanks. Werner, can you try that, please? ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-04-30 20:03 ` Andreas Schwab 2018-05-01 2:37 ` Eli Zaretskii @ 2018-05-01 6:47 ` Werner LEMBERG 2018-05-01 8:13 ` Andreas Schwab 1 sibling, 1 reply; 23+ messages in thread From: Werner LEMBERG @ 2018-05-01 6:47 UTC (permalink / raw) To: schwab; +Cc: 31315 >>> I think it should be gb18030-2-byte, both as encoding and repertory. >> >> Well, if we want to limit the repertory, then why not >> gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030? > > All gb18030 encoded fonts I found only provide 2-byte codes. You actually have found a font encoded in GB18030? Which one? I have never seen that. > That may be an inherent limitation. The limitation is not 2-byte codes but the number of glyphs in a font. An .otf or .ttf font can't hold more than 2^16 glyphs. It is thus common to have separate fonts for accessing glyphs from U+10000 and higher. Werner ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 6:47 ` Werner LEMBERG @ 2018-05-01 8:13 ` Andreas Schwab 2018-05-01 9:11 ` Werner LEMBERG 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2018-05-01 8:13 UTC (permalink / raw) To: Werner LEMBERG; +Cc: 31315 On Mai 01 2018, Werner LEMBERG <wl@gnu.org> wrote: > You actually have found a font encoded in GB18030? Which one? I have > never seen that. The same that you have plus two others. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 8:13 ` Andreas Schwab @ 2018-05-01 9:11 ` Werner LEMBERG 2018-05-01 15:00 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Werner LEMBERG @ 2018-05-01 9:11 UTC (permalink / raw) To: schwab; +Cc: 31315 >> You actually have found a font encoded in GB18030? Which one? I have >> never seen that. > > The same that you have plus two others. But those fonts are *not* encoded in GB18030 at all. It is the X11 font interface that provides GB18030 encoding access. BTW, looking into my `/usr/share/fonts/encodings/large' directory, I see a file `gb18030.2000-0.enc.gz', which contains only two-byte entries. In other words, it omits all three- and four-byte entries, thus covering only a subset of GB18030. Werner ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 9:11 ` Werner LEMBERG @ 2018-05-01 15:00 ` Eli Zaretskii 2018-05-01 17:42 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2018-05-01 15:00 UTC (permalink / raw) To: Werner LEMBERG; +Cc: 31315, schwab > Date: Tue, 01 May 2018 11:11:30 +0200 (CEST) > Cc: eliz@gnu.org, handa@gnu.org, 31315@debbugs.gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > >> You actually have found a font encoded in GB18030? Which one? I have > >> never seen that. > > > > The same that you have plus two others. > > But those fonts are *not* encoded in GB18030 at all. It is the X11 > font interface that provides GB18030 encoding access. > > BTW, looking into my `/usr/share/fonts/encodings/large' directory, I > see a file `gb18030.2000-0.enc.gz', which contains only two-byte > entries. In other words, it omits all three- and four-byte entries, > thus covering only a subset of GB18030. Andreas, given that '("gb18030" (gb18030 . unicode))' appears to work, do you see any potential problems with using it, instead of gb18030-2-byte? Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 15:00 ` Eli Zaretskii @ 2018-05-01 17:42 ` Andreas Schwab 2018-05-05 8:57 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2018-05-01 17:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 31315 On Mai 01 2018, Eli Zaretskii <eliz@gnu.org> wrote: > Andreas, given that '("gb18030" (gb18030 . unicode))' appears to work, > do you see any potential problems with using it, instead of > gb18030-2-byte? I leave that question to those who are more knowledgeable of the X font interface. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 17:42 ` Andreas Schwab @ 2018-05-05 8:57 ` Eli Zaretskii 0 siblings, 0 replies; 23+ messages in thread From: Eli Zaretskii @ 2018-05-05 8:57 UTC (permalink / raw) To: Andreas Schwab; +Cc: 31315-done > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: Werner LEMBERG <wl@gnu.org>, handa@gnu.org, 31315@debbugs.gnu.org > Date: Tue, 01 May 2018 19:42:16 +0200 > > On Mai 01 2018, Eli Zaretskii <eliz@gnu.org> wrote: > > > Andreas, given that '("gb18030" (gb18030 . unicode))' appears to work, > > do you see any potential problems with using it, instead of > > gb18030-2-byte? > > I leave that question to those who are more knowledgeable of the X font > interface. I invite those people to chime in. Meanwhile, I pushed that change to the master branch, and I'm boldly marking this bug "done". Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-04-30 15:13 ` Eli Zaretskii 2018-04-30 15:42 ` Andreas Schwab @ 2018-05-01 6:36 ` Werner LEMBERG 2018-05-01 15:22 ` Eli Zaretskii 1 sibling, 1 reply; 23+ messages in thread From: Werner LEMBERG @ 2018-05-01 6:36 UTC (permalink / raw) To: eliz; +Cc: 31315 > And I think you might be mistaken in your interpretation of what > "gb18030.2000" in the font name means: I think it's the font registry, > not its encoding. Yes, but the font registry implies the used encoding to access the font. > How sure are you that the encoding of this font is indeed > gb18030.2000? Quite sure. To be more precise: The real encoding of the font is irrelevant (the Droid Sans Fallback font is a standard TrueType font that has only a Unicode cmap); what matters is how the font backend provides the font to the client. Calling `xlsfonts' I see that X11 offers access as follows. -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-1 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-2 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-3 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb18030.2000-0 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb2312.1980-0 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-iso10646-1 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0201.1976-0 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1983-0 -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1990-0 >> The problem now is that the encoding of the fallback font is not >> respected. In the image, the highlighted character is U+83EF, but >> Emacs incorrectly displays U+51BF instead. >> >> The GB 18030 bytes to represent U+51BF are \x83\xEF; this clearly >> shows that Emacs lacks an iconv call (or an equivalent to that); >> instead, it seems to simply feed the Unicode value to the font >> backend. > > Tz-tz-tz, how can you even suggest something like that about Emacs ;-) > > If you look in xfont_encode_char, you will see that it does encode > the character before handing it to the font-drawing function. But I > see that font-encoding-alist has this to say about gb18030: > > ("gb18030" unicode) > > Does replacing that with something like this: > > ("gb18030" (gb18030 . unicode)) > > solve the problem? Yes, it seems so. > What we put in font-encoding-alist now was a deliberate change in > Jan 2008, in response to a bug report; see > > http://lists.gnu.org/archive/html/emacs-devel/2008-01/msg00754.html > > If fonts like this one need to have characters encoded by gb18030, > then I think we need to change what the value says. As can be seen above, the font itself doesn't need GB18030. It's the font backend that provides this encoding, and Emacs accesses it. > But this area in Emacs is under-documented, so I'm not sure I've > got it right, in particular what is the effect of ENCODING and > REPERTORY in this context. For most font back-ends, ENCODING is > ignored, because the back-end is capable to encode the character we > hand to it. But the xfont back-end indeed uses Emacs's encoding > functions to do that externally to the corresponding X APIs. Which > might explain why this problem, if indeed we fail to specify the > correct encoding for this charset, was never reported till now: > xfont is rarely if ever used. Emacs doesn't fail to specify the correct encoding. The problem is that it feeds the font backend with characters in the wrong encoding (namely Unicode instead of GB 18030). >> It's a completely different question why on my system Emacs uses a >> font encoded in GB 18030 as a fallback font. It's probably related >> to the fact that I use `mew' as my e-mail program, manually >> extended to cover GB 18030. Unfortunately, I wasn't able yet to >> trigger the issue with `emacs -Q' (which by default uses iso10646 >> for the fallback font). > > Well, we cannot try helping you to unlock this unless you tell how > you "manually extended" Emacs. Oh, I haven't extended Emacs, sorry for the bad wording. I've simply added a line to mew's elisp code to make it recognize GB18030 in e-mails. > In general, the way to request that Emacs uses fonts you like with > certain characters or charsets is by customizing your fontsets. I > cannot say more without hearing the details. I don't have any fontsets customized in my `.emacs' file. >> On the other hand, as soon as the problem happens, it happens with >> any buffer containing CJK characters not displayable with the >> current font, so it seems a genuine Emacs core bug. > > What "problem" do you allude to here? The first (seemingly > incorrect encoding) or the second (fallback to this particular > font)? Both. If I open a new file Unicode encoded file, Emacs continues to use GB18030.2000 as the charset registry/encoding for displaying fallback characters, failing to convert Unicode to GB18030 before accessing the characters from the font backend. Werner ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 6:36 ` Werner LEMBERG @ 2018-05-01 15:22 ` Eli Zaretskii 2018-05-01 19:30 ` Werner LEMBERG 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2018-05-01 15:22 UTC (permalink / raw) To: Werner LEMBERG; +Cc: 31315 > Date: Tue, 01 May 2018 08:36:44 +0200 (CEST) > Cc: handa@gnu.org, 31315@debbugs.gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > And I think you might be mistaken in your interpretation of what > > "gb18030.2000" in the font name means: I think it's the font registry, > > not its encoding. > > Yes, but the font registry implies the used encoding to access the > font. Having said that, you seem to contradict yourself right away: > The real encoding of the font is irrelevant (the Droid Sans Fallback > font is a standard TrueType font that has only a Unicode cmap); So I still think we may be miscommunicating. > what matters is how the font backend provides the font to the > client. Calling `xlsfonts' I see that X11 offers access as follows. > > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-1 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-2 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-3 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb18030.2000-0 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb2312.1980-0 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-iso10646-1 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0201.1976-0 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1983-0 > -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1990-0 I think we have a terminology problem here, most probably my fault. What exactly do you mean when you say "font backend" in this context? And what is "the client" in this case? I'm afraid using xlsfonts doesn't help me understand what am I missing, because I have only a vague idea of what that command does, beyond the basic fact that it lists fonts. > > What we put in font-encoding-alist now was a deliberate change in > > Jan 2008, in response to a bug report; see > > > > http://lists.gnu.org/archive/html/emacs-devel/2008-01/msg00754.html > > > > If fonts like this one need to have characters encoded by gb18030, > > then I think we need to change what the value says. > > As can be seen above, the font itself doesn't need GB18030. It's the > font backend that provides this encoding, and Emacs accesses it. In my terminology, "font backend" is in Emacs (xfont.c, xftfont.c, etc.), and the encoding happens in the backend, guided by font-encoding-alist, among other things. And your OP vs the experiment with changing font-encoding-alist clearly shows that encoding characters correctly for the xfont backend _is_ required to display the correct glyphs with fonts handled by that backend. > > But this area in Emacs is under-documented, so I'm not sure I've > > got it right, in particular what is the effect of ENCODING and > > REPERTORY in this context. For most font back-ends, ENCODING is > > ignored, because the back-end is capable to encode the character we > > hand to it. But the xfont back-end indeed uses Emacs's encoding > > functions to do that externally to the corresponding X APIs. Which > > might explain why this problem, if indeed we fail to specify the > > correct encoding for this charset, was never reported till now: > > xfont is rarely if ever used. > > Emacs doesn't fail to specify the correct encoding. The problem is > that it feeds the font backend with characters in the wrong encoding > (namely Unicode instead of GB 18030). "Fails to specify the correct encoding" is the reason why it uses wrong encoding for the characters in the font backend xfont.c. I believe this is again a terminology problem. > >> It's a completely different question why on my system Emacs uses a > >> font encoded in GB 18030 as a fallback font. It's probably related > >> to the fact that I use `mew' as my e-mail program, manually > >> extended to cover GB 18030. Unfortunately, I wasn't able yet to > >> trigger the issue with `emacs -Q' (which by default uses iso10646 > >> for the fallback font). > > > > Well, we cannot try helping you to unlock this unless you tell how > > you "manually extended" Emacs. > > Oh, I haven't extended Emacs, sorry for the bad wording. I've simply > added a line to mew's elisp code to make it recognize GB18030 in > e-mails. If you received a GB18030 encoded email, it is expected that Emacs will try to find a font that explicitly supports GB18030. This is a feature that AFAIU is very important to CJK users: they expect Emacs to select a font that declares support for the character's charset as set by the decoding machinery. > > In general, the way to request that Emacs uses fonts you like with > > certain characters or charsets is by customizing your fontsets. I > > cannot say more without hearing the details. > > I don't have any fontsets customized in my `.emacs' file. Well, it sounds like you should. Emacs chooses fonts using techniques that prefer speed to accuracy, and if that gives suboptimal results, the way to improve them is to guide Emacs by tailoring your fontset to the fonts you have installed and to the visual appearance you happen to like. > >> On the other hand, as soon as the problem happens, it happens with > >> any buffer containing CJK characters not displayable with the > >> current font, so it seems a genuine Emacs core bug. > > > > What "problem" do you allude to here? The first (seemingly > > incorrect encoding) or the second (fallback to this particular > > font)? > > Both. If I open a new file Unicode encoded file, Emacs continues to > use GB18030.2000 as the charset registry/encoding for displaying > fallback characters, failing to convert Unicode to GB18030 before > accessing the characters from the font backend. The former part is not a bug at all. When Emacs needs to display a character that is not supported by the frame's default font, it first tries all the fonts it already has loaded, before it searches the rest of the fonts on your system. So once the GB18030.2000 font is loaded, Emacs will use it for any character not supported by other loaded fonts. Or did I miss something? ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 15:22 ` Eli Zaretskii @ 2018-05-01 19:30 ` Werner LEMBERG 2018-05-02 7:27 ` Werner LEMBERG 2018-05-02 15:22 ` Eli Zaretskii 0 siblings, 2 replies; 23+ messages in thread From: Werner LEMBERG @ 2018-05-01 19:30 UTC (permalink / raw) To: eliz; +Cc: 31315 >> what matters is how the font backend provides the font to the >> client. Calling `xlsfonts' I see that X11 offers access as >> follows. >> >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-1 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-2 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-3 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb18030.2000-0 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb2312.1980-0 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-iso10646-1 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0201.1976-0 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1983-0 >> -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1990-0 > > I think we have a terminology problem here, most probably my fault. > What exactly do you mean when you say "font backend" in this > context? And what is "the client" in this case? OK, sorry. I mean the X11 font backend. Here's my global picture. gb18030 unicode Emacs -----------> xft ------------> DroidSansFallback.ttf For me, Emacs is a client of the xft font interface. In our particular case, xft provides `DroidSansFallback.ttf' to Emacs as a font encoded in GB18030 – Emacs obviously has requested a font in this encoding. Behind the scenes, however, xft communicates with the `DroidSansFallback.ttf' font using Unicode (the font has no other cmap). > If you received a GB18030 encoded email, it is expected that Emacs > will try to find a font that explicitly supports GB18030. > > This is a feature that AFAIU is very important to CJK users: they > expect Emacs to select a font that declares support for the > character's charset as set by the decoding machinery. While this is correct for other CJK encodings like GB, JIS, KSC, or Big5, it is *not* true for GB18030. This is *only* an encoding and *not* a charset! It is simply another representation of Unicode, comparable to UTF-8 or UCS4. There doesn't exist a single font natively encoded in GB18030! This encoding only exists to be code-wise backward compatible with GB 2312. To a certain extent it is valid to assume that a user of GB18030 expects Chinese glyph representation forms for characters in the CJK range. However, since full Unicode is supported, this assumption is rather weak. The X11 interface is too old actually to handle GB18030 correctly. For example, on my GNU/Linux box xft offers the following: -adobe-noto sans cjk jp thin-light-r-normal--0-0-0-0-p-0-gb18030.2000-0 As the `jp' in the name indicates this font contains Japanese glyph representation forms. Since `Noto Sans CJK' provides all CJK glyphs in the BMP, xft happily tags it with GB18030... >> > In general, the way to request that Emacs uses fonts you like >> > with certain characters or charsets is by customizing your >> > fontsets. I cannot say more without hearing the details. >> >> I don't have any fontsets customized in my `.emacs' file. > > Well, it sounds like you should. Emacs chooses fonts using > techniques that prefer speed to accuracy, and if that gives > suboptimal results, the way to improve them is to guide Emacs by > tailoring your fontset to the fonts you have installed and to the > visual appearance you happen to like. For the purpose of reporting this bug I thought it would be best to not use further deviations of `emacs -Q'... >> Both. If I open a new file Unicode encoded file, Emacs continues >> to use GB18030.2000 as the charset registry/encoding for displaying >> fallback characters, failing to convert Unicode to GB18030 before >> accessing the characters from the font backend. > > The former part is not a bug at all. I agree. I only wanted to tell you what I observe. Werner ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 19:30 ` Werner LEMBERG @ 2018-05-02 7:27 ` Werner LEMBERG 2018-05-02 15:22 ` Eli Zaretskii 1 sibling, 0 replies; 23+ messages in thread From: Werner LEMBERG @ 2018-05-02 7:27 UTC (permalink / raw) To: eliz; +Cc: 31315 > To a certain extent it is valid to assume that a user of GB18030 > expects Chinese glyph representation forms for characters in the CJK > range. However, since full Unicode is supported, this assumption is > rather weak. > > The X11 interface is too old actually to handle GB18030 correctly. Oops, please ignore this sentence, which I forgot to delete. > For example, on my GNU/Linux box xft offers the following: > > -adobe-noto sans cjk jp thin-light-r-normal--0-0-0-0-p-0-gb18030.2000-0 > > As the `jp' in the name indicates this font contains Japanese glyph > representation forms. Since `Noto Sans CJK' provides all CJK glyphs > in the BMP, xft happily tags it with GB18030... Werner ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-01 19:30 ` Werner LEMBERG 2018-05-02 7:27 ` Werner LEMBERG @ 2018-05-02 15:22 ` Eli Zaretskii 2018-05-03 5:52 ` Werner LEMBERG 1 sibling, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2018-05-02 15:22 UTC (permalink / raw) To: Werner LEMBERG; +Cc: 31315 > Date: Tue, 01 May 2018 21:30:14 +0200 (CEST) > Cc: handa@gnu.org, 31315@debbugs.gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > I think we have a terminology problem here, most probably my fault. > > What exactly do you mean when you say "font backend" in this > > context? And what is "the client" in this case? > > OK, sorry. I mean the X11 font backend. Here's my global picture. > > gb18030 unicode > Emacs -----------> xft ------------> DroidSansFallback.ttf > > For me, Emacs is a client of the xft font interface. In our > particular case, xft provides `DroidSansFallback.ttf' to Emacs as a > font encoded in GB18030 – Emacs obviously has requested a font in this > encoding. Behind the scenes, however, xft communicates with the > `DroidSansFallback.ttf' font using Unicode (the font has no other > cmap). If by "xft" you mean the part of the X libraries that supports the APIs used by xfont.c, then I think we are on the same page now. > > If you received a GB18030 encoded email, it is expected that Emacs > > will try to find a font that explicitly supports GB18030. > > > > This is a feature that AFAIU is very important to CJK users: they > > expect Emacs to select a font that declares support for the > > character's charset as set by the decoding machinery. > > While this is correct for other CJK encodings like GB, JIS, KSC, or > Big5, it is *not* true for GB18030. This is *only* an encoding and > *not* a charset! It is simply another representation of Unicode, > comparable to UTF-8 or UCS4. There doesn't exist a single font > natively encoded in GB18030! This encoding only exists to be > code-wise backward compatible with GB 2312. Maybe so, but GB18030 is a Chinese encoding, and as such it behaves in Emacs as all the other Chinese encodings. Emacs employs that logic for every charset it has defined, including Latin-2, for example: if text was decoded from an encoding which supports a particular charset, Emacs puts the corresponding 'charset' text property on the decoded text, and the machinery which selects the appropriate font tries first to find a font which supports that charset. The idea is that users in a particular culture have certain distinct preferences wrt fonts, and that an encoding that supports a certain charset or culture provides a hint about those preferences. This idea is very central in how Emacs selects fonts. > To a certain extent it is valid to assume that a user of GB18030 > expects Chinese glyph representation forms for characters in the CJK > range. However, since full Unicode is supported, this assumption is > rather weak. Weak or not, Emacs tries to heed it. > >> I don't have any fontsets customized in my `.emacs' file. > > > > Well, it sounds like you should. Emacs chooses fonts using > > techniques that prefer speed to accuracy, and if that gives > > suboptimal results, the way to improve them is to guide Emacs by > > tailoring your fontset to the fonts you have installed and to the > > visual appearance you happen to like. > > For the purpose of reporting this bug I thought it would be best to > not use further deviations of `emacs -Q'... My comment was not in the context of the bug report (where your assumption is absolutely correct), it is rather a response to your broader complain regarding an ugly font that creeps into display of text which was encoded in GB18030. You can tell Emacs to use other fonts for that charset by customizing your fontset. > >> Both. If I open a new file Unicode encoded file, Emacs continues > >> to use GB18030.2000 as the charset registry/encoding for displaying > >> fallback characters, failing to convert Unicode to GB18030 before > >> accessing the characters from the font backend. > > > > The former part is not a bug at all. > > I agree. I only wanted to tell you what I observe. Well, you called that a "problem". I understand that we now agree the first part is not a problem in itself. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-02 15:22 ` Eli Zaretskii @ 2018-05-03 5:52 ` Werner LEMBERG 2018-05-03 17:48 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Werner LEMBERG @ 2018-05-03 5:52 UTC (permalink / raw) To: eliz; +Cc: 31315 > If by "xft" you mean the part of the X libraries that supports the > APIs used by xfont.c, then I think we are on the same page now. OK. >> While this is correct for other CJK encodings like GB, JIS, KSC, or >> Big5, it is *not* true for GB18030. This is *only* an encoding and >> *not* a charset! It is simply another representation of Unicode, >> comparable to UTF-8 or UCS4. There doesn't exist a single font >> natively encoded in GB18030! This encoding only exists to be >> code-wise backward compatible with GB 2312. > > Maybe so, but GB18030 is a Chinese encoding, and as such it behaves > in Emacs as all the other Chinese encodings. I know, and I agree. BUT! xft doesn't do what Emacs expects. *Any* font that covers the whole BMP (in particular, the whole CJK part of it) gets a `GB18030' tag from xft. In other words, the `Chinese' property isn't in the font from the very beginning.[*] > Emacs employs that logic for every charset it has defined, including > Latin-2, for example: if text was decoded from an encoding which > supports a particular charset, Emacs puts the corresponding > 'charset' text property on the decoded text, and the machinery which > selects the appropriate font tries first to find a font which > supports that charset. The idea is that users in a particular > culture have certain distinct preferences wrt fonts, and that an > encoding that supports a certain charset or culture provides a hint > about those preferences. This idea is very central in how Emacs > selects fonts. Being the FreeType maintainer, and having co-developed Emacs's internal buffer encoding scheme many, many years ago, I all know this. I can only repeat that Emacs might tag a certain text with GB18030 so that the user can deduce a Chinese origin. However, there is *no* guarantee that the user gets a Chinese-flavoured font – at least not from the xft interface.[**] As a corollary, it is fully sufficient for xft to handle GB18030 equal to Unicode (i.e., `iso10646'). Werner [*] Actually, having Unicode fonts that provide CJK glyphs for the whole BMP completely spoils Emacs's font selection scheme based on charsets – as shown in one of my previous e-mails, xft provides all common CJK encodings for such fonts because Unicode is a superset of those encodings. [**] If, say, the Pango font interface is used instead to access a modern CJK OpenType font, Emacs might request `script=hani, lang=ZHS' if it encounters GB18030 to resolve Unicode's Han unification, ensuring simplified Chinese glyph representation forms. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-03 5:52 ` Werner LEMBERG @ 2018-05-03 17:48 ` Eli Zaretskii 2018-05-03 19:05 ` Werner LEMBERG 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2018-05-03 17:48 UTC (permalink / raw) To: Werner LEMBERG; +Cc: 31315 > Date: Thu, 03 May 2018 07:52:27 +0200 (CEST) > Cc: handa@gnu.org, 31315@debbugs.gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > Emacs employs that logic for every charset it has defined, including > > Latin-2, for example: if text was decoded from an encoding which > > supports a particular charset, Emacs puts the corresponding > > 'charset' text property on the decoded text, and the machinery which > > selects the appropriate font tries first to find a font which > > supports that charset. The idea is that users in a particular > > culture have certain distinct preferences wrt fonts, and that an > > encoding that supports a certain charset or culture provides a hint > > about those preferences. This idea is very central in how Emacs > > selects fonts. > > Being the FreeType maintainer, and having co-developed Emacs's > internal buffer encoding scheme many, many years ago, I all know this. Sorry, I couldn't know that. > I can only repeat that Emacs might tag a certain text with GB18030 so > that the user can deduce a Chinese origin. However, there is *no* > guarantee that the user gets a Chinese-flavoured font – at least not > from the xft interface.[**] IME, there's no guarantee about anything in the Emacs font look up heuristics, except that empirically it does TRT in about 85% of uses. May I invite you to work on revisiting the design and implementation of the Emacs font-look up facilities, and on modernizing them? I'm afraid we didn't have an active developer in this area for several years, and I fear that we will stagnate (or already are stagnating). TIA ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-03 17:48 ` Eli Zaretskii @ 2018-05-03 19:05 ` Werner LEMBERG 2018-05-03 19:59 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Werner LEMBERG @ 2018-05-03 19:05 UTC (permalink / raw) To: eliz; +Cc: 31315 >> I can only repeat that Emacs might tag a certain text with GB18030 >> so that the user can deduce a Chinese origin. However, there is >> *no* guarantee that the user gets a Chinese-flavoured font – at >> least not from the xft interface.[**] > > IME, there's no guarantee about anything in the Emacs font look up > heuristics, except that empirically it does TRT in about 85% of > uses. If you don't install pan-CJK Unicode fonts, I fully agree :-) > May I invite you to work on revisiting the design and implementation > of the Emacs font-look up facilities, and on modernizing them? I'm > afraid we didn't have an active developer in this area for several > years, and I fear that we will stagnate (or already are stagnating). Alas, my Elisp knowledge is ... well ... not impressive. Basically, I'm just an Emacs user, not an Emacs developer. With Xft, there is no possibility for improvement IMHO. The probably best choice is to switch to Pango for font access (in case you don't do that already). An Emacs charset essentially triggers a language and script setting, and those two parameters can be passed to Pango, AFAIK. Werner ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-03 19:05 ` Werner LEMBERG @ 2018-05-03 19:59 ` Eli Zaretskii 2018-05-04 5:11 ` Werner LEMBERG 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2018-05-03 19:59 UTC (permalink / raw) To: Werner LEMBERG; +Cc: 31315 > Date: Thu, 03 May 2018 21:05:28 +0200 (CEST) > Cc: handa@gnu.org, 31315@debbugs.gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > May I invite you to work on revisiting the design and implementation > > of the Emacs font-look up facilities, and on modernizing them? I'm > > afraid we didn't have an active developer in this area for several > > years, and I fear that we will stagnate (or already are stagnating). > > Alas, my Elisp knowledge is ... well ... not impressive. Basically, > I'm just an Emacs user, not an Emacs developer. Almost all of the relevant code is in C, not in Lisp. > With Xft, there is no possibility for improvement IMHO. The probably > best choice is to switch to Pango for font access (in case you don't > do that already). There are several back-ends besides Xft, the most advanced being xftfont.c. They all didn't see any serious development for the past several years. And yes, acquiring new back-ends is also a worthy goal. All of that requires a level of expertise that IMO we currently don't have. > An Emacs charset essentially triggers a language and script setting, > and those two parameters can be passed to Pango, AFAIK. Emacs handles charsets and scripts separately. A script is matched against OTF/TTF features of fonts to make sure the necessary shaping features required by a script are supported. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-03 19:59 ` Eli Zaretskii @ 2018-05-04 5:11 ` Werner LEMBERG 2018-05-04 13:05 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Werner LEMBERG @ 2018-05-04 5:11 UTC (permalink / raw) To: eliz; +Cc: 31315 >> Alas, my Elisp knowledge is ... well ... not impressive. >> Basically, I'm just an Emacs user, not an Emacs developer. > > Almost all of the relevant code is in C, not in Lisp. OK. >> With Xft, there is no possibility for improvement IMHO. The >> probably best choice is to switch to Pango for font access (in case >> you don't do that already). > > There are several back-ends besides Xft, the most advanced being > xftfont.c. They all didn't see any serious development for the past > several years. And yes, acquiring new back-ends is also a worthy > goal. > > All of that requires a level of expertise that IMO we currently > don't have. What I can certainly offer is advice. However, I'm not sure to be right person for working on the innards of Emacs, given the usual lack of time (sigh). >> An Emacs charset essentially triggers a language and script >> setting, and those two parameters can be passed to Pango, AFAIK. > > Emacs handles charsets and scripts separately. A script is matched > against OTF/TTF features of fonts to make sure the necessary shaping > features required by a script are supported. Interesting. Access to OpenType features is exactly what's needed to improve font display for selected charsets. Where can I find the related code in Emacs? Additionally, I suggest that the Emacs maintainers set up a GSoC project, namely to improve font rendering. This is a broad topic, which could be further split into smaller subprojects. Emacs uses Handa-san's libotf library (are there any other projects that use this library?), but AFAICS it doesn't receive a lot of testing. On the other hand, there is Behdad Esfahbod's `HarfBuzz' shaping engine that comes with a large suite of tests. One of such subprojects could be to take these tests and use them to improve libotf, especially for Indic scripts. https://www.freedesktop.org/wiki/Software/HarfBuzz/ Werner ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#31315: wrong font encoding for fallback font 2018-05-04 5:11 ` Werner LEMBERG @ 2018-05-04 13:05 ` Eli Zaretskii 0 siblings, 0 replies; 23+ messages in thread From: Eli Zaretskii @ 2018-05-04 13:05 UTC (permalink / raw) To: Werner LEMBERG; +Cc: 31315 > Date: Fri, 04 May 2018 07:11:37 +0200 (CEST) > Cc: handa@gnu.org, 31315@debbugs.gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > Emacs handles charsets and scripts separately. A script is matched > > against OTF/TTF features of fonts to make sure the necessary shaping > > features required by a script are supported. > > Interesting. Access to OpenType features is exactly what's needed to > improve font display for selected charsets. Where can I find the > related code in Emacs? Emacs delegates that to the font back-end. E.g., in font_match_p you will see that if certain 'otf' capabilities are required, Emacs calls the otf_capability method of the font driver. The ftfont.c driver implements that as ftfont_otf_capability. And which OTF capabilities are required for what scripts is set up in the fontsets by fontest.el. > Additionally, I suggest that the Emacs maintainers set up a GSoC > project, namely to improve font rendering. This is a broad topic, > which could be further split into smaller subprojects. Good idea. > Emacs uses Handa-san's libotf library (are there any other projects > that use this library?), but AFAICS it doesn't receive a lot of > testing. On the other hand, there is Behdad Esfahbod's `HarfBuzz' > shaping engine that comes with a large suite of tests. One of such > subprojects could be to take these tests and use them to improve > libotf, especially for Indic scripts. > > https://www.freedesktop.org/wiki/Software/HarfBuzz/ I think a better development would be to teach Emacs to use HarfBuzz as its shaping engine. HarfBuzz is available on many platforms, and is AFAIK actively developed, so we will gain better shaper and advanced features if we do that. Once again, such a project needs a motivated volunteer to carry it out. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2018-05-05 8:57 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-04-30 7:21 bug#31315: wrong font encoding for fallback font Werner LEMBERG 2018-04-30 15:13 ` Eli Zaretskii 2018-04-30 15:42 ` Andreas Schwab 2018-04-30 19:26 ` Eli Zaretskii 2018-04-30 20:03 ` Andreas Schwab 2018-05-01 2:37 ` Eli Zaretskii 2018-05-01 6:47 ` Werner LEMBERG 2018-05-01 8:13 ` Andreas Schwab 2018-05-01 9:11 ` Werner LEMBERG 2018-05-01 15:00 ` Eli Zaretskii 2018-05-01 17:42 ` Andreas Schwab 2018-05-05 8:57 ` Eli Zaretskii 2018-05-01 6:36 ` Werner LEMBERG 2018-05-01 15:22 ` Eli Zaretskii 2018-05-01 19:30 ` Werner LEMBERG 2018-05-02 7:27 ` Werner LEMBERG 2018-05-02 15:22 ` Eli Zaretskii 2018-05-03 5:52 ` Werner LEMBERG 2018-05-03 17:48 ` Eli Zaretskii 2018-05-03 19:05 ` Werner LEMBERG 2018-05-03 19:59 ` Eli Zaretskii 2018-05-04 5:11 ` Werner LEMBERG 2018-05-04 13:05 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).