unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#31315: wrong font encoding for fallback font
@ 2018-04-30  7:21 Werner LEMBERG
  2018-04-30 15:13 ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Werner LEMBERG @ 2018-04-30  7:21 UTC (permalink / raw)
  To: 31315

[-- Attachment #1: Type: Text/Plain, Size: 5948 bytes --]



The attached image shows that some CJK characters are displayed
incorrectly.  For the used outline font Emacs reports

  xft:-PfEd-AR PL UKai TW MBE-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1

One character (the highlighted one) is missing in this font, and Emacs
uses a different font as a fallback:

  x:-misc-droid sans fallback-medium-r-normal--18-130-100-100-p-179-gb18030.2000-0

Note that the different font backend seems to produce the ugly
rendering; the font in question is the outline font
`DroidSansFallbackFull.ttf'.

The problem now is that the encoding of the fallback font is not
respected.  In the image, the highlighted character is U+83EF, but
Emacs incorrectly displays U+51BF instead.

The GB 18030 bytes to represent U+51BF are \x83\xEF; this clearly
shows that Emacs lacks an iconv call (or an equivalent to that);
instead, it seems to simply feed the Unicode value to the font
backend.

                        *   *   *

It's a completely different question why on my system Emacs uses a
font encoded in GB 18030 as a fallback font.  It's probably related to
the fact that I use `mew' as my e-mail program, manually extended to
cover GB 18030.  Unfortunately, I wasn't able yet to trigger the issue
with `emacs -Q' (which by default uses iso10646 for the fallback
font).  On the other hand, as soon as the problem happens, it happens
with any buffer containing CJK characters not displayable with the
current font, so it seems a genuine Emacs core bug.



In GNU Emacs 27.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.20.10)
 of 2018-02-12 built on linux
Repository revision: 3a718ffca097b35218c3e041a94adff937f3052f
Windowing system distributor 'The X.Org Foundation', version 11.0.11803000
System Description: openSUSE Leap 42.3

Recent messages:
Wrote /home/wl/Mail/draft/2
Draft is prepared
Sole completion
next-line: End of buffer [3 times]
Kill draft message? (y or n) y
Saving file /home/wl/Mail/draft/2...
Wrote /home/wl/Mail/draft/2
Draft was killed
Quit
Type C-x 1 to remove help window.  

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS NOTIFY
GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS
GTK3 X11 THREADS LIBSYSTEMD JSON LCMS2

Important settings:
  value of $LANG: de_AT.UTF-8
  value of $XMODIFIERS: @im=none
  locale-coding-system: utf-8-unix

Major mode: Summary

Minor modes in effect:
  shell-dirtrack-mode: t
  TeX-PDF-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  transient-mark-mode: t

Load-path shadows:
/usr/local/share/emacs/site-lisp/thai-word hides /usr/local/share/emacs/27.0.50/lisp/language/thai-word

Features:
(shadow emacsbug message dired dired-loaddefs format-spec rfc822 mml
mml-sec epa derived epg gnus-util rmail rmail-loaddefs mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mail-utils preview prv-emacs
noutline outline tex-jp wid-edit descr-text apropos font-latex
plain-tex tex-buf latex easy-mmode tex-ispell tex-style tex-mode
compile shell pcomplete comint ansi-color ring latexenc misearch
multi-isearch shr-color color shr svg browse-url network-stream puny
nsm rmc starttls tls gnutls qp pp mew-varsx mew-unix elec-pair edmacro
kmacro rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse
rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode nxml-outln
nxml-rap nxml-util nxml-enc xmltok sgml-mode dom hideshow cal-menu
calendar cal-loaddefs mew-auth mew-config mew-imap2 mew-imap mew-nntp2
mew-nntp mew-pop mew-smtp mew-ssl mew-ssh mew-net mew-highlight
mew-sort mew-fib mew-ext mew-refile mew-demo mew-attach mew-draft
mew-message mew-thread mew-virtual mew-summary4 mew-summary3
mew-summary2 mew-summary mew-search mew-pick mew-passwd mew-scan
mew-syntax mew-bq mew-smime mew-pgp mew-header mew-exec mew-mark
mew-mime mew-edit mew-decode mew-encode mew-cache mew-minibuf
mew-complete mew-addrbook mew-local mew-vars3 mew-vars2 mew-vars
mew-env mew-mule3 mew-mule mew-gemacs mew-key mew-func mew-blvs
mew-const mew tex dbus xml crm advice auto-loads tex-site quail
help-mode cjktilde mm-util mail-prsvr disp-table finder-inf package
easymenu epg-config url-handlers url-parse auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map url-vars seq
byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib time-date
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham
georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript
charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray
minibuffer cl-preloaded nadvice loaddefs button faces cus-face
macroexp files text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget hashtable-print-readable backquote
dbusbind inotify lcms2 dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 16 454688 29723)
 (symbols 48 49126 5)
 (miscs 40 4726 718)
 (strings 32 150458 3023)
 (string-bytes 1 3707077)
 (vectors 16 46771)
 (vector-slots 8 2048741 170478)
 (floats 8 303 739)
 (intervals 56 66880 624)
 (buffers 992 20)
 (heap 1024 70029 4352))

[-- Attachment #2: emacs.png --]
[-- Type: Image/Png, Size: 79126 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-04-30  7:21 bug#31315: wrong font encoding for fallback font Werner LEMBERG
@ 2018-04-30 15:13 ` Eli Zaretskii
  2018-04-30 15:42   ` Andreas Schwab
  2018-05-01  6:36   ` Werner LEMBERG
  0 siblings, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2018-04-30 15:13 UTC (permalink / raw)
  To: Werner LEMBERG, Kenichi Handa; +Cc: 31315

> Date: Mon, 30 Apr 2018 09:21:06 +0200 (CEST)
> From: Werner LEMBERG <wl@gnu.org>

(Adding Handa-san to the discussion in the hope that he might have
some comments.)

> The attached image shows that some CJK characters are displayed
> incorrectly.  For the used outline font Emacs reports
> 
>   xft:-PfEd-AR PL UKai TW MBE-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
> 
> One character (the highlighted one) is missing in this font, and Emacs
> uses a different font as a fallback:
> 
>   x:-misc-droid sans fallback-medium-r-normal--18-130-100-100-p-179-gb18030.2000-0
> 
> Note that the different font backend seems to produce the ugly
> rendering; the font in question is the outline font
> `DroidSansFallbackFull.ttf'.

That's why it is fallback, I guess... ;-)

And I think you might be mistaken in your interpretation of what
"gb18030.2000" in the font name means: I think it's the font registry,
not its encoding.  How sure are you that the encoding of this font is
indeed gb18030.2000?  (I'm not an expert on this stuff.)

> The problem now is that the encoding of the fallback font is not
> respected.  In the image, the highlighted character is U+83EF, but
> Emacs incorrectly displays U+51BF instead.
> 
> The GB 18030 bytes to represent U+51BF are \x83\xEF; this clearly
> shows that Emacs lacks an iconv call (or an equivalent to that);
> instead, it seems to simply feed the Unicode value to the font
> backend.

Tz-tz-tz, how can you even suggest something like that about Emacs ;-)

If you look in xfont_encode_char, you will see that it does encode the
character before handing it to the font-drawing function.  But I see
that font-encoding-alist has this to say about gb18030:

 ("gb18030" unicode)

Does replacing that with something like this:

 ("gb18030" (gb18030 . unicode))

solve the problem?

What we put in font-encoding-alist now was a deliberate change in Jan
2008, in response to a bug report; see

  http://lists.gnu.org/archive/html/emacs-devel/2008-01/msg00754.html

If fonts like this one need to have characters encoded by gb18030,
then I think we need to change what the value says.  But this area in
Emacs is under-documented, so I'm not sure I've got it right, in
particular what is the effect of ENCODING and REPERTORY in this
context.  For most font back-ends, ENCODING is ignored, because the
back-end is capable to encode the character we hand to it.  But the
xfont back-end indeed uses Emacs's encoding functions to do that
externally to the corresponding X APIs.  Which might explain why this
problem, if indeed we fail to specify the correct encoding for this
charset, was never reported till now: xfont is rarely if ever used.

> It's a completely different question why on my system Emacs uses a
> font encoded in GB 18030 as a fallback font.  It's probably related to
> the fact that I use `mew' as my e-mail program, manually extended to
> cover GB 18030.  Unfortunately, I wasn't able yet to trigger the issue
> with `emacs -Q' (which by default uses iso10646 for the fallback
> font).

Well, we cannot try helping you to unlock this unless you tell how you
"manually extended" Emacs.  In general, the way to request that Emacs
uses fonts you like with certain characters or charsets is by
customizing your fontsets.  I cannot say more without hearing the
details.

> On the other hand, as soon as the problem happens, it happens
> with any buffer containing CJK characters not displayable with the
> current font, so it seems a genuine Emacs core bug.

What "problem" do you allude to here?  The first (seemingly incorrect
encoding) or the second (fallback to this particular font)?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-04-30 15:13 ` Eli Zaretskii
@ 2018-04-30 15:42   ` Andreas Schwab
  2018-04-30 19:26     ` Eli Zaretskii
  2018-05-01  6:36   ` Werner LEMBERG
  1 sibling, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2018-04-30 15:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 31315

On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote:

> If you look in xfont_encode_char, you will see that it does encode the
> character before handing it to the font-drawing function.  But I see
> that font-encoding-alist has this to say about gb18030:
>
>  ("gb18030" unicode)
>
> Does replacing that with something like this:
>
>  ("gb18030" (gb18030 . unicode))
>
> solve the problem?

I think it should be gb18030-2-byte, both as encoding and repertory.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-04-30 15:42   ` Andreas Schwab
@ 2018-04-30 19:26     ` Eli Zaretskii
  2018-04-30 20:03       ` Andreas Schwab
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2018-04-30 19:26 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 31315

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Werner LEMBERG <wl@gnu.org>,  Kenichi Handa <handa@gnu.org>,  31315@debbugs.gnu.org
> Date: Mon, 30 Apr 2018 17:42:50 +0200
> 
> On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > Does replacing that with something like this:
> >
> >  ("gb18030" (gb18030 . unicode))
> >
> > solve the problem?
> 
> I think it should be gb18030-2-byte, both as encoding and repertory.

Well, if we want to limit the repertory, then why not
gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030?

But I agree that it would be good to try that, and maybe both.

Thanks.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-04-30 19:26     ` Eli Zaretskii
@ 2018-04-30 20:03       ` Andreas Schwab
  2018-05-01  2:37         ` Eli Zaretskii
  2018-05-01  6:47         ` Werner LEMBERG
  0 siblings, 2 replies; 23+ messages in thread
From: Andreas Schwab @ 2018-04-30 20:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 31315

On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: Werner LEMBERG <wl@gnu.org>,  Kenichi Handa <handa@gnu.org>,  31315@debbugs.gnu.org
>> Date: Mon, 30 Apr 2018 17:42:50 +0200
>> 
>> On Apr 30 2018, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>> > Does replacing that with something like this:
>> >
>> >  ("gb18030" (gb18030 . unicode))
>> >
>> > solve the problem?
>> 
>> I think it should be gb18030-2-byte, both as encoding and repertory.
>
> Well, if we want to limit the repertory, then why not
> gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030?

All gb18030 encoded fonts I found only provide 2-byte codes.  That may
be an inherent limitation.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-04-30 20:03       ` Andreas Schwab
@ 2018-05-01  2:37         ` Eli Zaretskii
  2018-05-01  6:47         ` Werner LEMBERG
  1 sibling, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-01  2:37 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 31315

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: wl@gnu.org,  handa@gnu.org,  31315@debbugs.gnu.org
> Date: Mon, 30 Apr 2018 22:03:42 +0200
> 
> >> > Does replacing that with something like this:
> >> >
> >> >  ("gb18030" (gb18030 . unicode))
> >> >
> >> > solve the problem?
> >> 
> >> I think it should be gb18030-2-byte, both as encoding and repertory.
> >
> > Well, if we want to limit the repertory, then why not
> > gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030?
> 
> All gb18030 encoded fonts I found only provide 2-byte codes.  That may
> be an inherent limitation.

Thanks.

Werner, can you try that, please?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-04-30 15:13 ` Eli Zaretskii
  2018-04-30 15:42   ` Andreas Schwab
@ 2018-05-01  6:36   ` Werner LEMBERG
  2018-05-01 15:22     ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-01  6:36 UTC (permalink / raw)
  To: eliz; +Cc: 31315


> And I think you might be mistaken in your interpretation of what
> "gb18030.2000" in the font name means: I think it's the font registry,
> not its encoding.

Yes, but the font registry implies the used encoding to access the
font.

> How sure are you that the encoding of this font is indeed
> gb18030.2000?

Quite sure.  To be more precise: The real encoding of the font is
irrelevant (the Droid Sans Fallback font is a standard TrueType font
that has only a Unicode cmap); what matters is how the font backend
provides the font to the client.  Calling `xlsfonts' I see that X11
offers access as follows.

  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-1
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-2
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-3
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb18030.2000-0
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb2312.1980-0
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-iso10646-1
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0201.1976-0
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1983-0
  -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1990-0

>> The problem now is that the encoding of the fallback font is not
>> respected.  In the image, the highlighted character is U+83EF, but
>> Emacs incorrectly displays U+51BF instead.
>>
>> The GB 18030 bytes to represent U+51BF are \x83\xEF; this clearly
>> shows that Emacs lacks an iconv call (or an equivalent to that);
>> instead, it seems to simply feed the Unicode value to the font
>> backend.
>
> Tz-tz-tz, how can you even suggest something like that about Emacs ;-)
>
> If you look in xfont_encode_char, you will see that it does encode
> the character before handing it to the font-drawing function.  But I
> see that font-encoding-alist has this to say about gb18030:
>
>  ("gb18030" unicode)
>
> Does replacing that with something like this:
>
>  ("gb18030" (gb18030 . unicode))
>
> solve the problem?

Yes, it seems so.

> What we put in font-encoding-alist now was a deliberate change in
> Jan 2008, in response to a bug report; see
>
>   http://lists.gnu.org/archive/html/emacs-devel/2008-01/msg00754.html
>
> If fonts like this one need to have characters encoded by gb18030,
> then I think we need to change what the value says.

As can be seen above, the font itself doesn't need GB18030.  It's the
font backend that provides this encoding, and Emacs accesses it.

> But this area in Emacs is under-documented, so I'm not sure I've
> got it right, in particular what is the effect of ENCODING and
> REPERTORY in this context.  For most font back-ends, ENCODING is
> ignored, because the back-end is capable to encode the character we
> hand to it.  But the xfont back-end indeed uses Emacs's encoding
> functions to do that externally to the corresponding X APIs.  Which
> might explain why this problem, if indeed we fail to specify the
> correct encoding for this charset, was never reported till now:
> xfont is rarely if ever used.

Emacs doesn't fail to specify the correct encoding.  The problem is
that it feeds the font backend with characters in the wrong encoding
(namely Unicode instead of GB 18030).

>> It's a completely different question why on my system Emacs uses a
>> font encoded in GB 18030 as a fallback font.  It's probably related
>> to the fact that I use `mew' as my e-mail program, manually
>> extended to cover GB 18030.  Unfortunately, I wasn't able yet to
>> trigger the issue with `emacs -Q' (which by default uses iso10646
>> for the fallback font).
>
> Well, we cannot try helping you to unlock this unless you tell how
> you "manually extended" Emacs.

Oh, I haven't extended Emacs, sorry for the bad wording.  I've simply
added a line to mew's elisp code to make it recognize GB18030 in
e-mails.

> In general, the way to request that Emacs uses fonts you like with
> certain characters or charsets is by customizing your fontsets.  I
> cannot say more without hearing the details.

I don't have any fontsets customized in my `.emacs' file.

>> On the other hand, as soon as the problem happens, it happens with
>> any buffer containing CJK characters not displayable with the
>> current font, so it seems a genuine Emacs core bug.
>
> What "problem" do you allude to here?  The first (seemingly
> incorrect encoding) or the second (fallback to this particular
> font)?

Both.  If I open a new file Unicode encoded file, Emacs continues to
use GB18030.2000 as the charset registry/encoding for displaying
fallback characters, failing to convert Unicode to GB18030 before
accessing the characters from the font backend.


    Werner





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-04-30 20:03       ` Andreas Schwab
  2018-05-01  2:37         ` Eli Zaretskii
@ 2018-05-01  6:47         ` Werner LEMBERG
  2018-05-01  8:13           ` Andreas Schwab
  1 sibling, 1 reply; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-01  6:47 UTC (permalink / raw)
  To: schwab; +Cc: 31315


>>> I think it should be gb18030-2-byte, both as encoding and repertory.
>>
>> Well, if we want to limit the repertory, then why not
>> gb18030-4-byte-bmp, which AFAIU is the mandatory part of gb18030?
> 
> All gb18030 encoded fonts I found only provide 2-byte codes.

You actually have found a font encoded in GB18030?  Which one?  I have
never seen that.

> That may be an inherent limitation.

The limitation is not 2-byte codes but the number of glyphs in a font.
An .otf or .ttf font can't hold more than 2^16 glyphs.  It is thus
common to have separate fonts for accessing glyphs from U+10000 and
higher.


    Werner





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01  6:47         ` Werner LEMBERG
@ 2018-05-01  8:13           ` Andreas Schwab
  2018-05-01  9:11             ` Werner LEMBERG
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2018-05-01  8:13 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: 31315

On Mai 01 2018, Werner LEMBERG <wl@gnu.org> wrote:

> You actually have found a font encoded in GB18030?  Which one?  I have
> never seen that.

The same that you have plus two others.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01  8:13           ` Andreas Schwab
@ 2018-05-01  9:11             ` Werner LEMBERG
  2018-05-01 15:00               ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-01  9:11 UTC (permalink / raw)
  To: schwab; +Cc: 31315


>> You actually have found a font encoded in GB18030?  Which one?  I have
>> never seen that.
> 
> The same that you have plus two others.

But those fonts are *not* encoded in GB18030 at all.  It is the X11
font interface that provides GB18030 encoding access.

BTW, looking into my `/usr/share/fonts/encodings/large' directory, I
see a file `gb18030.2000-0.enc.gz', which contains only two-byte
entries.  In other words, it omits all three- and four-byte entries,
thus covering only a subset of GB18030.


    Werner





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01  9:11             ` Werner LEMBERG
@ 2018-05-01 15:00               ` Eli Zaretskii
  2018-05-01 17:42                 ` Andreas Schwab
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-01 15:00 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: 31315, schwab

> Date: Tue, 01 May 2018 11:11:30 +0200 (CEST)
> Cc: eliz@gnu.org, handa@gnu.org, 31315@debbugs.gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> 
> >> You actually have found a font encoded in GB18030?  Which one?  I have
> >> never seen that.
> > 
> > The same that you have plus two others.
> 
> But those fonts are *not* encoded in GB18030 at all.  It is the X11
> font interface that provides GB18030 encoding access.
> 
> BTW, looking into my `/usr/share/fonts/encodings/large' directory, I
> see a file `gb18030.2000-0.enc.gz', which contains only two-byte
> entries.  In other words, it omits all three- and four-byte entries,
> thus covering only a subset of GB18030.

Andreas, given that '("gb18030" (gb18030 . unicode))' appears to work,
do you see any potential problems with using it, instead of
gb18030-2-byte?

Thanks.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01  6:36   ` Werner LEMBERG
@ 2018-05-01 15:22     ` Eli Zaretskii
  2018-05-01 19:30       ` Werner LEMBERG
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-01 15:22 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: 31315

> Date: Tue, 01 May 2018 08:36:44 +0200 (CEST)
> Cc: handa@gnu.org, 31315@debbugs.gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> > And I think you might be mistaken in your interpretation of what
> > "gb18030.2000" in the font name means: I think it's the font registry,
> > not its encoding.
> 
> Yes, but the font registry implies the used encoding to access the
> font.

Having said that, you seem to contradict yourself right away:

> The real encoding of the font is irrelevant (the Droid Sans Fallback
> font is a standard TrueType font that has only a Unicode cmap);

So I still think we may be miscommunicating.

> what matters is how the font backend provides the font to the
> client.  Calling `xlsfonts' I see that X11 offers access as follows.
> 
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-1
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-2
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-3
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb18030.2000-0
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb2312.1980-0
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-iso10646-1
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0201.1976-0
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1983-0
>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1990-0

I think we have a terminology problem here, most probably my fault.
What exactly do you mean when you say "font backend" in this context?
And what is "the client" in this case?

I'm afraid using xlsfonts doesn't help me understand what am I
missing, because I have only a vague idea of what that command does,
beyond the basic fact that it lists fonts.

> > What we put in font-encoding-alist now was a deliberate change in
> > Jan 2008, in response to a bug report; see
> >
> >   http://lists.gnu.org/archive/html/emacs-devel/2008-01/msg00754.html
> >
> > If fonts like this one need to have characters encoded by gb18030,
> > then I think we need to change what the value says.
> 
> As can be seen above, the font itself doesn't need GB18030.  It's the
> font backend that provides this encoding, and Emacs accesses it.

In my terminology, "font backend" is in Emacs (xfont.c, xftfont.c,
etc.), and the encoding happens in the backend, guided by
font-encoding-alist, among other things.  And your OP vs the
experiment with changing font-encoding-alist clearly shows that
encoding characters correctly for the xfont backend _is_ required to
display the correct glyphs with fonts handled by that backend.

> > But this area in Emacs is under-documented, so I'm not sure I've
> > got it right, in particular what is the effect of ENCODING and
> > REPERTORY in this context.  For most font back-ends, ENCODING is
> > ignored, because the back-end is capable to encode the character we
> > hand to it.  But the xfont back-end indeed uses Emacs's encoding
> > functions to do that externally to the corresponding X APIs.  Which
> > might explain why this problem, if indeed we fail to specify the
> > correct encoding for this charset, was never reported till now:
> > xfont is rarely if ever used.
> 
> Emacs doesn't fail to specify the correct encoding.  The problem is
> that it feeds the font backend with characters in the wrong encoding
> (namely Unicode instead of GB 18030).

"Fails to specify the correct encoding" is the reason why it uses
wrong encoding for the characters in the font backend xfont.c.  I
believe this is again a terminology problem.

> >> It's a completely different question why on my system Emacs uses a
> >> font encoded in GB 18030 as a fallback font.  It's probably related
> >> to the fact that I use `mew' as my e-mail program, manually
> >> extended to cover GB 18030.  Unfortunately, I wasn't able yet to
> >> trigger the issue with `emacs -Q' (which by default uses iso10646
> >> for the fallback font).
> >
> > Well, we cannot try helping you to unlock this unless you tell how
> > you "manually extended" Emacs.
> 
> Oh, I haven't extended Emacs, sorry for the bad wording.  I've simply
> added a line to mew's elisp code to make it recognize GB18030 in
> e-mails.

If you received a GB18030 encoded email, it is expected that Emacs
will try to find a font that explicitly supports GB18030.  This is a
feature that AFAIU is very important to CJK users: they expect Emacs
to select a font that declares support for the character's charset as
set by the decoding machinery.

> > In general, the way to request that Emacs uses fonts you like with
> > certain characters or charsets is by customizing your fontsets.  I
> > cannot say more without hearing the details.
> 
> I don't have any fontsets customized in my `.emacs' file.

Well, it sounds like you should.  Emacs chooses fonts using techniques
that prefer speed to accuracy, and if that gives suboptimal results,
the way to improve them is to guide Emacs by tailoring your fontset to
the fonts you have installed and to the visual appearance you happen
to like.

> >> On the other hand, as soon as the problem happens, it happens with
> >> any buffer containing CJK characters not displayable with the
> >> current font, so it seems a genuine Emacs core bug.
> >
> > What "problem" do you allude to here?  The first (seemingly
> > incorrect encoding) or the second (fallback to this particular
> > font)?
> 
> Both.  If I open a new file Unicode encoded file, Emacs continues to
> use GB18030.2000 as the charset registry/encoding for displaying
> fallback characters, failing to convert Unicode to GB18030 before
> accessing the characters from the font backend.

The former part is not a bug at all.  When Emacs needs to display a
character that is not supported by the frame's default font, it first
tries all the fonts it already has loaded, before it searches the rest
of the fonts on your system.  So once the GB18030.2000 font is loaded,
Emacs will use it for any character not supported by other loaded
fonts.  Or did I miss something?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01 15:00               ` Eli Zaretskii
@ 2018-05-01 17:42                 ` Andreas Schwab
  2018-05-05  8:57                   ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2018-05-01 17:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 31315

On Mai 01 2018, Eli Zaretskii <eliz@gnu.org> wrote:

> Andreas, given that '("gb18030" (gb18030 . unicode))' appears to work,
> do you see any potential problems with using it, instead of
> gb18030-2-byte?

I leave that question to those who are more knowledgeable of the X font
interface.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01 15:22     ` Eli Zaretskii
@ 2018-05-01 19:30       ` Werner LEMBERG
  2018-05-02  7:27         ` Werner LEMBERG
  2018-05-02 15:22         ` Eli Zaretskii
  0 siblings, 2 replies; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-01 19:30 UTC (permalink / raw)
  To: eliz; +Cc: 31315

>> what matters is how the font backend provides the font to the
>> client.  Calling `xlsfonts' I see that X11 offers access as
>> follows.
>>
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-1
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-2
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-3
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb18030.2000-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb2312.1980-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-iso10646-1
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0201.1976-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1983-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1990-0
>
> I think we have a terminology problem here, most probably my fault.
> What exactly do you mean when you say "font backend" in this
> context?  And what is "the client" in this case?

OK, sorry.  I mean the X11 font backend.  Here's my global picture.

          gb18030               unicode
 Emacs  ----------->   xft   ------------>  DroidSansFallback.ttf

For me, Emacs is a client of the xft font interface.  In our
particular case, xft provides `DroidSansFallback.ttf' to Emacs as a
font encoded in GB18030 – Emacs obviously has requested a font in this
encoding.  Behind the scenes, however, xft communicates with the
`DroidSansFallback.ttf' font using Unicode (the font has no other
cmap).

> If you received a GB18030 encoded email, it is expected that Emacs
> will try to find a font that explicitly supports GB18030.
>
> This is a feature that AFAIU is very important to CJK users: they
> expect Emacs to select a font that declares support for the
> character's charset as set by the decoding machinery.

While this is correct for other CJK encodings like GB, JIS, KSC, or
Big5, it is *not* true for GB18030.  This is *only* an encoding and
*not* a charset!  It is simply another representation of Unicode,
comparable to UTF-8 or UCS4.  There doesn't exist a single font
natively encoded in GB18030!  This encoding only exists to be
code-wise backward compatible with GB 2312.

To a certain extent it is valid to assume that a user of GB18030
expects Chinese glyph representation forms for characters in the CJK
range.  However, since full Unicode is supported, this assumption is
rather weak.

The X11 interface is too old actually to handle GB18030 correctly.
For example, on my GNU/Linux box xft offers the following:

  -adobe-noto sans cjk jp thin-light-r-normal--0-0-0-0-p-0-gb18030.2000-0

As the `jp' in the name indicates this font contains Japanese glyph
representation forms.  Since `Noto Sans CJK' provides all CJK glyphs
in the BMP, xft happily tags it with GB18030...

>> > In general, the way to request that Emacs uses fonts you like
>> > with certain characters or charsets is by customizing your
>> > fontsets.  I cannot say more without hearing the details.
>>
>> I don't have any fontsets customized in my `.emacs' file.
>
> Well, it sounds like you should.  Emacs chooses fonts using
> techniques that prefer speed to accuracy, and if that gives
> suboptimal results, the way to improve them is to guide Emacs by
> tailoring your fontset to the fonts you have installed and to the
> visual appearance you happen to like.

For the purpose of reporting this bug I thought it would be best to
not use further deviations of `emacs -Q'...

>> Both.  If I open a new file Unicode encoded file, Emacs continues
>> to use GB18030.2000 as the charset registry/encoding for displaying
>> fallback characters, failing to convert Unicode to GB18030 before
>> accessing the characters from the font backend.
>
> The former part is not a bug at all.

I agree.  I only wanted to tell you what I observe.


    Werner

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01 19:30       ` Werner LEMBERG
@ 2018-05-02  7:27         ` Werner LEMBERG
  2018-05-02 15:22         ` Eli Zaretskii
  1 sibling, 0 replies; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-02  7:27 UTC (permalink / raw)
  To: eliz; +Cc: 31315


> To a certain extent it is valid to assume that a user of GB18030
> expects Chinese glyph representation forms for characters in the CJK
> range.  However, since full Unicode is supported, this assumption is
> rather weak.
>
> The X11 interface is too old actually to handle GB18030 correctly.

Oops, please ignore this sentence, which I forgot to delete.

> For example, on my GNU/Linux box xft offers the following:
>
>   -adobe-noto sans cjk jp thin-light-r-normal--0-0-0-0-p-0-gb18030.2000-0
>
> As the `jp' in the name indicates this font contains Japanese glyph
> representation forms.  Since `Noto Sans CJK' provides all CJK glyphs
> in the BMP, xft happily tags it with GB18030...


    Werner





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01 19:30       ` Werner LEMBERG
  2018-05-02  7:27         ` Werner LEMBERG
@ 2018-05-02 15:22         ` Eli Zaretskii
  2018-05-03  5:52           ` Werner LEMBERG
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-02 15:22 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: 31315

> Date: Tue, 01 May 2018 21:30:14 +0200 (CEST)
> Cc: handa@gnu.org, 31315@debbugs.gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> > I think we have a terminology problem here, most probably my fault.
> > What exactly do you mean when you say "font backend" in this
> > context?  And what is "the client" in this case?
> 
> OK, sorry.  I mean the X11 font backend.  Here's my global picture.
> 
>           gb18030               unicode
>  Emacs  ----------->   xft   ------------>  DroidSansFallback.ttf
> 
> For me, Emacs is a client of the xft font interface.  In our
> particular case, xft provides `DroidSansFallback.ttf' to Emacs as a
> font encoded in GB18030 – Emacs obviously has requested a font in this
> encoding.  Behind the scenes, however, xft communicates with the
> `DroidSansFallback.ttf' font using Unicode (the font has no other
> cmap).

If by "xft" you mean the part of the X libraries that supports the
APIs used by xfont.c, then I think we are on the same page now.

> > If you received a GB18030 encoded email, it is expected that Emacs
> > will try to find a font that explicitly supports GB18030.
> >
> > This is a feature that AFAIU is very important to CJK users: they
> > expect Emacs to select a font that declares support for the
> > character's charset as set by the decoding machinery.
> 
> While this is correct for other CJK encodings like GB, JIS, KSC, or
> Big5, it is *not* true for GB18030.  This is *only* an encoding and
> *not* a charset!  It is simply another representation of Unicode,
> comparable to UTF-8 or UCS4.  There doesn't exist a single font
> natively encoded in GB18030!  This encoding only exists to be
> code-wise backward compatible with GB 2312.

Maybe so, but GB18030 is a Chinese encoding, and as such it behaves in
Emacs as all the other Chinese encodings.

Emacs employs that logic for every charset it has defined, including
Latin-2, for example: if text was decoded from an encoding which
supports a particular charset, Emacs puts the corresponding 'charset'
text property on the decoded text, and the machinery which selects the
appropriate font tries first to find a font which supports that
charset.  The idea is that users in a particular culture have certain
distinct preferences wrt fonts, and that an encoding that supports a
certain charset or culture provides a hint about those preferences.
This idea is very central in how Emacs selects fonts.

> To a certain extent it is valid to assume that a user of GB18030
> expects Chinese glyph representation forms for characters in the CJK
> range.  However, since full Unicode is supported, this assumption is
> rather weak.

Weak or not, Emacs tries to heed it.

> >> I don't have any fontsets customized in my `.emacs' file.
> >
> > Well, it sounds like you should.  Emacs chooses fonts using
> > techniques that prefer speed to accuracy, and if that gives
> > suboptimal results, the way to improve them is to guide Emacs by
> > tailoring your fontset to the fonts you have installed and to the
> > visual appearance you happen to like.
> 
> For the purpose of reporting this bug I thought it would be best to
> not use further deviations of `emacs -Q'...

My comment was not in the context of the bug report (where your
assumption is absolutely correct), it is rather a response to your
broader complain regarding an ugly font that creeps into display of
text which was encoded in GB18030.  You can tell Emacs to use other
fonts for that charset by customizing your fontset.

> >> Both.  If I open a new file Unicode encoded file, Emacs continues
> >> to use GB18030.2000 as the charset registry/encoding for displaying
> >> fallback characters, failing to convert Unicode to GB18030 before
> >> accessing the characters from the font backend.
> >
> > The former part is not a bug at all.
> 
> I agree.  I only wanted to tell you what I observe.

Well, you called that a "problem".  I understand that we now agree the
first part is not a problem in itself.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-02 15:22         ` Eli Zaretskii
@ 2018-05-03  5:52           ` Werner LEMBERG
  2018-05-03 17:48             ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-03  5:52 UTC (permalink / raw)
  To: eliz; +Cc: 31315

> If by "xft" you mean the part of the X libraries that supports the
> APIs used by xfont.c, then I think we are on the same page now.

OK.

>> While this is correct for other CJK encodings like GB, JIS, KSC, or
>> Big5, it is *not* true for GB18030.  This is *only* an encoding and
>> *not* a charset!  It is simply another representation of Unicode,
>> comparable to UTF-8 or UCS4.  There doesn't exist a single font
>> natively encoded in GB18030!  This encoding only exists to be
>> code-wise backward compatible with GB 2312.
> 
> Maybe so, but GB18030 is a Chinese encoding, and as such it behaves
> in Emacs as all the other Chinese encodings.

I know, and I agree.  BUT!  xft doesn't do what Emacs expects.  *Any*
font that covers the whole BMP (in particular, the whole CJK part of
it) gets a `GB18030' tag from xft.  In other words, the `Chinese'
property isn't in the font from the very beginning.[*]

> Emacs employs that logic for every charset it has defined, including
> Latin-2, for example: if text was decoded from an encoding which
> supports a particular charset, Emacs puts the corresponding
> 'charset' text property on the decoded text, and the machinery which
> selects the appropriate font tries first to find a font which
> supports that charset.  The idea is that users in a particular
> culture have certain distinct preferences wrt fonts, and that an
> encoding that supports a certain charset or culture provides a hint
> about those preferences.  This idea is very central in how Emacs
> selects fonts.

Being the FreeType maintainer, and having co-developed Emacs's
internal buffer encoding scheme many, many years ago, I all know this.
I can only repeat that Emacs might tag a certain text with GB18030 so
that the user can deduce a Chinese origin.  However, there is *no*
guarantee that the user gets a Chinese-flavoured font – at least not
from the xft interface.[**]

As a corollary, it is fully sufficient for xft to handle GB18030 equal
to Unicode (i.e., `iso10646').


    Werner


[*] Actually, having Unicode fonts that provide CJK glyphs for the
    whole BMP completely spoils Emacs's font selection scheme based on
    charsets – as shown in one of my previous e-mails, xft provides
    all common CJK encodings for such fonts because Unicode is a
    superset of those encodings.

[**] If, say, the Pango font interface is used instead to access a
     modern CJK OpenType font, Emacs might request `script=hani,
     lang=ZHS' if it encounters GB18030 to resolve Unicode's Han
     unification, ensuring simplified Chinese glyph representation
     forms.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-03  5:52           ` Werner LEMBERG
@ 2018-05-03 17:48             ` Eli Zaretskii
  2018-05-03 19:05               ` Werner LEMBERG
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-03 17:48 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: 31315

> Date: Thu, 03 May 2018 07:52:27 +0200 (CEST)
> Cc: handa@gnu.org, 31315@debbugs.gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> > Emacs employs that logic for every charset it has defined, including
> > Latin-2, for example: if text was decoded from an encoding which
> > supports a particular charset, Emacs puts the corresponding
> > 'charset' text property on the decoded text, and the machinery which
> > selects the appropriate font tries first to find a font which
> > supports that charset.  The idea is that users in a particular
> > culture have certain distinct preferences wrt fonts, and that an
> > encoding that supports a certain charset or culture provides a hint
> > about those preferences.  This idea is very central in how Emacs
> > selects fonts.
> 
> Being the FreeType maintainer, and having co-developed Emacs's
> internal buffer encoding scheme many, many years ago, I all know this.

Sorry, I couldn't know that.

> I can only repeat that Emacs might tag a certain text with GB18030 so
> that the user can deduce a Chinese origin.  However, there is *no*
> guarantee that the user gets a Chinese-flavoured font – at least not
> from the xft interface.[**]

IME, there's no guarantee about anything in the Emacs font look up
heuristics, except that empirically it does TRT in about 85% of uses.

May I invite you to work on revisiting the design and implementation
of the Emacs font-look up facilities, and on modernizing them?  I'm
afraid we didn't have an active developer in this area for several
years, and I fear that we will stagnate (or already are stagnating).

TIA





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-03 17:48             ` Eli Zaretskii
@ 2018-05-03 19:05               ` Werner LEMBERG
  2018-05-03 19:59                 ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-03 19:05 UTC (permalink / raw)
  To: eliz; +Cc: 31315


>> I can only repeat that Emacs might tag a certain text with GB18030
>> so that the user can deduce a Chinese origin.  However, there is
>> *no* guarantee that the user gets a Chinese-flavoured font – at
>> least not from the xft interface.[**]
> 
> IME, there's no guarantee about anything in the Emacs font look up
> heuristics, except that empirically it does TRT in about 85% of
> uses.

If you don't install pan-CJK Unicode fonts, I fully agree :-)

> May I invite you to work on revisiting the design and implementation
> of the Emacs font-look up facilities, and on modernizing them?  I'm
> afraid we didn't have an active developer in this area for several
> years, and I fear that we will stagnate (or already are stagnating).

Alas, my Elisp knowledge is ... well ... not impressive.  Basically,
I'm just an Emacs user, not an Emacs developer.

With Xft, there is no possibility for improvement IMHO.  The probably
best choice is to switch to Pango for font access (in case you don't
do that already).  An Emacs charset essentially triggers a language
and script setting, and those two parameters can be passed to Pango,
AFAIK.


    Werner

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-03 19:05               ` Werner LEMBERG
@ 2018-05-03 19:59                 ` Eli Zaretskii
  2018-05-04  5:11                   ` Werner LEMBERG
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-03 19:59 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: 31315

> Date: Thu, 03 May 2018 21:05:28 +0200 (CEST)
> Cc: handa@gnu.org, 31315@debbugs.gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> > May I invite you to work on revisiting the design and implementation
> > of the Emacs font-look up facilities, and on modernizing them?  I'm
> > afraid we didn't have an active developer in this area for several
> > years, and I fear that we will stagnate (or already are stagnating).
> 
> Alas, my Elisp knowledge is ... well ... not impressive.  Basically,
> I'm just an Emacs user, not an Emacs developer.

Almost all of the relevant code is in C, not in Lisp.

> With Xft, there is no possibility for improvement IMHO.  The probably
> best choice is to switch to Pango for font access (in case you don't
> do that already).

There are several back-ends besides Xft, the most advanced being
xftfont.c.  They all didn't see any serious development for the past
several years.  And yes, acquiring new back-ends is also a worthy
goal.

All of that requires a level of expertise that IMO we currently don't
have.

> An Emacs charset essentially triggers a language and script setting,
> and those two parameters can be passed to Pango, AFAIK.

Emacs handles charsets and scripts separately.  A script is matched
against OTF/TTF features of fonts to make sure the necessary shaping
features required by a script are supported.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-03 19:59                 ` Eli Zaretskii
@ 2018-05-04  5:11                   ` Werner LEMBERG
  2018-05-04 13:05                     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Werner LEMBERG @ 2018-05-04  5:11 UTC (permalink / raw)
  To: eliz; +Cc: 31315


>> Alas, my Elisp knowledge is ... well ... not impressive.
>> Basically, I'm just an Emacs user, not an Emacs developer.
> 
> Almost all of the relevant code is in C, not in Lisp.

OK.

>> With Xft, there is no possibility for improvement IMHO.  The
>> probably best choice is to switch to Pango for font access (in case
>> you don't do that already).
> 
> There are several back-ends besides Xft, the most advanced being
> xftfont.c.  They all didn't see any serious development for the past
> several years.  And yes, acquiring new back-ends is also a worthy
> goal.
> 
> All of that requires a level of expertise that IMO we currently
> don't have.

What I can certainly offer is advice.  However, I'm not sure to be
right person for working on the innards of Emacs, given the usual lack
of time (sigh).

>> An Emacs charset essentially triggers a language and script
>> setting, and those two parameters can be passed to Pango, AFAIK.
> 
> Emacs handles charsets and scripts separately.  A script is matched
> against OTF/TTF features of fonts to make sure the necessary shaping
> features required by a script are supported.

Interesting.  Access to OpenType features is exactly what's needed to
improve font display for selected charsets.  Where can I find the
related code in Emacs?

Additionally, I suggest that the Emacs maintainers set up a GSoC
project, namely to improve font rendering.  This is a broad topic,
which could be further split into smaller subprojects.

Emacs uses Handa-san's libotf library (are there any other projects
that use this library?), but AFAICS it doesn't receive a lot of
testing.  On the other hand, there is Behdad Esfahbod's `HarfBuzz'
shaping engine that comes with a large suite of tests.  One of such
subprojects could be to take these tests and use them to improve
libotf, especially for Indic scripts.

  https://www.freedesktop.org/wiki/Software/HarfBuzz/


    Werner





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-04  5:11                   ` Werner LEMBERG
@ 2018-05-04 13:05                     ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-04 13:05 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: 31315

> Date: Fri, 04 May 2018 07:11:37 +0200 (CEST)
> Cc: handa@gnu.org, 31315@debbugs.gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> > Emacs handles charsets and scripts separately.  A script is matched
> > against OTF/TTF features of fonts to make sure the necessary shaping
> > features required by a script are supported.
> 
> Interesting.  Access to OpenType features is exactly what's needed to
> improve font display for selected charsets.  Where can I find the
> related code in Emacs?

Emacs delegates that to the font back-end.  E.g., in font_match_p you
will see that if certain 'otf' capabilities are required, Emacs calls
the otf_capability method of the font driver.  The ftfont.c driver
implements that as ftfont_otf_capability.  And which OTF capabilities
are required for what scripts is set up in the fontsets by fontest.el.

> Additionally, I suggest that the Emacs maintainers set up a GSoC
> project, namely to improve font rendering.  This is a broad topic,
> which could be further split into smaller subprojects.

Good idea.

> Emacs uses Handa-san's libotf library (are there any other projects
> that use this library?), but AFAICS it doesn't receive a lot of
> testing.  On the other hand, there is Behdad Esfahbod's `HarfBuzz'
> shaping engine that comes with a large suite of tests.  One of such
> subprojects could be to take these tests and use them to improve
> libotf, especially for Indic scripts.
> 
>   https://www.freedesktop.org/wiki/Software/HarfBuzz/

I think a better development would be to teach Emacs to use HarfBuzz
as its shaping engine.  HarfBuzz is available on many platforms, and
is AFAIK actively developed, so we will gain better shaper and
advanced features if we do that.  Once again, such a project needs a
motivated volunteer to carry it out.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#31315: wrong font encoding for fallback font
  2018-05-01 17:42                 ` Andreas Schwab
@ 2018-05-05  8:57                   ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2018-05-05  8:57 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 31315-done

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Werner LEMBERG <wl@gnu.org>,  handa@gnu.org,  31315@debbugs.gnu.org
> Date: Tue, 01 May 2018 19:42:16 +0200
> 
> On Mai 01 2018, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > Andreas, given that '("gb18030" (gb18030 . unicode))' appears to work,
> > do you see any potential problems with using it, instead of
> > gb18030-2-byte?
> 
> I leave that question to those who are more knowledgeable of the X font
> interface.

I invite those people to chime in.  Meanwhile, I pushed that change to
the master branch, and I'm boldly marking this bug "done".

Thanks.





^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-05-05  8:57 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-30  7:21 bug#31315: wrong font encoding for fallback font Werner LEMBERG
2018-04-30 15:13 ` Eli Zaretskii
2018-04-30 15:42   ` Andreas Schwab
2018-04-30 19:26     ` Eli Zaretskii
2018-04-30 20:03       ` Andreas Schwab
2018-05-01  2:37         ` Eli Zaretskii
2018-05-01  6:47         ` Werner LEMBERG
2018-05-01  8:13           ` Andreas Schwab
2018-05-01  9:11             ` Werner LEMBERG
2018-05-01 15:00               ` Eli Zaretskii
2018-05-01 17:42                 ` Andreas Schwab
2018-05-05  8:57                   ` Eli Zaretskii
2018-05-01  6:36   ` Werner LEMBERG
2018-05-01 15:22     ` Eli Zaretskii
2018-05-01 19:30       ` Werner LEMBERG
2018-05-02  7:27         ` Werner LEMBERG
2018-05-02 15:22         ` Eli Zaretskii
2018-05-03  5:52           ` Werner LEMBERG
2018-05-03 17:48             ` Eli Zaretskii
2018-05-03 19:05               ` Werner LEMBERG
2018-05-03 19:59                 ` Eli Zaretskii
2018-05-04  5:11                   ` Werner LEMBERG
2018-05-04 13:05                     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).