unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Font back end font selection process
@ 2009-06-07  3:54 Adrian Robert
  2009-06-07  5:59 ` Stephen J. Turnbull
  2009-06-08  2:49 ` Kenichi Handa
  0 siblings, 2 replies; 5+ messages in thread
From: Adrian Robert @ 2009-06-07  3:54 UTC (permalink / raw)
  To: Emacs-Devel devel

I am working on updating the NS font driver to work with script and  
friends so that correct nonASCII fonts can be chosen using the  
default fontset skeleton mechanism.  The back end seems to use these  
methods to request a font from the list() method:

- registry in the font spec proper
- :script property in "extra" properties
- :lang property in "extra"
- part of the :otf property bundle in "extra"

I haven't found a way to respond to the first type of query using  
Cocoa APIs yet.  The others that get requested, and the order, seems  
to depend on the language in question.  In particular, for some  
languages like Thai, only OTF requests ever seem to get made.  It  
seems like this class might be the scripts requiring compositional  
rendering, but why, since emacs used to be able to handle  
compositional rendering without making use of any OTF-specific  
properties provided by a font driver?

Also, often I have noticed that when given a Chinese text file  
(encoded in UTF-8), the only request that comes through is :lang=ja.   
How should the font driver know to return a kanji font instead of  
hiragana / katakana?.  Wouldn't it would be better to  
request :script=han, adding :lang=ja or :lang=zh only if emacs has  
some knowledge that the file IS actually in one of these languages?   
The file encoding might be one piece of information to take into  
account, but when it is UTF-8 it would need to run some kind of  
lexical analysis, or query the user.

I also noticed that if no entities are returned from a list() request  
with a family and a script specified, it next makes a list() request  
with no family specified.  Instead of this it would be good to  
request a match() with the family still specified, as this gives the  
driver the opportunity to find a font that "looks like" the family  
(e.g. presence of serifs, etc.), instead of just a random font  
covering the needed characters.  Indeed, I have not noticed match()  
being called at all when searching for a font for a script -- instead  
the back end just goes with the ascii font (and rendering boxes)  
before ever making such a request.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Font back end font selection process
  2009-06-07  3:54 Font back end font selection process Adrian Robert
@ 2009-06-07  5:59 ` Stephen J. Turnbull
  2009-06-08  2:49 ` Kenichi Handa
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen J. Turnbull @ 2009-06-07  5:59 UTC (permalink / raw)
  To: Adrian Robert; +Cc: Emacs-Devel devel

Adrian Robert writes:

 > Also, often I have noticed that when given a Chinese text file  
 > (encoded in UTF-8), the only request that comes through is :lang=ja.   

That reflects the historical origin of Mule, I would guess.

 > How should the font driver know to return a kanji font instead of  
 > hiragana / katakana?

If kana are present, it's Japanese.  If Hangul are present, it's
Korean.  If the accents outnumber the base characters, it's
Vietnamese.  Otherwise, it's Chinese.

There are more precise criteria based on usage of simplified
characters, but that would be good enough for a start.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Font back end font selection process
  2009-06-07  3:54 Font back end font selection process Adrian Robert
  2009-06-07  5:59 ` Stephen J. Turnbull
@ 2009-06-08  2:49 ` Kenichi Handa
  2009-06-10  7:27   ` Adrian Robert
  1 sibling, 1 reply; 5+ messages in thread
From: Kenichi Handa @ 2009-06-08  2:49 UTC (permalink / raw)
  To: Adrian Robert; +Cc: emacs-devel

In article <8BA022EF-AACD-495A-ABBB-24B230475217@gmail.com>, Adrian Robert <adrian.b.robert@gmail.com> writes:

> I am working on updating the NS font driver to work with script and  
> friends so that correct nonASCII fonts can be chosen using the  
> default fontset skeleton mechanism.  The back end seems to use these  
> methods to request a font from the list() method:

> - registry in the font spec proper
> - :script property in "extra" properties
> - :lang property in "extra"
> - part of the :otf property bundle in "extra"

> I haven't found a way to respond to the first type of query using  
> Cocoa APIs yet.

In that case, you can simply reject any register other than
"iso10646-1".

> The others that get requested, and the order, seems  
> to depend on the language in question.  In particular, for some  
> languages like Thai, only OTF requests ever seem to get made.  It  
> seems like this class might be the scripts requiring compositional  
> rendering, but why, since emacs used to be able to handle  
> compositional rendering without making use of any OTF-specific  
> properties provided by a font driver?

Emacs 23 still can use non-OTF Thai font if the registry is
tis620 or iso8859-11.  The default fontset has this entry
for Thai.

     (thai  ,(font-spec :registry "iso10646-1" :otf '(thai nil nil (mark)))
	    (nil . "TIS620*")
	    (nil . "ISO8859-11"))

The reason why I added :otf for "iso10646-1" is that now we
have many OTF Thai fonts usable with Xft font-backend (and
perhaps with uniscribe backend).  OTF Thai fonts provide
better Thai rendering than the simple relative stacking
method of Emacs 22.  But, if OTF is not available on Cocoa,
I'll change the entry for Thai to something like this:

     (thai  ,(font-spec :registry "iso10646-1" :otf '(thai nil nil (mark)))
     	    ,(font-spec :registry "iso10646-1" :scritp 'thai)
	    (nil . "TIS620*")
	    (nil . "ISO8859-11"))

Does it solve your problem?

> Also, often I have noticed that when given a Chinese text file  
> (encoded in UTF-8), the only request that comes through is :lang=ja.   

?? For han script, the default fontset has this entry:

     (han (nil . "GB2312.1980-0")
	  (nil . "JISX0208*")
	  (nil . "JISX0212*")
	  (nil . "big5*")
	  (nil . "KSC5601.1987*")
	  (nil . "CNS11643.1992-1")
	  (nil . "CNS11643.1992-2")
	  (nil . "CNS11643.1992-3")
	  (nil . "CNS11643.1992-4")
	  (nil . "CNS11643.1992-5")
	  (nil . "CNS11643.1992-6")
	  (nil . "CNS11643.1992-7")
	  (nil . "gbk-0")
	  (nil . "gb18030")
	  (nil . "JISX0213.2000-1")
	  (nil . "JISX0213.2000-2")
	  (nil . "JISX0213.2004-1")
	  ,(font-spec :registry "iso10646-1" :lang 'ja)
	  ,(font-spec :registry "iso10646-1" :lang 'zh))

So, not only `ja', emacs should try `zh' if `ja' is not
available.  Doesn't it happen on Cocoa?

> How should the font driver know to return a kanji font instead of  
> hiragana / katakana?.

A font driver can return any 'ja' iso10646-1 fonts for this
request (even if the font support only kana):

	  ,(font-spec :registry "iso10646-1" :lang 'ja)

If the first font in the returned list doesn't support a
specific han character, Emacs tries another font in the
returned list.

> Wouldn't it would be better to  
> request :script=han, adding :lang=ja or :lang=zh only if emacs has  
> some knowledge that the file IS actually in one of these languages?   
> The file encoding might be one piece of information to take into  
> account, but when it is UTF-8 it would need to run some kind of  
> lexical analysis, or query the user.

If the buffer file is in UTF-8, Emacs currently does this.
If the current lang.  env. is "Japanese", try :lang=ja
before :lang=zh.  If the current lang. env. is
"Chinese-XXX", try :lang=zh before :lang=ja.  Otherwise, try
by the order the default fontset is defined (thus :lang-ja
first).  I've thought that should work well in most cases.

"Some kinf of lexical analysis" is surely very good but
currently we don't have that facility.  And, "query the
user" is too annoying.  I think it is better to provide a
good user interface for specifing a font for each script (or
range of characters).

> I also noticed that if no entities are returned from a list() request  
> with a family and a script specified, it next makes a list() request  
> with no family specified.  Instead of this it would be good to  
> request a match() with the family still specified, as this gives the  
> driver the opportunity to find a font that "looks like" the family  
> (e.g. presence of serifs, etc.), instead of just a random font  
> covering the needed characters.  Indeed, I have not noticed match()  
> being called at all when searching for a font for a script -- instead  
> the back end just goes with the ascii font (and rendering boxes)  
> before ever making such a request.

Ah, that sounds a good idea.  Another way is to allow font
drivers to list also fonts of similar families (sorted by
the closeness of family) and modify font_sort_entities to
preserver the order of lists of other properties than family
are the same.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Font back end font selection process
  2009-06-08  2:49 ` Kenichi Handa
@ 2009-06-10  7:27   ` Adrian Robert
  2009-06-10 11:04     ` Kenichi Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Adrian Robert @ 2009-06-10  7:27 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel


On Jun 8, 2009, at 9:49 AM, Kenichi Handa wrote:

> In article <8BA022EF-AACD-495A-ABBB-24B230475217@gmail.com>, Adrian  
> Robert <adrian.b.robert@gmail.com> writes:
>>
>> - registry in the font spec proper
>> - :script property in "extra" properties
>> - :lang property in "extra"
>> - part of the :otf property bundle in "extra"
>
>> I haven't found a way to respond to the first type of query using
>> Cocoa APIs yet.
>
> In that case, you can simply reject any register other than
> "iso10646-1".

OK, that's what we are doing.



>>  In particular, for some
>> languages like Thai, only OTF requests ever seem to get made.  It
>> seems like this class might be the scripts requiring compositional
>> rendering, but why, since emacs used to be able to handle
>> compositional rendering without making use of any OTF-specific
>> properties provided by a font driver?
>
> Emacs 23 still can use non-OTF Thai font if the registry is
> tis620 or iso8859-11.  The default fontset has this entry
> for Thai.
>
>      (thai  ,(font-spec :registry "iso10646-1" :otf '(thai nil nil  
> (mark)))
> 	    (nil . "TIS620*")
> 	    (nil . "ISO8859-11"))
>
> The reason why I added :otf for "iso10646-1" is that now we
> have many OTF Thai fonts usable with Xft font-backend (and
> perhaps with uniscribe backend).  OTF Thai fonts provide
> better Thai rendering than the simple relative stacking
> method of Emacs 22.  But, if OTF is not available on Cocoa,
> I'll change the entry for Thai to something like this:
>
>      (thai  ,(font-spec :registry "iso10646-1" :otf '(thai nil nil  
> (mark)))
>      	    ,(font-spec :registry "iso10646-1" :scritp 'thai)
> 	    (nil . "TIS620*")
> 	    (nil . "ISO8859-11"))
>
> Does it solve your problem?

Currently I'm just responding to the 'thai' in :otf with a Thai font  
and it seems to work reasonably.  None of the otf functions are  
implemented in the NS font driver and I'm unsure whether they can be,  
but emacs' text layout must fall back to stacking automatically.  If  
it would be better to refuse the :otf list() request at this stage  
then adding the :script 'thai entry would be good.  The same goes for  
other entries in the default fontset that use :otf in the same way.



>> Also, often I have noticed that when given a Chinese text file
>> (encoded in UTF-8), the only request that comes through is :lang=ja.
>
> ?? For han script, the default fontset has this entry:
>
>      (han (nil . "GB2312.1980-0")
> 	  (nil . "JISX0208*")
> 	  (nil . "JISX0212*")
> 	  (nil . "big5*")
> ...
> 	  ,(font-spec :registry "iso10646-1" :lang 'ja)
> 	  ,(font-spec :registry "iso10646-1" :lang 'zh))

Why not have

(font-spec :registry "iso10646-1" :script 'han)

before the lang entries?



> So, not only `ja', emacs should try `zh' if `ja' is not
> available.  Doesn't it happen on Cocoa?

As long as there are Japanese fonts on the system (always true on OS  
X), the first 'ja request will return fonts and the 'zh one will  
never get made.



>> How should the font driver know to return a kanji font instead of
>> hiragana / katakana?.
>
> A font driver can return any 'ja' iso10646-1 fonts for this
> request (even if the font support only kana):
>
> 	  ,(font-spec :registry "iso10646-1" :lang 'ja)
>
> If the first font in the returned list doesn't support a
> specific han character, Emacs tries another font in the
> returned list.

Ah, OK so for purposes of list() the driver should treat :lang='ja as  
"kana | kanji" instead of "kana & kanji", and treat kanji itself as  
"kanji | hanzi".







^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Font back end font selection process
  2009-06-10  7:27   ` Adrian Robert
@ 2009-06-10 11:04     ` Kenichi Handa
  0 siblings, 0 replies; 5+ messages in thread
From: Kenichi Handa @ 2009-06-10 11:04 UTC (permalink / raw)
  To: Adrian Robert; +Cc: emacs-devel

In article <E11A11F8-6256-4C97-A8FC-C8CA036E5002@gmail.com>, Adrian Robert <adrian.b.robert@gmail.com> writes:

> Currently I'm just responding to the 'thai' in :otf with a Thai font  
> and it seems to work reasonably.  None of the otf functions are  
> implemented in the NS font driver and I'm unsure whether they can be,  
> but emacs' text layout must fall back to stacking automatically.  If  
> it would be better to refuse the :otf list() request at this stage  
> then adding the :script 'thai entry would be good.  The same goes for  
> other entries in the default fontset that use :otf in the same way.

If NS backend doesn't support OTF, it is better that `list'
method returns nil for that request.  So, I'll add

	    ,(font-spec :registry "iso10646-1" :script 'thai)

for Thai.  By the way, for lao, the default fontset already
has this entry after the entry specifying :otf property.

	  ,(font-spec :registry "iso10646-1" :script 'lao)

But, for the other scripts that request OTF, it is
impossible to implement a falling back method.  Simple
stacking doesn't work for them.

>>> Also, often I have noticed that when given a Chinese text file
>>> (encoded in UTF-8), the only request that comes through is :lang=ja.
> >
> > ?? For han script, the default fontset has this entry:
> >
> >      (han (nil . "GB2312.1980-0")
> > 	  (nil . "JISX0208*")
> > 	  (nil . "JISX0212*")
> > 	  (nil . "big5*")
> > ...
> > 	  ,(font-spec :registry "iso10646-1" :lang 'ja)
> > 	  ,(font-spec :registry "iso10646-1" :lang 'zh))

> Why not have

> (font-spec :registry "iso10646-1" :script 'han)

> before the lang entries?

Just to reduce the number of font-specs to try.  Here I
assume that a font that supports han script supports ja
and/or zh, and thus adding the entry of :script 'han is
redundant.

> > So, not only `ja', emacs should try `zh' if `ja' is not
> > available.  Doesn't it happen on Cocoa?

>>> How should the font driver know to return a kanji font instead of
>>> hiragana / katakana?.
> >
> > A font driver can return any 'ja' iso10646-1 fonts for this
> > request (even if the font support only kana):
> >
> > 	  ,(font-spec :registry "iso10646-1" :lang 'ja)
> >
> > If the first font in the returned list doesn't support a
> > specific han character, Emacs tries another font in the
> > returned list.

> Ah, OK so for purposes of list() the driver should treat :lang='ja as  
> "kana | kanji" instead of "kana & kanji", and treat kanji itself as  
> "kanji | hanzi".

Yes.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-06-10 11:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-07  3:54 Font back end font selection process Adrian Robert
2009-06-07  5:59 ` Stephen J. Turnbull
2009-06-08  2:49 ` Kenichi Handa
2009-06-10  7:27   ` Adrian Robert
2009-06-10 11:04     ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).