unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#1399: 23.0.60; Some Unicode glyphs incorrectly mapped to CJK
@ 2008-11-20 23:33 Ian Eure
  0 siblings, 0 replies; 3+ messages in thread
From: Ian Eure @ 2008-11-20 23:33 UTC (permalink / raw)
  To: emacs-pretest-bug

It seems that some Unicode glyphs are incorrectly categorized.

For example, U+201C, U+201D, U+2018, U+2019 (LEFT/RIGHT SINGLE/DOUBLE  
QUOTATION MARK) are all mapped into the CHK category. This results in  
the use of the STHeiti font for those characters, which are a  
different width than the normal font I've chosen.

I think it's incorrect for them to be categorized as CJK, since they  
are widely used in latin scripts.


In GNU Emacs 23.0.60.1 (i386-apple-darwin9.5.0, NS apple-appkit-949.35)
  of 2008-11-20 on neutron.local
Windowing system distributor `Apple', version  
97.112.112.108.101.45.97.112.112.107.105.116.45.57.52.57.46.51.53
configured using `configure  '--with-ns''

Important settings:
   value of $LC_ALL: nil
   value of $LC_COLLATE: nil
   value of $LC_CTYPE: nil
   value of $LC_MESSAGES: nil
   value of $LC_MONETARY: nil
   value of $LC_NUMERIC: nil
   value of $LC_TIME: nil
   value of $LANG: nil
   value of $XMODIFIERS: nil
   locale-coding-system: nil
   default-enable-multibyte-characters: t

Major mode: Help

Minor modes in effect:
   ime-bindings: t
   tooltip-mode: t
   mouse-wheel-mode: t
   menu-bar-mode: t
   file-name-shadow-mode: t
   global-font-lock-mode: t
   font-lock-mode: t
   global-auto-composition-mode: t
   auto-composition-mode: t
   auto-encryption-mode: t
   auto-compression-mode: t
   line-number-mode: t
   transient-mark-mode: t
   view-mode: t

Recent input:
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-x b * <backspace> f o o <return> <return>
<backspace> C-x k RET C-x b f o <tab> <return> C-a
C-k C-p C-f C-f C-k C-a C-f C-a M-x d e s c r i b e
- c h a r <tab> <return> w C-_ C-x o C-n C-e C-b C-b
C-b C-b <return> C-c C-b C-b C-p C-b C-b C-b C-b C-b
C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b
C-b C-b C-b C-b M-> C-p C-p C-p C-p C-p C-n C-n C-e
C-a C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-n C-n C-n C-n C-n C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p M-p M-p C-p C-p C-p
C-p C-p <help-echo> <down-mouse-1> <mouse-movement>
<drag-mouse-1> C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-a
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p <help-echo> <down-mouse-1>
<down-mouse-1> <mouse-1> <wheel-up> <double-wheel-up>
<down-mouse-1> <mouse-1> <wheel-up> <wheel-down> <double-wheel-down>
<triple-wheel-down> <triple-wheel-down> <triple-wheel-down>
<triple-wheel-down> <triple-wheel-down> <triple-wheel-down>
<triple-wheel-up> <triple-wheel-up> <triple-wheel-up>
<triple-wheel-up> <triple-wheel-up> <triple-wheel-up>
<triple-wheel-up> <triple-wheel-up> <down-mouse-1>
<mouse-1> <help-echo> <down-mouse-1> <mouse-1> C-x
b f o n t <tab> <return> C-x 1 C-v C-v C-v M-v C-v
C-v C-v M-v M-v M-v C-x b <return> C-x b <return> C-x
b * H <tab> <return> C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-n C-a C-SPC C-e M-w <menu-bar>
<help-menu> <send-emacs-bug-report>

Recent messages:
uncompressing fontset.el.gz...done
call-interactively: Beginning of buffer
Type C-x 1 to delete the help window.
Undo!
Mark set
Auto-saving...done
byte-code: Beginning of buffer [3 times]
byte-code: End of buffer [6 times]
byte-code: Beginning of buffer [7 times]
Mark set






^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#1399: 23.0.60; Some Unicode glyphs incorrectly mapped to CJK
@ 2008-11-30  2:26 Chong Yidong
  0 siblings, 0 replies; 3+ messages in thread
From: Chong Yidong @ 2008-11-30  2:26 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 1399

Hi Handa-san,

Could you take a look at this bug report?

Ian Eure <ian@digg.com> wrote:

> It seems that some Unicode glyphs are incorrectly categorized.
>
> For example, U+201C, U+201D, U+2018, U+2019 (LEFT/RIGHT SINGLE/DOUBLE  
> QUOTATION MARK) are all mapped into the CHK category. This results in  
> the use of the STHeiti font for those characters, which are a  
> different width than the normal font I've chosen.
>
> I think it's incorrect for them to be categorized as CJK, since they  
> are widely used in latin scripts.






^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#1399: 23.0.60; Some Unicode glyphs incorrectly mapped to CJK
@ 2009-03-17  2:00 Kenichi Handa
  0 siblings, 0 replies; 3+ messages in thread
From: Kenichi Handa @ 2009-03-17  2:00 UTC (permalink / raw)
  To: 1399; +Cc: ian

Sorry for the late response.

> It seems that some Unicode glyphs are incorrectly categorized.
> 
> For example, U+201C, U+201D, U+2018, U+2019 (LEFT/RIGHT SINGLE/DOUBLE  
> QUOTATION MARK) are all mapped into the CHK category. This results in  
> the use of the STHeiti font for those characters, which are a  
> different width than the normal font I've chosen.

Category doesn't affect the font selection.

As all of those characters are `symbol' script, Emacs at
first lists fonts that have at least one of #x201C, #x2200,
#x2500 (see script-representative-chars), and select one
that matches best with your default font's family, foundry,
etc.

In your case, perhaps all your listed fonts have different
family, foundry, etc than the default font, and thus Emacs
selects arbitrary one from the listed fonts.

Currently, Emacs can't know which kind of font is more
suitable for those charaters; a font that has double-width
glyphs for them, or a font that has single-width glyphs.

So, if you prefer a specific font for symbol characters, you
must modify the defualt fontset (or whatever fontset you are
using) for symbol characters, for example, as this:

(set-fontset-font
  "fontset-default"
  'symbol
  '("FAMILYNAME" . "iso10646-1"))

> I think it's incorrect for them to be categorized as CJK, since they  
> are widely used in latin scripts.

Character category is not exclusive.  Even if a character
has CJK category, it doesn't mean that the character is not
Latin.

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-03-17  2:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-17  2:00 bug#1399: 23.0.60; Some Unicode glyphs incorrectly mapped to CJK Kenichi Handa
  -- strict thread matches above, loose matches on Subject: below --
2008-11-30  2:26 Chong Yidong
2008-11-20 23:33 Ian Eure

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).