* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
@ 2020-02-18 13:50 ynyaaa
2020-02-18 16:02 ` Eli Zaretskii
0 siblings, 1 reply; 8+ messages in thread
From: ynyaaa @ 2020-02-18 13:50 UTC (permalink / raw)
To: 39659
'han' script is defined in char-script-table as:
2E80-2FDF han
3200-9FFF han
F900-FAFF han
FE30-FE4F han
1F200-1F2FF han
20000-2A6DF han
2A700-2EBEF han
2F800-2FA1F han
It is better to set values as:
3200-33FF cjk-misc
4DC0-4DFF cjk-misc
FE30-FE4F cjk-misc
1F200-1F2FF cjk-misc
If enclosed CJK Ideographs should be 'han' script,
enclosed Hanguls should be 'hangul' script,
enclosed Katakana should be 'kana' script,
and enclosed Numbers should be 'symbol' script.
In GNU Emacs 27.0.60 (build 1, x86_64-w64-mingw32)
of 2019-12-29 built on CIRROCUMULUS
Repository revision: 21c3020fcec0a32122d2680a391864a75393031b
Repository branch: emacs-27
Windowing system distributor 'Microsoft Corp.', version 10.0.18363
System Description: Microsoft Windows 10 Pro (v10.0.1909.18363.657)
Recent messages:
Configured using:
'configure --without-dbus --host=x86_64-w64-mingw32
--without-compress-install -C 'CFLAGS=-O2 -static -g3''
Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY W32NOTIFY ACL GNUTLS LIBXML2
HARFBUZZ ZLIB TOOLKIT_SCROLL_BARS MODULES THREADS PDUMPER LCMS2 GMP
Important settings:
value of $LANG: JPN
locale-coding-system: cp932
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(rect wid-edit descr-text mule-diag thingatpt cl-extra novice help-fns
radix-tree cl-print debug backtrace find-func gnutls network-stream nsm
mailalias smtpmail auth-source cl-seq eieio eieio-core cl-macs
eieio-loaddefs json map misearch multi-isearch help-mode pp shadow sort
mail-extr term/bobcat emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs text-property-search time-date
subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
japan-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads w32notify w32 lcms2 multi-tty make-network-process
emacs)
Memory information:
((conses 16 921325 302508)
(symbols 48 58666 0)
(strings 32 119118 12074)
(string-bytes 1 2586170)
(vectors 16 88868)
(vector-slots 8 2545555 209000)
(floats 8 47 281)
(intervals 56 44668 5857)
(buffers 1000 22))
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
2020-02-18 13:50 bug#39659: 27.0.60; inappropriate han script definition in char-script-table ynyaaa
@ 2020-02-18 16:02 ` Eli Zaretskii
2020-02-19 9:53 ` ynyaaa
0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2020-02-18 16:02 UTC (permalink / raw)
To: ynyaaa, Kenichi Handa; +Cc: 39659
> From: ynyaaa@gmail.com
> Date: Tue, 18 Feb 2020 22:50:57 +0900
>
> 'han' script is defined in char-script-table as:
> 2E80-2FDF han
> 3200-9FFF han
> F900-FAFF han
> FE30-FE4F han
> 1F200-1F2FF han
> 20000-2A6DF han
> 2A700-2EBEF han
> 2F800-2FA1F han
>
> It is better to set values as:
> 3200-33FF cjk-misc
> 4DC0-4DFF cjk-misc
> FE30-FE4F cjk-misc
> 1F200-1F2FF cjk-misc
>
> If enclosed CJK Ideographs should be 'han' script,
> enclosed Hanguls should be 'hangul' script,
> enclosed Katakana should be 'kana' script,
> and enclosed Numbers should be 'symbol' script.
Please provide some rationale for the differences, just saying
"better" and "should" doesn't explain why you think the changes are
for the good.
CC'ing Handa-san, who I hope will have some comments on this.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
2020-02-18 16:02 ` Eli Zaretskii
@ 2020-02-19 9:53 ` ynyaaa
2020-02-19 15:43 ` Eli Zaretskii
2020-02-29 3:39 ` handa
0 siblings, 2 replies; 8+ messages in thread
From: ynyaaa @ 2020-02-19 9:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 39659
Eli Zaretskii <eliz@gnu.org> writes:
>> From: ynyaaa@gmail.com
>> Date: Tue, 18 Feb 2020 22:50:57 +0900
>>
>> 'han' script is defined in char-script-table as:
>> 2E80-2FDF han
>> 3200-9FFF han
>> F900-FAFF han
>> FE30-FE4F han
>> 1F200-1F2FF han
>> 20000-2A6DF han
>> 2A700-2EBEF han
>> 2F800-2FA1F han
>>
>> It is better to set values as:
>> 3200-33FF cjk-misc
>> 4DC0-4DFF cjk-misc
>> FE30-FE4F cjk-misc
>> 1F200-1F2FF cjk-misc
>>
>> If enclosed CJK Ideographs should be 'han' script,
>> enclosed Hanguls should be 'hangul' script,
>> enclosed Katakana should be 'kana' script,
>> and enclosed Numbers should be 'symbol' script.
>
> Please provide some rationale for the differences, just saying
> "better" and "should" doesn't explain why you think the changes are
> for the good.
>
> CC'ing Handa-san, who I hope will have some comments on this.
>
> Thanks.
Because they are not han characters.
I think that combinatorial characters are not han characters,
and that they are symbolic characters.
As for enclosed latin letters, they are treated as 'symbol' script.
249C-24B5 PARENTHESIZED LATIN SMALL LETTER *
24B6-24CF CIRCLED LATIN CAPITAL LETTER *
24D0-24E9 CIRCLED LATIN SMALL LETTER *
1F110-1F129 PARENTHESIZED LATIN CAPITAL LETTER *
1F130-1F149 SQUARED LATIN CAPITAL LETTER *
1F150-1F169 NEGATIVE CIRCLED LATIN CAPITAL LETTER *
1F170-1F189 NEGATIVE SQUARED LATIN CAPITAL LETTER *
1F12A TORTOISE SHELL BRACKETED LATIN CAPITAL LETTER S
1F12B CIRCLED ITALIC LATIN CAPITAL LETTER C
1F12C CIRCLED ITALIC LATIN CAPITAL LETTER R
1F18A CROSSED NEGATIVE SQUARED LATIN CAPITAL LETTER P
1F1A5 SQUARED LATIN SMALL LETTER D
If script is set to han, hangul or kana for combinatorial characters
which contain han, hangul or kana characters, script values are like below:
CodePoint Script Comment
3200-321E hangul enclosed hangul
321F - unassigned
3220-3247 han enclosed han
3248-324F symbol enclosed number
3250 symbol combined latin
3251-325F symbol enclosed number
3260-327E hangul enclosed hangul
327F symbol symbol
3280-32B0 han enclosed han
32B1-32BF symbol enclosed number
32C0-32CB han square character with han
32CC-32CF symbol square character with latin
32D0-32FE kana enclosed kana
32FF han square character with han
3300-3357 kana square character with kana
3358-3370 han square character with han
3371-337A symbol square character with latin
337B-337F han square character with han
3380-33DF symbol square character with latin
33E0-33FE han square character with han
33FF symbol square character with latin
4DC0-4DFF symbol symbol
FE30-FE44 symbol symbol for vertical
FE45-FE46 symbol symbol
FE47-FE48 symbol symbol for vertical
FE49-FE4F symbol symbol
1F200-1F202 kana enclosed/square character with kana
... - unassigned
1F210-1F212 han enclosed han
1F213 kana enclosed kana
1F214-1F248 han enclosed han
... - unassigned
1F250-1F251 han enclosed han
... - unassigned
1F260-1F265 symbol symbol
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
2020-02-19 9:53 ` ynyaaa
@ 2020-02-19 15:43 ` Eli Zaretskii
2020-02-20 6:27 ` ynyaaa
2020-02-29 3:39 ` handa
1 sibling, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2020-02-19 15:43 UTC (permalink / raw)
To: ynyaaa; +Cc: 39659
> From: ynyaaa@gmail.com
> Cc: Kenichi Handa <handa@gnu.org>, 39659@debbugs.gnu.org
> Date: Wed, 19 Feb 2020 18:53:07 +0900
>
> >> It is better to set values as:
> >> 3200-33FF cjk-misc
> >> 4DC0-4DFF cjk-misc
> >> FE30-FE4F cjk-misc
> >> 1F200-1F2FF cjk-misc
> >>
> >> If enclosed CJK Ideographs should be 'han' script,
> >> enclosed Hanguls should be 'hangul' script,
> >> enclosed Katakana should be 'kana' script,
> >> and enclosed Numbers should be 'symbol' script.
> >
> > Please provide some rationale for the differences, just saying
> > "better" and "should" doesn't explain why you think the changes are
> > for the good.
> >
> > CC'ing Handa-san, who I hope will have some comments on this.
> >
> > Thanks.
>
> Because they are not han characters.
> I think that combinatorial characters are not han characters,
> and that they are symbolic characters.
So your interpretation of cjk-misc is that they are symbols, not
letters? I'm asking because I don't really know what is meant by
"cjk-misc", I don't think we have it documented anywhere.
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
2020-02-19 15:43 ` Eli Zaretskii
@ 2020-02-20 6:27 ` ynyaaa
0 siblings, 0 replies; 8+ messages in thread
From: ynyaaa @ 2020-02-20 6:27 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 39659
Eli Zaretskii <eliz@gnu.org> writes:
>> From: ynyaaa@gmail.com
>> Cc: Kenichi Handa <handa@gnu.org>, 39659@debbugs.gnu.org
>> Date: Wed, 19 Feb 2020 18:53:07 +0900
>>
>> >> It is better to set values as:
>> >> 3200-33FF cjk-misc
>> >> 4DC0-4DFF cjk-misc
>> >> FE30-FE4F cjk-misc
>> >> 1F200-1F2FF cjk-misc
>> >>
>> >> If enclosed CJK Ideographs should be 'han' script,
>> >> enclosed Hanguls should be 'hangul' script,
>> >> enclosed Katakana should be 'kana' script,
>> >> and enclosed Numbers should be 'symbol' script.
>> >
>> > Please provide some rationale for the differences, just saying
>> > "better" and "should" doesn't explain why you think the changes are
>> > for the good.
>> >
>> > CC'ing Handa-san, who I hope will have some comments on this.
>> >
>> > Thanks.
>>
>> Because they are not han characters.
>> I think that combinatorial characters are not han characters,
>> and that they are symbolic characters.
>
> So your interpretation of cjk-misc is that they are symbols, not
> letters? I'm asking because I don't really know what is meant by
> "cjk-misc", I don't think we have it documented anywhere.
I guess the cjk-misc script means CJK related characters.
Block names in the Unicode Character Database are described as below.
(https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt)
3000..303F; CJK Symbols and Punctuation
31C0..31EF; CJK Strokes
3200..32FF; Enclosed CJK Letters and Months
3300..33FF; CJK Compatibility
4DC0..4DFF; Yijing Hexagram Symbols
FE30..FE4F; CJK Compatibility Forms
1F200..1F2FF; Enclosed Ideographic Supplement
Yijing Hexagram Symbols(U+4DC0..U+4DFF) are chinese symbols related with
2630-2637 TRIGRAM FOR *
268A-268B MONOGRAM FOR *
268C-268F DIGRAM FOR *
1D300-1D35F Tai Xuan Jing Symbols
The script symbol for "Yijing Hexagram Symbols" may be 'symbol or
'yijing-hexagram-symbol.
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
2020-02-19 9:53 ` ynyaaa
2020-02-19 15:43 ` Eli Zaretskii
@ 2020-02-29 3:39 ` handa
2020-02-29 7:34 ` Eli Zaretskii
1 sibling, 1 reply; 8+ messages in thread
From: handa @ 2020-02-29 3:39 UTC (permalink / raw)
To: ynyaaa; +Cc: 39659
In article <86y2syj43g.fsf@gmail.com>, ynyaaa@gmail.com writes:
>>> From: ynyaaa@gmail.com
>>> Date: Tue, 18 Feb 2020 22:50:57 +0900
>>>
>>> 'han' script is defined in char-script-table as:
>>> 2E80-2FDF han
>>> 3200-9FFF han
>>> F900-FAFF han
>>> FE30-FE4F han
>>> 1F200-1F2FF han
>>> 20000-2A6DF han
>>> 2A700-2EBEF han
>>> 2F800-2FA1F han
>>>
>>> It is better to set values as:
>>> 3200-33FF cjk-misc
>>> 4DC0-4DFF cjk-misc
>>> FE30-FE4F cjk-misc
>>> 1F200-1F2FF cjk-misc
The script names were at first assigned to help fontset.el which sets up
the default fontset by using script names in defining font specs (for
CHARSTE_REGISTRY of X fonts or "script" of OpenType fonts). So there
was no precise semantics.
I think it is ok to change/fix char-script-table to improve some
behavior of Emacs without breaking fontset.el.
---
K. Handa
handa@gnu.org
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
2020-02-29 3:39 ` handa
@ 2020-02-29 7:34 ` Eli Zaretskii
2020-03-08 1:13 ` handa
0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2020-02-29 7:34 UTC (permalink / raw)
To: handa; +Cc: ynyaaa, 39659
> From: handa <handa@gnu.org>
> Cc: eliz@gnu.org, 39659@debbugs.gnu.org
> Date: Sat, 29 Feb 2020 12:39:30 +0900
>
> In article <86y2syj43g.fsf@gmail.com>, ynyaaa@gmail.com writes:
> >>> From: ynyaaa@gmail.com
> >>> Date: Tue, 18 Feb 2020 22:50:57 +0900
> >>>
> >>> 'han' script is defined in char-script-table as:
> >>> 2E80-2FDF han
> >>> 3200-9FFF han
> >>> F900-FAFF han
> >>> FE30-FE4F han
> >>> 1F200-1F2FF han
> >>> 20000-2A6DF han
> >>> 2A700-2EBEF han
> >>> 2F800-2FA1F han
> >>>
> >>> It is better to set values as:
> >>> 3200-33FF cjk-misc
> >>> 4DC0-4DFF cjk-misc
> >>> FE30-FE4F cjk-misc
> >>> 1F200-1F2FF cjk-misc
>
> The script names were at first assigned to help fontset.el which sets up
> the default fontset by using script names in defining font specs (for
> CHARSTE_REGISTRY of X fonts or "script" of OpenType fonts). So there
> was no precise semantics.
OK, but would you agree that the latter group of character blocks,
i.e.
3200-33FF
4DC0-4DFF
FE30-FE4F
1F200-1F2FF
should be in the cjk-misc category? Or, to phrase this differently:
why was cjk-misc created in the first place, since the only difference
between it and han in the default fontset seems to be this single
element:
(nil . "JISX0213.2004-1")
which is present for the han script, but absent for cjk-misc. I don't
think I see where the CHARSET_REGISTRY of X or "script" of OpenType
fonts come into play, when distinguishing between han and cjk-misc
is concerned.
> I think it is ok to change/fix char-script-table to improve some
> behavior of Emacs without breaking fontset.el.
Can you elaborate about this? I don't think I understand which fixes
you had in mind, and how they could or could not break fontset.el.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
2020-02-29 7:34 ` Eli Zaretskii
@ 2020-03-08 1:13 ` handa
0 siblings, 0 replies; 8+ messages in thread
From: handa @ 2020-03-08 1:13 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: ynyaaa, 39659
In article <83sgitetio.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > The script names were at first assigned to help fontset.el which sets up
> > the default fontset by using script names in defining font specs (for
> > CHARSTE_REGISTRY of X fonts or "script" of OpenType fonts). So there
> > was no precise semantics.
> OK, but would you agree that the latter group of character blocks,
> i.e.
> 3200-33FF
> 4DC0-4DFF
> FE30-FE4F
> 1F200-1F2FF
> should be in the cjk-misc category? Or, to phrase this differently:
> why was cjk-misc created in the first place,
When I defined them, it was a transion period of font-related
environment. Af far as I remmeber, cjk-misc was introduced later for
fonts that covers characters used in CJK environment but not yet
covered by legacy CJK X fonts (JISX0208, JISX0212, GB2312, KSC5601).
> since the only difference between it and han in the default fontset
> seems to be this single element:
> (nil . "JISX0213.2004-1")
> which is present for the han script, but absent for cjk-misc.
The definition of the default fontset had been changed frequently on
the change of the font-related environment. Perhaps the current setting
must be re-considered based on the current font-related environment.
> > I think it is ok to change/fix char-script-table to improve some
> > behavior of Emacs without breaking fontset.el.
> Can you elaborate about this? I don't think I understand which fixes
> you had in mind, and how they could or could not break fontset.el.
As I don't know the tendency of the recent font-related environment, I
can not suggest how to fix the current setting. All I can say is that ,
when someone changes char-script-table, he should also check how script
is used for the definition of the default fontset.
---
K. Handa
handa@gnu.org
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-03-08 1:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-18 13:50 bug#39659: 27.0.60; inappropriate han script definition in char-script-table ynyaaa
2020-02-18 16:02 ` Eli Zaretskii
2020-02-19 9:53 ` ynyaaa
2020-02-19 15:43 ` Eli Zaretskii
2020-02-20 6:27 ` ynyaaa
2020-02-29 3:39 ` handa
2020-02-29 7:34 ` Eli Zaretskii
2020-03-08 1:13 ` handa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).