unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
@ 2020-02-18 13:50 ynyaaa
  2020-02-18 16:02 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: ynyaaa @ 2020-02-18 13:50 UTC (permalink / raw)
  To: 39659


'han' script is defined in char-script-table as:
	2E80-2FDF	han
	3200-9FFF	han
	F900-FAFF	han
	FE30-FE4F	han
	1F200-1F2FF	han
	20000-2A6DF	han
	2A700-2EBEF	han
	2F800-2FA1F	han

It is better to set values as:
	3200-33FF	cjk-misc
	4DC0-4DFF	cjk-misc
	FE30-FE4F	cjk-misc
	1F200-1F2FF	cjk-misc

If enclosed CJK Ideographs should be 'han' script,
enclosed Hanguls should be 'hangul' script,
enclosed Katakana should be 'kana' script,
and enclosed Numbers should be 'symbol' script.


In GNU Emacs 27.0.60 (build 1, x86_64-w64-mingw32)
 of 2019-12-29 built on CIRROCUMULUS
Repository revision: 21c3020fcec0a32122d2680a391864a75393031b
Repository branch: emacs-27
Windowing system distributor 'Microsoft Corp.', version 10.0.18363
System Description: Microsoft Windows 10 Pro (v10.0.1909.18363.657)

Recent messages:

Configured using:
 'configure --without-dbus --host=x86_64-w64-mingw32
 --without-compress-install -C 'CFLAGS=-O2 -static -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY W32NOTIFY ACL GNUTLS LIBXML2
HARFBUZZ ZLIB TOOLKIT_SCROLL_BARS MODULES THREADS PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: JPN
  locale-coding-system: cp932

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(rect wid-edit descr-text mule-diag thingatpt cl-extra novice help-fns
radix-tree cl-print debug backtrace find-func gnutls network-stream nsm
mailalias smtpmail auth-source cl-seq eieio eieio-core cl-macs
eieio-loaddefs json map misearch multi-isearch help-mode pp shadow sort
mail-extr term/bobcat emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs text-property-search time-date
subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
japan-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads w32notify w32 lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 921325 302508)
 (symbols 48 58666 0)
 (strings 32 119118 12074)
 (string-bytes 1 2586170)
 (vectors 16 88868)
 (vector-slots 8 2545555 209000)
 (floats 8 47 281)
 (intervals 56 44668 5857)
 (buffers 1000 22))





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
  2020-02-18 13:50 bug#39659: 27.0.60; inappropriate han script definition in char-script-table ynyaaa
@ 2020-02-18 16:02 ` Eli Zaretskii
  2020-02-19  9:53   ` ynyaaa
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2020-02-18 16:02 UTC (permalink / raw)
  To: ynyaaa, Kenichi Handa; +Cc: 39659

> From: ynyaaa@gmail.com
> Date: Tue, 18 Feb 2020 22:50:57 +0900
> 
> 'han' script is defined in char-script-table as:
> 	2E80-2FDF	han
> 	3200-9FFF	han
> 	F900-FAFF	han
> 	FE30-FE4F	han
> 	1F200-1F2FF	han
> 	20000-2A6DF	han
> 	2A700-2EBEF	han
> 	2F800-2FA1F	han
> 
> It is better to set values as:
> 	3200-33FF	cjk-misc
> 	4DC0-4DFF	cjk-misc
> 	FE30-FE4F	cjk-misc
> 	1F200-1F2FF	cjk-misc
> 
> If enclosed CJK Ideographs should be 'han' script,
> enclosed Hanguls should be 'hangul' script,
> enclosed Katakana should be 'kana' script,
> and enclosed Numbers should be 'symbol' script.

Please provide some rationale for the differences, just saying
"better" and "should" doesn't explain why you think the changes are
for the good.

CC'ing Handa-san, who I hope will have some comments on this.

Thanks.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
  2020-02-18 16:02 ` Eli Zaretskii
@ 2020-02-19  9:53   ` ynyaaa
  2020-02-19 15:43     ` Eli Zaretskii
  2020-02-29  3:39     ` handa
  0 siblings, 2 replies; 8+ messages in thread
From: ynyaaa @ 2020-02-19  9:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 39659

Eli Zaretskii <eliz@gnu.org> writes:

>> From: ynyaaa@gmail.com
>> Date: Tue, 18 Feb 2020 22:50:57 +0900
>> 
>> 'han' script is defined in char-script-table as:
>> 	2E80-2FDF	han
>> 	3200-9FFF	han
>> 	F900-FAFF	han
>> 	FE30-FE4F	han
>> 	1F200-1F2FF	han
>> 	20000-2A6DF	han
>> 	2A700-2EBEF	han
>> 	2F800-2FA1F	han
>> 
>> It is better to set values as:
>> 	3200-33FF	cjk-misc
>> 	4DC0-4DFF	cjk-misc
>> 	FE30-FE4F	cjk-misc
>> 	1F200-1F2FF	cjk-misc
>> 
>> If enclosed CJK Ideographs should be 'han' script,
>> enclosed Hanguls should be 'hangul' script,
>> enclosed Katakana should be 'kana' script,
>> and enclosed Numbers should be 'symbol' script.
>
> Please provide some rationale for the differences, just saying
> "better" and "should" doesn't explain why you think the changes are
> for the good.
>
> CC'ing Handa-san, who I hope will have some comments on this.
>
> Thanks.

Because they are not han characters.
I think that combinatorial characters are not han characters,
and that they are symbolic characters.

As for enclosed latin letters, they are treated as 'symbol' script.
	249C-24B5	PARENTHESIZED LATIN SMALL LETTER *
	24B6-24CF	CIRCLED LATIN CAPITAL LETTER *
	24D0-24E9	CIRCLED LATIN SMALL LETTER *
	1F110-1F129	PARENTHESIZED LATIN CAPITAL LETTER *
	1F130-1F149	SQUARED LATIN CAPITAL LETTER *
	1F150-1F169	NEGATIVE CIRCLED LATIN CAPITAL LETTER *
	1F170-1F189	NEGATIVE SQUARED LATIN CAPITAL LETTER *
	1F12A		TORTOISE SHELL BRACKETED LATIN CAPITAL LETTER S
	1F12B		CIRCLED ITALIC LATIN CAPITAL LETTER C
	1F12C		CIRCLED ITALIC LATIN CAPITAL LETTER R
	1F18A		CROSSED NEGATIVE SQUARED LATIN CAPITAL LETTER P
	1F1A5		SQUARED LATIN SMALL LETTER D

If script is set to han, hangul or kana for combinatorial characters
which contain han, hangul or kana characters, script values are like below:

CodePoint	Script	Comment
3200-321E	hangul	enclosed hangul
321F		-	unassigned
3220-3247	han	enclosed han
3248-324F	symbol	enclosed number
3250		symbol	combined latin
3251-325F	symbol	enclosed number
3260-327E	hangul	enclosed hangul
327F		symbol	symbol
3280-32B0	han	enclosed han
32B1-32BF	symbol	enclosed number
32C0-32CB	han	square character with han
32CC-32CF	symbol	square character with latin
32D0-32FE	kana	enclosed kana
32FF		han	square character with han
3300-3357	kana	square character with kana
3358-3370	han	square character with han
3371-337A	symbol	square character with latin
337B-337F	han	square character with han
3380-33DF	symbol	square character with latin
33E0-33FE	han	square character with han
33FF		symbol	square character with latin

4DC0-4DFF	symbol	symbol

FE30-FE44	symbol	symbol for vertical
FE45-FE46	symbol	symbol
FE47-FE48	symbol	symbol for vertical
FE49-FE4F	symbol	symbol

1F200-1F202	kana	enclosed/square character with kana
...		-	unassigned
1F210-1F212	han	enclosed han
1F213		kana	enclosed kana
1F214-1F248	han	enclosed han
...		-	unassigned
1F250-1F251	han	enclosed han
...		-	unassigned
1F260-1F265	symbol	symbol





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
  2020-02-19  9:53   ` ynyaaa
@ 2020-02-19 15:43     ` Eli Zaretskii
  2020-02-20  6:27       ` ynyaaa
  2020-02-29  3:39     ` handa
  1 sibling, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2020-02-19 15:43 UTC (permalink / raw)
  To: ynyaaa; +Cc: 39659

> From: ynyaaa@gmail.com
> Cc: Kenichi Handa <handa@gnu.org>,  39659@debbugs.gnu.org
> Date: Wed, 19 Feb 2020 18:53:07 +0900
> 
> >> It is better to set values as:
> >> 	3200-33FF	cjk-misc
> >> 	4DC0-4DFF	cjk-misc
> >> 	FE30-FE4F	cjk-misc
> >> 	1F200-1F2FF	cjk-misc
> >> 
> >> If enclosed CJK Ideographs should be 'han' script,
> >> enclosed Hanguls should be 'hangul' script,
> >> enclosed Katakana should be 'kana' script,
> >> and enclosed Numbers should be 'symbol' script.
> >
> > Please provide some rationale for the differences, just saying
> > "better" and "should" doesn't explain why you think the changes are
> > for the good.
> >
> > CC'ing Handa-san, who I hope will have some comments on this.
> >
> > Thanks.
> 
> Because they are not han characters.
> I think that combinatorial characters are not han characters,
> and that they are symbolic characters.

So your interpretation of cjk-misc is that they are symbols, not
letters?  I'm asking because I don't really know what is meant by
"cjk-misc", I don't think we have it documented anywhere.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
  2020-02-19 15:43     ` Eli Zaretskii
@ 2020-02-20  6:27       ` ynyaaa
  0 siblings, 0 replies; 8+ messages in thread
From: ynyaaa @ 2020-02-20  6:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 39659

Eli Zaretskii <eliz@gnu.org> writes:

>> From: ynyaaa@gmail.com
>> Cc: Kenichi Handa <handa@gnu.org>,  39659@debbugs.gnu.org
>> Date: Wed, 19 Feb 2020 18:53:07 +0900
>> 
>> >> It is better to set values as:
>> >> 	3200-33FF	cjk-misc
>> >> 	4DC0-4DFF	cjk-misc
>> >> 	FE30-FE4F	cjk-misc
>> >> 	1F200-1F2FF	cjk-misc
>> >> 
>> >> If enclosed CJK Ideographs should be 'han' script,
>> >> enclosed Hanguls should be 'hangul' script,
>> >> enclosed Katakana should be 'kana' script,
>> >> and enclosed Numbers should be 'symbol' script.
>> >
>> > Please provide some rationale for the differences, just saying
>> > "better" and "should" doesn't explain why you think the changes are
>> > for the good.
>> >
>> > CC'ing Handa-san, who I hope will have some comments on this.
>> >
>> > Thanks.
>> 
>> Because they are not han characters.
>> I think that combinatorial characters are not han characters,
>> and that they are symbolic characters.
>
> So your interpretation of cjk-misc is that they are symbols, not
> letters?  I'm asking because I don't really know what is meant by
> "cjk-misc", I don't think we have it documented anywhere.

I guess the cjk-misc script means CJK related characters.
Block names in the Unicode Character Database are described as below.
(https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt)
	3000..303F; CJK Symbols and Punctuation
	31C0..31EF; CJK Strokes
	3200..32FF; Enclosed CJK Letters and Months
	3300..33FF; CJK Compatibility
	4DC0..4DFF; Yijing Hexagram Symbols
	FE30..FE4F; CJK Compatibility Forms
	1F200..1F2FF; Enclosed Ideographic Supplement

Yijing Hexagram Symbols(U+4DC0..U+4DFF) are chinese symbols related with
	2630-2637	TRIGRAM FOR *
	268A-268B	MONOGRAM FOR *
	268C-268F	DIGRAM FOR *
	1D300-1D35F	Tai Xuan Jing Symbols
The script symbol for "Yijing Hexagram Symbols" may be 'symbol or
'yijing-hexagram-symbol.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
  2020-02-19  9:53   ` ynyaaa
  2020-02-19 15:43     ` Eli Zaretskii
@ 2020-02-29  3:39     ` handa
  2020-02-29  7:34       ` Eli Zaretskii
  1 sibling, 1 reply; 8+ messages in thread
From: handa @ 2020-02-29  3:39 UTC (permalink / raw)
  To: ynyaaa; +Cc: 39659

In article <86y2syj43g.fsf@gmail.com>, ynyaaa@gmail.com writes:
>>> From: ynyaaa@gmail.com
>>> Date: Tue, 18 Feb 2020 22:50:57 +0900
>>> 
>>> 'han' script is defined in char-script-table as:
>>> 2E80-2FDF	han
>>> 3200-9FFF	han
>>> F900-FAFF	han
>>> FE30-FE4F	han
>>> 1F200-1F2FF	han
>>> 20000-2A6DF	han
>>> 2A700-2EBEF	han
>>> 2F800-2FA1F	han
>>> 
>>> It is better to set values as:
>>> 3200-33FF	cjk-misc
>>> 4DC0-4DFF	cjk-misc
>>> FE30-FE4F	cjk-misc
>>> 1F200-1F2FF	cjk-misc

The script names were at first assigned to help fontset.el which sets up
the default fontset by using script names in defining font specs (for
CHARSTE_REGISTRY of X fonts or "script" of OpenType fonts).  So there
was no precise semantics.

I think it is ok to change/fix char-script-table to improve some
behavior of Emacs without breaking fontset.el.

---
K. Handa
handa@gnu.org





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
  2020-02-29  3:39     ` handa
@ 2020-02-29  7:34       ` Eli Zaretskii
  2020-03-08  1:13         ` handa
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2020-02-29  7:34 UTC (permalink / raw)
  To: handa; +Cc: ynyaaa, 39659

> From: handa <handa@gnu.org>
> Cc: eliz@gnu.org,  39659@debbugs.gnu.org
> Date: Sat, 29 Feb 2020 12:39:30 +0900
> 
> In article <86y2syj43g.fsf@gmail.com>, ynyaaa@gmail.com writes:
> >>> From: ynyaaa@gmail.com
> >>> Date: Tue, 18 Feb 2020 22:50:57 +0900
> >>> 
> >>> 'han' script is defined in char-script-table as:
> >>> 2E80-2FDF	han
> >>> 3200-9FFF	han
> >>> F900-FAFF	han
> >>> FE30-FE4F	han
> >>> 1F200-1F2FF	han
> >>> 20000-2A6DF	han
> >>> 2A700-2EBEF	han
> >>> 2F800-2FA1F	han
> >>> 
> >>> It is better to set values as:
> >>> 3200-33FF	cjk-misc
> >>> 4DC0-4DFF	cjk-misc
> >>> FE30-FE4F	cjk-misc
> >>> 1F200-1F2FF	cjk-misc
> 
> The script names were at first assigned to help fontset.el which sets up
> the default fontset by using script names in defining font specs (for
> CHARSTE_REGISTRY of X fonts or "script" of OpenType fonts).  So there
> was no precise semantics.

OK, but would you agree that the latter group of character blocks,
i.e.

 3200-33FF
 4DC0-4DFF
 FE30-FE4F
 1F200-1F2FF

should be in the cjk-misc category?  Or, to phrase this differently:
why was cjk-misc created in the first place, since the only difference
between it and han in the default fontset seems to be this single
element:

	  (nil . "JISX0213.2004-1")

which is present for the han script, but absent for cjk-misc.  I don't
think I see where the CHARSET_REGISTRY of X or "script" of OpenType
fonts come into play, when distinguishing between han and cjk-misc
is concerned.

> I think it is ok to change/fix char-script-table to improve some
> behavior of Emacs without breaking fontset.el.

Can you elaborate about this?  I don't think I understand which fixes
you had in mind, and how they could or could not break fontset.el.

Thanks.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39659: 27.0.60; inappropriate han script definition in char-script-table
  2020-02-29  7:34       ` Eli Zaretskii
@ 2020-03-08  1:13         ` handa
  0 siblings, 0 replies; 8+ messages in thread
From: handa @ 2020-03-08  1:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ynyaaa, 39659

In article <83sgitetio.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > The script names were at first assigned to help fontset.el which sets up
> > the default fontset by using script names in defining font specs (for
> > CHARSTE_REGISTRY of X fonts or "script" of OpenType fonts).  So there
> > was no precise semantics.

> OK, but would you agree that the latter group of character blocks,
> i.e.

>  3200-33FF
>  4DC0-4DFF
>  FE30-FE4F
>  1F200-1F2FF

> should be in the cjk-misc category?  Or, to phrase this differently:
> why was cjk-misc created in the first place,

When I defined them, it was a transion period of font-related
environment.  Af far as I remmeber, cjk-misc was introduced later for
fonts that covers characters used in CJK environment but not yet
covered by legacy CJK X fonts (JISX0208, JISX0212, GB2312, KSC5601).

> since the only difference between it and han in the default fontset
> seems to be this single element:
> 	  (nil . "JISX0213.2004-1")
> which is present for the han script, but absent for cjk-misc.  

The definition of the default fontset had been changed frequently on
the change of the font-related environment.  Perhaps the current setting
must be re-considered based on the current font-related environment.

> > I think it is ok to change/fix char-script-table to improve some
> > behavior of Emacs without breaking fontset.el.

> Can you elaborate about this?  I don't think I understand which fixes
> you had in mind, and how they could or could not break fontset.el.

As I don't know the tendency of the recent font-related environment, I
can not suggest how to fix the current setting.  All I can say is that ,
when someone changes char-script-table, he should also check how script
is used for the definition of the default fontset.

---
K. Handa
handa@gnu.org





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-08  1:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-18 13:50 bug#39659: 27.0.60; inappropriate han script definition in char-script-table ynyaaa
2020-02-18 16:02 ` Eli Zaretskii
2020-02-19  9:53   ` ynyaaa
2020-02-19 15:43     ` Eli Zaretskii
2020-02-20  6:27       ` ynyaaa
2020-02-29  3:39     ` handa
2020-02-29  7:34       ` Eli Zaretskii
2020-03-08  1:13         ` handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).