segmentation fault displaying etc/HELLO on Windows

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* segmentation fault displaying etc/HELLO on Windows
@ 2008-07-29 20:49 Juanma Barranquero
  2008-07-30  6:48 ` Jason Rumney
  0 siblings, 1 reply; 30+ messages in thread
From: Juanma Barranquero @ 2008-07-29 20:49 UTC (permalink / raw)
  To: emacs- devel

Scrolling seems to be quite fast now on Windows (not yet up to 22.X
levels, but much faster than before).

However, C-h H crashes Emacs:

Program received signal SIGSEGV, Segmentation fault.
0x011ea1ff in w32_compute_glyph_string_overhangs (s=0x82eaf0) at w32term.c:1194
1194          font->driver->text_extents (font, code, s->nchars, &metrics);
(gdb) bt
#0  0x011ea1ff in w32_compute_glyph_string_overhangs (s=0x82eaf0) at
w32term.c:1194
#1  0x010569a4 in draw_glyphs (w=0x2f4b800, x=444, row=0x2fd8428,
area=TEXT_AREA, start=0, end=57, hl=DRAW_NORMAL_TEXT,
    overlaps=0) at xdisp.c:20359
#2  0x01059339 in x_write_glyphs (start=0x32ba000, len=57) at xdisp.c:21819
#3  0x0115bafb in update_window_line (w=0x2f4b800, vpos=7,
mouse_face_overwritten_p=0x82f0ec) at dispnew.c:4453
#4  0x0115c6cc in update_window (w=0x2f4b800, force_p=0) at dispnew.c:4309
#5  0x0115ed86 in update_window_tree (w=0x2f4b800, force_p=0) at dispnew.c:4002
#6  0x01160574 in update_frame (f=0x2f4ba00, force_p=0,
inhibit_hairy_id_p=0) at dispnew.c:3930
#7  0x010474b1 in redisplay_internal (preserve_echo_area=<value
optimized out>) at xdisp.c:11854
#8  0x0108a279 in read_char (commandflag=1, nmaps=3, maps=0x82fb70,
prev_event=47511553, used_mouse_menu=0x82fc34, end_time=0x0)
    at keyboard.c:2680
#9  0x0108e8ca in read_key_sequence (keybuf=0x82fcd4, bufsize=30,
prompt=47511553, dont_downcase_last=0,
    can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:9406
#10 0x01091a3d in command_loop_1 () at keyboard.c:1646
#11 0x01019016 in internal_condition_case (bfun=0x10917af
<command_loop_1>, handlers=47575281, hfun=0x1088a16 <cmd_error>)
    at eval.c:1511
#12 0x01087d2b in command_loop_2 () at keyboard.c:1362
#13 0x010190c0 in internal_catch (tag=47571353, func=0x1087d08
<command_loop_2>, arg=47511553) at eval.c:1247
#14 0x0108885b in command_loop () at keyboard.c:1341
#15 0x01088baf in recursive_edit_1 () at keyboard.c:950
#16 0x01088d1a in Frecursive_edit () at keyboard.c:1012
#17 0x01002c41 in main (argc=2, argv=0xa92750) at emacs.c:1749

-- 
 Juanma




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: segmentation fault displaying etc/HELLO on Windows
  2008-07-29 20:49 segmentation fault displaying etc/HELLO on Windows Juanma Barranquero
@ 2008-07-30  6:48 ` Jason Rumney
  2008-07-30 11:48   ` Juanma Barranquero
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Rumney @ 2008-07-30  6:48 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs- devel

Juanma Barranquero wrote:

> However, C-h H crashes Emacs:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x011ea1ff in w32_compute_glyph_string_overhangs (s=0x82eaf0) at w32term.c:1194
> 1194          font->driver->text_extents (font, code, s->nchars, &metrics);
> (gdb) bt
> #0  0x011ea1ff in w32_compute_glyph_string_overhangs (s=0x82eaf0) at
> w32term.c:1194

That seems to suggest that s->font is invalid, but s->font_not_found_p
is not set. Can you see what characters are being drawn here
(s->first_glyph->u.ch) so you can narrow down which font should be being
used (based on earlier versions) and start stepping through the code
earlier (use a conditional breakpoint based on the character being
displayed).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: segmentation fault displaying etc/HELLO on Windows
  2008-07-30  6:48 ` Jason Rumney
@ 2008-07-30 11:48   ` Juanma Barranquero
  2008-07-30 13:05     ` Jason Rumney
  0 siblings, 1 reply; 30+ messages in thread
From: Juanma Barranquero @ 2008-07-30 11:48 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs- devel

On Wed, Jul 30, 2008 at 08:48, Jason Rumney <jasonr@gnu.org> wrote:

>> Program received signal SIGSEGV, Segmentation fault.
>> 0x011ea1ff in w32_compute_glyph_string_overhangs (s=0x82eaf0) at w32term.c:1194
>> 1194          font->driver->text_extents (font, code, s->nchars, &metrics);
>> (gdb) bt
>> #0  0x011ea1ff in w32_compute_glyph_string_overhangs (s=0x82eaf0) at
>> w32term.c:1194

Hmm. I hate heisenbugs.

1) The backtrace above was at home, rebuilding (but not
bootstrapping), with system-configuration-options "--with-gcc (4.3)
--cflags -DENABLE_CHECKING=1 -DSITELOAD_PURESIZE_EXTRA=200000
-IC:/emacs/build/include -fno-crossjumping".

2) However, at work, after bootstrapping and with the same
system-configuration-options, I get this backtrace:

Program received signal SIGSEGV, Segmentation fault.
x_produce_glyphs (it=0x82e08c) at xdisp.c:19561
19561         PREPARE_FACE_FOR_DISPLAY (f, face);
(gdb) bt
#0  x_produce_glyphs (it=0x82e08c) at xdisp.c:19561
#1  0x0103b097 in display_line (it=0x82e08c) at xdisp.c:16662
#2  0x0103fbe6 in try_window (window=49766404, pos={charpos = 1,
bytepos = 1}, check_margins=1) at xdisp.c:14073
#3  0x01053f16 in redisplay_window (window=49766404,
just_this_one_p=0) at xdisp.c:13691
#4  0x01055fd9 in redisplay_window_0 (window=49766404) at xdisp.c:12276
#5  0x01018d91 in internal_condition_case_1 (bfun=0x1055fb6
<redisplay_window_0>, arg=49766404, handlers=47498501,
    hfun=0x1023dce <redisplay_window_error>) at eval.c:1559
#6  0x0102db3d in redisplay_windows (window=34) at xdisp.c:12255
#7  0x010474fd in redisplay_internal (preserve_echo_area=<value
optimized out>) at xdisp.c:11821
#8  0x0108a279 in read_char (commandflag=1, nmaps=3, maps=0x82fb70,
prev_event=47515649, used_mouse_menu=0x82fc34,
    end_time=0x0) at keyboard.c:2680
#9  0x0108e8ca in read_key_sequence (keybuf=0x82fcd4, bufsize=30,
prompt=47515649, dont_downcase_last=0,
    can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:9406
#10 0x01091a3d in command_loop_1 () at keyboard.c:1646
#11 0x01019016 in internal_condition_case (bfun=0x10917af
<command_loop_1>, handlers=47579377,
    hfun=0x1088a16 <cmd_error>) at eval.c:1511
#12 0x01087d2b in command_loop_2 () at keyboard.c:1362
#13 0x010190c0 in internal_catch (tag=47575449, func=0x1087d08
<command_loop_2>, arg=47515649) at eval.c:1247
#14 0x0108885b in command_loop () at keyboard.c:1341
#15 0x01088baf in recursive_edit_1 () at keyboard.c:950
#16 0x01088d1a in Frecursive_edit () at keyboard.c:1012
#17 0x01002c41 in main (argc=2, argv=0xa841e0) at emacs.c:1749

3) So I thought, let's do a non-optimizing build. This time I
bootstrapped (at work) with system-configuration-options "--with-gcc
(4.3) --no-opt --cflags -DENABLE_CHECKING=1
-DSITELOAD_PURESIZE_EXTRA=200000 -IC:/emacs/build/include". Now Emacs
does *not* crash.

All compilations were done with gcc (GCC) 4.3.0 20080305
(alpha-testing) mingw-20080502

Any pointer about how to debug this?

  Juanma




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: segmentation fault displaying etc/HELLO on Windows
  2008-07-30 11:48   ` Juanma Barranquero
@ 2008-07-30 13:05     ` Jason Rumney
  2008-07-30 13:11       ` Jason Rumney
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Rumney @ 2008-07-30 13:05 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs- devel

Juanma Barranquero wrote:

> 3) So I thought, let's do a non-optimizing build. This time I
> bootstrapped (at work) with system-configuration-options "--with-gcc
> (4.3) --no-opt --cflags -DENABLE_CHECKING=1
> -DSITELOAD_PURESIZE_EXTRA=200000 -IC:/emacs/build/include". Now Emacs
> does *not* crash.

That might explain why I don't see any crashes. I suspect the problem
lies in the changes made to uniscribe_encode_char, so I'll review those.
I did resize some buffers a couple of times while writing the code, and
may have ended up with inconsistent sizing.





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: segmentation fault displaying etc/HELLO on Windows
  2008-07-30 13:05     ` Jason Rumney
@ 2008-07-30 13:11       ` Jason Rumney
  2008-07-30 14:03         ` Juanma Barranquero
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Rumney @ 2008-07-30 13:11 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs- devel

Jason Rumney wrote:
> Juanma Barranquero wrote:
> 
>> 3) So I thought, let's do a non-optimizing build. This time I
>> bootstrapped (at work) with system-configuration-options "--with-gcc
>> (4.3) --no-opt --cflags -DENABLE_CHECKING=1
>> -DSITELOAD_PURESIZE_EXTRA=200000 -IC:/emacs/build/include". Now Emacs
>> does *not* crash.
> 
> That might explain why I don't see any crashes. I suspect the problem
> lies in the changes made to uniscribe_encode_char, so I'll review those.
> I did resize some buffers a couple of times while writing the code, and
> may have ended up with inconsistent sizing.

Indeed, that was the problem. I reduced the buffer size to 1, since
characters that produce multiple glyphs can't be handled properly by
uniscribe_encode_char, but passed in a size of 20 (a number I'd picked
after I'd come across Indic characters that produce more than the 2
glyphs I'd originally allowed for) to the system function. It might
explain some of the unexplained display corruption I was seeing with
some Indic characters that I had to work around, so I may be able to
simplify that function now.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: segmentation fault displaying etc/HELLO on Windows
  2008-07-30 13:11       ` Jason Rumney
@ 2008-07-30 14:03         ` Juanma Barranquero
  2008-07-30 14:19           ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Jason Rumney
  2008-07-31  1:49           ` segmentation fault displaying etc/HELLO on Windows Kyle M. Lee
  0 siblings, 2 replies; 30+ messages in thread
From: Juanma Barranquero @ 2008-07-30 14:03 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs- devel

On Wed, Jul 30, 2008 at 15:11, Jason Rumney <jasonr@gnu.org> wrote:

> Indeed, that was the problem. I reduced the buffer size to 1, since
> characters that produce multiple glyphs can't be handled properly by
> uniscribe_encode_char, but passed in a size of 20 (a number I'd picked
> after I'd come across Indic characters that produce more than the 2
> glyphs I'd originally allowed for) to the system function.

I can confirm that it doesn't crash anymore.

And the performance improvement is really great; I've been able to
scroll down the whole etc/NEWS buffer with the redisplay keeping the
pace and not a single undesired recenter in sight...

BTW, one of those questions about weird font selection choices by the
font backend:

In etc/HELLO there are two instances of U+2200 (FOR ALL). The newest
release of DejaVu (2.26) added a glyph for that codepoint to DejaVu
Sans Mono, which I use as default font.

Now the weird thing is, the first FOR ALL in etc/HELLO is shown as

        character: ∀ (8704, #o21000, #x2200)
preferred charset: unicode (Unicode (ISO10646))
       code point: 0x2200
           syntax: . 	which means: punctuation
         category: h:Korean j:Japanese
      buffer code: #xE2 #x88 #x80
        file code: ESC #x24 #x42 #x22 #x4F (encoded by coding system
iso-2022-7bit-dos)
          display: by this font (glyph code)
    uniscribe:-outline-DejaVu Sans
Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 (#x7A2)

while the second one is

        character: ∀ (8704, #o21000, #x2200)
preferred charset: unicode (Unicode (ISO10646))
       code point: 0x2200
           syntax: . 	which means: punctuation
         category: h:Korean j:Japanese
      buffer code: #xE2 #x88 #x80
        file code: ESC #x24 #x42 #x22 #x4F (encoded by coding system
iso-2022-7bit-dos)
          display: by this font (glyph code)
    uniscribe:-outline-MS
Mincho-normal-normal-normal-mono-13-*-*-*-c-*-jisx0208*-* (#x421)

Shouldn't it use DejaVu Sans Mono for both?

  Juanma

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows)
  2008-07-30 14:03         ` Juanma Barranquero
@ 2008-07-30 14:19           ` Jason Rumney
  2008-07-30 15:03             ` Choice of fonts displaying etc/HELLO Jason Rumney
  2008-08-01 12:56             ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Kenichi Handa
  2008-07-31  1:49           ` segmentation fault displaying etc/HELLO on Windows Kyle M. Lee
  1 sibling, 2 replies; 30+ messages in thread
From: Jason Rumney @ 2008-07-30 14:19 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs- devel

Juanma Barranquero wrote:

> In etc/HELLO there are two instances of U+2200 (FOR ALL). The newest
> release of DejaVu (2.26) added a glyph for that codepoint to DejaVu
> Sans Mono, which I use as default font.
> 
> Now the weird thing is, the first FOR ALL in etc/HELLO is shown as
> 
>         character: ∀ (8704, #o21000, #x2200)
> preferred charset: unicode (Unicode (ISO10646))
>        code point: 0x2200
>            syntax: . 	which means: punctuation
>          category: h:Korean j:Japanese
>       buffer code: #xE2 #x88 #x80
>         file code: ESC #x24 #x42 #x22 #x4F (encoded by coding system
> iso-2022-7bit-dos)
>           display: by this font (glyph code)
>     uniscribe:-outline-DejaVu Sans
> Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 (#x7A2)
> 
> while the second one is
> 
>         character: ∀ (8704, #o21000, #x2200)
> preferred charset: unicode (Unicode (ISO10646))
>        code point: 0x2200
>            syntax: . 	which means: punctuation
>          category: h:Korean j:Japanese
>       buffer code: #xE2 #x88 #x80
>         file code: ESC #x24 #x42 #x22 #x4F (encoded by coding system
> iso-2022-7bit-dos)
>           display: by this font (glyph code)
>     uniscribe:-outline-MS
> Mincho-normal-normal-normal-mono-13-*-*-*-c-*-jisx0208*-* (#x421)
> 
> Shouldn't it use DejaVu Sans Mono for both?

I have no idea why these use different fonts (even the file code is the
same, so it is not a difference in iso-2022 codepoints chosen), on my
installation both use MS Mincho (I probably have an older version of
DejaVu Mono that does not support that character). And why does that
character have a category of h:Korean j:Japanese?

Handa-san, can you explain this?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-07-30 14:19           ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Jason Rumney
@ 2008-07-30 15:03             ` Jason Rumney
  2008-07-30 15:26               ` Juanma Barranquero
  2008-08-01 12:56             ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Kenichi Handa
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Rumney @ 2008-07-30 15:03 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs- devel

Jason Rumney wrote:
> Juanma Barranquero wrote:
> 
>> In etc/HELLO there are two instances of U+2200 (FOR ALL). The newest
>> release of DejaVu (2.26) added a glyph for that codepoint to DejaVu
>> Sans Mono, which I use as default font.
>>
>> Now the weird thing is, the first FOR ALL in etc/HELLO is shown as
>>
>>         character: ∀ (8704, #o21000, #x2200)
>> preferred charset: unicode (Unicode (ISO10646))
>>        code point: 0x2200
>>            syntax: . 	which means: punctuation
>>          category: h:Korean j:Japanese
>>       buffer code: #xE2 #x88 #x80
>>         file code: ESC #x24 #x42 #x22 #x4F (encoded by coding system
>> iso-2022-7bit-dos)
>>           display: by this font (glyph code)
>>     uniscribe:-outline-DejaVu Sans
>> Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 (#x7A2)
>>
>> while the second one is
>>
>>         character: ∀ (8704, #o21000, #x2200)
>> preferred charset: unicode (Unicode (ISO10646))
>>        code point: 0x2200
>>            syntax: . 	which means: punctuation
>>          category: h:Korean j:Japanese
>>       buffer code: #xE2 #x88 #x80
>>         file code: ESC #x24 #x42 #x22 #x4F (encoded by coding system
>> iso-2022-7bit-dos)
>>           display: by this font (glyph code)
>>     uniscribe:-outline-MS
>> Mincho-normal-normal-normal-mono-13-*-*-*-c-*-jisx0208*-* (#x421)
>>
>> Shouldn't it use DejaVu Sans Mono for both?
> 
> I have no idea why these use different fonts (even the file code is the
> same, so it is not a difference in iso-2022 codepoints chosen), on my
> installation both use MS Mincho (I probably have an older version of
> DejaVu Mono that does not support that character). And why does that
> character have a category of h:Korean j:Japanese?

This is strange. After upgrading my DejaVu fonts to the latest version,
both for-all signs are displayed using DejaVu Sans Mono. But I do now
notice that the first has:

There are text properties here:
  auto-composed        t
  charset              mule-unicode-0100-24ff

while the second says:

  auto-composed        t
  charset              japanese-jisx0208

So probably the file code is different despite what file code is
reported above that.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-07-30 15:03             ` Choice of fonts displaying etc/HELLO Jason Rumney
@ 2008-07-30 15:26               ` Juanma Barranquero
  2008-08-01 12:50                 ` Kenichi Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Juanma Barranquero @ 2008-07-30 15:26 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs- devel

On Wed, Jul 30, 2008 at 17:03, Jason Rumney <jasonr@gnu.org> wrote:

> So probably the file code is different despite what file code is
> reported above that.

^[$,1x    => ESC #x24 #x2c #x31 #x78 #x20

vs.

^[$B"O    => ESC #x24 #x42 #x22 #x4F

apparently. So the last one is correctly reported.

  Juanma




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: segmentation fault displaying etc/HELLO on Windows
  2008-07-30 14:03         ` Juanma Barranquero
  2008-07-30 14:19           ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Jason Rumney
@ 2008-07-31  1:49           ` Kyle M. Lee
  2008-07-31  2:03             ` Juanma Barranquero
  1 sibling, 1 reply; 30+ messages in thread
From: Kyle M. Lee @ 2008-07-31  1:49 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs- devel, Jason Rumney

Juanma Barranquero 写道:
> On Wed, Jul 30, 2008 at 15:11, Jason Rumney <jasonr@gnu.org> wrote:
> 
>> Indeed, that was the problem. I reduced the buffer size to 1, since
>> characters that produce multiple glyphs can't be handled properly by
>> uniscribe_encode_char, but passed in a size of 20 (a number I'd picked
>> after I'd come across Indic characters that produce more than the 2
>> glyphs I'd originally allowed for) to the system function.
> 
> I can confirm that it doesn't crash anymore.
> 

Does that mean I have to compile the cvs emacs with --no-opt ?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: segmentation fault displaying etc/HELLO on Windows
  2008-07-31  1:49           ` segmentation fault displaying etc/HELLO on Windows Kyle M. Lee
@ 2008-07-31  2:03             ` Juanma Barranquero
  0 siblings, 0 replies; 30+ messages in thread
From: Juanma Barranquero @ 2008-07-31  2:03 UTC (permalink / raw)
  To: Kyle M. Lee; +Cc: emacs- devel, Jason Rumney

> Does that mean I have to compile the cvs emacs with --no-opt ?

No.

   Juanma




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-07-30 15:26               ` Juanma Barranquero
@ 2008-08-01 12:50                 ` Kenichi Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Kenichi Handa @ 2008-08-01 12:50 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: emacs-devel, jasonr

In article <f7ccd24b0807300826w445a5710qd23eca35fe5711fe@mail.gmail.com>, "Juanma Barranquero" <lekktu@gmail.com> writes:

> On Wed, Jul 30, 2008 at 17:03, Jason Rumney <jasonr@gnu.org> wrote:
> > So probably the file code is different despite what file code is
> > reported above that.

> ^[$,1x    => ESC #x24 #x2c #x31 #x78 #x20

> vs.

> ^[$B"O    => ESC #x24 #x42 #x22 #x4F

> apparently. So the last one is correctly reported.

Ah, this is a bug of describe-char.  It didn't check
`charset' text-property on encoding.

I've just installed this change.

2008-08-01  Kenichi Handa  <handa@m17n.org>

	* descr-text.el (describe-char-display): Call encode-coding-char
	with the arg CHARSET.
	(describe-char): Pay attention to the text-property `charset'.

	* international/mule-cmds.el (encode-coding-char): New optional
	arg CHARSET.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows)
  2008-07-30 14:19           ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Jason Rumney
  2008-07-30 15:03             ` Choice of fonts displaying etc/HELLO Jason Rumney
@ 2008-08-01 12:56             ` Kenichi Handa
  2008-08-01 13:17               ` Choice of fonts displaying etc/HELLO Jason Rumney
  1 sibling, 1 reply; 30+ messages in thread
From: Kenichi Handa @ 2008-08-01 12:56 UTC (permalink / raw)
  To: Jason Rumney; +Cc: lekktu, emacs-devel

In article <48907856.6040308@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> I have no idea why these use different fonts (even the file code is the
> same, so it is not a difference in iso-2022 codepoints chosen), on my
> installation both use MS Mincho (I probably have an older version of
> DejaVu Mono that does not support that character).

As you already know, this is because the file designates the
different charset for those two characters, thus Emacs add
the different `charset' text-property to them, which affects
font selection.

> And why does that character have a category of h:Korean
> j:Japanese?

That character is surely included in Korean charset ksc5601
and Japanese charset jisx0208, and all such characters
belongs to cateogories "h" and "j" (set in characters.el).

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-01 12:56             ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Kenichi Handa
@ 2008-08-01 13:17               ` Jason Rumney
  2008-08-01 13:51                 ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Rumney @ 2008-08-01 13:17 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, emacs-devel

Kenichi Handa wrote:
> In article <48907856.6040308@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

>> And why does that character have a category of h:Korean
>> j:Japanese?
> 
> That character is surely included in Korean charset ksc5601
> and Japanese charset jisx0208, and all such characters
> belongs to cateogories "h" and "j" (set in characters.el).

But is that the right way to categorize them? This character (and others
around it) are not Japanese or Korean characters. They just happen to be
included in those encodings. The same goes for Cyrillic and Greek
characters.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-01 13:17               ` Choice of fonts displaying etc/HELLO Jason Rumney
@ 2008-08-01 13:51                 ` Eli Zaretskii
  2008-08-05  7:33                   ` Kenichi Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2008-08-01 13:51 UTC (permalink / raw)
  To: Jason Rumney; +Cc: lekktu, emacs-devel, handa

> Date: Fri, 01 Aug 2008 14:17:24 +0100
> From: Jason Rumney <jasonr@gnu.org>
> Cc: lekktu@gmail.com, emacs-devel@gnu.org
> 
> Kenichi Handa wrote:
> > In article <48907856.6040308@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:
> 
> >> And why does that character have a category of h:Korean
> >> j:Japanese?
> > 
> > That character is surely included in Korean charset ksc5601
> > and Japanese charset jisx0208, and all such characters
> > belongs to cateogories "h" and "j" (set in characters.el).
> 
> But is that the right way to categorize them? This character (and others
> around it) are not Japanese or Korean characters. They just happen to be
> included in those encodings. The same goes for Cyrillic and Greek
> characters.

I agree with Jason.  I reported a similar curiosity the moment the
unicode-2 branch was merged with the trunk.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-01 13:51                 ` Eli Zaretskii
@ 2008-08-05  7:33                   ` Kenichi Handa
  2008-08-05 18:12                     ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Kenichi Handa @ 2008-08-05  7:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, emacs-devel, jasonr

In article <uvdykn8rl.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > But is that the right way to categorize them? This character (and others
> > around it) are not Japanese or Korean characters. They just happen to be
> > included in those encodings. The same goes for Cyrillic and Greek
> > characters.

> I agree with Jason.  I reported a similar curiosity the moment the
> unicode-2 branch was merged with the trunk.

We don't define the semantics of "category" clearly.  We can
think of the category name "Japanese" as "characters
belonging to one of Japanese character set".  Then that
character surely have that cateogry.  And, in Emacs 22, that
kind of definition was surely useful.

Although, I agree that such kind of definition is not
appropriate now, I have no good idea about how to improve it
without breaking backward compatibility.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-05  7:33                   ` Kenichi Handa
@ 2008-08-05 18:12                     ` Eli Zaretskii
  2008-08-06  5:30                       ` Kenichi Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2008-08-05 18:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, emacs-devel, jasonr

> From: Kenichi Handa <handa@m17n.org>
> CC: jasonr@gnu.org, lekktu@gmail.com, emacs-devel@gnu.org
> Date: Tue, 05 Aug 2008 16:33:06 +0900
> 
> In article <uvdykn8rl.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > But is that the right way to categorize them? This character (and others
> > > around it) are not Japanese or Korean characters. They just happen to be
> > > included in those encodings. The same goes for Cyrillic and Greek
> > > characters.
> 
> > I agree with Jason.  I reported a similar curiosity the moment the
> > unicode-2 branch was merged with the trunk.
> 
> We don't define the semantics of "category" clearly.  We can
> think of the category name "Japanese" as "characters
> belonging to one of Japanese character set".  Then that
> character surely have that cateogry.  And, in Emacs 22, that
> kind of definition was surely useful.
> 
> Although, I agree that such kind of definition is not
> appropriate now, I have no good idea about how to improve it
> without breaking backward compatibility.

I think the only meaningful categories are those defined by Unicode.
That is, a block of Cyrillic characters should have the cyrillic
category, the block of Japanese characters should have japanese
category, etc.  The fact that the character is covered by some
ISO-2022 charset is not interesting in Emacs 23, unless I'm missing
something.

What kind of backward compatibility problems could we cause?  Who or
what code depends on these categories?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-05 18:12                     ` Eli Zaretskii
@ 2008-08-06  5:30                       ` Kenichi Handa
  2008-08-06  6:14                         ` Stephen J. Turnbull
  2008-08-06 17:56                         ` Eli Zaretskii
  0 siblings, 2 replies; 30+ messages in thread
From: Kenichi Handa @ 2008-08-06  5:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, jasonr, emacs-devel

In article <uzlnrjpp5.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> I think the only meaningful categories are those defined by Unicode.
> That is, a block of Cyrillic characters should have the cyrillic
> category, the block of Japanese characters should have japanese
> category, etc.

Unfortunately, there's no such block as "Japanese" in
Unicode.

> The fact that the character is covered by some
> ISO-2022 charset is not interesting in Emacs 23, unless I'm missing
> something.

> What kind of backward compatibility problems could we cause?  Who or
> what code depends on these categories?

(re-search-forward "\\cj") can effectively search for
characters belonging to Japanese charset.  It is used, for
instance, in japanese-hankaku-region (in japan-util.el), and
"\\cc" is used in encode-hz-region (in china-util.el).

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-06  5:30                       ` Kenichi Handa
@ 2008-08-06  6:14                         ` Stephen J. Turnbull
  2008-08-06  6:29                           ` Kenichi Handa
  2008-08-06 17:56                         ` Eli Zaretskii
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen J. Turnbull @ 2008-08-06  6:14 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, Eli Zaretskii, emacs-devel, jasonr

Kenichi Handa writes:

 > Unfortunately, there's no such block as "Japanese" in
 > Unicode.

Unicode blocks are not really based on language, although the
correlation is pretty high.

Emacs could adapt fontconfig's "orthography" mechanism instead.





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-06  6:14                         ` Stephen J. Turnbull
@ 2008-08-06  6:29                           ` Kenichi Handa
  2008-08-06 15:52                             ` Stephen J. Turnbull
  0 siblings, 1 reply; 30+ messages in thread
From: Kenichi Handa @ 2008-08-06  6:29 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: lekktu, eliz, emacs-devel, jasonr

In article <874p5ywtz4.fsf@uwakimon.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" <stephen@xemacs.org> writes:

> Kenichi Handa writes:
> Unfortunately, there's no such block as "Japanese" in
> Unicode.

> Unicode blocks are not really based on language, although the
> correlation is pretty high.

Yes.

> Emacs could adapt fontconfig's "orthography" mechanism instead.

But, for selecting fonts for symbols (the current case is
U+2200 [FOR ALL]), such a mechanism doesn't work.

In addition, currently, Emacs doesn't know in which langauge
a text is written.  So, we can't use an appropriate ":lang"
property of fontconfig.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-06  6:29                           ` Kenichi Handa
@ 2008-08-06 15:52                             ` Stephen J. Turnbull
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen J. Turnbull @ 2008-08-06 15:52 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, eliz, jasonr, emacs-devel

Kenichi Handa writes:
 > In article <874p5ywtz4.fsf@uwakimon.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" <stephen@xemacs.org> writes:

 > > Emacs could adapt fontconfig's "orthography" mechanism instead.
 > 
 > But, for selecting fonts for symbols (the current case is
 > U+2200 [FOR ALL]), such a mechanism doesn't work.

Of course.  Here's how I think about it.

Historically, East Asian coded character sets have tended to try to be
UCSes, including everything that might be needed, since it was
possible in a MBCS.  Trying to implement single octet codes with code
page switching made no sense.  That approach is unintuitive to
Westerners who are used to having separate "code pages" or fonts for
specialty usage like mathematics, and it has its practical limits for
East Asians, too, what with the addition of 11,000 pre-composed Hangul
and the CNS with ~80,000 code points.  Of course, now we have a true
UCS (even though it has some problems) in Unicode, so we should use
it.  And now FOR ALL should not be considered a "Japanese" character
even if it does have a code point in some JIS standard.  I would say
the same for GREEK SMALL LETTER ALPHA.

It's also very annoying in practical use (in Emacs, anyway) that GREEK
SMALL LETTER ALPHA (not to mention LATIN SMALL LETTER A) has multiple
encodings in the "native" coded character set and several others.

Of course this can be useful if you happen not to have a math font but
do have a Greek font or Japanese font that contains alpha or for all.
For those purposes, fontconfig's character set feature is exactly what
you want.

 > In addition, currently, Emacs doesn't know in which langauge
 > a text is written.  So, we can't use an appropriate ":lang"
 > property of fontconfig.

Well, for most users almost all of the time we do.  The LANG
environment variable will tell us.  This will make most users very
happy at little cost in coding.

Agreed, in multilingual use, we can't use fontconfig directly.  My
idea is that fontconfig has already constructed a database of language
repertoires and operations which might help in doing analysis of a
text to determine its language.  Also, instead of using the UCS-like
repertoires of East Asian scripts to determine character categories, I
suggest the fontconfig repertoires are more appropriate and will lead
to more attractive presentation for users who have appropriate fonts.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-06  5:30                       ` Kenichi Handa
  2008-08-06  6:14                         ` Stephen J. Turnbull
@ 2008-08-06 17:56                         ` Eli Zaretskii
  2008-08-07  1:14                           ` Kenichi Handa
  1 sibling, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2008-08-06 17:56 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, jasonr, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> CC: lekktu@gmail.com, emacs-devel@gnu.org, jasonr@gnu.org
> Date: Wed, 06 Aug 2008 14:30:15 +0900
> 
> In article <uzlnrjpp5.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I think the only meaningful categories are those defined by Unicode.
> > That is, a block of Cyrillic characters should have the cyrillic
> > category, the block of Japanese characters should have japanese
> > category, etc.
> 
> Unfortunately, there's no such block as "Japanese" in
> Unicode.

Sorry, I meant Katakana and Hiragana.

> > What kind of backward compatibility problems could we cause?  Who or
> > what code depends on these categories?
> 
> (re-search-forward "\\cj") can effectively search for
> characters belonging to Japanese charset.  It is used, for
> instance, in japanese-hankaku-region (in japan-util.el), and
> "\\cc" is used in encode-hz-region (in china-util.el).

Well, would it still work if "\\cj" matches Katakana and Hiragana
characters?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-06 17:56                         ` Eli Zaretskii
@ 2008-08-07  1:14                           ` Kenichi Handa
  2008-08-07  3:22                             ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Kenichi Handa @ 2008-08-07  1:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, jasonr, emacs-devel

In article <usktijact.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > (re-search-forward "\\cj") can effectively search for
> > characters belonging to Japanese charset.  It is used, for
> > instance, in japanese-hankaku-region (in japan-util.el), and
> > "\\cc" is used in encode-hz-region (in china-util.el).

> Well, would it still work if "\\cj" matches Katakana and Hiragana
> characters?

??? It matched in Emacs 22 and still matches in Emacs 23.
So, it still works, of course.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-07  1:14                           ` Kenichi Handa
@ 2008-08-07  3:22                             ` Eli Zaretskii
  2008-08-07  3:54                               ` Kenichi Handa
  2008-08-07  4:54                               ` Miles Bader
  0 siblings, 2 replies; 30+ messages in thread
From: Eli Zaretskii @ 2008-08-07  3:22 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, jasonr, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> CC: lekktu@gmail.com, emacs-devel@gnu.org, jasonr@gnu.org
> Date: Thu, 07 Aug 2008 10:14:53 +0900
> 
> In article <usktijact.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > (re-search-forward "\\cj") can effectively search for
> > > characters belonging to Japanese charset.  It is used, for
> > > instance, in japanese-hankaku-region (in japan-util.el), and
> > > "\\cc" is used in encode-hz-region (in china-util.el).
> 
> > Well, would it still work if "\\cj" matches Katakana and Hiragana
> > characters?
> 
> ??? It matched in Emacs 22 and still matches in Emacs 23.
> So, it still works, of course.

I meant would it break something if "\\cj" matched only the Katakana
and Hiragana characters instead of what it matches today?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-07  3:22                             ` Eli Zaretskii
@ 2008-08-07  3:54                               ` Kenichi Handa
  2008-08-07  4:54                               ` Miles Bader
  1 sibling, 0 replies; 30+ messages in thread
From: Kenichi Handa @ 2008-08-07  3:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, jasonr, emacs-devel

In article <uiqudjyqg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > ??? It matched in Emacs 22 and still matches in Emacs 23.
> > So, it still works, of course.

> I meant would it break something if "\\cj" matched only the Katakana
> and Hiragana characters instead of what it matches today?

Ah.  At least "\\cj" should match with also with FULLWIDTH
version of Latin and symbol characters.  And, there will be
codes by third parties that expect it match also with Kanji
charaters.

By the way, FYI, for Hiragana and Katakana, we already have
categories H and K.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-07  3:22                             ` Eli Zaretskii
  2008-08-07  3:54                               ` Kenichi Handa
@ 2008-08-07  4:54                               ` Miles Bader
  2008-08-07 18:03                                 ` Eli Zaretskii
  1 sibling, 1 reply; 30+ messages in thread
From: Miles Bader @ 2008-08-07  4:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, jasonr, emacs-devel, Kenichi Handa

Eli Zaretskii <eliz@gnu.org> writes:
> I meant would it break something if "\\cj" matched only the Katakana
> and Hiragana characters instead of what it matches today?

I don't know what it would break, but that doesn't seem like
particularly intuitive behavior.

I think emacs' concept of characters belonging to multiple language
categories is pretty neat actually.

-miles

-- 
People who are more than casually interested in computers should have at
least some idea of what the underlying hardware is like.  Otherwise the
programs they write will be pretty weird.  -- Donald Knuth




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-07  4:54                               ` Miles Bader
@ 2008-08-07 18:03                                 ` Eli Zaretskii
  2008-08-07 19:30                                   ` Stephen J. Turnbull
  2008-08-11  8:48                                   ` Miles Bader
  0 siblings, 2 replies; 30+ messages in thread
From: Eli Zaretskii @ 2008-08-07 18:03 UTC (permalink / raw)
  To: Miles Bader; +Cc: lekktu, jasonr, emacs-devel, handa

> From: Miles Bader <miles.bader@necel.com>
> Cc: Kenichi Handa <handa@m17n.org>, lekktu@gmail.com, jasonr@gnu.org,
>         emacs-devel@gnu.org
> Date: Thu, 07 Aug 2008 13:54:55 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> > I meant would it break something if "\\cj" matched only the Katakana
> > and Hiragana characters instead of what it matches today?
> 
> I don't know what it would break, but that doesn't seem like
> particularly intuitive behavior.

??? Why not?

> I think emacs' concept of characters belonging to multiple language
> categories is pretty neat actually.

Maybe I'm missing something, but I don't see how the fact that, say,
Cyrillic characters are claimed to belong to Japanese category could
be considered ``neat''.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-07 18:03                                 ` Eli Zaretskii
@ 2008-08-07 19:30                                   ` Stephen J. Turnbull
  2008-08-11  8:48                                   ` Miles Bader
  1 sibling, 0 replies; 30+ messages in thread
From: Stephen J. Turnbull @ 2008-08-07 19:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, jasonr, emacs-devel, handa, Miles Bader

Eli Zaretskii writes:
 > > From: Miles Bader <miles.bader@necel.com>
 > > Eli Zaretskii <eliz@gnu.org> writes:
 > > > I meant would it break something if "\\cj" matched only the Katakana
 > > > and Hiragana characters instead of what it matches today?
 > > 
 > > I don't know what it would break, but that doesn't seem like
 > > particularly intuitive behavior.
 > 
 > ??? Why not?

Because although Katakana and Hiragana are the only uniquely Japanese
word constituents, the written form of the Japanese language also uses
a set of ideographs (Kanji) borrowed from Chinese, as well as an
idiosyncratic set of symbols (eg, precomposed Roman numerals,
precomosed multiletter units such as "mm" and "kg").  Since the
admissible set of ideographs is defined by Ministry of Education
standards, the Japanese *set* of Kanji is not the same as the Chinese
*set*, and therefore need a category of their own.  So the Japanese
category should include, at least, Hiragana, Katakana, (Japanese)
Kanji, and the idiosyncratic symbol set.

 > > I think emacs' concept of characters belonging to multiple language
 > > categories is pretty neat actually.
 > 
 > Maybe I'm missing something, but I don't see how the fact that, say,
 > Cyrillic characters are claimed to belong to Japanese category could
 > be considered ``neat''.

It's not considered "neat" that Cyrillic is (in old Mule) considered
to be Japanese, at least not by me.  However, I do think it's useful,
at least, that the Hanzi (several varieties of Chinese) overlap the
Kanji (Japanese versions of same) and Hanja (Korean version).
Similarly for the accented characters that are used by Spanish and
French alike (although they don't use the same set, there is some
overlap), etc, etc.  I suppose that's what Miles meant?

Now, that inclusion of Cyrillic in Japanese is due to the fact that
with a character set size of nearly 10,000 and an official list of
about 6000 characters needed for daily use, the Japanese decided that
a more or less universal character set would be a good idea so they
added Cyrillic, Greek, and a number of math symbols, as well as a
bunch of other scripts and "stuff".  In the old Mule encoding I
suppose the \cX categories were implemented basically by looking at
the leading byte, and so if Cyrillic were encoded according to the JIS
standard it would get included in \cj; if it were encoded according to
ISO 8859/5, it would not be included in \cj.  (That's true for XEmacs,
Handa-san is of course authoritative for Emacs.)

While I think it is worth the pain to clean up this inelegant
inclusion of Greek, Cyrillic, etc in Japanese (among other things,
"native" fonts can be used instead of typically ugly fonts designed by
foreigners), it probably will break user applications.  Eg, I can
imagine an MUA that does things like check for \([[:ASCII:]]\|\cj\)*
to see if a message could be encoded in MIME charset ISO-2022-JP.  (I
don't know if any of the mainstream MUAs do that, though.)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-07 18:03                                 ` Eli Zaretskii
  2008-08-07 19:30                                   ` Stephen J. Turnbull
@ 2008-08-11  8:48                                   ` Miles Bader
  2008-08-11 19:03                                     ` Eli Zaretskii
  1 sibling, 1 reply; 30+ messages in thread
From: Miles Bader @ 2008-08-11  8:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, emacs-devel, handa, jasonr

Eli Zaretskii <eliz@gnu.org> writes:
>> > I meant would it break something if "\\cj" matched only the Katakana
>> > and Hiragana characters instead of what it matches today?
>> 
>> I don't know what it would break, but that doesn't seem like
>> particularly intuitive behavior.
>
> ??? Why not?

Because Japanese as a language uses more than just Katakana and
Hiragana.  You are (apparently) suggesting that \\cj match only
characters that are _uniquely_ japanese, and while that might be an
interesting predicate in some cases, it doesn't seem particularly useful
in general (well to me anyway).

If there's any use at _all_ for the \\c feature, then it should match
how japanese is actually written, rather than "a random subset which
happens to be trivial to implement from the set of data we have
available today".

>> I think emacs' concept of characters belonging to multiple language
>> categories is pretty neat actually.
>
> Maybe I'm missing something, but I don't see how the fact that, say,
> Cyrillic characters are claimed to belong to Japanese category could
> be considered ``neat''.

I didn't say that all the particular results of that functionality were
good -- cyrillic as japanese is one example of where it's silly.

However the _ability_ to have characters belong to multiple categories
is a good thing, and I think it fits the natural way people think of
them.

-Miles

-- 
Idiot, n. A member of a large and powerful tribe whose influence in human
affairs has always been dominant and controlling.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Choice of fonts displaying etc/HELLO
  2008-08-11  8:48                                   ` Miles Bader
@ 2008-08-11 19:03                                     ` Eli Zaretskii
  0 siblings, 0 replies; 30+ messages in thread
From: Eli Zaretskii @ 2008-08-11 19:03 UTC (permalink / raw)
  To: Miles Bader; +Cc: lekktu, emacs-devel, handa, jasonr

> From: Miles Bader <miles@gnu.org>
> Cc: lekktu@gmail.com,  jasonr@gnu.org,  emacs-devel@gnu.org,  handa@m17n.org
> Date: Mon, 11 Aug 2008 17:48:26 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >> > I meant would it break something if "\\cj" matched only the Katakana
> >> > and Hiragana characters instead of what it matches today?
> >> 
> >> I don't know what it would break, but that doesn't seem like
> >> particularly intuitive behavior.
> >
> > ??? Why not?
> 
> Because Japanese as a language uses more than just Katakana and
> Hiragana.  You are (apparently) suggesting that \\cj match only
> characters that are _uniquely_ japanese, and while that might be an
> interesting predicate in some cases, it doesn't seem particularly useful
> in general (well to me anyway).
> 
> If there's any use at _all_ for the \\c feature, then it should match
> how japanese is actually written, rather than "a random subset which
> happens to be trivial to implement from the set of data we have
> available today".

Is adding Kanji to Hiragana and Katakana is all this tirade is about?
Because I'm for it, no need to persuade me so violently.

> However the _ability_ to have characters belong to multiple categories
> is a good thing, and I think it fits the natural way people think of
> them.

Any examples, other than Kanji, where this is useful?

In any case, I think that one category should be the main one, and
each character should belong to only one such category.  And that is
what we should show in "C-u C-x =".





^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-08-11 19:03 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-29 20:49 segmentation fault displaying etc/HELLO on Windows Juanma Barranquero
2008-07-30  6:48 ` Jason Rumney
2008-07-30 11:48   ` Juanma Barranquero
2008-07-30 13:05     ` Jason Rumney
2008-07-30 13:11       ` Jason Rumney
2008-07-30 14:03         ` Juanma Barranquero
2008-07-30 14:19           ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Jason Rumney
2008-07-30 15:03             ` Choice of fonts displaying etc/HELLO Jason Rumney
2008-07-30 15:26               ` Juanma Barranquero
2008-08-01 12:50                 ` Kenichi Handa
2008-08-01 12:56             ` Choice of fonts displaying etc/HELLO (was: Re: segmentation fault displaying etc/HELLO on Windows) Kenichi Handa
2008-08-01 13:17               ` Choice of fonts displaying etc/HELLO Jason Rumney
2008-08-01 13:51                 ` Eli Zaretskii
2008-08-05  7:33                   ` Kenichi Handa
2008-08-05 18:12                     ` Eli Zaretskii
2008-08-06  5:30                       ` Kenichi Handa
2008-08-06  6:14                         ` Stephen J. Turnbull
2008-08-06  6:29                           ` Kenichi Handa
2008-08-06 15:52                             ` Stephen J. Turnbull
2008-08-06 17:56                         ` Eli Zaretskii
2008-08-07  1:14                           ` Kenichi Handa
2008-08-07  3:22                             ` Eli Zaretskii
2008-08-07  3:54                               ` Kenichi Handa
2008-08-07  4:54                               ` Miles Bader
2008-08-07 18:03                                 ` Eli Zaretskii
2008-08-07 19:30                                   ` Stephen J. Turnbull
2008-08-11  8:48                                   ` Miles Bader
2008-08-11 19:03                                     ` Eli Zaretskii
2008-07-31  1:49           ` segmentation fault displaying etc/HELLO on Windows Kyle M. Lee
2008-07-31  2:03             ` Juanma Barranquero

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).