BIG5-HKSCS?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* BIG5-HKSCS?
@ 2003-11-12 16:11 Simon Josefsson
  2003-11-13  1:53 ` BIG5-HKSCS? Kenichi Handa
  0 siblings, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-12 16:11 UTC (permalink / raw)


How would one add a new coding system?  Someone requested support for
BIG5-HKSCS.  The relevant references appear to be (although the second
file was 10MB so I haven't downloaded it):

http://www.iana.org/assignments/charset-reg/Big5-HKSCS
http://www.info.gov.hk/digital21/eng/hkscs/download/e_hkscs.pdf

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-12 16:11 BIG5-HKSCS? Simon Josefsson
@ 2003-11-13  1:53 ` Kenichi Handa
  2003-11-13  4:14   ` BIG5-HKSCS? Simon Josefsson
  2003-11-13  4:49   ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 2 replies; 58+ messages in thread
From: Kenichi Handa @ 2003-11-13  1:53 UTC (permalink / raw)
  Cc: emacs-devel

In article <ilubrrha7oc.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> How would one add a new coding system?  Someone requested support for
> BIG5-HKSCS.  The relevant references appear to be (although the second
> file was 10MB so I haven't downloaded it):

It's not easy to support BIG5-HKSCS in the current Emacs,
and I don't have a time to work on it now, sorry.  But
emacs-unicode version already supports it.

You can get that version from CVS as below:

% cvs -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs login
Logging in to :pserver:anoncvs@subversions.gnu.org:2401/cvsroot/emacs
CVS password: <-- Hit Return here
% cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  1:53 ` BIG5-HKSCS? Kenichi Handa
@ 2003-11-13  4:14   ` Simon Josefsson
  2003-11-13  5:34     ` BIG5-HKSCS? Kenichi Handa
  2003-11-13  4:49   ` BIG5-HKSCS? Simon Josefsson
  1 sibling, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-13  4:14 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <ilubrrha7oc.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> How would one add a new coding system?  Someone requested support for
>> BIG5-HKSCS.  The relevant references appear to be (although the second
>> file was 10MB so I haven't downloaded it):
>
> It's not easy to support BIG5-HKSCS in the current Emacs,
> and I don't have a time to work on it now, sorry.  But
> emacs-unicode version already supports it.

Good enough for me.  Do you have an opinion on whether falling back to
BIG5 when BIG5-HKSCS is not available [in Gnus, for displaying
incoming e-mail in BIG-5HKSCS], is a reasonable behaviour?

I browsed the BIG5-HKSCS specification, and it appear to add lots of
characters (~1500) but it didn't seem to alter any, and I can't tell
whether the additions are critical or just rarely used symbols.  I
doubt rendering it as BIG5 is worse than QP, though, which is the
current behaviour.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  4:14   ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-13  5:34     ` Kenichi Handa
  2003-11-13  5:50       ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-13  5:34 UTC (permalink / raw)
  Cc: emacs-devel

In article <iluislo9a7g.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> Good enough for me.  Do you have an opinion on whether falling back to
> BIG5 when BIG5-HKSCS is not available [in Gnus, for displaying
> incoming e-mail in BIG-5HKSCS], is a reasonable behaviour?

> I browsed the BIG5-HKSCS specification, and it appear to add lots of
> characters (~1500) but it didn't seem to alter any

Hmmm, if that is true, it's possbile to support it in the
current Emacs.  Emacs repsents Big5 characters in two
charsets chinese-big5-1 and chinese-big5-2 internally.  The
former contains Big5 chars #xA140 .. #xC8FE, the latter
#xC940..#xFEFE.  That means that chinese-big5-1 still has a
room for that additional 1500 character.

> , and I can't tell
> whether the additions are critical or just rarely used symbols.  I
> doubt rendering it as BIG5 is worse than QP, though, which is the
> current behaviour.

If BIG5-HKSCS surely just adds characters to BIG5, I think
it is reasonable to fallback to BIG5.  But, as I wrote
above, it seems possible to support the whole BIG5-HKSCS in
the current Emacs with a faily small effort.   Could you
please wait for a while?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  5:34     ` BIG5-HKSCS? Kenichi Handa
@ 2003-11-13  5:50       ` Simon Josefsson
  0 siblings, 0 replies; 58+ messages in thread
From: Simon Josefsson @ 2003-11-13  5:50 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <iluislo9a7g.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> Good enough for me.  Do you have an opinion on whether falling back to
>> BIG5 when BIG5-HKSCS is not available [in Gnus, for displaying
>> incoming e-mail in BIG-5HKSCS], is a reasonable behaviour?
>
>> I browsed the BIG5-HKSCS specification, and it appear to add lots of
>> characters (~1500) but it didn't seem to alter any
>
> Hmmm, if that is true, it's possbile to support it in the
> current Emacs.  Emacs repsents Big5 characters in two
> charsets chinese-big5-1 and chinese-big5-2 internally.  The
> former contains Big5 chars #xA140 .. #xC8FE, the latter
> #xC940..#xFEFE.  That means that chinese-big5-1 still has a
> room for that additional 1500 character.
>
>> , and I can't tell
>> whether the additions are critical or just rarely used symbols.  I
>> doubt rendering it as BIG5 is worse than QP, though, which is the
>> current behaviour.
>
> If BIG5-HKSCS surely just adds characters to BIG5, I think
> it is reasonable to fallback to BIG5.  But, as I wrote
> above, it seems possible to support the whole BIG5-HKSCS in
> the current Emacs with a faily small effort.   Could you
> please wait for a while?

I don't read Chinese, so I don't care much, but someone in
gnu.emacs.gnus might be happy. :-) I recall that the characters it
added was in the User-Defined and Vendor-Defined areas of BIG-5, so
making those mean BIG5-HKSCS could potentially conflict with other
BIG5 variants, though.

But all this is beyond my non-ASCII knowledge, so don't count on me to
test or provide any useful feedback.  I'll propose to add the
BIG5-HKSCS -> BIG5 alias to Gnus, though, for old Emacs.

Thanks for your work and prompt responses!

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  1:53 ` BIG5-HKSCS? Kenichi Handa
  2003-11-13  4:14   ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-13  4:49   ` Simon Josefsson
  2003-11-13  6:10     ` BIG5-HKSCS? Kenichi Handa
  1 sibling, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-13  4:49 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> % cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs

I tried starting Gnus on it, but it failed.  It died with a elisp
backtrace regarding define-key or something like that within bbdb.
Since bbdb isn't a critical part, I just disabled it, but then it
crashed within Fbase64_encode_string.

I think emacs-unicode-2 is too unstable for me to continue look at it,
but I can try again in a few months.

(gdb) bt
#0  abort () at emacs.c:417
#1  0x0818e68c in Fbase64_encode_string (string=956298188, no_line_break=72)
    at fns.c:3224
#2  0x08185292 in Ffuncall (nargs=2, args=0xbfffd880) at eval.c:2727
#3  0x081b0ff5 in Fbyte_code (bytestr=409382308, vector=1,
    maxdepth=-1073751824) at bytecode.c:710
#4  0x08185689 in funcall_lambda (fun=1215756704, nargs=1,
    arg_vector=0xbfffda24) at eval.c:2911
#5  0x0818514d in Ffuncall (nargs=2, args=0xbfffda20) at eval.c:2781
#6  0x081b0ff5 in Fbyte_code (bytestr=418352412, vector=1,
    maxdepth=-1073751520) at bytecode.c:710
#7  0x08185689 in funcall_lambda (fun=1216104416, nargs=2,
    arg_vector=0xbfffdb48) at eval.c:2911
#8  0x0818514d in Ffuncall (nargs=3, args=0xbfffdb44) at eval.c:2781
#9  0x081b0ff5 in Fbyte_code (bytestr=406124996, vector=2,
    maxdepth=-1073751228) at bytecode.c:710
#10 0x08185689 in funcall_lambda (fun=1216141496, nargs=1,
    arg_vector=0xbfffdc64) at eval.c:2911
#11 0x0818514d in Ffuncall (nargs=2, args=0xbfffdc60) at eval.c:2781
#12 0x081b0ff5 in Fbyte_code (bytestr=955237444, vector=1,
    maxdepth=-1073750944) at bytecode.c:710
#13 0x08185689 in funcall_lambda (fun=1215587456, nargs=2,
    arg_vector=0xbfffdd84) at eval.c:2911
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) up
#1  0x0818e68c in Fbase64_encode_string (string=956298188, no_line_break=72)
    at fns.c:3224
3224        abort ();
(gdb) p encoded_length
$1 = 72
(gdb) p allength
$2 = 56
(gdb) p length
$3 = 36
(gdb) p string
$4 = 956298188
(gdb) q
A debugging session is active.
Do you still want to close the debugger?(y or n) y
jas@latte:~/src/emacs-unicode/src$

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  4:49   ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-13  6:10     ` Kenichi Handa
  2003-11-13  6:51       ` BIG5-HKSCS? Simon Josefsson
  2003-11-15 22:32       ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 2 replies; 58+ messages in thread
From: Kenichi Handa @ 2003-11-13  6:10 UTC (permalink / raw)
  Cc: emacs-devel

In article <ilur80c50uj.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  % cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs

> I tried starting Gnus on it, but it failed.  It died with a elisp
> backtrace regarding define-key or something like that within bbdb.
> Since bbdb isn't a critical part, 

As bbdb is not a part of Emacs, I have no idea what is wrong
with it.   Anyway,

> I just disabled it, but then it crashed within
> Fbase64_encode_string.

I found a simple/silly mistake in fns.c, and have just
installed a fix.  Could you please update your working
directory and try again?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  6:10     ` BIG5-HKSCS? Kenichi Handa
@ 2003-11-13  6:51       ` Simon Josefsson
  2003-11-13  9:01         ` BIG5-HKSCS? Kenichi Handa
  2003-11-15 22:32       ` BIG5-HKSCS? Simon Josefsson
  1 sibling, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-13  6:51 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

>> I just disabled it, but then it crashed within
>> Fbase64_encode_string.
>
> I found a simple/silly mistake in fns.c, and have just
> installed a fix.  Could you please update your working
> directory and try again?

The HMAC-MD5 function seem to fail, causing my login attempts in Gnus
to fail.  Reproduce it by:

jas@latte:~/src/emacs-unicode/src$ ./emacs -q ../lisp/gnus/rfc2104.el

then do M-x eval-buffer RET and try to evaluate some of the test
vectors, the first one should give:

(rfc2104-hash 'md5 64 16 "Jefe" "what do ya want for nothing?")
 => "750c783e6ab0b503eaa86e310a5db738"

With emacs-unicode I get "f898573306b1366f6edd841a9f5b2871".

Is anyone using the emacs-unicode branch with Gnus?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  6:51       ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-13  9:01         ` Kenichi Handa
  2003-11-13 13:29           ` BIG5-HKSCS? Oliver Scholz
  2003-11-13 16:34           ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 2 replies; 58+ messages in thread
From: Kenichi Handa @ 2003-11-13  9:01 UTC (permalink / raw)
  Cc: emacs-devel

In article <iluekwcwyl8.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> The HMAC-MD5 function seem to fail, causing my login attempts in Gnus
> to fail.  Reproduce it by:

> jas@latte:~/src/emacs-unicode/src$ ./emacs -q ../lisp/gnus/rfc2104.el

> then do M-x eval-buffer RET and try to evaluate some of the test
> vectors, the first one should give:

> (rfc2104-hash 'md5 64 16 "Jefe" "what do ya want for nothing?")
>  => "750c783e6ab0b503eaa86e310a5db738"

> With emacs-unicode I get "f898573306b1366f6edd841a9f5b2871".

Thank you for testing.  I've just installed a fix for
rfc2104.el.  I'd like to ask you to try it again.

This is a typical problem of emacs-unicode in which
characters 128..255 are valid Unicode characters, thus, for
instance, (concat '(?a ?\300)) returns a multibyte string of
`a' and `À'.  But in the current Emacs, it returns a unibyte
string.

I suspect the similar fix is necessary in several other
places.

> Is anyone using the emacs-unicode branch with Gnus?

At least, I'm not a Gnus user.  I'd like to ask people to
use emacs-unicode in various ways to find bugs.  What I can
test is limited, but, usually, I can fix them quite easily
like this case.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  9:01         ` BIG5-HKSCS? Kenichi Handa
@ 2003-11-13 13:29           ` Oliver Scholz
  2003-11-13 23:40             ` BIG5-HKSCS? Kenichi Handa
  2003-11-13 16:34           ` BIG5-HKSCS? Simon Josefsson
  1 sibling, 1 reply; 58+ messages in thread
From: Oliver Scholz @ 2003-11-13 13:29 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
[...]
> I'd like to ask people to use emacs-unicode in various ways to find
> bugs. What I can test is limited, but, usually, I can fix them quite
> easily like this case.
[...]

Alright, I decided to follow this advice and checked out
emacs-unicode-2 an hour ago.  When I eval this expression on
GNU/Linux ...

(set-face-font 'mode-line
	       "-adobe-helvetica-medium-r-normal-*-12-*-*-*-*-*-iso8859-1")

... then Emacs segfaults.

I append the backtrace. Where is the right place to send bug reports
to? emacs-pretest-bug@gnu.org, perhaps with "[unicode]" in the
subject? Or to the emacs-unicode mailing list?

[Thank you a lot for your work, BTW!]

    Oliver


The backtrace:

#0  0x080bb099 in choose_face_font (f=0x8473830, attrs=0x89d61e8, font_spec=405708356, needs_overstrike=0xbfffe858)
    at xfaces.c:6620
#1  0x080b3528 in load_face_font (f=0x8473830, face=0x89d61a0) at xfaces.c:1235
#2  0x080bbbcd in realize_x_face (cache=0x8b8aff8, attrs=0xbfffe964) at xfaces.c:6980
#3  0x080bb893 in realize_face (cache=0x8b8aff8, attrs=0xbfffe964, former_face_id=2) at xfaces.c:6869
#4  0x080bb82d in realize_named_face (f=0x8473830, symbol=405751916, id=2) at xfaces.c:6839
#5  0x080bb1f6 in realize_basic_faces (f=0x8473830) at xfaces.c:6670
#6  0x080b302b in recompute_basic_faces (f=0x8473830) at xfaces.c:951
#7  0x0805f869 in init_iterator (it=0xbfffeab4, w=0x8bedfc8, charpos=-1, bytepos=-1, row=0x0, base_face_id=DEFAULT_FACE_ID)
    at xdisp.c:2012
#8  0x080677e6 in x_consider_frame_title (frame=1212626992) at xdisp.c:7885
#9  0x08067904 in prepare_menu_bars () at xdisp.c:7944
#10 0x0806a533 in redisplay_internal (preserve_echo_area=0) at xdisp.c:9743
#11 0x08069edc in redisplay () at xdisp.c:9533
#12 0x080e2de5 in read_char (commandflag=1, nmaps=2, maps=0xbffff314, prev_event=405708356, used_mouse_menu=0xbffff358)
    at keyboard.c:2496
#13 0x080ea115 in read_key_sequence (keybuf=0xbffff464, bufsize=30, prompt=405708356, dont_downcase_last=0, 
    can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:8827
#14 0x080e10c6 in command_loop_1 () at keyboard.c:1505
#15 0x0813773d in internal_condition_case (bfun=0x80e0db4 <command_loop_1>, handlers=405796004, hfun=0x80e09b4 <cmd_error>)
    at eval.c:1333
#16 0x080e0c78 in command_loop_2 () at keyboard.c:1292
#17 0x081372b5 in internal_catch (tag=405757252, func=0x80e0c54 <command_loop_2>, arg=405708356) at eval.c:1094
#18 0x080e0c23 in command_loop () at keyboard.c:1271
#19 0x080e0778 in recursive_edit_1 () at keyboard.c:987
#20 0x080e08a0 in Frecursive_edit () at keyboard.c:1043
#21 0x080df722 in main (argc=2, argv=0xbffffa34) at emacs.c:1673


-- 
Oliver Scholz               23 Brumaire an 212 de la Révolution
Taunusstr. 25               Liberté, Egalité, Fraternité!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13 13:29           ` BIG5-HKSCS? Oliver Scholz
@ 2003-11-13 23:40             ` Kenichi Handa
  2003-11-14 13:35               ` BIG5-HKSCS? Oliver Scholz
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-13 23:40 UTC (permalink / raw)
  Cc: emacs-devel

In article <873ccswg5i.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <epameinondas@gmx.de> writes:
> Alright, I decided to follow this advice and checked out
> emacs-unicode-2 an hour ago.

Thank you very much!!

> When I eval this expression on GNU/Linux ...

> (set-face-font 'mode-line
> 	       "-adobe-helvetica-medium-r-normal-*-12-*-*-*-*-*-iso8859-1")

> ... then Emacs segfaults.

I can't reproduce it, but I found one bug in xfaces.c.  I've
just installed it.   Perhaps, it fixes the above bug.

> I append the backtrace. Where is the right place to send bug reports
> to? emacs-pretest-bug@gnu.org, perhaps with "[unicode]" in the
> subject? Or to the emacs-unicode mailing list?

I think emacs-unicode@gnu.org is suitable.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13 23:40             ` BIG5-HKSCS? Kenichi Handa
@ 2003-11-14 13:35               ` Oliver Scholz
  0 siblings, 0 replies; 58+ messages in thread
From: Oliver Scholz @ 2003-11-14 13:35 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <873ccswg5i.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <epameinondas@gmx.de> writes:
[...]
>> ... then Emacs segfaults.
>
> I can't reproduce it, but I found one bug in xfaces.c.  I've
> just installed it.   Perhaps, it fixes the above bug.
[...]

Yes, it is fixed now. Thanks.

    Oliver
-- 
Oliver Scholz               24 Brumaire an 212 de la Révolution
Taunusstr. 25               Liberté, Egalité, Fraternité!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  9:01         ` BIG5-HKSCS? Kenichi Handa
  2003-11-13 13:29           ` BIG5-HKSCS? Oliver Scholz
@ 2003-11-13 16:34           ` Simon Josefsson
  2003-11-14  0:47             ` eight-bit char handling in emacs-unicode Kenichi Handa
  1 sibling, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-13 16:34 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <iluekwcwyl8.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> The HMAC-MD5 function seem to fail, causing my login attempts in Gnus
>> to fail.  Reproduce it by:
>
>> jas@latte:~/src/emacs-unicode/src$ ./emacs -q ../lisp/gnus/rfc2104.el
>
>> then do M-x eval-buffer RET and try to evaluate some of the test
>> vectors, the first one should give:
>
>> (rfc2104-hash 'md5 64 16 "Jefe" "what do ya want for nothing?")
>>  => "750c783e6ab0b503eaa86e310a5db738"
>
>> With emacs-unicode I get "f898573306b1366f6edd841a9f5b2871".
>
> Thank you for testing.  I've just installed a fix for
> rfc2104.el.  I'd like to ask you to try it again.

rfc2104.el now works, thanks.  But does the fix really have to
explicitly mention charsets like iso-latin-1?  Is there no way to
handle binary octet strings in emacs-unicode?  Preferably in a
portable way, that works on old Emacs versions and on XEmacs.

> This is a typical problem of emacs-unicode in which
> characters 128..255 are valid Unicode characters, thus, for
> instance, (concat '(?a ?\300)) returns a multibyte string of
> `a' and `À'.  But in the current Emacs, it returns a unibyte
> string.
>
> I suspect the similar fix is necessary in several other
> places.

Having a way to deal with data that is a pure single byte, without
involving coding systems, seems like a rather important thing to me.

>> Is anyone using the emacs-unicode branch with Gnus?
>
> At least, I'm not a Gnus user.  I'd like to ask people to
> use emacs-unicode in various ways to find bugs.  What I can
> test is limited, but, usually, I can fix them quite easily
> like this case.

It started now, but when I enter a summary buffer it crashed:

Program received signal SIGSEGV, Segmentation fault.
0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
1591                      char_ranges[n_char_ranges++] = c;
(gdb) bt
#0  0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
#1  0x090ed860 in ?? ()
#2  0x081a30d0 in Fskip_chars_forward (string=1, lim=1) at syntax.c:1344
#3  0x081b1a43 in Fbyte_code (bytestr=6, vector=160, maxdepth=152054512)
    at bytecode.c:1418
#4  0x08185689 in funcall_lambda (fun=1223225480, nargs=1,
    arg_vector=0xbfffcf44) at eval.c:2911
#5  0x0818514d in Ffuncall (nargs=2, args=0xbfffcf40) at eval.c:2781
#6  0x081b0ff5 in Fbyte_code (bytestr=406381860, vector=1,
    maxdepth=-1073754304) at bytecode.c:710
#7  0x08185689 in funcall_lambda (fun=1213250456, nargs=2,
    arg_vector=0xbfffd084) at eval.c:2911
#8  0x0818514d in Ffuncall (nargs=3, args=0xbfffd080) at eval.c:2781
#9  0x081b0ff5 in Fbyte_code (bytestr=408546780, vector=2,
    maxdepth=-1073753984) at bytecode.c:710
#10 0x08185689 in funcall_lambda (fun=1222504096, nargs=0,
    arg_vector=0xbfffd1b4) at eval.c:2911
#11 0x0818514d in Ffuncall (nargs=1, args=0xbfffd1b0) at eval.c:2781
#12 0x081b0ff5 in Fbyte_code (bytestr=416820644, vector=0,
    maxdepth=-1073753680) at bytecode.c:710
#13 0x08185689 in funcall_lambda (fun=1222459392, nargs=0,
    arg_vector=0xbfffd2d4) at eval.c:2911
#14 0x0818514d in Ffuncall (nargs=1, args=0xbfffd2d0) at eval.c:2781
---Type <return> to continue, or q <return> to quit---
#15 0x081b0ff5 in Fbyte_code (bytestr=410610228, vector=0,
    maxdepth=-1073753392) at bytecode.c:710
#16 0x08185689 in funcall_lambda (fun=1222459176, nargs=2,
    arg_vector=0xbfffd3f4) at eval.c:2911
#17 0x0818514d in Ffuncall (nargs=3, args=0xbfffd3f0) at eval.c:2781
#18 0x081b0ff5 in Fbyte_code (bytestr=416766892, vector=2,
    maxdepth=-1073753104) at bytecode.c:710
#19 0x08185689 in funcall_lambda (fun=1222077040, nargs=2,
    arg_vector=0xbfffd514) at eval.c:2911
#20 0x0818514d in Ffuncall (nargs=3, args=0xbfffd510) at eval.c:2781
#21 0x081b0ff5 in Fbyte_code (bytestr=416766916, vector=2,
    maxdepth=-1073752816) at bytecode.c:710
#22 0x08185689 in funcall_lambda (fun=1222110576, nargs=1,
    arg_vector=0xbfffd634) at eval.c:2911
#23 0x0818514d in Ffuncall (nargs=2, args=0xbfffd630) at eval.c:2781
#24 0x081b0ff5 in Fbyte_code (bytestr=416640468, vector=1,
    maxdepth=-1073752528) at bytecode.c:710
#25 0x08185689 in funcall_lambda (fun=1221949600, nargs=6,
    arg_vector=0xbfffd764) at eval.c:2911
#26 0x0818514d in Ffuncall (nargs=7, args=0xbfffd760) at eval.c:2781
#27 0x081b0ff5 in Fbyte_code (bytestr=408688788, vector=6,
    maxdepth=-1073752224) at bytecode.c:710
#28 0x08185689 in funcall_lambda (fun=1221947744, nargs=7,
---Type <return> to continue, or q <return> to quit---
    arg_vector=0xbfffd894) at eval.c:2911
#29 0x0818514d in Ffuncall (nargs=8, args=0xbfffd890) at eval.c:2781
#30 0x081b0ff5 in Fbyte_code (bytestr=408688788, vector=7,
    maxdepth=-1073751920) at bytecode.c:710
#31 0x08185689 in funcall_lambda (fun=1214659912, nargs=3,
    arg_vector=0xbfffd9c4) at eval.c:2911
#32 0x0818514d in Ffuncall (nargs=4, args=0xbfffd9c0) at eval.c:2781
#33 0x081b0ff5 in Fbyte_code (bytestr=406477324, vector=3,
    maxdepth=-1073751616) at bytecode.c:710
#34 0x08185689 in funcall_lambda (fun=1223292464, nargs=1,
    arg_vector=0xbfffdb24) at eval.c:2911
#35 0x0818514d in Ffuncall (nargs=2, args=0xbfffdb20) at eval.c:2781
#36 0x08180cce in Fcall_interactively (function=407759756,
    record_flag=406023676, keys=1211380872) at callint.c:850
#37 0x0812e9db in Fcommand_execute (cmd=407759756, record_flag=406023676,
    keys=1, special=406023676) at keyboard.c:9725
#38 0x08123462 in command_loop_1 () at keyboard.c:1756
#39 0x0818345e in internal_condition_case (bfun=0x8123100 <command_loop_1>,
    handlers=406111316, hfun=0x8122c40 <cmd_error>) at eval.c:1333
#40 0x08122f9e in command_loop_2 () at keyboard.c:1292
#41 0x08182fbb in internal_catch (tag=1, func=0x8122f70 <command_loop_2>,
    arg=406023676) at eval.c:1094
#42 0x08122f3e in command_loop () at keyboard.c:1271
---Type <return> to continue, or q <return> to quit---
#43 0x081229d4 in recursive_edit_1 () at keyboard.c:987
#44 0x08122b01 in Frecursive_edit () at keyboard.c:1043
#45 0x081211e0 in main (argc=3, argv=0xbfffe374) at emacs.c:1673
(gdb) l
1673      Frecursive_edit ();
1674      /* NOTREACHED */
1675      return 0;
1676    }
1677    ^L
1678    /* Sort the args so we can find the most important ones
1679       at the beginning of argv.  */
1680
1681    /* First, here's a table of all the standard options.  */
1682
(gdb) up
#1  0x090ed860 in ?? ()
(gdb) up
#2  0x081a30d0 in Fskip_chars_forward (string=1, lim=1) at syntax.c:1344
1344      return skip_chars (1, string, lim);
(gdb) p string
$1 = 1
(gdb) p lim
$2 = 1
(gdb) up
#3  0x081b1a43 in Fbyte_code (bytestr=6, vector=160, maxdepth=152054512)
    at bytecode.c:1418
1418                TOP = Fskip_chars_forward (TOP, v1);
(gdb) up
#4  0x08185689 in funcall_lambda (fun=1223225480, nargs=1,
    arg_vector=0xbfffcf44) at eval.c:2911
2911          val = Fbyte_code (AREF (fun, COMPILED_BYTECODE),
(gdb) up
#5  0x0818514d in Ffuncall (nargs=2, args=0xbfffcf40) at eval.c:2781
2781            val = funcall_lambda (fun, numargs, args + 1);
(gdb) up
#6  0x081b0ff5 in Fbyte_code (bytestr=406381860, vector=1,
    maxdepth=-1073754304) at bytecode.c:710
710                 TOP = Ffuncall (op + 1, &TOP);
(gdb) q
A debugging session is active.
Do you still want to close the debugger?(y or n) y
jas@latte:~/src/emacs-unicode/src$

^ permalink raw reply	[flat|nested] 58+ messages in thread

* eight-bit char handling in emacs-unicode
  2003-11-13 16:34           ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-14  0:47             ` Kenichi Handa
  2003-11-14 13:25               ` Oliver Scholz
                                 ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Kenichi Handa @ 2003-11-14  0:47 UTC (permalink / raw)
  Cc: emacs-devel

In article <ilun0b08by1.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> rfc2104.el now works, thanks.  But does the fix really have to
> explicitly mention charsets like iso-latin-1?  Is there no way to
> handle binary octet strings in emacs-unicode?  Preferably in a
> portable way, that works on old Emacs versions and on XEmacs.

>>  This is a typical problem of emacs-unicode in which
>>  characters 128..255 are valid Unicode characters, thus, for
>>  instance, (concat '(?a ?\300)) returns a multibyte string of
>>  `a' and `À'.  But in the current Emacs, it returns a unibyte
>>  string.
>> 
>>  I suspect the similar fix is necessary in several other
>>  places.

> Having a way to deal with data that is a pure single byte, without
> involving coding systems, seems like a rather important thing to me.

I agree with you.  Currently, I can think of these methods:

(1) Perhaps the easiest way.

Check `default-enable-multibyte-characters' or a newly
instroduced variable `byte-as-byte' to decide whether a
integer 128..255 must be treated as a Latin-1 char or a
byte.   So,
(concat '(?a ?\300)) => "aÀ" (multibyte string)
(let ((byte-as-byte t))
  (concat '(?a ?\300))) => "a\300" (unibyte string)

(2) Introduce a new function `eight-bit-char'.

It converts an argument to ascii or eight-bit-char.
(eight-bit-char ?a) => 94
(eight-bit-char ?\300) => 4194240
Then,
(concat '(?a (eight-bit-char ?\300))) => "a\300"

(3) Make a series of new functions (I think it's not good)

concat vs concat-unibyte
string vs string-unibyte
aset vs aset-unibyte

(4) Most drastic way (the cleanest but requires lots of work)

The basic problem is that we don't distinguish a character
(code) and a number.  So, we introduce a character object
(like XEmacs).  The function `character' converts a
character code into the corresponding character object.  The
lisp reader always generate a character object for ?a,
?\300, etc.   So:
 (concat '(?a ?\300)) => "aÀ"
 (concat '(?a #o300)) => "a\300"
 (concat '(?a (character #o300))) => "aÀ"
 (concat '(?a #o300 (character #o300))) => "a\300À"

Note: (character X) == (decode-char 'ucs X)

> It started now, but when I enter a summary buffer it crashed:

> Program received signal SIGSEGV, Segmentation fault.
> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
> 1591                      char_ranges[n_char_ranges++] = c;
> (gdb) bt
> #0  0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591

I just tried gnus but I couldn't reproduce it.  So, I need
more help.  Could you show me the results of the following?

(gdb) p n_char_ranges
(gbd) p c
(gdb) p string
(gdb) xstring
(gdb) p *$

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-14  0:47             ` eight-bit char handling in emacs-unicode Kenichi Handa
@ 2003-11-14 13:25               ` Oliver Scholz
  2003-11-15  1:09                 ` Kenichi Handa
  2003-11-15  3:04               ` Simon Josefsson
  2003-11-17 21:17               ` Stefan Monnier
  2 siblings, 1 reply; 58+ messages in thread
From: Oliver Scholz @ 2003-11-14 13:25 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <ilun0b08by1.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
[...]
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
>> 1591                      char_ranges[n_char_ranges++] = c;
>> (gdb) bt
>> #0  0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
>
> I just tried gnus but I couldn't reproduce it.  So, I need
> more help.

I get this error, too, not when I enter a summary buffer, but when I
hit RET in the summary buffer to display an article.  I tracked this
through the code. It takes place in
`mail-extract-address-components'. I found a way to reproduce this
without Gnus:

r -q --eval '(progn (load "mail-extr") (mail-extr-skip-whitespace-forward))'


I can't reproduce it, if I evaluate the body of
`mail-extr-skip-whitespace-forward', though. Weird. Could this have
something to do with the Latin-1 no-break space?

    Oliver
-- 
Oliver Scholz               24 Brumaire an 212 de la Révolution
Taunusstr. 25               Liberté, Egalité, Fraternité!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-14 13:25               ` Oliver Scholz
@ 2003-11-15  1:09                 ` Kenichi Handa
  2003-11-15 10:26                   ` Oliver Scholz
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-15  1:09 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <87n0aznl06.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <epameinondas@gmx.de> writes:
> I get this error, too, not when I enter a summary buffer, but when I
> hit RET in the summary buffer to display an article.  I tracked this
> through the code. It takes place in
> `mail-extract-address-components'. I found a way to reproduce this
> without Gnus:

> r -q --eval '(progn (load "mail-extr") (mail-extr-skip-whitespace-forward))'

Thank you.  Now I see what was wrong.  I've just installed a
fix.

> I can't reproduce it, if I evaluate the body of
> `mail-extr-skip-whitespace-forward', though. Weird. Could this have
> something to do with the Latin-1 no-break space?

I think so.  The bug occurs when we do (skip-chars-forward
_MULTIBYTE_STRING_) in an ASCII-only buffer, which I think
I've never tried. :-(

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-15  1:09                 ` Kenichi Handa
@ 2003-11-15 10:26                   ` Oliver Scholz
  2003-11-15 21:47                     ` Simon Josefsson
  0 siblings, 1 reply; 58+ messages in thread
From: Oliver Scholz @ 2003-11-15 10:26 UTC (permalink / raw)
  Cc: jas, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

[...]
> Thank you.  Now I see what was wrong.  I've just installed a
> fix.
[...]

Thanks. It works now. I guess I may flatter myself now to be the first
one who has sent a message with Gnus on Emacs 22 out to the public.

That's something I will tell to my grandchildren some day. :-)

I'll continue testing emacs-unicode.

    Oliver
-- 
Oliver Scholz               25 Brumaire an 212 de la Révolution
Taunusstr. 25               Liberté, Egalité, Fraternité!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-15 10:26                   ` Oliver Scholz
@ 2003-11-15 21:47                     ` Simon Josefsson
  0 siblings, 0 replies; 58+ messages in thread
From: Simon Josefsson @ 2003-11-15 21:47 UTC (permalink / raw)
  Cc: emacs-devel, Kenichi Handa

Oliver Scholz <epameinondas@gmx.de> writes:

> Kenichi Handa <handa@m17n.org> writes:
>
> [...]
>> Thank you.  Now I see what was wrong.  I've just installed a
>> fix.
> [...]
>
> Thanks. It works now. I guess I may flatter myself now to be the first
> one who has sent a message with Gnus on Emacs 22 out to the public.

And here is the second one. :-)  (Assuming this is sent OK...)

I can't reproduce the crash I got earlier, perhaps it is fixed.

I noticed that M-SPC within *Message* buffers activate the region, but
do not highlight the selected area.  It works on other buffers though.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-14  0:47             ` eight-bit char handling in emacs-unicode Kenichi Handa
  2003-11-14 13:25               ` Oliver Scholz
@ 2003-11-15  3:04               ` Simon Josefsson
  2003-11-16 15:03                 ` Alex Schroeder
  2003-11-17 21:17               ` Stefan Monnier
  2 siblings, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-15  3:04 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <ilun0b08by1.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> rfc2104.el now works, thanks.  But does the fix really have to
>> explicitly mention charsets like iso-latin-1?  Is there no way to
>> handle binary octet strings in emacs-unicode?  Preferably in a
>> portable way, that works on old Emacs versions and on XEmacs.
>
>>>  This is a typical problem of emacs-unicode in which
>>>  characters 128..255 are valid Unicode characters, thus, for
>>>  instance, (concat '(?a ?\300)) returns a multibyte string of
>>>  `a' and `À'.  But in the current Emacs, it returns a unibyte
>>>  string.
>>> 
>>>  I suspect the similar fix is necessary in several other
>>>  places.
>
>> Having a way to deal with data that is a pure single byte, without
>> involving coding systems, seems like a rather important thing to me.
>
> I agree with you.  Currently, I can think of these methods:

Can you think of one that would work on Emacs 21?  Having a stable
idiom to use to deal with octets would be useful, forcing third-party
packages to try several methods can easily lead to unreadable code.

> (1) Perhaps the easiest way.
>
> Check `default-enable-multibyte-characters' or a newly
> instroduced variable `byte-as-byte' to decide whether a
> integer 128..255 must be treated as a Latin-1 char or a
> byte.   So,
> (concat '(?a ?\300)) => "aÀ" (multibyte string)
> (let ((byte-as-byte t))
>   (concat '(?a ?\300))) => "a\300" (unibyte string)
>
> (2) Introduce a new function `eight-bit-char'.
>
> It converts an argument to ascii or eight-bit-char.
> (eight-bit-char ?a) => 94
> (eight-bit-char ?\300) => 4194240
> Then,
> (concat '(?a (eight-bit-char ?\300))) => "a\300"

Both would work for me, although superficially both look like quick
hacks to me.

> (3) Make a series of new functions (I think it's not good)
>
> concat vs concat-unibyte
> string vs string-unibyte
> aset vs aset-unibyte

I agree it isn't good.

> (4) Most drastic way (the cleanest but requires lots of work)
>
> The basic problem is that we don't distinguish a character
> (code) and a number.  So, we introduce a character object
> (like XEmacs).  The function `character' converts a
> character code into the corresponding character object.  The
> lisp reader always generate a character object for ?a,
> ?\300, etc.   So:
>  (concat '(?a ?\300)) => "aÀ"
>  (concat '(?a #o300)) => "a\300"
>  (concat '(?a (character #o300))) => "aÀ"
>  (concat '(?a #o300 (character #o300))) => "a\300À"
>
> Note: (character X) == (decode-char 'ucs X)

This would be nice.  Characters aren't numbers (unless within the
internal representation, but the internal representation should be
hidden), so separating the two types is useful.  So to be consistent
with that, I think your `character' function should be called
`ucs-character' or similar.

>> It started now, but when I enter a summary buffer it crashed:
>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
>> 1591                      char_ranges[n_char_ranges++] = c;
>> (gdb) bt
>> #0  0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
>
> I just tried gnus but I couldn't reproduce it.  So, I need
> more help.  Could you show me the results of the following?
>
> (gdb) p n_char_ranges
> (gbd) p c
> (gdb) p string
> (gdb) xstring
> (gdb) p *$

I'll try to get time to try emacs-unicode-2 more, but no promises.

Thanks.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-15  3:04               ` Simon Josefsson
@ 2003-11-16 15:03                 ` Alex Schroeder
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Schroeder @ 2003-11-16 15:03 UTC (permalink / raw)

Simon Josefsson <jas@extundo.com> writes:

> Characters aren't numbers (unless within the internal
> representation, but the internal representation should be hidden),

As far as I understand the Emacs design philosophy, we don't believe
that internal representation should be hidden.  If it is not hidden,
we can easily write code to modify it without having to recompile
Emacs.   But that's just an aside.  :)

Alex.
-- 
http://www.emacswiki.org/alex/
There is no substitute for experience.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-14  0:47             ` eight-bit char handling in emacs-unicode Kenichi Handa
  2003-11-14 13:25               ` Oliver Scholz
  2003-11-15  3:04               ` Simon Josefsson
@ 2003-11-17 21:17               ` Stefan Monnier
  2003-11-18  7:33                 ` Kenichi Handa
  2 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2003-11-17 21:17 UTC (permalink / raw)
  Cc: emacs-devel, jas

> The basic problem is that we don't distinguish a character
> (code) and a number.  So, we introduce a character object

That's one way to look at the problem.
Another is to say that the problem is instead that we do not distinguish
between arrays of chars and arrays of bytes.  We just use strings and
buffers and expect to be able to mix bytes and chars in them.

Such mixes are admittedly very rare for strings, but they're pretty common
for buffers.

So when we write 192 at a location, we don't know whether we should put
there the byte 192 or the eight-bit-char character that will be encoded
into a 192 byte.

In Emacs-21 we worked around the problem by arranging for "the
eight-bit-char that encodes to 192" to be represented by the integer 192, so
as to avoid having to choose.  But with unicode, the 128-255 zone cannot be
dedicated to eight-bit-char since it's already used up for latin-1, so we
have to face the problem more directly.

The places where Emacs-21 still had to choose, we just used heursitics,
so `concat' will sometimes return a unibyte string, and sometimes
multibyte string.

So I think your options 1-3 are better than 4.  BTW, your function
`eight-bit-char' should be named `byte-to-char' instead.

Which of 1 to 3 is the best is not clear, and maybe we can just live with
`make-string-unibyte' and `make-string-multibyte'.  Note that 1-3 are
not mutually exclusive so we can use them all.

        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-17 21:17               ` Stefan Monnier
@ 2003-11-18  7:33                 ` Kenichi Handa
  2003-11-18 17:12                   ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-18  7:33 UTC (permalink / raw)
  Cc: emacs-devel, jas

In article <jwvhe12emr3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>  The basic problem is that we don't distinguish a character
>>  (code) and a number.  So, we introduce a character object

> That's one way to look at the problem.
> Another is to say that the problem is instead that we do not distinguish
> between arrays of chars and arrays of bytes.

I agree that it's possible to grasp the problem in that way,
but I'm not sure which is the better way.  Could you explain
WHY yours is better?

[...]
> In Emacs-21 we worked around the problem by arranging for "the
> eight-bit-char that encodes to 192" to be represented by the integer 192, so
> as to avoid having to choose.  But with unicode, the 128-255 zone cannot be
> dedicated to eight-bit-char since it's already used up for latin-1, so we
> have to face the problem more directly.

> The places where Emacs-21 still had to choose, we just used heursitics,
> so `concat' will sometimes return a unibyte string, and sometimes
> multibyte string.

> So I think your options 1-3 are better than 4.  BTW, your function
> `eight-bit-char' should be named `byte-to-char' instead.

> Which of 1 to 3 is the best is not clear, and maybe we can just live with
> `make-string-unibyte' and `make-string-multibyte'.

I think you mean string-make-unibyte/multibyte, but, for the
current problem, we can't use it because string-make-unibyte
may behave differently in different language environment.
Such a lang. env. that makes iso-8859-1 or Unicode the
highest priority for the character `À' is ok.

(string-make-unibyte (concat '(?a 192))) = "a\300"

But, if some lang. env. prefers such a charset for `À' that
encodes it not to 192 (e.g. Vietnamese VSCII), we fail.

> Note that 1-3 are not mutually exclusive so we can use
> them all.

Yes, but, at least, I really want to avoid "(3) Make a
series of new functions".

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-18  7:33                 ` Kenichi Handa
@ 2003-11-18 17:12                   ` Stefan Monnier
  2003-11-19  0:06                     ` Kenichi Handa
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2003-11-18 17:12 UTC (permalink / raw)
  Cc: emacs-devel, jas

>>> The basic problem is that we don't distinguish a character
>>> (code) and a number.  So, we introduce a character object

>> That's one way to look at the problem.
>> Another is to say that the problem is instead that we do not distinguish
>> between arrays of chars and arrays of bytes.

> I agree that it's possible to grasp the problem in that way,
> but I'm not sure which is the better way.  Could you explain
> WHY yours is better?

I'm not sure whether it's better or worse.  The problem I have with the
introduction of a new type for chars is that it is a change that has far
reaching consequences and I'm not sure it would solve all our problems
since many of the problems have to do with bad elisp code.

>> Which of 1 to 3 is the best is not clear, and maybe we can just live with
>> `make-string-unibyte' and `make-string-multibyte'.

> I think you mean string-make-unibyte/multibyte, but, for the
> current problem, we can't use it because string-make-unibyte
> may behave differently in different language environment.
> Such a lang. env. that makes iso-8859-1 or Unicode the
> highest priority for the character `À' is ok.

> (string-make-unibyte (concat '(?a 192))) = "a\300"

> But, if some lang. env. prefers such a charset for `À' that
> encodes it not to 192 (e.g. Vietnamese VSCII), we fail.

No.  My `make-string-unibyte' should only work to convert "bytes in
multibyte string" to "bytes in unibyte string": there's no char, thus no
coding-system.  If the multibyte string argument contains a char that's
not an eight-bit-char, then it's an error.

To do what your string-make-unibyte does you should use
`encode-coding-string' where the coding system is passed explicitly.

I've changed my Emacs so that string-make-unibyte does the above
(i.e. signals an error if it encounters a non-byte char) and it works fairly
well, except for the few places where the elisp code is sloppy and needs to
be fixed.

>> Note that 1-3 are not mutually exclusive so we can use
>> them all.

> Yes, but, at least, I really want to avoid "(3) Make a
> series of new functions".

(defun concat-unibyte (&rest x)
  (make-string-unibyte (apply 'concat x)))
...

so we don't need this series of new functions, but if some of them are used
often enough, we can add them of course.

        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-18 17:12                   ` Stefan Monnier
@ 2003-11-19  0:06                     ` Kenichi Handa
  2003-11-19  3:05                       ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-19  0:06 UTC (permalink / raw)
  Cc: emacs-devel, jas

In article <jwvn0atd38w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
> I'm not sure whether it's better or worse.  The problem I have with the
> introduction of a new type for chars is that it is a change that has far
> reaching consequences and I'm not sure it would solve all our problems
> since many of the problems have to do with bad elisp code.

I see.  Apart from the design itself, I agree that it's
difficult to introduce a new type.  But, when I discussed
with Richard about the Character type object a few year ago,
he was not that negative provided that it gives sure
improvement.

>>>  Which of 1 to 3 is the best is not clear, and maybe we can just live with
>>>  `make-string-unibyte' and `make-string-multibyte'.

>>  I think you mean string-make-unibyte/multibyte, but, for the

> No.  My `make-string-unibyte' should only work to convert "bytes in
> multibyte string" to "bytes in unibyte string": there's no char, thus no
> coding-system.

I see.  In emacs-unicode, I already introduced
string-to-multibyte which, I think, is the same as your
make-string-multibyte.   But,

> If the multibyte string argument contains a char that's
> not an eight-bit-char, then it's an error.

Then, we can't use make-string-unibyte for the current case
because, in emacs-unicode, (concat '(?a 192)) returns a
multibyte string whose second element is A-grave, not an
eight-bit-char.  Am I missing something?

> To do what your string-make-unibyte does you should use
> `encode-coding-string' where the coding system is passed explicitly.

Those are conceptually different things (I remember the
similar discussion we had a while ago).

encode-coding-string does:
char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence
  --CES--> encoded-byte-sequence

string-make-unibyte does:
char-sequence --CCS--> code-point-sequence
  --concat--> code-point-sequence

These two yield the same result only when CCS support all
chars in "char-sequence" and CES is stateless
(e.g. iso-latin-1) and .

> I've changed my Emacs so that string-make-unibyte does the above
> (i.e. signals an error if it encounters a non-byte char) and it works fairly
> well, except for the few places where the elisp code is sloppy and needs to
> be fixed.

How did you change it?  string-make-unibyte internally uses
the function copy_text.  Did you change it?  But, then, each
time you copy a multibyte string into a unibyte buffer, you
should get an error.

>>>  Note that 1-3 are not mutually exclusive so we can use
>>>  them all.

>>  Yes, but, at least, I really want to avoid "(3) Make a
>>  series of new functions".

> (defun concat-unibyte (&rest x)
>   (make-string-unibyte (apply 'concat x)))
> ...

As I wrote above, this should signal an error on:
  (concat-unibyte '(?a 192))

> so we don't need this series of new functions, but if some of them are used
> often enough, we can add them of course.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-19  0:06                     ` Kenichi Handa
@ 2003-11-19  3:05                       ` Stefan Monnier
  2003-11-19 10:46                         ` Juri Linkov
  2003-11-21  0:41                         ` Kenichi Handa
  0 siblings, 2 replies; 58+ messages in thread
From: Stefan Monnier @ 2003-11-19  3:05 UTC (permalink / raw)
  Cc: emacs-devel, jas

> I see.  Apart from the design itself, I agree that it's difficult to
> introduce a new type.  But, when I discussed with Richard about the
> Character type object a few year ago, he was not that negative provided
> that it gives sure improvement.

Sounds about right to me: we have one free tag that we could use for chars
(and that I currently use to boost the max buffer size from 256MB to 512MB
in my local code).
But it needs to pay for itself.

> Then, we can't use make-string-unibyte for the current case
> because, in emacs-unicode, (concat '(?a 192)) returns a
> multibyte string whose second element is A-grave, not an
> eight-bit-char.  Am I missing something?

Well, obviously we need to make it accept this case (i.e. accept both the
latin-1 192 and the eight-bit-char 192).  I'm sure there'll be other issues.
I haven't had much time to think about it and you're obviously better
placed to foresee potential problems.

>> To do what your string-make-unibyte does you should use
>> `encode-coding-string' where the coding system is passed explicitly.

> Those are conceptually different things (I remember the
> similar discussion we had a while ago).

> encode-coding-string does:
> char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence
>     --CES--> encoded-byte-sequence

> string-make-unibyte does:
> char-sequence --CCS--> code-point-sequence
>     --concat--> code-point-sequence

> These two yield the same result only when CCS support all
> chars in "char-sequence" and CES is stateless
> (e.g. iso-latin-1) and .

You lost me here (I'm a poor soul whose doesn't know much outside of the
latin-1 world).
I thought that string-make-unibyte only behaves meaningfully for
"normal 8bit coding-systems" such as latin-1.

>> I've changed my Emacs so that string-make-unibyte does the above
>> (i.e. signals an error if it encounters a non-byte char) and it works fairly
>> well, except for the few places where the elisp code is sloppy and needs to
>> be fixed.

> How did you change it?  string-make-unibyte internally uses
> the function copy_text.  Did you change it?  But, then, each
> time you copy a multibyte string into a unibyte buffer, you
> should get an error.

Of course: it's an error.  A unibyte buffer cannot represent multibyte
chars, so you need to encode them first (into a unibyte string).

Now to tell you the truth, my change had to accept a few (not so) special
cases and it took a bit of fiddling to make the code lenient enough to
accept elisp code I didn't feel like "fixing".  I can't remember the details
off-hand, but I remember having problems with regexp matching functions
where multibyte regexps are used in unibyte buffers.

-- Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-19  3:05                       ` Stefan Monnier
@ 2003-11-19 10:46                         ` Juri Linkov
  2003-11-19 13:48                           ` Stefan Monnier
  2003-11-20 23:41                           ` Kenichi Handa
  2003-11-21  0:41                         ` Kenichi Handa
  1 sibling, 2 replies; 58+ messages in thread
From: Juri Linkov @ 2003-11-19 10:46 UTC (permalink / raw)
  Cc: emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
> Now to tell you the truth, my change had to accept a few (not so) special
> cases and it took a bit of fiddling to make the code lenient enough to
> accept elisp code I didn't feel like "fixing".  I can't remember the details
> off-hand, but I remember having problems with regexp matching functions
> where multibyte regexps are used in unibyte buffers.

Do you mean unibyte regexps in multibyte buffers?  For example,
currently gnus/message.el has a wrong regexp than prevents the Gnus
from using in some language environments.  To repeat this bug,
you can eval the following:

(progn
 (set-language-environment 'ukrainian)
 (re-search-forward "[\000-\007\013\015-\032\034-\037\200-\237]" nil t))

It fails with the (invalid-regexp "Invalid range end").
Could you suggest how to fix this bug?

-- 
http://www.jurta.org/emacs/

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-19 10:46                         ` Juri Linkov
@ 2003-11-19 13:48                           ` Stefan Monnier
  2003-11-20 23:41                           ` Kenichi Handa
  1 sibling, 0 replies; 58+ messages in thread
From: Stefan Monnier @ 2003-11-19 13:48 UTC (permalink / raw)
  Cc: emacs-devel

>> Now to tell you the truth, my change had to accept a few (not so) special
>> cases and it took a bit of fiddling to make the code lenient enough to
>> accept elisp code I didn't feel like "fixing".  I can't remember the details
>> off-hand, but I remember having problems with regexp matching functions
>> where multibyte regexps are used in unibyte buffers.

> Do you mean unibyte regexps in multibyte buffers?  For example,

No: multibyte is a superset of unibyte, so there's no problem searching
for unibyte elements in a multibyte sequence.

> currently gnus/message.el has a wrong regexp than prevents the Gnus
> from using in some language environments.  To repeat this bug,
> you can eval the following:

> (progn
>  (set-language-environment 'ukrainian)
>  (re-search-forward "[\000-\007\013\015-\032\034-\037\200-\237]" nil t))

In my Emacs this doesn't fail because the unibyte string is turned into
multibyte without looking at the coding-system (i.e. it will only match
ASCII and chars from eight-bit-control or eight-bit-graphic: probably not
what the author's intended).


        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-19 10:46                         ` Juri Linkov
  2003-11-19 13:48                           ` Stefan Monnier
@ 2003-11-20 23:41                           ` Kenichi Handa
  1 sibling, 0 replies; 58+ messages in thread
From: Kenichi Handa @ 2003-11-20 23:41 UTC (permalink / raw)
  Cc: monnier, emacs-devel

In article <87ptfovdnj.fsf@mail.jurta.org>, Juri Linkov <juri@jurta.org> writes:
> (progn
>  (set-language-environment 'ukrainian)
>  (re-search-forward "[\000-\007\013\015-\032\034-\037\200-\237]" nil t))

> It fails with the (invalid-regexp "Invalid range end").
> Could you suggest how to fix this bug?

The current Emacs simply makes the unibyte regex string to
multibyte, and in Uktranian, as nonascii-translation-table
converts ?\200 to 299040, but ?\237 to 2295, the above
regexp leads to "Invalid range end".  This behaviour itself
is a bug.  We must treat \200-\237 as the same way as
\200\201...\236\237 (emacs-unicode already does that).

But fixing that bug doesn't solve the Gnus problem because
the intention of the part "\200-\237" is apparently to match
with C1 control chars, not to match with the multibyte
equivalence in the current language environment.  So
changing the above as below is correct.

(re-search-forward
  (string-as-multibyte "[\000-\007\013\015-\032\034-\037\200-\237]" nil t))

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-19  3:05                       ` Stefan Monnier
  2003-11-19 10:46                         ` Juri Linkov
@ 2003-11-21  0:41                         ` Kenichi Handa
  2003-11-21  5:27                           ` Stefan Monnier
  1 sibling, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-21  0:41 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvptfp139w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>  I see.  Apart from the design itself, I agree that it's difficult to
>>  introduce a new type.  But, when I discussed with Richard about the
>>  Character type object a few year ago, he was not that negative provided
>>  that it gives sure improvement.

> Sounds about right to me: we have one free tag that we could use for chars

Yes, and as that is the last free tag, I still hesitate to
consume it for the Character object.

>>  Then, we can't use make-string-unibyte for the current case
>>  because, in emacs-unicode, (concat '(?a 192)) returns a
>>  multibyte string whose second element is A-grave, not an
>>  eight-bit-char.  Am I missing something?

> Well, obviously we need to make it accept this case (i.e. accept both the
> latin-1 192 and the eight-bit-char 192).

Then, I see your intention.  But, isn't the semantics of
such a function very weird?

>>>  To do what your string-make-unibyte does you should use
>>>  `encode-coding-string' where the coding system is passed explicitly.

>>  Those are conceptually different things (I remember the
>>  similar discussion we had a while ago).

>>  encode-coding-string does:
>>  char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence
>>    --CES-->  encoded-byte-sequence

>>  string-make-unibyte does:
>>  char-sequence --CCS--> code-point-sequence
>>    --concat-->  code-point-sequence

>>  These two yield the same result only when CCS support all
>>  chars in "char-sequence" and CES is stateless
>>  (e.g. iso-latin-1) and .

> You lost me here (I'm a poor soul whose doesn't know much outside of the
> latin-1 world).

CCS: Coded Character Set
CES: Character Encoding Scheme
coding-system of Emacs: Set of CCSs and CES.
   iso-latin-1: CCSs are ascii and latin-iso8859-1, 
		CES is 8-bit version of ISO-2022
   iso-2022-jp:	CCSs are ascii, japanese-jisx0208, ...
		CES is 7-bit version of ISO-2022

> I thought that string-make-unibyte only behaves meaningfully for
> "normal 8bit coding-systems" such as latin-1.

Yes, but it doesn't mean it is conceptually the same as
encode-coding-string.  The result of string-make-unibyte
should still be regarded as a sequence of character, but the
result of encode-coding-string is a sequence of byte.
Here exists an ambiguity of a unibyte string.

The number 192 can be regarded as:
(1) just a number, a byte
(2) a code point of some character set.
(3) a character code

A unibyte string can contain (1) and (2) without
distinguishing them, but a multibyte string can contain (1)
and (3) while distinguishing them.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-21  0:41                         ` Kenichi Handa
@ 2003-11-21  5:27                           ` Stefan Monnier
  2003-11-21  6:27                             ` Kenichi Handa
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2003-11-21  5:27 UTC (permalink / raw)
  Cc: jas, emacs-devel

>> I thought that string-make-unibyte only behaves meaningfully for
>> "normal 8bit coding-systems" such as latin-1.

> Yes, but it doesn't mean it is conceptually the same as
> encode-coding-string.  The result of string-make-unibyte
> should still be regarded as a sequence of character, but the
> result of encode-coding-string is a sequence of byte.

Why/when is the distinction meaningful (given the fact that it
can only be used meaningfully with 8bit coding-systems where the
distinction seems more philosophical than anything else) ?

> Here exists an ambiguity of a unibyte string.

> The number 192 can be regarded as:
> (1) just a number, a byte
> (2) a code point of some character set.
> (3) a character code

But the second case is only possible for 8bit character sets, right?

Until now, I always thought that Emacs only dealt with
- byte streams representing encoded sequences of code points: case 1.
- sequences of internal character codes (internally encoded in emacs-mule
  or unicode depending on the branch you use): case 3.
Is there any place where we deal with sequences of code points of external
charsets really (other than in the degenerate case where such a sequence
is indistinguishable from case 1, maybe).

> A unibyte string can contain (1) and (2) without
> distinguishing them, but a multibyte string can contain (1)
> and (3) while distinguishing them.

Can multibyte strings distinguish the cases (1) and (3) for integer 97 and
character `a' ?


        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-21  5:27                           ` Stefan Monnier
@ 2003-11-21  6:27                             ` Kenichi Handa
  2003-11-21 14:59                               ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-21  6:27 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvzneqwbo3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>  Yes, but it doesn't mean it is conceptually the same as
>>  encode-coding-string.  The result of string-make-unibyte
>>  should still be regarded as a sequence of character, but the
>>  result of encode-coding-string is a sequence of byte.

> Why/when is the distinction meaningful (given the fact that it
> can only be used meaningfully with 8bit coding-systems where the
> distinction seems more philosophical than anything else) ?

It is perfectly possible to live in such an environment
where only the charset iso-8859-1 is used but only the
coding system utf-8 is used.  In this environment, the
results of encode-coding-string and string-make-unibyte are
of course not the same, but still both operations are
meaningful.

>>  Here exists an ambiguity of a unibyte string.

>>  The number 192 can be regarded as:
>>  (1) just a number, a byte
>>  (2) a code point of some character set.
>>  (3) a character code

> But the second case is only possible for 8bit character sets, right?

Yes.  But, as I wrote above, it doesn't mean that we are
restricted to simple 8bit-oriented coding-systems.

> Until now, I always thought that Emacs only dealt with
> - byte streams representing encoded sequences of code points: case 1.
> - sequences of internal character codes (internally encoded in emacs-mule
>   or unicode depending on the branch you use): case 3.
> Is there any place where we deal with sequences of code points of external
> charsets really (other than in the degenerate case where such a sequence
> is indistinguishable from case 1, maybe).

I'd like to repeat that although we don't have such an
environment now, it doesn't mean it is impossible to assume
such environment.

>>  A unibyte string can contain (1) and (2) without
>>  distinguishing them, but a multibyte string can contain (1)
>>  and (3) while distinguishing them.

> Can multibyte strings distinguish the cases (1) and (3) for integer 97 and
> character `a' ?

Good point.  Of course no.  I dared not mention that to make
the discussion simpler.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-21  6:27                             ` Kenichi Handa
@ 2003-11-21 14:59                               ` Stefan Monnier
  2003-11-22  1:25                                 ` Kenichi Handa
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2003-11-21 14:59 UTC (permalink / raw)
  Cc: jas, emacs-devel

>> Why/when is the distinction meaningful (given the fact that it
>> can only be used meaningfully with 8bit coding-systems where the
>> distinction seems more philosophical than anything else) ?

> It is perfectly possible to live in such an environment
> where only the charset iso-8859-1 is used but only the
> coding system utf-8 is used.  In this environment, the
> results of encode-coding-string and string-make-unibyte are
> of course not the same, but still both operations are
> meaningful.

I see that encode-coding-string does the utf-8 encoding, but what
does string-make-unibyte do in such a case and what is it used for ?

>> Until now, I always thought that Emacs only dealt with
>> - byte streams representing encoded sequences of code points: case 1.
>> - sequences of internal character codes (internally encoded in emacs-mule
>> or unicode depending on the branch you use): case 3.
>> Is there any place where we deal with sequences of code points of external
>> charsets really (other than in the degenerate case where such a sequence
>> is indistinguishable from case 1, maybe).

> I'd like to repeat that although we don't have such an
> environment now, it doesn't mean it is impossible to assume
> such environment.

I guess I don't understand how that is possible (and useful) and what that
would look like.


        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-21 14:59                               ` Stefan Monnier
@ 2003-11-22  1:25                                 ` Kenichi Handa
  2003-11-22 23:53                                   ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-22  1:25 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvvfpdsrab.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>  It is perfectly possible to live in such an environment
>>  where only the charset iso-8859-1 is used but only the
>>  coding system utf-8 is used.  In this environment, the
>>  results of encode-coding-string and string-make-unibyte are
>>  of course not the same, but still both operations are
>>  meaningful.

> I see that encode-coding-string does the utf-8 encoding, but what
> does string-make-unibyte do in such a case and what is it used for ?

It gets iso-8859-1 code-points of all characters in a
multibyte string and concatenate them (the same as what is
does in latin-1 lang. env.).

In his environment, he has no problem in using unibyte
buffer because it can represent all characters he wants.

>>>  Until now, I always thought that Emacs only dealt with
>>>  - byte streams representing encoded sequences of code points: case 1.
>>>  - sequences of internal character codes (internally encoded in emacs-mule
>>>  or unicode depending on the branch you use): case 3.
>>>  Is there any place where we deal with sequences of code points of external
>>>  charsets really (other than in the degenerate case where such a sequence
>>>  is indistinguishable from case 1, maybe).

>>  I'd like to repeat that although we don't have such an
>>  environment now,

Ah, no, we have UTF-8 lang. env. now.

>> it doesn't mean it is impossible to assume such
>> environment.

> I guess I don't understand how that is possible (and useful) and what that
> would look like.

Please try C-x C-m L utf-8 RET and see how
string-make-unibyte and string-make-multibyte work.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-22  1:25                                 ` Kenichi Handa
@ 2003-11-22 23:53                                   ` Stefan Monnier
  2003-11-23  7:30                                     ` Kenichi Handa
       [not found]                                     ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
  0 siblings, 2 replies; 58+ messages in thread
From: Stefan Monnier @ 2003-11-22 23:53 UTC (permalink / raw)
  Cc: jas, emacs-devel

>>> It is perfectly possible to live in such an environment
>>> where only the charset iso-8859-1 is used but only the
>>> coding system utf-8 is used.  In this environment, the
>>> results of encode-coding-string and string-make-unibyte are
>>> of course not the same, but still both operations are
>>> meaningful.

>> I see that encode-coding-string does the utf-8 encoding, but what
>> does string-make-unibyte do in such a case and what is it used for ?

> It gets iso-8859-1 code-points of all characters in a
> multibyte string and concatenate them (the same as what is
> does in latin-1 lang. env.).

You mean it does the same as (encode-coding-string str 'latin-1) ?
Then why use string-make-unibyte ?

> Please try C-x C-m L utf-8 RET and see how
> string-make-unibyte and string-make-multibyte work.

I'll try that, but I'd like to understand the motivation for making it work
the way it works.  I've always understood those two as "trying to DTRT" in
a very ad-hoc way such that people that used to work in an 8bit non-ASCII
environment don't need to worry about coding-systems and still have
things working mostly correctly.


        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-22 23:53                                   ` Stefan Monnier
@ 2003-11-23  7:30                                     ` Kenichi Handa
  2003-11-23 23:48                                       ` Stefan Monnier
       [not found]                                     ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
  1 sibling, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-23  7:30 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvoev4ufqd.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>>>  It is perfectly possible to live in such an environment
>>>>  where only the charset iso-8859-1 is used but only the
>>>>  coding system utf-8 is used.  In this environment, the
>>>>  results of encode-coding-string and string-make-unibyte are
>>>>  of course not the same, but still both operations are
>>>>  meaningful.

>>>  I see that encode-coding-string does the utf-8 encoding, but what
>>>  does string-make-unibyte do in such a case and what is it used for ?

>>  It gets iso-8859-1 code-points of all characters in a
>>  multibyte string and concatenate them (the same as what is
>>  does in latin-1 lang. env.).

> You mean it does the same as (encode-coding-string str 'latin-1) ?

Not exactly the same when STR contains, for instance,
Cyrillic characters.  How to deal with unsupported
characters differs in operations.  Encode-coding-string may
behave leniently so that the result can be decoded back
correctly (perhaps by adding some escape sequence).  But,
string-make-unibyte should never change the number of
charaters.  And,

> Then why use string-make-unibyte ?

There's no way to know that we should use the coding-system
latin-1 in this situation.  All we know is that the default
coding-system is utf-8, and the default character set is
iso-8859-1.

>>  Please try C-x C-m L utf-8 RET and see how
>>  string-make-unibyte and string-make-multibyte work.

> I'll try that, but I'd like to understand the motivation for making it work
> the way it works.  I've always understood those two as "trying to DTRT" in
> a very ad-hoc way such that people that used to work in an 8bit non-ASCII
> environment don't need to worry about coding-systems and still have
> things working mostly correctly.

Doing unibyte<->multibyte conversion automatically
may be an ad-hoc way.  The way how they work for unsupported
characters may also be an ad-hoc way.

But, the concept of unibyte<->multibyte convesion itself is
not ad-hoc.  Don't you think their meaning is very clear
when you grasp them as my way?  Do you see any inconsistency
in my explanation about them?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-23  7:30                                     ` Kenichi Handa
@ 2003-11-23 23:48                                       ` Stefan Monnier
  2003-11-25  1:07                                         ` Kenichi Handa
  2003-11-25  4:28                                         ` Richard Stallman
  0 siblings, 2 replies; 58+ messages in thread
From: Stefan Monnier @ 2003-11-23 23:48 UTC (permalink / raw)
  Cc: jas, emacs-devel

> But, the concept of unibyte<->multibyte convesion itself is
> not ad-hoc.  Don't you think their meaning is very clear
> when you grasp them as my way?  Do you see any inconsistency
> in my explanation about them?

No, as a matter of fact I don't see why in a utf-8 environment,
it makes any sense to have a function that turns a multibyte string
into a unibyte string encoded in latin-1 (without even complaining when
it encounters other characters).

It'd make sense if the environment said "latin-1 when you can,
utf-8 otherwise" or something like that, but then we would use
encode-coding-string anyway.

Besides, if any non-latin-1 char is encountered by string-make-unibyte, then
we end up with a uninyte string that has an unknown meaning because some
chars might have been encoded in latin-1, and others in some other encoding.

I just don't know of a concrete case where it makes sense to use
string-make-unibyte.

        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-23 23:48                                       ` Stefan Monnier
@ 2003-11-25  1:07                                         ` Kenichi Handa
       [not found]                                           ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
  2003-11-25  4:28                                         ` Richard Stallman
  1 sibling, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-25  1:07 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvr7zybqvr.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>  But, the concept of unibyte<->multibyte convesion itself is
>>  not ad-hoc.  Don't you think their meaning is very clear
>>  when you grasp them as my way?  Do you see any inconsistency
>>  in my explanation about them?

> No, as a matter of fact I don't see why in a utf-8 environment,
> it makes any sense to have a function that turns a multibyte string
> into a unibyte string encoded in latin-1

It seems that you keep of saying that "A does B, thus it's
nonsense".  But, I'm arguing that "A does C".

It doesn't make sense because you treat the result as "a
unibyte string encoded in Latin-1".

It makes sense if you treat the result as "a unibyte string
in which each byte represents a sequence of Unicode
code-points", doesn't it?

> (without even complaining when it encounters other
> characters).

I think it's ok (or better) that string-make-unibyte
complains in such a case.   

> It'd make sense if the environment said "latin-1 when you can,
> utf-8 otherwise" or something like that, but then we would use
> encode-coding-string anyway.

It's itself nonsense to have such a coding system.  Do you
agree with having string-make-unibyte if it signals an error
on non-Latin-1 characters?

> Besides, if any non-latin-1 char is encountered by string-make-unibyte, then
> we end up with a uninyte string that has an unknown meaning because some
> chars might have been encoded in latin-1, and others in some other encoding.

> I just don't know of a concrete case where it makes sense to use
> string-make-unibyte.

I'll paraphrase my previous example as this:

  It is perfectly possible to live in such an environment
  where only the characters U+0000..U+00FF of Unicode is
  used but only the coding system utf-8 is used.

But, I don't claim that the above is a realistic case.

Another non-realistic but concrete case is:

  Use only the charset iso-8859-5 and the encoding CTEXT.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

[parent not found: <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>]

* Re: eight-bit char handling in emacs-unicode
       [not found]                                           ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
@ 2003-11-26  0:07                                             ` Kenichi Handa
  2003-11-26 14:14                                               ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-26  0:07 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>  It seems that you keep of saying that "A does B, thus it's
>>  nonsense".  But, I'm arguing that "A does C".

> Well, the thing is: I still don't understand what is C.
> From what I understand, you say that C is "a conversion from multibyte
> to a sequence of code-points",

Yes, that what I said.

> but since the output is a unibyte string,
> that restrict it to cases where the code-points can be encoded in 8 bits,
> thus it doesn't sound very generic

Yes.  But I thought generic or not is not a point here.

> and I don't see any application for it
> (nor do I see any practical difference with using encode-coding-string
> since the output AFAIK would be the same).

My examples shows that we can't use encode-coding-string.
How can we use encode-coding-string without knowing what
coding system to use?  I haven't heard your answer yet.

>>  It doesn't make sense because you treat the result as "a
>>  unibyte string encoded in Latin-1".

>>  It makes sense if you treat the result as "a unibyte string
>>  in which each byte represents a sequence of Unicode
>>  code-points", doesn't it?

> But each byte can only represent the 0-255 subset of unicode code-points, in
> which case this is equivalent (practically speaking) to latin-1, isn't it ?

Yes.  And that covers all characters the user uses in this
case.

>>>  It'd make sense if the environment said "latin-1 when you can,
>>>  utf-8 otherwise" or something like that, but then we would use
>>>  encode-coding-string anyway.

>>  It's itself nonsense to have such a coding system.

> I was not thinking of a coding-system, but just some encoding job,
> such as what is done when saving a buffer (where my .emacs does exactly
> that: try latin-1 first and utf-8 if that fails).

Ah, I see.  But, my understanding is that
string-make-unibyte/multibyte are designed not to change the
number of characters to make the difference of
unibyte/multibyte transparent in Lisp.  That restriction
leads to a case that non-supported characters are handled
incorrectly.  But, I think Richard's design policy was that
incorrect handling of non-supported characters is better
than a possibly more disastrous error caused by the change
of number of characters.

>>  Do you agree with having string-make-unibyte if it signals an error on
>>  non-Latin-1 characters?

> Of course: that's pretty much what I suggested: make-string-unibyte only
> accepts multibyte chars that correspond to "bytes".

I agree with that.  But, it just changes the behaviour of
the function on error case.  It doesn't change the concept
of what it does.

>>>  I just don't know of a concrete case where it makes sense to use
>>>  string-make-unibyte.

>>  I'll paraphrase my previous example as this:

>>    It is perfectly possible to live in such an environment
>>    where only the characters U+0000..U+00FF of Unicode is
>>    used but only the coding system utf-8 is used.

>>  But, I don't claim that the above is a realistic case.

>>  Another non-realistic but concrete case is:

>>    Use only the charset iso-8859-5 and the encoding CTEXT.

> I don't see any use of string-make-unibyte in your two examples.

Again, I'd like to ask how to use encode-coding-string
without knowing the proper coding-system in each case.

> And "having string-make-unibyte if it signals an error on non-Latin-1
> characters" means that the second example can't be used any more.

In the second case, of course "supported characters" are
what included in the charset iso-8859-5, and
string-make-unibyte should accept them.  Again, the result
is the same as encoding by the coding system iso-8859-5, but
we only know about the coding system CTEXT here.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-26  0:07                                             ` Kenichi Handa
@ 2003-11-26 14:14                                               ` Stefan Monnier
  2003-11-27  1:34                                                 ` Kenichi Handa
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2003-11-26 14:14 UTC (permalink / raw)
  Cc: jas, emacs-devel

>> but since the output is a unibyte string,
>> that restrict it to cases where the code-points can be encoded in 8 bits,
>> thus it doesn't sound very generic
> Yes.  But I thought generic or not is not a point here.

Except that if it's not generic (in the sense that it does not behave
meaningfully in all language environments), then it can't be used in generic
elisp code, right?

>> and I don't see any application for it
>> (nor do I see any practical difference with using encode-coding-string
>> since the output AFAIK would be the same).

> My examples shows that we can't use encode-coding-string.
> How can we use encode-coding-string without knowing what
> coding system to use?  I haven't heard your answer yet.

I can't answer this question without knowing the answer to my question:
what is string-make-unibyte used for.  I'm not saying that we can do
something like:

  (defun string-make-unibyte (s) (encode-coding-string s <blabla>))

but I'm saying that everywhere where the current string-make-unibyte is
used, we should be able to easily replace it by a call to
encode-coding-string or a code to my make-string-unibyte (which does
not pay attention to the language environment and only accepts multibyte
chars that correspond to bytes, i.e. eight-bit-control or
eight-bit-graphic, or ASCII, and multibyte chars whose internal code point
is 128-255).

> But, my understanding is that
> string-make-unibyte/multibyte are designed not to change the
> number of characters to make the difference of
> unibyte/multibyte transparent in Lisp.

That is indeed an absolute requirement.

>> Of course: that's pretty much what I suggested: make-string-unibyte only
>> accepts multibyte chars that correspond to "bytes".

> I agree with that.  But, it just changes the behaviour of
> the function on error case.  It doesn't change the concept
> of what it does.

Except that I said "byte" not "code point", which makes a difference
in non-latin-1 locales.

>> I don't see any use of string-make-unibyte in your two examples.
> Again, I'd like to ask how to use encode-coding-string
> without knowing the proper coding-system in each case.

How could I know the coding-system to use when replacing
`string-make-unibyte' if I don't have any actual call to
string-make-unibyte to work with ?

        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-26 14:14                                               ` Stefan Monnier
@ 2003-11-27  1:34                                                 ` Kenichi Handa
  2003-11-27 14:23                                                   ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-27  1:34 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvhe0rp6ml.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>>  but since the output is a unibyte string,
>>>  that restrict it to cases where the code-points can be encoded in 8 bits,
>>>  thus it doesn't sound very generic
>>  Yes.  But I thought generic or not is not a point here.

> Except that if it's not generic (in the sense that it does not behave
> meaningfully in all language environments), then it can't be used in generic
> elisp code, right?

Yes.  But, it simply means that insertion of multibyte
string in a unibyte buffer can't be generic.

>>  My examples shows that we can't use encode-coding-string.
>>  How can we use encode-coding-string without knowing what
>>  coding system to use?  I haven't heard your answer yet.

> I can't answer this question without knowing the answer to my question:
> what is string-make-unibyte used for.

It is used for converting a multibyte string to unibyte
before it is inserted in a unibyte buffer.

> I'm not saying that we can do something like:

>   (defun string-make-unibyte (s) (encode-coding-string s <blabla>))

??? I have thought that you are saying that because you
wrote below:

> To do what your string-make-unibyte does you should use
> `encode-coding-string' where the coding system is passed explicitly.

Anyway,

> but I'm saying that everywhere where the current string-make-unibyte is
> used, we should be able to easily replace it by a call to
> encode-coding-string or a code to my make-string-unibyte (which does
> not pay attention to the language environment and only accepts multibyte
> chars that correspond to bytes, i.e. eight-bit-control or
> eight-bit-graphic, or ASCII, and multibyte chars whose internal code point
> is 128-255).

It's an ambiguous statement.  Which are you sauing?

Replace string-make-unibyte by:
(1) encode-coding-string or make-string-unibyte.

(2) a code that applies encode-coding-string or
make-string-unibyte to the whole string depending on
something (perhaps on the input string?).

(3) a code that applies encode-coding-string to substrings
where that is appropriate, and applies make-string-unibyte
to the remaing substrings.

(4) something that I still don't understand.

>>>  I don't see any use of string-make-unibyte in your two examples.
>>  Again, I'd like to ask how to use encode-coding-string
>>  without knowing the proper coding-system in each case.

> How could I know the coding-system to use when replacing
> `string-make-unibyte' if I don't have any actual call to
> string-make-unibyte to work with ?

What a strange logic?!?  You have been argued that we should
replace string-make-unibyte with something that uses
encode-coding-string.  Then you should have an idea about
what coding-system to use for encode-coding-string.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-27  1:34                                                 ` Kenichi Handa
@ 2003-11-27 14:23                                                   ` Stefan Monnier
  2003-12-01  0:43                                                     ` Kenichi Handa
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2003-11-27 14:23 UTC (permalink / raw)
  Cc: jas, emacs-devel

>> I can't answer this question without knowing the answer to my question:
>> what is string-make-unibyte used for.

> It is used for converting a multibyte string to unibyte
> before it is inserted in a unibyte buffer.

I meant `what is "converting from multibyte to unibyte" used for'.
I.e. it can be used for different things in different contexts and I can't
answer in general, so I need a concrete case.

> It's an ambiguous statement.  Which are you sauing?

> Replace string-make-unibyte by:
> (1) encode-coding-string or make-string-unibyte.

> (2) a code that applies encode-coding-string or
> make-string-unibyte to the whole string depending on
> something (perhaps on the input string?).

> (3) a code that applies encode-coding-string to substrings
> where that is appropriate, and applies make-string-unibyte
> to the remaing substrings.

> (4) something that I still don't understand.

I'm saying that each *call* to string-make-unibyte can be replaced
by a call to either encode-coding-string or make-string-unibyte.

But the decision of which to use and which coding-system to use
depends on the context.

Now why would we want to do the work of changing all those calls?
Because all those that would use encode-coding-string are incorrect
in using string-make-unibyte because they won't do the right thing
in some language environments.

        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-27 14:23                                                   ` Stefan Monnier
@ 2003-12-01  0:43                                                     ` Kenichi Handa
  2003-12-01 16:15                                                       ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-12-01  0:43 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvad6hlwu1.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>>  I can't answer this question without knowing the answer to my question:
>>>  what is string-make-unibyte used for.

>>  It is used for converting a multibyte string to unibyte
>>  before it is inserted in a unibyte buffer.

> I meant `what is "converting from multibyte to unibyte" used for'.
> I.e. it can be used for different things in different contexts and I can't
> answer in general, so I need a concrete case.

It is used for not loosing information about text even if
you kill a text in a multibyte buffer and paste it in a
unibyte buffer.  When you kill the just pasted text of a
unibyte buffer and paste it in the original multibyte
buffer, you recover the same character sequence.

Anyway, I already showed you this example:

  In Latin-2 environment but the default encoding is CTEXT.

In that case also, inserting multibyte latin-2 string in
unibyte buffer works the same way as in this case:

  In Latin-2 environment and the default environment is iso-latin-2.

And, that's because the functionality of string-make-unibyte
doesn't have to know about coding system.  All it has to
know is which character set to use.

If you can't answer in general, please answer to this
concrete question.

  In Latin-2 environment where one's primary character set
  is latin-iso8859-2 but the default encoding is CTEXT, how
  to make insertion of a multibyte string (containing only
  latin-iso8859-2 characters) in a unibyte buffer work with
  your method?  Such an insertion may happen when a user
  kill a text in a multibyte buffer and yank it in a unibyte
  buffer.

>>  It's an ambiguous statement.  Which are you sauing?

>>  Replace string-make-unibyte by:
>>  (1) encode-coding-string or make-string-unibyte.

>>  (2) a code that applies encode-coding-string or
>>  make-string-unibyte to the whole string depending on
>>  something (perhaps on the input string?).

>>  (3) a code that applies encode-coding-string to substrings
>>  where that is appropriate, and applies make-string-unibyte
>>  to the remaing substrings.

>>  (4) something that I still don't understand.

> I'm saying that each *call* to string-make-unibyte can be replaced
> by a call to either encode-coding-string or make-string-unibyte.

> But the decision of which to use and which coding-system to use
> depends on the context.

Are you talking about the actual Emacs Lisp codes that
explicitely call make-string-unibyte?  I've been talking
about the functionality of make-string-unibyte itself,
especially about the implicit call to the C function
copy_text that does the same thing as make-string-unibyte.
Is that the reason why it seems that we are talking at corss
purposes.

> Now why would we want to do the work of changing all those calls?
> Because all those that would use encode-coding-string are incorrect
> in using string-make-unibyte because they won't do the right thing
> in some language environments.

What is the right thing to do when a multibyte Japanese text
is being pasted into a unibyte buffer?

I think signalling an error is the only right thing, and
I've never objected to make copy_text and
Fstring_make_unibyte signal an error in such a case.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-12-01  0:43                                                     ` Kenichi Handa
@ 2003-12-01 16:15                                                       ` Stefan Monnier
  2003-12-02 13:07                                                         ` Kenichi Handa
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier @ 2003-12-01 16:15 UTC (permalink / raw)
  Cc: jas, emacs-devel

> It is used for not loosing information about text even if
> you kill a text in a multibyte buffer and paste it in a
> unibyte buffer.

That's the kind of concrete case I needed, thank you.
Now I'll have to go back and reread the thread to understand things
better.  Are there other cases like that ?

Also, should we really allow such a thing ?
I mean, it's a dangerous operation since it only works if the user
is lucky enough to use just the right subset of characters.  So we
should at least signal an error if the conversion is unsafe (in
that make-string-multibyte will not recover the original string).

BTW, in which kind of circumstances is the user presented with both
a multibyte buffer and a unibyte buffer ?

> Are you talking about the actual Emacs Lisp codes that
> explicitely call make-string-unibyte?  I've been talking
> about the functionality of make-string-unibyte itself,
> especially about the implicit call to the C function
> copy_text that does the same thing as make-string-unibyte.
> Is that the reason why it seems that we are talking at corss
> purposes.

I'm talking about both.

> What is the right thing to do when a multibyte Japanese text
> is being pasted into a unibyte buffer?

> I think signalling an error is the only right thing, and
> I've never objected to make copy_text and
> Fstring_make_unibyte signal an error in such a case.

I agree on the signalling, of course, I just want to push it further
and signal even when pasting latin-2 multibyte text into a unibyte buffer.
After all, why should Slovak users be able to do that but Japanese users
not ?  In my view, everytime we use this kind of thing, we're taking
a temporary shortcut that is "good enough for 8bit users" but not for the
rest of the world.
AFAIK, unibyte buffers should only be used internally and never presented
to the user.  This is because unibyte buffers contain bytes (in my view)
whereas the user wants to see characters.

        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-12-01 16:15                                                       ` Stefan Monnier
@ 2003-12-02 13:07                                                         ` Kenichi Handa
  2003-12-02 16:06                                                           ` Stefan Monnier
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-12-02 13:07 UTC (permalink / raw)
  Cc: jas, emacs-devel

In article <jwvd6b8ttfj.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>  It is used for not loosing information about text even if
>>  you kill a text in a multibyte buffer and paste it in a
>>  unibyte buffer.

> That's the kind of concrete case I needed, thank you.

I'm very glad that now we can start to argue on the same
wavelength.

> Now I'll have to go back and reread the thread to understand things
> better.

Please.

>  Are there other cases like that ?

For instance, on searching a multibyte string in a unibyte
buffer.  But, if we are searching for a regular expression
that contains a character range (e.g. [a-z]), the current
way of simple multibyte->unibyte conversion doesn't work in
many cases.  I fixed it in the unicode branch.

> Also, should we really allow such a thing ?

I myself tend to agree with dropping such a way of unibyte
support, but that should be decided by Richard.

> I mean, it's a dangerous operation since it only works if the user
> is lucky enough to use just the right subset of
> characters.

But, we can expect such a luck in many situations where
people mostly uses only characters belonging to their
primary charset.

> So we should at least signal an error if the conversion is
> unsafe (in that make-string-multibyte will not recover the
> original string).

Shall we test it with HEAD to check how often such an error
occurs?

> BTW, in which kind of circumstances is the user presented with both
> a multibyte buffer and a unibyte buffer ?

Even if one starts Emacs with --unibyte, emacs sometimes
make a multibyte buffer (e.g. C-h h).  And, even if one
starts Emacs with --multibyte, he may have a file that
contains, for instance, latin-1 characters and raw-byte
data, and he may want to read such a file with the coding
system raw-text (then C-x = always shows \000..\377).

>>  Are you talking about the actual Emacs Lisp codes that
>>  explicitely call make-string-unibyte?  I've been talking
>>  about the functionality of make-string-unibyte itself,
>>  especially about the implicit call to the C function
>>  copy_text that does the same thing as make-string-unibyte.
>>  Is that the reason why it seems that we are talking at corss
>>  purposes.

> I'm talking about both.

> I agree on the signalling, of course, I just want to push it further
> and signal even when pasting latin-2 multibyte text into a unibyte buffer.
> After all, why should Slovak users be able to do that but Japanese users
> not ?  In my view, everytime we use this kind of thing, we're taking
> a temporary shortcut that is "good enough for 8bit users" but not for the
> rest of the world.

The fact that something doesn't work for double-byte charset
users can't be a reason strong enough for dropping it for
single-byte charset users.

> AFAIK, unibyte buffers should only be used internally and never presented
> to the user.  This is because unibyte buffers contain bytes (in my view)
> whereas the user wants to see characters.

I agree that is a very clean view, and I myself expressed
the same thing several times.  But, it seems that Richard
doesn't want to drop the current way of unibyte support.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-12-02 13:07                                                         ` Kenichi Handa
@ 2003-12-02 16:06                                                           ` Stefan Monnier
  0 siblings, 0 replies; 58+ messages in thread
From: Stefan Monnier @ 2003-12-02 16:06 UTC (permalink / raw)
  Cc: jas, emacs-devel

>> So we should at least signal an error if the conversion is
>> unsafe (in that make-string-multibyte will not recover the
>> original string).

> Shall we test it with HEAD to check how often such an error
> occurs?

That would be great.

>> BTW, in which kind of circumstances is the user presented with both
>> a multibyte buffer and a unibyte buffer ?

> Even if one starts Emacs with --unibyte, emacs sometimes
> make a multibyte buffer (e.g. C-h h).

I guess in a unibyte session, it makes sense, because in such a case,
unibyte buffers do contain characters and the user explicitly tells us
"don't bother me about multiple charsets, just pretend all fits within
8bits".

> And, even if one starts Emacs with --multibyte, he may have a file that
> contains, for instance, latin-1 characters and raw-byte data, and he may
> want to read such a file with the coding system raw-text (then C-x =
> always shows \000..\377).

Is such a buffer necessarily unibyte ?  Why not multibyte ?
Or is it for performance reasons ?
And what should happen if we paste text containing 8859-5 ou BIG5
text in such a buffer ?

> The fact that something doesn't work for double-byte charset
> users can't be a reason strong enough for dropping it for
> single-byte charset users.

Agreed.  But we should encourage people to "do it right" by calling
the appropriate encoding/decoding functions so it works for all cases.
I believe that a good way to encourage people is by discouraging the use of
string-make-unibyte (and other ways to use copy_text similarly).

        Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: eight-bit char handling in emacs-unicode
  2003-11-23 23:48                                       ` Stefan Monnier
  2003-11-25  1:07                                         ` Kenichi Handa
@ 2003-11-25  4:28                                         ` Richard Stallman
  1 sibling, 0 replies; 58+ messages in thread
From: Richard Stallman @ 2003-11-25  4:28 UTC (permalink / raw)
  Cc: jas, emacs-devel, handa

    No, as a matter of fact I don't see why in a utf-8 environment,
    it makes any sense to have a function that turns a multibyte string
    into a unibyte string encoded in latin-1 (without even complaining when
    it encounters other characters).

There are programs that need to do explicit encoding.  This will
always be necessary.

^ permalink raw reply	[flat|nested] 58+ messages in thread

[parent not found: <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>]

* Re: eight-bit char handling in emacs-unicode
       [not found]                                     ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
@ 2003-12-09 21:49                                       ` Richard Stallman
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Stallman @ 2003-12-09 21:49 UTC (permalink / raw)
  Cc: emacs-devel, handa, jas

    So you seem to be thinking about a piece of elisp (or maybe C) that will
    call string-make-unibyte, but I'm wondering which piece of code you're
    thinking of, because this piece of code will work if your keyboard uses
    latin-1 encoding, but not if it uses utf-8 encoding.

That may be good enough for some users.

    Also I'm wondering why this piece of code needs to use string-make-unibyte,
    instead of encode-coding-string (the only good reason I can think of is
    that the coding-system to use is not immediately apparent.

One possible reason to use string-make-unibyte is because you want things
to work "as if they'd been converted by something else".  As long as
Emacs performs this conversion in other situations on its own,
it is useful to make it available as a separate function.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-13  6:10     ` BIG5-HKSCS? Kenichi Handa
  2003-11-13  6:51       ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-15 22:32       ` Simon Josefsson
  2003-11-17  1:12         ` BIG5-HKSCS? Kenichi Handa
  1 sibling, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-15 22:32 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <ilur80c50uj.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> Kenichi Handa <handa@m17n.org> writes:
>>>  % cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs
>
>> I tried starting Gnus on it, but it failed.  It died with a elisp
>> backtrace regarding define-key or something like that within bbdb.
>> Since bbdb isn't a critical part, 
>
> As bbdb is not a part of Emacs, I have no idea what is wrong
> with it.

Here is a test case:

  (setq bbdb-mode-map (make-keymap))
  (suppress-keymap bbdb-mode-map)
  (define-key bbdb-mode-map [(?\;)]        'bbdb-record-edit-notes)
  (define-key bbdb-mode-map [(??)]         'bbdb-help)

Thanks.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-15 22:32       ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-17  1:12         ` Kenichi Handa
  2003-11-17  2:06           ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 1 reply; 58+ messages in thread
From: Kenichi Handa @ 2003-11-17  1:12 UTC (permalink / raw)
  Cc: emacs-unicode, emacs-devel

In article <ilud6bte00n.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>>  As bbdb is not a part of Emacs, I have no idea what is wrong
>>  with it.

> Here is a test case:

>   (setq bbdb-mode-map (make-keymap))
>   (suppress-keymap bbdb-mode-map)
>   (define-key bbdb-mode-map [(?\;)]        'bbdb-record-edit-notes)
>   (define-key bbdb-mode-map [(??)]         'bbdb-help)

Thank you.  I found a bug in handling Lucid style event type
list.   I've just installed a fix.

By the way, Oliver Scholz <epameinondas@gmx.de> has also
started testing emacs-unicode and our discussion shifted to
emacs-unicode@gnu.org mailing list.  I think it is better to
use that mailing list for bug-reports specific to
emacs-unicode.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-17  1:12         ` BIG5-HKSCS? Kenichi Handa
@ 2003-11-17  2:06           ` Simon Josefsson
  2003-11-17  5:45             ` BIG5-HKSCS? Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-17  2:06 UTC (permalink / raw)
  Cc: emacs-unicode, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <ilud6bte00n.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>>>  As bbdb is not a part of Emacs, I have no idea what is wrong
>>>  with it.
>
>> Here is a test case:
>
>>   (setq bbdb-mode-map (make-keymap))
>>   (suppress-keymap bbdb-mode-map)
>>   (define-key bbdb-mode-map [(?\;)]        'bbdb-record-edit-notes)
>>   (define-key bbdb-mode-map [(??)]         'bbdb-help)
>
> Thank you.  I found a bug in handling Lucid style event type
> list.   I've just installed a fix.

Thanks, I'll try it.

> By the way, Oliver Scholz <epameinondas@gmx.de> has also
> started testing emacs-unicode and our discussion shifted to
> emacs-unicode@gnu.org mailing list.  I think it is better to
> use that mailing list for bug-reports specific to
> emacs-unicode.

Where can I find archives for the list?  I sent a report to the list,
but it didn't appear to be a proper mailing list, so I got a user
unknown bounce from <oldo@coli.uni-sb.de>.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-17  2:06           ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-17  5:45             ` Eli Zaretskii
  2003-11-17  7:43               ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2003-11-17  5:45 UTC (permalink / raw)
  Cc: emacs-unicode, emacs-devel, handa

> From: Simon Josefsson <jas@extundo.com>
> Date: Mon, 17 Nov 2003 03:06:22 +0100
> 
> > By the way, Oliver Scholz <epameinondas@gmx.de> has also
> > started testing emacs-unicode and our discussion shifted to
> > emacs-unicode@gnu.org mailing list.  I think it is better to
> > use that mailing list for bug-reports specific to
> > emacs-unicode.
> 
> Where can I find archives for the list?

If you have a login on fencepost.gnu.org, I can tell you where to
find the archives (mail me privately).  Otherwise, tough.

> I sent a report to the list, but it didn't appear to be a proper
> mailing list, so I got a user unknown bounce from
> <oldo@coli.uni-sb.de>.

What do you mean by ``didn't appear to be a proper mailing list''?
What did you do and what happened, exactly?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-17  5:45             ` BIG5-HKSCS? Eli Zaretskii
@ 2003-11-17  7:43               ` Simon Josefsson
  2003-11-18  7:01                 ` BIG5-HKSCS? Richard Stallman
  0 siblings, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-17  7:43 UTC (permalink / raw)
  Cc: emacs-unicode, handa, emacs-devel

Eli Zaretskii <eliz@elta.co.il> writes:

    >> From: Simon Josefsson <jas@extundo.com>
>> Date: Mon, 17 Nov 2003 03:06:22 +0100
>> 
>> > By the way, Oliver Scholz <epameinondas@gmx.de> has also
>> > started testing emacs-unicode and our discussion shifted to
>> > emacs-unicode@gnu.org mailing list.  I think it is better to
>> > use that mailing list for bug-reports specific to
>> > emacs-unicode.
>> 
>> Where can I find archives for the list?
>
> If you have a login on fencepost.gnu.org, I can tell you where to
> find the archives (mail me privately).  Otherwise, tough.

I found it.

>> I sent a report to the list, but it didn't appear to be a proper
>> mailing list, so I got a user unknown bounce from
>> <oldo@coli.uni-sb.de>.
>
> What do you mean by ``didn't appear to be a proper mailing list''?

I meant that it wasn't run by some mailing list software that set the
sender address to the mailing list software, instead of maintaining
the original sender address (i.e., my address).  The consequence is
that when there is a network (or other) problem for any member of the
list, I get a bounce.  Fortunately, the list doesn't appear to have
many members.  Moving the list to mail.gnu.org would, besides fixing
that problem, allow public archiving of the list, and user
customization of delivery options.

> What did you do and what happened, exactly?

I sent a mail to the list, and got a user unknown bounce from one
member on the list.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-17  7:43               ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-18  7:01                 ` Richard Stallman
  2003-11-18  8:56                   ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 1 reply; 58+ messages in thread
From: Richard Stallman @ 2003-11-18  7:01 UTC (permalink / raw)
  Cc: eliz, emacs-devel, emacs-unicode, handa

    I meant that it wasn't run by some mailing list software that set the
    sender address to the mailing list software, instead of maintaining
    the original sender address (i.e., my address).

That way of running a list is a pain in the neck, because it makes
sending a reply just to the sender of a message rather inconvenient.
So we do not generally set up our lists that way.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-18  7:01                 ` BIG5-HKSCS? Richard Stallman
@ 2003-11-18  8:56                   ` Simon Josefsson
  2003-11-19  5:15                     ` BIG5-HKSCS? Richard Stallman
  0 siblings, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-18  8:56 UTC (permalink / raw)
  Cc: eliz, emacs-devel, emacs-unicode, handa

Richard Stallman <rms@gnu.org> writes:

>     I meant that it wasn't run by some mailing list software that set the
>     sender address to the mailing list software, instead of maintaining
>     the original sender address (i.e., my address).
>
> That way of running a list is a pain in the neck, because it makes
> sending a reply just to the sender of a message rather inconvenient.
> So we do not generally set up our lists that way.

I think there is some confusion here; the sender address is not the
same as adding a Reply-To header, which I believe you refer to.  I
agree using a mailing list software that add a Reply-To header that
point to the mailing list itself is just wrong.  But altering the
sender address to the mailing list software itself avoids a torrent of
bounces to everyone that sends a message to the list.

For example, for each and every message I have sent to
emacs-unicode@gnu.org I have received a bounce message saying that one
of the lists' member is not known.  If that was a large list, with
thousands of subscribers, the risk that, say, 10 of the subscribes
have disappeared from the net is rather high.  Getting tens of bounces
for every message you send to a list is not good.

I think it would be best to make emacs-unicode@gnu.org a proper
mail.gnu.org mailing list.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-18  8:56                   ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-19  5:15                     ` Richard Stallman
  2003-11-20  5:48                       ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 1 reply; 58+ messages in thread
From: Richard Stallman @ 2003-11-19  5:15 UTC (permalink / raw)
  Cc: eliz, emacs-devel, emacs-unicode, handa

    I think there is some confusion here; the sender address is not the
    same as adding a Reply-To header, which I believe you refer to.  I
    agree using a mailing list software that add a Reply-To header that
    point to the mailing list itself is just wrong.  But altering the
    sender address to the mailing list software itself avoids a torrent of
    bounces to everyone that sends a message to the list.

You are right, I did misunderstand that point.  Sorry.

    I think it would be best to make emacs-unicode@gnu.org a proper
    mail.gnu.org mailing list.

It is a proper mailing list now, just not managed through mailman.

I have no opinion on whether to use mailman to manage that list,
but I strongly object to the idea that there is something less
valid or less proper about defining a mailing list as an alias.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-19  5:15                     ` BIG5-HKSCS? Richard Stallman
@ 2003-11-20  5:48                       ` Simon Josefsson
  2003-11-20  5:56                         ` BIG5-HKSCS? Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Simon Josefsson @ 2003-11-20  5:48 UTC (permalink / raw)
  Cc: eliz, emacs-devel, emacs-unicode, handa

Richard Stallman <rms@gnu.org> writes:

>     I think it would be best to make emacs-unicode@gnu.org a proper
>     mail.gnu.org mailing list.
>
> It is a proper mailing list now, just not managed through mailman.
>
> I have no opinion on whether to use mailman to manage that list,
> but I strongly object to the idea that there is something less
> valid or less proper about defining a mailing list as an alias.

Right.  I used the term "proper mailing list" in search of a better
term, that would distinguish between mailing lists handled by mailman
and mailing lists handled by sendmail aliases.

(Incidentally, lately I have been receiving bounces for eliz@gnu.org
for every post I send to emacs-unicode@gnu.org...  perhaps fixed by
now though.)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-20  5:48                       ` BIG5-HKSCS? Simon Josefsson
@ 2003-11-20  5:56                         ` Eli Zaretskii
  2003-11-20  6:20                           ` BIG5-HKSCS? Simon Josefsson
  0 siblings, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2003-11-20  5:56 UTC (permalink / raw)
  Cc: emacs-unicode, emacs-devel, handa

> From: Simon Josefsson <jas@extundo.com>
> Date: Thu, 20 Nov 2003 06:48:49 +0100
> 
> (Incidentally, lately I have been receiving bounces for eliz@gnu.org
> for every post I send to emacs-unicode@gnu.org...  perhaps fixed by
> now though.)

When was the last bounce, please?  I think I fixed that a few days
ago.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: BIG5-HKSCS?
  2003-11-20  5:56                         ` BIG5-HKSCS? Eli Zaretskii
@ 2003-11-20  6:20                           ` Simon Josefsson
  0 siblings, 0 replies; 58+ messages in thread
From: Simon Josefsson @ 2003-11-20  6:20 UTC (permalink / raw)
  Cc: emacs-unicode, emacs-devel, handa

Eli Zaretskii <eliz@elta.co.il> writes:

>> From: Simon Josefsson <jas@extundo.com>
>> Date: Thu, 20 Nov 2003 06:48:49 +0100
>> 
>> (Incidentally, lately I have been receiving bounces for eliz@gnu.org
>> for every post I send to emacs-unicode@gnu.org...  perhaps fixed by
>> now though.)
>
> When was the last bounce, please?  I think I fixed that a few days
> ago.

Last time I posted (except previous message) was 18 Nov.  I haven't
received a bounce to the previous mail yet, and since you received it,
I guess it works...

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2003-12-09 21:49 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-12 16:11 BIG5-HKSCS? Simon Josefsson
2003-11-13  1:53 ` BIG5-HKSCS? Kenichi Handa
2003-11-13  4:14   ` BIG5-HKSCS? Simon Josefsson
2003-11-13  5:34     ` BIG5-HKSCS? Kenichi Handa
2003-11-13  5:50       ` BIG5-HKSCS? Simon Josefsson
2003-11-13  4:49   ` BIG5-HKSCS? Simon Josefsson
2003-11-13  6:10     ` BIG5-HKSCS? Kenichi Handa
2003-11-13  6:51       ` BIG5-HKSCS? Simon Josefsson
2003-11-13  9:01         ` BIG5-HKSCS? Kenichi Handa
2003-11-13 13:29           ` BIG5-HKSCS? Oliver Scholz
2003-11-13 23:40             ` BIG5-HKSCS? Kenichi Handa
2003-11-14 13:35               ` BIG5-HKSCS? Oliver Scholz
2003-11-13 16:34           ` BIG5-HKSCS? Simon Josefsson
2003-11-14  0:47             ` eight-bit char handling in emacs-unicode Kenichi Handa
2003-11-14 13:25               ` Oliver Scholz
2003-11-15  1:09                 ` Kenichi Handa
2003-11-15 10:26                   ` Oliver Scholz
2003-11-15 21:47                     ` Simon Josefsson
2003-11-15  3:04               ` Simon Josefsson
2003-11-16 15:03                 ` Alex Schroeder
2003-11-17 21:17               ` Stefan Monnier
2003-11-18  7:33                 ` Kenichi Handa
2003-11-18 17:12                   ` Stefan Monnier
2003-11-19  0:06                     ` Kenichi Handa
2003-11-19  3:05                       ` Stefan Monnier
2003-11-19 10:46                         ` Juri Linkov
2003-11-19 13:48                           ` Stefan Monnier
2003-11-20 23:41                           ` Kenichi Handa
2003-11-21  0:41                         ` Kenichi Handa
2003-11-21  5:27                           ` Stefan Monnier
2003-11-21  6:27                             ` Kenichi Handa
2003-11-21 14:59                               ` Stefan Monnier
2003-11-22  1:25                                 ` Kenichi Handa
2003-11-22 23:53                                   ` Stefan Monnier
2003-11-23  7:30                                     ` Kenichi Handa
2003-11-23 23:48                                       ` Stefan Monnier
2003-11-25  1:07                                         ` Kenichi Handa
     [not found]                                           ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-11-26  0:07                                             ` Kenichi Handa
2003-11-26 14:14                                               ` Stefan Monnier
2003-11-27  1:34                                                 ` Kenichi Handa
2003-11-27 14:23                                                   ` Stefan Monnier
2003-12-01  0:43                                                     ` Kenichi Handa
2003-12-01 16:15                                                       ` Stefan Monnier
2003-12-02 13:07                                                         ` Kenichi Handa
2003-12-02 16:06                                                           ` Stefan Monnier
2003-11-25  4:28                                         ` Richard Stallman
     [not found]                                     ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-12-09 21:49                                       ` Richard Stallman
2003-11-15 22:32       ` BIG5-HKSCS? Simon Josefsson
2003-11-17  1:12         ` BIG5-HKSCS? Kenichi Handa
2003-11-17  2:06           ` BIG5-HKSCS? Simon Josefsson
2003-11-17  5:45             ` BIG5-HKSCS? Eli Zaretskii
2003-11-17  7:43               ` BIG5-HKSCS? Simon Josefsson
2003-11-18  7:01                 ` BIG5-HKSCS? Richard Stallman
2003-11-18  8:56                   ` BIG5-HKSCS? Simon Josefsson
2003-11-19  5:15                     ` BIG5-HKSCS? Richard Stallman
2003-11-20  5:48                       ` BIG5-HKSCS? Simon Josefsson
2003-11-20  5:56                         ` BIG5-HKSCS? Eli Zaretskii
2003-11-20  6:20                           ` BIG5-HKSCS? Simon Josefsson

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).