* BIG5-HKSCS? @ 2003-11-12 16:11 Simon Josefsson 2003-11-13 1:53 ` BIG5-HKSCS? Kenichi Handa 0 siblings, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-12 16:11 UTC (permalink / raw) How would one add a new coding system? Someone requested support for BIG5-HKSCS. The relevant references appear to be (although the second file was 10MB so I haven't downloaded it): http://www.iana.org/assignments/charset-reg/Big5-HKSCS http://www.info.gov.hk/digital21/eng/hkscs/download/e_hkscs.pdf ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-12 16:11 BIG5-HKSCS? Simon Josefsson @ 2003-11-13 1:53 ` Kenichi Handa 2003-11-13 4:14 ` BIG5-HKSCS? Simon Josefsson 2003-11-13 4:49 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 2 replies; 58+ messages in thread From: Kenichi Handa @ 2003-11-13 1:53 UTC (permalink / raw) Cc: emacs-devel In article <ilubrrha7oc.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > How would one add a new coding system? Someone requested support for > BIG5-HKSCS. The relevant references appear to be (although the second > file was 10MB so I haven't downloaded it): It's not easy to support BIG5-HKSCS in the current Emacs, and I don't have a time to work on it now, sorry. But emacs-unicode version already supports it. You can get that version from CVS as below: % cvs -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs login Logging in to :pserver:anoncvs@subversions.gnu.org:2401/cvsroot/emacs CVS password: <-- Hit Return here % cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 1:53 ` BIG5-HKSCS? Kenichi Handa @ 2003-11-13 4:14 ` Simon Josefsson 2003-11-13 5:34 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 4:49 ` BIG5-HKSCS? Simon Josefsson 1 sibling, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-13 4:14 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <ilubrrha7oc.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> How would one add a new coding system? Someone requested support for >> BIG5-HKSCS. The relevant references appear to be (although the second >> file was 10MB so I haven't downloaded it): > > It's not easy to support BIG5-HKSCS in the current Emacs, > and I don't have a time to work on it now, sorry. But > emacs-unicode version already supports it. Good enough for me. Do you have an opinion on whether falling back to BIG5 when BIG5-HKSCS is not available [in Gnus, for displaying incoming e-mail in BIG-5HKSCS], is a reasonable behaviour? I browsed the BIG5-HKSCS specification, and it appear to add lots of characters (~1500) but it didn't seem to alter any, and I can't tell whether the additions are critical or just rarely used symbols. I doubt rendering it as BIG5 is worse than QP, though, which is the current behaviour. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 4:14 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-13 5:34 ` Kenichi Handa 2003-11-13 5:50 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-13 5:34 UTC (permalink / raw) Cc: emacs-devel In article <iluislo9a7g.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > Good enough for me. Do you have an opinion on whether falling back to > BIG5 when BIG5-HKSCS is not available [in Gnus, for displaying > incoming e-mail in BIG-5HKSCS], is a reasonable behaviour? > I browsed the BIG5-HKSCS specification, and it appear to add lots of > characters (~1500) but it didn't seem to alter any Hmmm, if that is true, it's possbile to support it in the current Emacs. Emacs repsents Big5 characters in two charsets chinese-big5-1 and chinese-big5-2 internally. The former contains Big5 chars #xA140 .. #xC8FE, the latter #xC940..#xFEFE. That means that chinese-big5-1 still has a room for that additional 1500 character. > , and I can't tell > whether the additions are critical or just rarely used symbols. I > doubt rendering it as BIG5 is worse than QP, though, which is the > current behaviour. If BIG5-HKSCS surely just adds characters to BIG5, I think it is reasonable to fallback to BIG5. But, as I wrote above, it seems possible to support the whole BIG5-HKSCS in the current Emacs with a faily small effort. Could you please wait for a while? --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 5:34 ` BIG5-HKSCS? Kenichi Handa @ 2003-11-13 5:50 ` Simon Josefsson 0 siblings, 0 replies; 58+ messages in thread From: Simon Josefsson @ 2003-11-13 5:50 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <iluislo9a7g.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> Good enough for me. Do you have an opinion on whether falling back to >> BIG5 when BIG5-HKSCS is not available [in Gnus, for displaying >> incoming e-mail in BIG-5HKSCS], is a reasonable behaviour? > >> I browsed the BIG5-HKSCS specification, and it appear to add lots of >> characters (~1500) but it didn't seem to alter any > > Hmmm, if that is true, it's possbile to support it in the > current Emacs. Emacs repsents Big5 characters in two > charsets chinese-big5-1 and chinese-big5-2 internally. The > former contains Big5 chars #xA140 .. #xC8FE, the latter > #xC940..#xFEFE. That means that chinese-big5-1 still has a > room for that additional 1500 character. > >> , and I can't tell >> whether the additions are critical or just rarely used symbols. I >> doubt rendering it as BIG5 is worse than QP, though, which is the >> current behaviour. > > If BIG5-HKSCS surely just adds characters to BIG5, I think > it is reasonable to fallback to BIG5. But, as I wrote > above, it seems possible to support the whole BIG5-HKSCS in > the current Emacs with a faily small effort. Could you > please wait for a while? I don't read Chinese, so I don't care much, but someone in gnu.emacs.gnus might be happy. :-) I recall that the characters it added was in the User-Defined and Vendor-Defined areas of BIG-5, so making those mean BIG5-HKSCS could potentially conflict with other BIG5 variants, though. But all this is beyond my non-ASCII knowledge, so don't count on me to test or provide any useful feedback. I'll propose to add the BIG5-HKSCS -> BIG5 alias to Gnus, though, for old Emacs. Thanks for your work and prompt responses! ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 1:53 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 4:14 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-13 4:49 ` Simon Josefsson 2003-11-13 6:10 ` BIG5-HKSCS? Kenichi Handa 1 sibling, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-13 4:49 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > % cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs I tried starting Gnus on it, but it failed. It died with a elisp backtrace regarding define-key or something like that within bbdb. Since bbdb isn't a critical part, I just disabled it, but then it crashed within Fbase64_encode_string. I think emacs-unicode-2 is too unstable for me to continue look at it, but I can try again in a few months. (gdb) bt #0 abort () at emacs.c:417 #1 0x0818e68c in Fbase64_encode_string (string=956298188, no_line_break=72) at fns.c:3224 #2 0x08185292 in Ffuncall (nargs=2, args=0xbfffd880) at eval.c:2727 #3 0x081b0ff5 in Fbyte_code (bytestr=409382308, vector=1, maxdepth=-1073751824) at bytecode.c:710 #4 0x08185689 in funcall_lambda (fun=1215756704, nargs=1, arg_vector=0xbfffda24) at eval.c:2911 #5 0x0818514d in Ffuncall (nargs=2, args=0xbfffda20) at eval.c:2781 #6 0x081b0ff5 in Fbyte_code (bytestr=418352412, vector=1, maxdepth=-1073751520) at bytecode.c:710 #7 0x08185689 in funcall_lambda (fun=1216104416, nargs=2, arg_vector=0xbfffdb48) at eval.c:2911 #8 0x0818514d in Ffuncall (nargs=3, args=0xbfffdb44) at eval.c:2781 #9 0x081b0ff5 in Fbyte_code (bytestr=406124996, vector=2, maxdepth=-1073751228) at bytecode.c:710 #10 0x08185689 in funcall_lambda (fun=1216141496, nargs=1, arg_vector=0xbfffdc64) at eval.c:2911 #11 0x0818514d in Ffuncall (nargs=2, args=0xbfffdc60) at eval.c:2781 #12 0x081b0ff5 in Fbyte_code (bytestr=955237444, vector=1, maxdepth=-1073750944) at bytecode.c:710 #13 0x08185689 in funcall_lambda (fun=1215587456, nargs=2, arg_vector=0xbfffdd84) at eval.c:2911 ---Type <return> to continue, or q <return> to quit---q Quit (gdb) up #1 0x0818e68c in Fbase64_encode_string (string=956298188, no_line_break=72) at fns.c:3224 3224 abort (); (gdb) p encoded_length $1 = 72 (gdb) p allength $2 = 56 (gdb) p length $3 = 36 (gdb) p string $4 = 956298188 (gdb) q A debugging session is active. Do you still want to close the debugger?(y or n) y jas@latte:~/src/emacs-unicode/src$ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 4:49 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-13 6:10 ` Kenichi Handa 2003-11-13 6:51 ` BIG5-HKSCS? Simon Josefsson 2003-11-15 22:32 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 2 replies; 58+ messages in thread From: Kenichi Handa @ 2003-11-13 6:10 UTC (permalink / raw) Cc: emacs-devel In article <ilur80c50uj.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > Kenichi Handa <handa@m17n.org> writes: >> % cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs > I tried starting Gnus on it, but it failed. It died with a elisp > backtrace regarding define-key or something like that within bbdb. > Since bbdb isn't a critical part, As bbdb is not a part of Emacs, I have no idea what is wrong with it. Anyway, > I just disabled it, but then it crashed within > Fbase64_encode_string. I found a simple/silly mistake in fns.c, and have just installed a fix. Could you please update your working directory and try again? --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 6:10 ` BIG5-HKSCS? Kenichi Handa @ 2003-11-13 6:51 ` Simon Josefsson 2003-11-13 9:01 ` BIG5-HKSCS? Kenichi Handa 2003-11-15 22:32 ` BIG5-HKSCS? Simon Josefsson 1 sibling, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-13 6:51 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: >> I just disabled it, but then it crashed within >> Fbase64_encode_string. > > I found a simple/silly mistake in fns.c, and have just > installed a fix. Could you please update your working > directory and try again? The HMAC-MD5 function seem to fail, causing my login attempts in Gnus to fail. Reproduce it by: jas@latte:~/src/emacs-unicode/src$ ./emacs -q ../lisp/gnus/rfc2104.el then do M-x eval-buffer RET and try to evaluate some of the test vectors, the first one should give: (rfc2104-hash 'md5 64 16 "Jefe" "what do ya want for nothing?") => "750c783e6ab0b503eaa86e310a5db738" With emacs-unicode I get "f898573306b1366f6edd841a9f5b2871". Is anyone using the emacs-unicode branch with Gnus? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 6:51 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-13 9:01 ` Kenichi Handa 2003-11-13 13:29 ` BIG5-HKSCS? Oliver Scholz 2003-11-13 16:34 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 2 replies; 58+ messages in thread From: Kenichi Handa @ 2003-11-13 9:01 UTC (permalink / raw) Cc: emacs-devel In article <iluekwcwyl8.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > The HMAC-MD5 function seem to fail, causing my login attempts in Gnus > to fail. Reproduce it by: > jas@latte:~/src/emacs-unicode/src$ ./emacs -q ../lisp/gnus/rfc2104.el > then do M-x eval-buffer RET and try to evaluate some of the test > vectors, the first one should give: > (rfc2104-hash 'md5 64 16 "Jefe" "what do ya want for nothing?") > => "750c783e6ab0b503eaa86e310a5db738" > With emacs-unicode I get "f898573306b1366f6edd841a9f5b2871". Thank you for testing. I've just installed a fix for rfc2104.el. I'd like to ask you to try it again. This is a typical problem of emacs-unicode in which characters 128..255 are valid Unicode characters, thus, for instance, (concat '(?a ?\300)) returns a multibyte string of `a' and `À'. But in the current Emacs, it returns a unibyte string. I suspect the similar fix is necessary in several other places. > Is anyone using the emacs-unicode branch with Gnus? At least, I'm not a Gnus user. I'd like to ask people to use emacs-unicode in various ways to find bugs. What I can test is limited, but, usually, I can fix them quite easily like this case. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 9:01 ` BIG5-HKSCS? Kenichi Handa @ 2003-11-13 13:29 ` Oliver Scholz 2003-11-13 23:40 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 16:34 ` BIG5-HKSCS? Simon Josefsson 1 sibling, 1 reply; 58+ messages in thread From: Oliver Scholz @ 2003-11-13 13:29 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: [...] > I'd like to ask people to use emacs-unicode in various ways to find > bugs. What I can test is limited, but, usually, I can fix them quite > easily like this case. [...] Alright, I decided to follow this advice and checked out emacs-unicode-2 an hour ago. When I eval this expression on GNU/Linux ... (set-face-font 'mode-line "-adobe-helvetica-medium-r-normal-*-12-*-*-*-*-*-iso8859-1") ... then Emacs segfaults. I append the backtrace. Where is the right place to send bug reports to? emacs-pretest-bug@gnu.org, perhaps with "[unicode]" in the subject? Or to the emacs-unicode mailing list? [Thank you a lot for your work, BTW!] Oliver The backtrace: #0 0x080bb099 in choose_face_font (f=0x8473830, attrs=0x89d61e8, font_spec=405708356, needs_overstrike=0xbfffe858) at xfaces.c:6620 #1 0x080b3528 in load_face_font (f=0x8473830, face=0x89d61a0) at xfaces.c:1235 #2 0x080bbbcd in realize_x_face (cache=0x8b8aff8, attrs=0xbfffe964) at xfaces.c:6980 #3 0x080bb893 in realize_face (cache=0x8b8aff8, attrs=0xbfffe964, former_face_id=2) at xfaces.c:6869 #4 0x080bb82d in realize_named_face (f=0x8473830, symbol=405751916, id=2) at xfaces.c:6839 #5 0x080bb1f6 in realize_basic_faces (f=0x8473830) at xfaces.c:6670 #6 0x080b302b in recompute_basic_faces (f=0x8473830) at xfaces.c:951 #7 0x0805f869 in init_iterator (it=0xbfffeab4, w=0x8bedfc8, charpos=-1, bytepos=-1, row=0x0, base_face_id=DEFAULT_FACE_ID) at xdisp.c:2012 #8 0x080677e6 in x_consider_frame_title (frame=1212626992) at xdisp.c:7885 #9 0x08067904 in prepare_menu_bars () at xdisp.c:7944 #10 0x0806a533 in redisplay_internal (preserve_echo_area=0) at xdisp.c:9743 #11 0x08069edc in redisplay () at xdisp.c:9533 #12 0x080e2de5 in read_char (commandflag=1, nmaps=2, maps=0xbffff314, prev_event=405708356, used_mouse_menu=0xbffff358) at keyboard.c:2496 #13 0x080ea115 in read_key_sequence (keybuf=0xbffff464, bufsize=30, prompt=405708356, dont_downcase_last=0, can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:8827 #14 0x080e10c6 in command_loop_1 () at keyboard.c:1505 #15 0x0813773d in internal_condition_case (bfun=0x80e0db4 <command_loop_1>, handlers=405796004, hfun=0x80e09b4 <cmd_error>) at eval.c:1333 #16 0x080e0c78 in command_loop_2 () at keyboard.c:1292 #17 0x081372b5 in internal_catch (tag=405757252, func=0x80e0c54 <command_loop_2>, arg=405708356) at eval.c:1094 #18 0x080e0c23 in command_loop () at keyboard.c:1271 #19 0x080e0778 in recursive_edit_1 () at keyboard.c:987 #20 0x080e08a0 in Frecursive_edit () at keyboard.c:1043 #21 0x080df722 in main (argc=2, argv=0xbffffa34) at emacs.c:1673 -- Oliver Scholz 23 Brumaire an 212 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 13:29 ` BIG5-HKSCS? Oliver Scholz @ 2003-11-13 23:40 ` Kenichi Handa 2003-11-14 13:35 ` BIG5-HKSCS? Oliver Scholz 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-13 23:40 UTC (permalink / raw) Cc: emacs-devel In article <873ccswg5i.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <epameinondas@gmx.de> writes: > Alright, I decided to follow this advice and checked out > emacs-unicode-2 an hour ago. Thank you very much!! > When I eval this expression on GNU/Linux ... > (set-face-font 'mode-line > "-adobe-helvetica-medium-r-normal-*-12-*-*-*-*-*-iso8859-1") > ... then Emacs segfaults. I can't reproduce it, but I found one bug in xfaces.c. I've just installed it. Perhaps, it fixes the above bug. > I append the backtrace. Where is the right place to send bug reports > to? emacs-pretest-bug@gnu.org, perhaps with "[unicode]" in the > subject? Or to the emacs-unicode mailing list? I think emacs-unicode@gnu.org is suitable. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 23:40 ` BIG5-HKSCS? Kenichi Handa @ 2003-11-14 13:35 ` Oliver Scholz 0 siblings, 0 replies; 58+ messages in thread From: Oliver Scholz @ 2003-11-14 13:35 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <873ccswg5i.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <epameinondas@gmx.de> writes: [...] >> ... then Emacs segfaults. > > I can't reproduce it, but I found one bug in xfaces.c. I've > just installed it. Perhaps, it fixes the above bug. [...] Yes, it is fixed now. Thanks. Oliver -- Oliver Scholz 24 Brumaire an 212 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 9:01 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 13:29 ` BIG5-HKSCS? Oliver Scholz @ 2003-11-13 16:34 ` Simon Josefsson 2003-11-14 0:47 ` eight-bit char handling in emacs-unicode Kenichi Handa 1 sibling, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-13 16:34 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <iluekwcwyl8.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> The HMAC-MD5 function seem to fail, causing my login attempts in Gnus >> to fail. Reproduce it by: > >> jas@latte:~/src/emacs-unicode/src$ ./emacs -q ../lisp/gnus/rfc2104.el > >> then do M-x eval-buffer RET and try to evaluate some of the test >> vectors, the first one should give: > >> (rfc2104-hash 'md5 64 16 "Jefe" "what do ya want for nothing?") >> => "750c783e6ab0b503eaa86e310a5db738" > >> With emacs-unicode I get "f898573306b1366f6edd841a9f5b2871". > > Thank you for testing. I've just installed a fix for > rfc2104.el. I'd like to ask you to try it again. rfc2104.el now works, thanks. But does the fix really have to explicitly mention charsets like iso-latin-1? Is there no way to handle binary octet strings in emacs-unicode? Preferably in a portable way, that works on old Emacs versions and on XEmacs. > This is a typical problem of emacs-unicode in which > characters 128..255 are valid Unicode characters, thus, for > instance, (concat '(?a ?\300)) returns a multibyte string of > `a' and `À'. But in the current Emacs, it returns a unibyte > string. > > I suspect the similar fix is necessary in several other > places. Having a way to deal with data that is a pure single byte, without involving coding systems, seems like a rather important thing to me. >> Is anyone using the emacs-unicode branch with Gnus? > > At least, I'm not a Gnus user. I'd like to ask people to > use emacs-unicode in various ways to find bugs. What I can > test is limited, but, usually, I can fix them quite easily > like this case. It started now, but when I enter a summary buffer it crashed: Program received signal SIGSEGV, Segmentation fault. 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 1591 char_ranges[n_char_ranges++] = c; (gdb) bt #0 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 #1 0x090ed860 in ?? () #2 0x081a30d0 in Fskip_chars_forward (string=1, lim=1) at syntax.c:1344 #3 0x081b1a43 in Fbyte_code (bytestr=6, vector=160, maxdepth=152054512) at bytecode.c:1418 #4 0x08185689 in funcall_lambda (fun=1223225480, nargs=1, arg_vector=0xbfffcf44) at eval.c:2911 #5 0x0818514d in Ffuncall (nargs=2, args=0xbfffcf40) at eval.c:2781 #6 0x081b0ff5 in Fbyte_code (bytestr=406381860, vector=1, maxdepth=-1073754304) at bytecode.c:710 #7 0x08185689 in funcall_lambda (fun=1213250456, nargs=2, arg_vector=0xbfffd084) at eval.c:2911 #8 0x0818514d in Ffuncall (nargs=3, args=0xbfffd080) at eval.c:2781 #9 0x081b0ff5 in Fbyte_code (bytestr=408546780, vector=2, maxdepth=-1073753984) at bytecode.c:710 #10 0x08185689 in funcall_lambda (fun=1222504096, nargs=0, arg_vector=0xbfffd1b4) at eval.c:2911 #11 0x0818514d in Ffuncall (nargs=1, args=0xbfffd1b0) at eval.c:2781 #12 0x081b0ff5 in Fbyte_code (bytestr=416820644, vector=0, maxdepth=-1073753680) at bytecode.c:710 #13 0x08185689 in funcall_lambda (fun=1222459392, nargs=0, arg_vector=0xbfffd2d4) at eval.c:2911 #14 0x0818514d in Ffuncall (nargs=1, args=0xbfffd2d0) at eval.c:2781 ---Type <return> to continue, or q <return> to quit--- #15 0x081b0ff5 in Fbyte_code (bytestr=410610228, vector=0, maxdepth=-1073753392) at bytecode.c:710 #16 0x08185689 in funcall_lambda (fun=1222459176, nargs=2, arg_vector=0xbfffd3f4) at eval.c:2911 #17 0x0818514d in Ffuncall (nargs=3, args=0xbfffd3f0) at eval.c:2781 #18 0x081b0ff5 in Fbyte_code (bytestr=416766892, vector=2, maxdepth=-1073753104) at bytecode.c:710 #19 0x08185689 in funcall_lambda (fun=1222077040, nargs=2, arg_vector=0xbfffd514) at eval.c:2911 #20 0x0818514d in Ffuncall (nargs=3, args=0xbfffd510) at eval.c:2781 #21 0x081b0ff5 in Fbyte_code (bytestr=416766916, vector=2, maxdepth=-1073752816) at bytecode.c:710 #22 0x08185689 in funcall_lambda (fun=1222110576, nargs=1, arg_vector=0xbfffd634) at eval.c:2911 #23 0x0818514d in Ffuncall (nargs=2, args=0xbfffd630) at eval.c:2781 #24 0x081b0ff5 in Fbyte_code (bytestr=416640468, vector=1, maxdepth=-1073752528) at bytecode.c:710 #25 0x08185689 in funcall_lambda (fun=1221949600, nargs=6, arg_vector=0xbfffd764) at eval.c:2911 #26 0x0818514d in Ffuncall (nargs=7, args=0xbfffd760) at eval.c:2781 #27 0x081b0ff5 in Fbyte_code (bytestr=408688788, vector=6, maxdepth=-1073752224) at bytecode.c:710 #28 0x08185689 in funcall_lambda (fun=1221947744, nargs=7, ---Type <return> to continue, or q <return> to quit--- arg_vector=0xbfffd894) at eval.c:2911 #29 0x0818514d in Ffuncall (nargs=8, args=0xbfffd890) at eval.c:2781 #30 0x081b0ff5 in Fbyte_code (bytestr=408688788, vector=7, maxdepth=-1073751920) at bytecode.c:710 #31 0x08185689 in funcall_lambda (fun=1214659912, nargs=3, arg_vector=0xbfffd9c4) at eval.c:2911 #32 0x0818514d in Ffuncall (nargs=4, args=0xbfffd9c0) at eval.c:2781 #33 0x081b0ff5 in Fbyte_code (bytestr=406477324, vector=3, maxdepth=-1073751616) at bytecode.c:710 #34 0x08185689 in funcall_lambda (fun=1223292464, nargs=1, arg_vector=0xbfffdb24) at eval.c:2911 #35 0x0818514d in Ffuncall (nargs=2, args=0xbfffdb20) at eval.c:2781 #36 0x08180cce in Fcall_interactively (function=407759756, record_flag=406023676, keys=1211380872) at callint.c:850 #37 0x0812e9db in Fcommand_execute (cmd=407759756, record_flag=406023676, keys=1, special=406023676) at keyboard.c:9725 #38 0x08123462 in command_loop_1 () at keyboard.c:1756 #39 0x0818345e in internal_condition_case (bfun=0x8123100 <command_loop_1>, handlers=406111316, hfun=0x8122c40 <cmd_error>) at eval.c:1333 #40 0x08122f9e in command_loop_2 () at keyboard.c:1292 #41 0x08182fbb in internal_catch (tag=1, func=0x8122f70 <command_loop_2>, arg=406023676) at eval.c:1094 #42 0x08122f3e in command_loop () at keyboard.c:1271 ---Type <return> to continue, or q <return> to quit--- #43 0x081229d4 in recursive_edit_1 () at keyboard.c:987 #44 0x08122b01 in Frecursive_edit () at keyboard.c:1043 #45 0x081211e0 in main (argc=3, argv=0xbfffe374) at emacs.c:1673 (gdb) l 1673 Frecursive_edit (); 1674 /* NOTREACHED */ 1675 return 0; 1676 } 1677 ^L 1678 /* Sort the args so we can find the most important ones 1679 at the beginning of argv. */ 1680 1681 /* First, here's a table of all the standard options. */ 1682 (gdb) up #1 0x090ed860 in ?? () (gdb) up #2 0x081a30d0 in Fskip_chars_forward (string=1, lim=1) at syntax.c:1344 1344 return skip_chars (1, string, lim); (gdb) p string $1 = 1 (gdb) p lim $2 = 1 (gdb) up #3 0x081b1a43 in Fbyte_code (bytestr=6, vector=160, maxdepth=152054512) at bytecode.c:1418 1418 TOP = Fskip_chars_forward (TOP, v1); (gdb) up #4 0x08185689 in funcall_lambda (fun=1223225480, nargs=1, arg_vector=0xbfffcf44) at eval.c:2911 2911 val = Fbyte_code (AREF (fun, COMPILED_BYTECODE), (gdb) up #5 0x0818514d in Ffuncall (nargs=2, args=0xbfffcf40) at eval.c:2781 2781 val = funcall_lambda (fun, numargs, args + 1); (gdb) up #6 0x081b0ff5 in Fbyte_code (bytestr=406381860, vector=1, maxdepth=-1073754304) at bytecode.c:710 710 TOP = Ffuncall (op + 1, &TOP); (gdb) q A debugging session is active. Do you still want to close the debugger?(y or n) y jas@latte:~/src/emacs-unicode/src$ ^ permalink raw reply [flat|nested] 58+ messages in thread
* eight-bit char handling in emacs-unicode 2003-11-13 16:34 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-14 0:47 ` Kenichi Handa 2003-11-14 13:25 ` Oliver Scholz ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Kenichi Handa @ 2003-11-14 0:47 UTC (permalink / raw) Cc: emacs-devel In article <ilun0b08by1.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > rfc2104.el now works, thanks. But does the fix really have to > explicitly mention charsets like iso-latin-1? Is there no way to > handle binary octet strings in emacs-unicode? Preferably in a > portable way, that works on old Emacs versions and on XEmacs. >> This is a typical problem of emacs-unicode in which >> characters 128..255 are valid Unicode characters, thus, for >> instance, (concat '(?a ?\300)) returns a multibyte string of >> `a' and `À'. But in the current Emacs, it returns a unibyte >> string. >> >> I suspect the similar fix is necessary in several other >> places. > Having a way to deal with data that is a pure single byte, without > involving coding systems, seems like a rather important thing to me. I agree with you. Currently, I can think of these methods: (1) Perhaps the easiest way. Check `default-enable-multibyte-characters' or a newly instroduced variable `byte-as-byte' to decide whether a integer 128..255 must be treated as a Latin-1 char or a byte. So, (concat '(?a ?\300)) => "aÀ" (multibyte string) (let ((byte-as-byte t)) (concat '(?a ?\300))) => "a\300" (unibyte string) (2) Introduce a new function `eight-bit-char'. It converts an argument to ascii or eight-bit-char. (eight-bit-char ?a) => 94 (eight-bit-char ?\300) => 4194240 Then, (concat '(?a (eight-bit-char ?\300))) => "a\300" (3) Make a series of new functions (I think it's not good) concat vs concat-unibyte string vs string-unibyte aset vs aset-unibyte (4) Most drastic way (the cleanest but requires lots of work) The basic problem is that we don't distinguish a character (code) and a number. So, we introduce a character object (like XEmacs). The function `character' converts a character code into the corresponding character object. The lisp reader always generate a character object for ?a, ?\300, etc. So: (concat '(?a ?\300)) => "aÀ" (concat '(?a #o300)) => "a\300" (concat '(?a (character #o300))) => "aÀ" (concat '(?a #o300 (character #o300))) => "a\300À" Note: (character X) == (decode-char 'ucs X) > It started now, but when I enter a summary buffer it crashed: > Program received signal SIGSEGV, Segmentation fault. > 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 > 1591 char_ranges[n_char_ranges++] = c; > (gdb) bt > #0 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 I just tried gnus but I couldn't reproduce it. So, I need more help. Could you show me the results of the following? (gdb) p n_char_ranges (gbd) p c (gdb) p string (gdb) xstring (gdb) p *$ --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-14 0:47 ` eight-bit char handling in emacs-unicode Kenichi Handa @ 2003-11-14 13:25 ` Oliver Scholz 2003-11-15 1:09 ` Kenichi Handa 2003-11-15 3:04 ` Simon Josefsson 2003-11-17 21:17 ` Stefan Monnier 2 siblings, 1 reply; 58+ messages in thread From: Oliver Scholz @ 2003-11-14 13:25 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <ilun0b08by1.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: [...] >> Program received signal SIGSEGV, Segmentation fault. >> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 >> 1591 char_ranges[n_char_ranges++] = c; >> (gdb) bt >> #0 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 > > I just tried gnus but I couldn't reproduce it. So, I need > more help. I get this error, too, not when I enter a summary buffer, but when I hit RET in the summary buffer to display an article. I tracked this through the code. It takes place in `mail-extract-address-components'. I found a way to reproduce this without Gnus: r -q --eval '(progn (load "mail-extr") (mail-extr-skip-whitespace-forward))' I can't reproduce it, if I evaluate the body of `mail-extr-skip-whitespace-forward', though. Weird. Could this have something to do with the Latin-1 no-break space? Oliver -- Oliver Scholz 24 Brumaire an 212 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-14 13:25 ` Oliver Scholz @ 2003-11-15 1:09 ` Kenichi Handa 2003-11-15 10:26 ` Oliver Scholz 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-15 1:09 UTC (permalink / raw) Cc: jas, emacs-devel In article <87n0aznl06.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <epameinondas@gmx.de> writes: > I get this error, too, not when I enter a summary buffer, but when I > hit RET in the summary buffer to display an article. I tracked this > through the code. It takes place in > `mail-extract-address-components'. I found a way to reproduce this > without Gnus: > r -q --eval '(progn (load "mail-extr") (mail-extr-skip-whitespace-forward))' Thank you. Now I see what was wrong. I've just installed a fix. > I can't reproduce it, if I evaluate the body of > `mail-extr-skip-whitespace-forward', though. Weird. Could this have > something to do with the Latin-1 no-break space? I think so. The bug occurs when we do (skip-chars-forward _MULTIBYTE_STRING_) in an ASCII-only buffer, which I think I've never tried. :-( --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-15 1:09 ` Kenichi Handa @ 2003-11-15 10:26 ` Oliver Scholz 2003-11-15 21:47 ` Simon Josefsson 0 siblings, 1 reply; 58+ messages in thread From: Oliver Scholz @ 2003-11-15 10:26 UTC (permalink / raw) Cc: jas, emacs-devel Kenichi Handa <handa@m17n.org> writes: [...] > Thank you. Now I see what was wrong. I've just installed a > fix. [...] Thanks. It works now. I guess I may flatter myself now to be the first one who has sent a message with Gnus on Emacs 22 out to the public. That's something I will tell to my grandchildren some day. :-) I'll continue testing emacs-unicode. Oliver -- Oliver Scholz 25 Brumaire an 212 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-15 10:26 ` Oliver Scholz @ 2003-11-15 21:47 ` Simon Josefsson 0 siblings, 0 replies; 58+ messages in thread From: Simon Josefsson @ 2003-11-15 21:47 UTC (permalink / raw) Cc: emacs-devel, Kenichi Handa Oliver Scholz <epameinondas@gmx.de> writes: > Kenichi Handa <handa@m17n.org> writes: > > [...] >> Thank you. Now I see what was wrong. I've just installed a >> fix. > [...] > > Thanks. It works now. I guess I may flatter myself now to be the first > one who has sent a message with Gnus on Emacs 22 out to the public. And here is the second one. :-) (Assuming this is sent OK...) I can't reproduce the crash I got earlier, perhaps it is fixed. I noticed that M-SPC within *Message* buffers activate the region, but do not highlight the selected area. It works on other buffers though. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-14 0:47 ` eight-bit char handling in emacs-unicode Kenichi Handa 2003-11-14 13:25 ` Oliver Scholz @ 2003-11-15 3:04 ` Simon Josefsson 2003-11-16 15:03 ` Alex Schroeder 2003-11-17 21:17 ` Stefan Monnier 2 siblings, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-15 3:04 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <ilun0b08by1.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> rfc2104.el now works, thanks. But does the fix really have to >> explicitly mention charsets like iso-latin-1? Is there no way to >> handle binary octet strings in emacs-unicode? Preferably in a >> portable way, that works on old Emacs versions and on XEmacs. > >>> This is a typical problem of emacs-unicode in which >>> characters 128..255 are valid Unicode characters, thus, for >>> instance, (concat '(?a ?\300)) returns a multibyte string of >>> `a' and `À'. But in the current Emacs, it returns a unibyte >>> string. >>> >>> I suspect the similar fix is necessary in several other >>> places. > >> Having a way to deal with data that is a pure single byte, without >> involving coding systems, seems like a rather important thing to me. > > I agree with you. Currently, I can think of these methods: Can you think of one that would work on Emacs 21? Having a stable idiom to use to deal with octets would be useful, forcing third-party packages to try several methods can easily lead to unreadable code. > (1) Perhaps the easiest way. > > Check `default-enable-multibyte-characters' or a newly > instroduced variable `byte-as-byte' to decide whether a > integer 128..255 must be treated as a Latin-1 char or a > byte. So, > (concat '(?a ?\300)) => "aÀ" (multibyte string) > (let ((byte-as-byte t)) > (concat '(?a ?\300))) => "a\300" (unibyte string) > > (2) Introduce a new function `eight-bit-char'. > > It converts an argument to ascii or eight-bit-char. > (eight-bit-char ?a) => 94 > (eight-bit-char ?\300) => 4194240 > Then, > (concat '(?a (eight-bit-char ?\300))) => "a\300" Both would work for me, although superficially both look like quick hacks to me. > (3) Make a series of new functions (I think it's not good) > > concat vs concat-unibyte > string vs string-unibyte > aset vs aset-unibyte I agree it isn't good. > (4) Most drastic way (the cleanest but requires lots of work) > > The basic problem is that we don't distinguish a character > (code) and a number. So, we introduce a character object > (like XEmacs). The function `character' converts a > character code into the corresponding character object. The > lisp reader always generate a character object for ?a, > ?\300, etc. So: > (concat '(?a ?\300)) => "aÀ" > (concat '(?a #o300)) => "a\300" > (concat '(?a (character #o300))) => "aÀ" > (concat '(?a #o300 (character #o300))) => "a\300À" > > Note: (character X) == (decode-char 'ucs X) This would be nice. Characters aren't numbers (unless within the internal representation, but the internal representation should be hidden), so separating the two types is useful. So to be consistent with that, I think your `character' function should be called `ucs-character' or similar. >> It started now, but when I enter a summary buffer it crashed: > >> Program received signal SIGSEGV, Segmentation fault. >> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 >> 1591 char_ranges[n_char_ranges++] = c; >> (gdb) bt >> #0 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591 > > I just tried gnus but I couldn't reproduce it. So, I need > more help. Could you show me the results of the following? > > (gdb) p n_char_ranges > (gbd) p c > (gdb) p string > (gdb) xstring > (gdb) p *$ I'll try to get time to try emacs-unicode-2 more, but no promises. Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-15 3:04 ` Simon Josefsson @ 2003-11-16 15:03 ` Alex Schroeder 0 siblings, 0 replies; 58+ messages in thread From: Alex Schroeder @ 2003-11-16 15:03 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: > Characters aren't numbers (unless within the internal > representation, but the internal representation should be hidden), As far as I understand the Emacs design philosophy, we don't believe that internal representation should be hidden. If it is not hidden, we can easily write code to modify it without having to recompile Emacs. But that's just an aside. :) Alex. -- http://www.emacswiki.org/alex/ There is no substitute for experience. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-14 0:47 ` eight-bit char handling in emacs-unicode Kenichi Handa 2003-11-14 13:25 ` Oliver Scholz 2003-11-15 3:04 ` Simon Josefsson @ 2003-11-17 21:17 ` Stefan Monnier 2003-11-18 7:33 ` Kenichi Handa 2 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2003-11-17 21:17 UTC (permalink / raw) Cc: emacs-devel, jas > The basic problem is that we don't distinguish a character > (code) and a number. So, we introduce a character object That's one way to look at the problem. Another is to say that the problem is instead that we do not distinguish between arrays of chars and arrays of bytes. We just use strings and buffers and expect to be able to mix bytes and chars in them. Such mixes are admittedly very rare for strings, but they're pretty common for buffers. So when we write 192 at a location, we don't know whether we should put there the byte 192 or the eight-bit-char character that will be encoded into a 192 byte. In Emacs-21 we worked around the problem by arranging for "the eight-bit-char that encodes to 192" to be represented by the integer 192, so as to avoid having to choose. But with unicode, the 128-255 zone cannot be dedicated to eight-bit-char since it's already used up for latin-1, so we have to face the problem more directly. The places where Emacs-21 still had to choose, we just used heursitics, so `concat' will sometimes return a unibyte string, and sometimes multibyte string. So I think your options 1-3 are better than 4. BTW, your function `eight-bit-char' should be named `byte-to-char' instead. Which of 1 to 3 is the best is not clear, and maybe we can just live with `make-string-unibyte' and `make-string-multibyte'. Note that 1-3 are not mutually exclusive so we can use them all. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-17 21:17 ` Stefan Monnier @ 2003-11-18 7:33 ` Kenichi Handa 2003-11-18 17:12 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-18 7:33 UTC (permalink / raw) Cc: emacs-devel, jas In article <jwvhe12emr3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> The basic problem is that we don't distinguish a character >> (code) and a number. So, we introduce a character object > That's one way to look at the problem. > Another is to say that the problem is instead that we do not distinguish > between arrays of chars and arrays of bytes. I agree that it's possible to grasp the problem in that way, but I'm not sure which is the better way. Could you explain WHY yours is better? [...] > In Emacs-21 we worked around the problem by arranging for "the > eight-bit-char that encodes to 192" to be represented by the integer 192, so > as to avoid having to choose. But with unicode, the 128-255 zone cannot be > dedicated to eight-bit-char since it's already used up for latin-1, so we > have to face the problem more directly. > The places where Emacs-21 still had to choose, we just used heursitics, > so `concat' will sometimes return a unibyte string, and sometimes > multibyte string. > So I think your options 1-3 are better than 4. BTW, your function > `eight-bit-char' should be named `byte-to-char' instead. > Which of 1 to 3 is the best is not clear, and maybe we can just live with > `make-string-unibyte' and `make-string-multibyte'. I think you mean string-make-unibyte/multibyte, but, for the current problem, we can't use it because string-make-unibyte may behave differently in different language environment. Such a lang. env. that makes iso-8859-1 or Unicode the highest priority for the character `À' is ok. (string-make-unibyte (concat '(?a 192))) = "a\300" But, if some lang. env. prefers such a charset for `À' that encodes it not to 192 (e.g. Vietnamese VSCII), we fail. > Note that 1-3 are not mutually exclusive so we can use > them all. Yes, but, at least, I really want to avoid "(3) Make a series of new functions". --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-18 7:33 ` Kenichi Handa @ 2003-11-18 17:12 ` Stefan Monnier 2003-11-19 0:06 ` Kenichi Handa 0 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2003-11-18 17:12 UTC (permalink / raw) Cc: emacs-devel, jas >>> The basic problem is that we don't distinguish a character >>> (code) and a number. So, we introduce a character object >> That's one way to look at the problem. >> Another is to say that the problem is instead that we do not distinguish >> between arrays of chars and arrays of bytes. > I agree that it's possible to grasp the problem in that way, > but I'm not sure which is the better way. Could you explain > WHY yours is better? I'm not sure whether it's better or worse. The problem I have with the introduction of a new type for chars is that it is a change that has far reaching consequences and I'm not sure it would solve all our problems since many of the problems have to do with bad elisp code. >> Which of 1 to 3 is the best is not clear, and maybe we can just live with >> `make-string-unibyte' and `make-string-multibyte'. > I think you mean string-make-unibyte/multibyte, but, for the > current problem, we can't use it because string-make-unibyte > may behave differently in different language environment. > Such a lang. env. that makes iso-8859-1 or Unicode the > highest priority for the character `À' is ok. > (string-make-unibyte (concat '(?a 192))) = "a\300" > But, if some lang. env. prefers such a charset for `À' that > encodes it not to 192 (e.g. Vietnamese VSCII), we fail. No. My `make-string-unibyte' should only work to convert "bytes in multibyte string" to "bytes in unibyte string": there's no char, thus no coding-system. If the multibyte string argument contains a char that's not an eight-bit-char, then it's an error. To do what your string-make-unibyte does you should use `encode-coding-string' where the coding system is passed explicitly. I've changed my Emacs so that string-make-unibyte does the above (i.e. signals an error if it encounters a non-byte char) and it works fairly well, except for the few places where the elisp code is sloppy and needs to be fixed. >> Note that 1-3 are not mutually exclusive so we can use >> them all. > Yes, but, at least, I really want to avoid "(3) Make a > series of new functions". (defun concat-unibyte (&rest x) (make-string-unibyte (apply 'concat x))) ... so we don't need this series of new functions, but if some of them are used often enough, we can add them of course. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-18 17:12 ` Stefan Monnier @ 2003-11-19 0:06 ` Kenichi Handa 2003-11-19 3:05 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-19 0:06 UTC (permalink / raw) Cc: emacs-devel, jas In article <jwvn0atd38w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: > I'm not sure whether it's better or worse. The problem I have with the > introduction of a new type for chars is that it is a change that has far > reaching consequences and I'm not sure it would solve all our problems > since many of the problems have to do with bad elisp code. I see. Apart from the design itself, I agree that it's difficult to introduce a new type. But, when I discussed with Richard about the Character type object a few year ago, he was not that negative provided that it gives sure improvement. >>> Which of 1 to 3 is the best is not clear, and maybe we can just live with >>> `make-string-unibyte' and `make-string-multibyte'. >> I think you mean string-make-unibyte/multibyte, but, for the > No. My `make-string-unibyte' should only work to convert "bytes in > multibyte string" to "bytes in unibyte string": there's no char, thus no > coding-system. I see. In emacs-unicode, I already introduced string-to-multibyte which, I think, is the same as your make-string-multibyte. But, > If the multibyte string argument contains a char that's > not an eight-bit-char, then it's an error. Then, we can't use make-string-unibyte for the current case because, in emacs-unicode, (concat '(?a 192)) returns a multibyte string whose second element is A-grave, not an eight-bit-char. Am I missing something? > To do what your string-make-unibyte does you should use > `encode-coding-string' where the coding system is passed explicitly. Those are conceptually different things (I remember the similar discussion we had a while ago). encode-coding-string does: char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence --CES--> encoded-byte-sequence string-make-unibyte does: char-sequence --CCS--> code-point-sequence --concat--> code-point-sequence These two yield the same result only when CCS support all chars in "char-sequence" and CES is stateless (e.g. iso-latin-1) and . > I've changed my Emacs so that string-make-unibyte does the above > (i.e. signals an error if it encounters a non-byte char) and it works fairly > well, except for the few places where the elisp code is sloppy and needs to > be fixed. How did you change it? string-make-unibyte internally uses the function copy_text. Did you change it? But, then, each time you copy a multibyte string into a unibyte buffer, you should get an error. >>> Note that 1-3 are not mutually exclusive so we can use >>> them all. >> Yes, but, at least, I really want to avoid "(3) Make a >> series of new functions". > (defun concat-unibyte (&rest x) > (make-string-unibyte (apply 'concat x))) > ... As I wrote above, this should signal an error on: (concat-unibyte '(?a 192)) > so we don't need this series of new functions, but if some of them are used > often enough, we can add them of course. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-19 0:06 ` Kenichi Handa @ 2003-11-19 3:05 ` Stefan Monnier 2003-11-19 10:46 ` Juri Linkov 2003-11-21 0:41 ` Kenichi Handa 0 siblings, 2 replies; 58+ messages in thread From: Stefan Monnier @ 2003-11-19 3:05 UTC (permalink / raw) Cc: emacs-devel, jas > I see. Apart from the design itself, I agree that it's difficult to > introduce a new type. But, when I discussed with Richard about the > Character type object a few year ago, he was not that negative provided > that it gives sure improvement. Sounds about right to me: we have one free tag that we could use for chars (and that I currently use to boost the max buffer size from 256MB to 512MB in my local code). But it needs to pay for itself. > Then, we can't use make-string-unibyte for the current case > because, in emacs-unicode, (concat '(?a 192)) returns a > multibyte string whose second element is A-grave, not an > eight-bit-char. Am I missing something? Well, obviously we need to make it accept this case (i.e. accept both the latin-1 192 and the eight-bit-char 192). I'm sure there'll be other issues. I haven't had much time to think about it and you're obviously better placed to foresee potential problems. >> To do what your string-make-unibyte does you should use >> `encode-coding-string' where the coding system is passed explicitly. > Those are conceptually different things (I remember the > similar discussion we had a while ago). > encode-coding-string does: > char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence > --CES--> encoded-byte-sequence > string-make-unibyte does: > char-sequence --CCS--> code-point-sequence > --concat--> code-point-sequence > These two yield the same result only when CCS support all > chars in "char-sequence" and CES is stateless > (e.g. iso-latin-1) and . You lost me here (I'm a poor soul whose doesn't know much outside of the latin-1 world). I thought that string-make-unibyte only behaves meaningfully for "normal 8bit coding-systems" such as latin-1. >> I've changed my Emacs so that string-make-unibyte does the above >> (i.e. signals an error if it encounters a non-byte char) and it works fairly >> well, except for the few places where the elisp code is sloppy and needs to >> be fixed. > How did you change it? string-make-unibyte internally uses > the function copy_text. Did you change it? But, then, each > time you copy a multibyte string into a unibyte buffer, you > should get an error. Of course: it's an error. A unibyte buffer cannot represent multibyte chars, so you need to encode them first (into a unibyte string). Now to tell you the truth, my change had to accept a few (not so) special cases and it took a bit of fiddling to make the code lenient enough to accept elisp code I didn't feel like "fixing". I can't remember the details off-hand, but I remember having problems with regexp matching functions where multibyte regexps are used in unibyte buffers. -- Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-19 3:05 ` Stefan Monnier @ 2003-11-19 10:46 ` Juri Linkov 2003-11-19 13:48 ` Stefan Monnier 2003-11-20 23:41 ` Kenichi Handa 2003-11-21 0:41 ` Kenichi Handa 1 sibling, 2 replies; 58+ messages in thread From: Juri Linkov @ 2003-11-19 10:46 UTC (permalink / raw) Cc: emacs-devel Stefan Monnier <monnier@IRO.UMontreal.CA> writes: > Now to tell you the truth, my change had to accept a few (not so) special > cases and it took a bit of fiddling to make the code lenient enough to > accept elisp code I didn't feel like "fixing". I can't remember the details > off-hand, but I remember having problems with regexp matching functions > where multibyte regexps are used in unibyte buffers. Do you mean unibyte regexps in multibyte buffers? For example, currently gnus/message.el has a wrong regexp than prevents the Gnus from using in some language environments. To repeat this bug, you can eval the following: (progn (set-language-environment 'ukrainian) (re-search-forward "[\000-\007\013\015-\032\034-\037\200-\237]" nil t)) It fails with the (invalid-regexp "Invalid range end"). Could you suggest how to fix this bug? -- http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-19 10:46 ` Juri Linkov @ 2003-11-19 13:48 ` Stefan Monnier 2003-11-20 23:41 ` Kenichi Handa 1 sibling, 0 replies; 58+ messages in thread From: Stefan Monnier @ 2003-11-19 13:48 UTC (permalink / raw) Cc: emacs-devel >> Now to tell you the truth, my change had to accept a few (not so) special >> cases and it took a bit of fiddling to make the code lenient enough to >> accept elisp code I didn't feel like "fixing". I can't remember the details >> off-hand, but I remember having problems with regexp matching functions >> where multibyte regexps are used in unibyte buffers. > Do you mean unibyte regexps in multibyte buffers? For example, No: multibyte is a superset of unibyte, so there's no problem searching for unibyte elements in a multibyte sequence. > currently gnus/message.el has a wrong regexp than prevents the Gnus > from using in some language environments. To repeat this bug, > you can eval the following: > (progn > (set-language-environment 'ukrainian) > (re-search-forward "[\000-\007\013\015-\032\034-\037\200-\237]" nil t)) In my Emacs this doesn't fail because the unibyte string is turned into multibyte without looking at the coding-system (i.e. it will only match ASCII and chars from eight-bit-control or eight-bit-graphic: probably not what the author's intended). Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-19 10:46 ` Juri Linkov 2003-11-19 13:48 ` Stefan Monnier @ 2003-11-20 23:41 ` Kenichi Handa 1 sibling, 0 replies; 58+ messages in thread From: Kenichi Handa @ 2003-11-20 23:41 UTC (permalink / raw) Cc: monnier, emacs-devel In article <87ptfovdnj.fsf@mail.jurta.org>, Juri Linkov <juri@jurta.org> writes: > (progn > (set-language-environment 'ukrainian) > (re-search-forward "[\000-\007\013\015-\032\034-\037\200-\237]" nil t)) > It fails with the (invalid-regexp "Invalid range end"). > Could you suggest how to fix this bug? The current Emacs simply makes the unibyte regex string to multibyte, and in Uktranian, as nonascii-translation-table converts ?\200 to 299040, but ?\237 to 2295, the above regexp leads to "Invalid range end". This behaviour itself is a bug. We must treat \200-\237 as the same way as \200\201...\236\237 (emacs-unicode already does that). But fixing that bug doesn't solve the Gnus problem because the intention of the part "\200-\237" is apparently to match with C1 control chars, not to match with the multibyte equivalence in the current language environment. So changing the above as below is correct. (re-search-forward (string-as-multibyte "[\000-\007\013\015-\032\034-\037\200-\237]" nil t)) --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-19 3:05 ` Stefan Monnier 2003-11-19 10:46 ` Juri Linkov @ 2003-11-21 0:41 ` Kenichi Handa 2003-11-21 5:27 ` Stefan Monnier 1 sibling, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-21 0:41 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvptfp139w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> I see. Apart from the design itself, I agree that it's difficult to >> introduce a new type. But, when I discussed with Richard about the >> Character type object a few year ago, he was not that negative provided >> that it gives sure improvement. > Sounds about right to me: we have one free tag that we could use for chars Yes, and as that is the last free tag, I still hesitate to consume it for the Character object. >> Then, we can't use make-string-unibyte for the current case >> because, in emacs-unicode, (concat '(?a 192)) returns a >> multibyte string whose second element is A-grave, not an >> eight-bit-char. Am I missing something? > Well, obviously we need to make it accept this case (i.e. accept both the > latin-1 192 and the eight-bit-char 192). Then, I see your intention. But, isn't the semantics of such a function very weird? >>> To do what your string-make-unibyte does you should use >>> `encode-coding-string' where the coding system is passed explicitly. >> Those are conceptually different things (I remember the >> similar discussion we had a while ago). >> encode-coding-string does: >> char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence >> --CES--> encoded-byte-sequence >> string-make-unibyte does: >> char-sequence --CCS--> code-point-sequence >> --concat--> code-point-sequence >> These two yield the same result only when CCS support all >> chars in "char-sequence" and CES is stateless >> (e.g. iso-latin-1) and . > You lost me here (I'm a poor soul whose doesn't know much outside of the > latin-1 world). CCS: Coded Character Set CES: Character Encoding Scheme coding-system of Emacs: Set of CCSs and CES. iso-latin-1: CCSs are ascii and latin-iso8859-1, CES is 8-bit version of ISO-2022 iso-2022-jp: CCSs are ascii, japanese-jisx0208, ... CES is 7-bit version of ISO-2022 > I thought that string-make-unibyte only behaves meaningfully for > "normal 8bit coding-systems" such as latin-1. Yes, but it doesn't mean it is conceptually the same as encode-coding-string. The result of string-make-unibyte should still be regarded as a sequence of character, but the result of encode-coding-string is a sequence of byte. Here exists an ambiguity of a unibyte string. The number 192 can be regarded as: (1) just a number, a byte (2) a code point of some character set. (3) a character code A unibyte string can contain (1) and (2) without distinguishing them, but a multibyte string can contain (1) and (3) while distinguishing them. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-21 0:41 ` Kenichi Handa @ 2003-11-21 5:27 ` Stefan Monnier 2003-11-21 6:27 ` Kenichi Handa 0 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2003-11-21 5:27 UTC (permalink / raw) Cc: jas, emacs-devel >> I thought that string-make-unibyte only behaves meaningfully for >> "normal 8bit coding-systems" such as latin-1. > Yes, but it doesn't mean it is conceptually the same as > encode-coding-string. The result of string-make-unibyte > should still be regarded as a sequence of character, but the > result of encode-coding-string is a sequence of byte. Why/when is the distinction meaningful (given the fact that it can only be used meaningfully with 8bit coding-systems where the distinction seems more philosophical than anything else) ? > Here exists an ambiguity of a unibyte string. > The number 192 can be regarded as: > (1) just a number, a byte > (2) a code point of some character set. > (3) a character code But the second case is only possible for 8bit character sets, right? Until now, I always thought that Emacs only dealt with - byte streams representing encoded sequences of code points: case 1. - sequences of internal character codes (internally encoded in emacs-mule or unicode depending on the branch you use): case 3. Is there any place where we deal with sequences of code points of external charsets really (other than in the degenerate case where such a sequence is indistinguishable from case 1, maybe). > A unibyte string can contain (1) and (2) without > distinguishing them, but a multibyte string can contain (1) > and (3) while distinguishing them. Can multibyte strings distinguish the cases (1) and (3) for integer 97 and character `a' ? Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-21 5:27 ` Stefan Monnier @ 2003-11-21 6:27 ` Kenichi Handa 2003-11-21 14:59 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-21 6:27 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvzneqwbo3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> Yes, but it doesn't mean it is conceptually the same as >> encode-coding-string. The result of string-make-unibyte >> should still be regarded as a sequence of character, but the >> result of encode-coding-string is a sequence of byte. > Why/when is the distinction meaningful (given the fact that it > can only be used meaningfully with 8bit coding-systems where the > distinction seems more philosophical than anything else) ? It is perfectly possible to live in such an environment where only the charset iso-8859-1 is used but only the coding system utf-8 is used. In this environment, the results of encode-coding-string and string-make-unibyte are of course not the same, but still both operations are meaningful. >> Here exists an ambiguity of a unibyte string. >> The number 192 can be regarded as: >> (1) just a number, a byte >> (2) a code point of some character set. >> (3) a character code > But the second case is only possible for 8bit character sets, right? Yes. But, as I wrote above, it doesn't mean that we are restricted to simple 8bit-oriented coding-systems. > Until now, I always thought that Emacs only dealt with > - byte streams representing encoded sequences of code points: case 1. > - sequences of internal character codes (internally encoded in emacs-mule > or unicode depending on the branch you use): case 3. > Is there any place where we deal with sequences of code points of external > charsets really (other than in the degenerate case where such a sequence > is indistinguishable from case 1, maybe). I'd like to repeat that although we don't have such an environment now, it doesn't mean it is impossible to assume such environment. >> A unibyte string can contain (1) and (2) without >> distinguishing them, but a multibyte string can contain (1) >> and (3) while distinguishing them. > Can multibyte strings distinguish the cases (1) and (3) for integer 97 and > character `a' ? Good point. Of course no. I dared not mention that to make the discussion simpler. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-21 6:27 ` Kenichi Handa @ 2003-11-21 14:59 ` Stefan Monnier 2003-11-22 1:25 ` Kenichi Handa 0 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2003-11-21 14:59 UTC (permalink / raw) Cc: jas, emacs-devel >> Why/when is the distinction meaningful (given the fact that it >> can only be used meaningfully with 8bit coding-systems where the >> distinction seems more philosophical than anything else) ? > It is perfectly possible to live in such an environment > where only the charset iso-8859-1 is used but only the > coding system utf-8 is used. In this environment, the > results of encode-coding-string and string-make-unibyte are > of course not the same, but still both operations are > meaningful. I see that encode-coding-string does the utf-8 encoding, but what does string-make-unibyte do in such a case and what is it used for ? >> Until now, I always thought that Emacs only dealt with >> - byte streams representing encoded sequences of code points: case 1. >> - sequences of internal character codes (internally encoded in emacs-mule >> or unicode depending on the branch you use): case 3. >> Is there any place where we deal with sequences of code points of external >> charsets really (other than in the degenerate case where such a sequence >> is indistinguishable from case 1, maybe). > I'd like to repeat that although we don't have such an > environment now, it doesn't mean it is impossible to assume > such environment. I guess I don't understand how that is possible (and useful) and what that would look like. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-21 14:59 ` Stefan Monnier @ 2003-11-22 1:25 ` Kenichi Handa 2003-11-22 23:53 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-22 1:25 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvvfpdsrab.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> It is perfectly possible to live in such an environment >> where only the charset iso-8859-1 is used but only the >> coding system utf-8 is used. In this environment, the >> results of encode-coding-string and string-make-unibyte are >> of course not the same, but still both operations are >> meaningful. > I see that encode-coding-string does the utf-8 encoding, but what > does string-make-unibyte do in such a case and what is it used for ? It gets iso-8859-1 code-points of all characters in a multibyte string and concatenate them (the same as what is does in latin-1 lang. env.). In his environment, he has no problem in using unibyte buffer because it can represent all characters he wants. >>> Until now, I always thought that Emacs only dealt with >>> - byte streams representing encoded sequences of code points: case 1. >>> - sequences of internal character codes (internally encoded in emacs-mule >>> or unicode depending on the branch you use): case 3. >>> Is there any place where we deal with sequences of code points of external >>> charsets really (other than in the degenerate case where such a sequence >>> is indistinguishable from case 1, maybe). >> I'd like to repeat that although we don't have such an >> environment now, Ah, no, we have UTF-8 lang. env. now. >> it doesn't mean it is impossible to assume such >> environment. > I guess I don't understand how that is possible (and useful) and what that > would look like. Please try C-x C-m L utf-8 RET and see how string-make-unibyte and string-make-multibyte work. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-22 1:25 ` Kenichi Handa @ 2003-11-22 23:53 ` Stefan Monnier 2003-11-23 7:30 ` Kenichi Handa [not found] ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> 0 siblings, 2 replies; 58+ messages in thread From: Stefan Monnier @ 2003-11-22 23:53 UTC (permalink / raw) Cc: jas, emacs-devel >>> It is perfectly possible to live in such an environment >>> where only the charset iso-8859-1 is used but only the >>> coding system utf-8 is used. In this environment, the >>> results of encode-coding-string and string-make-unibyte are >>> of course not the same, but still both operations are >>> meaningful. >> I see that encode-coding-string does the utf-8 encoding, but what >> does string-make-unibyte do in such a case and what is it used for ? > It gets iso-8859-1 code-points of all characters in a > multibyte string and concatenate them (the same as what is > does in latin-1 lang. env.). You mean it does the same as (encode-coding-string str 'latin-1) ? Then why use string-make-unibyte ? > Please try C-x C-m L utf-8 RET and see how > string-make-unibyte and string-make-multibyte work. I'll try that, but I'd like to understand the motivation for making it work the way it works. I've always understood those two as "trying to DTRT" in a very ad-hoc way such that people that used to work in an 8bit non-ASCII environment don't need to worry about coding-systems and still have things working mostly correctly. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-22 23:53 ` Stefan Monnier @ 2003-11-23 7:30 ` Kenichi Handa 2003-11-23 23:48 ` Stefan Monnier [not found] ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> 1 sibling, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-23 7:30 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvoev4ufqd.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >>>> It is perfectly possible to live in such an environment >>>> where only the charset iso-8859-1 is used but only the >>>> coding system utf-8 is used. In this environment, the >>>> results of encode-coding-string and string-make-unibyte are >>>> of course not the same, but still both operations are >>>> meaningful. >>> I see that encode-coding-string does the utf-8 encoding, but what >>> does string-make-unibyte do in such a case and what is it used for ? >> It gets iso-8859-1 code-points of all characters in a >> multibyte string and concatenate them (the same as what is >> does in latin-1 lang. env.). > You mean it does the same as (encode-coding-string str 'latin-1) ? Not exactly the same when STR contains, for instance, Cyrillic characters. How to deal with unsupported characters differs in operations. Encode-coding-string may behave leniently so that the result can be decoded back correctly (perhaps by adding some escape sequence). But, string-make-unibyte should never change the number of charaters. And, > Then why use string-make-unibyte ? There's no way to know that we should use the coding-system latin-1 in this situation. All we know is that the default coding-system is utf-8, and the default character set is iso-8859-1. >> Please try C-x C-m L utf-8 RET and see how >> string-make-unibyte and string-make-multibyte work. > I'll try that, but I'd like to understand the motivation for making it work > the way it works. I've always understood those two as "trying to DTRT" in > a very ad-hoc way such that people that used to work in an 8bit non-ASCII > environment don't need to worry about coding-systems and still have > things working mostly correctly. Doing unibyte<->multibyte conversion automatically may be an ad-hoc way. The way how they work for unsupported characters may also be an ad-hoc way. But, the concept of unibyte<->multibyte convesion itself is not ad-hoc. Don't you think their meaning is very clear when you grasp them as my way? Do you see any inconsistency in my explanation about them? --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-23 7:30 ` Kenichi Handa @ 2003-11-23 23:48 ` Stefan Monnier 2003-11-25 1:07 ` Kenichi Handa 2003-11-25 4:28 ` Richard Stallman 0 siblings, 2 replies; 58+ messages in thread From: Stefan Monnier @ 2003-11-23 23:48 UTC (permalink / raw) Cc: jas, emacs-devel > But, the concept of unibyte<->multibyte convesion itself is > not ad-hoc. Don't you think their meaning is very clear > when you grasp them as my way? Do you see any inconsistency > in my explanation about them? No, as a matter of fact I don't see why in a utf-8 environment, it makes any sense to have a function that turns a multibyte string into a unibyte string encoded in latin-1 (without even complaining when it encounters other characters). It'd make sense if the environment said "latin-1 when you can, utf-8 otherwise" or something like that, but then we would use encode-coding-string anyway. Besides, if any non-latin-1 char is encountered by string-make-unibyte, then we end up with a uninyte string that has an unknown meaning because some chars might have been encoded in latin-1, and others in some other encoding. I just don't know of a concrete case where it makes sense to use string-make-unibyte. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-23 23:48 ` Stefan Monnier @ 2003-11-25 1:07 ` Kenichi Handa [not found] ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> 2003-11-25 4:28 ` Richard Stallman 1 sibling, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-25 1:07 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvr7zybqvr.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> But, the concept of unibyte<->multibyte convesion itself is >> not ad-hoc. Don't you think their meaning is very clear >> when you grasp them as my way? Do you see any inconsistency >> in my explanation about them? > No, as a matter of fact I don't see why in a utf-8 environment, > it makes any sense to have a function that turns a multibyte string > into a unibyte string encoded in latin-1 It seems that you keep of saying that "A does B, thus it's nonsense". But, I'm arguing that "A does C". It doesn't make sense because you treat the result as "a unibyte string encoded in Latin-1". It makes sense if you treat the result as "a unibyte string in which each byte represents a sequence of Unicode code-points", doesn't it? > (without even complaining when it encounters other > characters). I think it's ok (or better) that string-make-unibyte complains in such a case. > It'd make sense if the environment said "latin-1 when you can, > utf-8 otherwise" or something like that, but then we would use > encode-coding-string anyway. It's itself nonsense to have such a coding system. Do you agree with having string-make-unibyte if it signals an error on non-Latin-1 characters? > Besides, if any non-latin-1 char is encountered by string-make-unibyte, then > we end up with a uninyte string that has an unknown meaning because some > chars might have been encoded in latin-1, and others in some other encoding. > I just don't know of a concrete case where it makes sense to use > string-make-unibyte. I'll paraphrase my previous example as this: It is perfectly possible to live in such an environment where only the characters U+0000..U+00FF of Unicode is used but only the coding system utf-8 is used. But, I don't claim that the above is a realistic case. Another non-realistic but concrete case is: Use only the charset iso-8859-5 and the encoding CTEXT. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>]
* Re: eight-bit char handling in emacs-unicode [not found] ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> @ 2003-11-26 0:07 ` Kenichi Handa 2003-11-26 14:14 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-26 0:07 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> It seems that you keep of saying that "A does B, thus it's >> nonsense". But, I'm arguing that "A does C". > Well, the thing is: I still don't understand what is C. > From what I understand, you say that C is "a conversion from multibyte > to a sequence of code-points", Yes, that what I said. > but since the output is a unibyte string, > that restrict it to cases where the code-points can be encoded in 8 bits, > thus it doesn't sound very generic Yes. But I thought generic or not is not a point here. > and I don't see any application for it > (nor do I see any practical difference with using encode-coding-string > since the output AFAIK would be the same). My examples shows that we can't use encode-coding-string. How can we use encode-coding-string without knowing what coding system to use? I haven't heard your answer yet. >> It doesn't make sense because you treat the result as "a >> unibyte string encoded in Latin-1". >> It makes sense if you treat the result as "a unibyte string >> in which each byte represents a sequence of Unicode >> code-points", doesn't it? > But each byte can only represent the 0-255 subset of unicode code-points, in > which case this is equivalent (practically speaking) to latin-1, isn't it ? Yes. And that covers all characters the user uses in this case. >>> It'd make sense if the environment said "latin-1 when you can, >>> utf-8 otherwise" or something like that, but then we would use >>> encode-coding-string anyway. >> It's itself nonsense to have such a coding system. > I was not thinking of a coding-system, but just some encoding job, > such as what is done when saving a buffer (where my .emacs does exactly > that: try latin-1 first and utf-8 if that fails). Ah, I see. But, my understanding is that string-make-unibyte/multibyte are designed not to change the number of characters to make the difference of unibyte/multibyte transparent in Lisp. That restriction leads to a case that non-supported characters are handled incorrectly. But, I think Richard's design policy was that incorrect handling of non-supported characters is better than a possibly more disastrous error caused by the change of number of characters. >> Do you agree with having string-make-unibyte if it signals an error on >> non-Latin-1 characters? > Of course: that's pretty much what I suggested: make-string-unibyte only > accepts multibyte chars that correspond to "bytes". I agree with that. But, it just changes the behaviour of the function on error case. It doesn't change the concept of what it does. >>> I just don't know of a concrete case where it makes sense to use >>> string-make-unibyte. >> I'll paraphrase my previous example as this: >> It is perfectly possible to live in such an environment >> where only the characters U+0000..U+00FF of Unicode is >> used but only the coding system utf-8 is used. >> But, I don't claim that the above is a realistic case. >> Another non-realistic but concrete case is: >> Use only the charset iso-8859-5 and the encoding CTEXT. > I don't see any use of string-make-unibyte in your two examples. Again, I'd like to ask how to use encode-coding-string without knowing the proper coding-system in each case. > And "having string-make-unibyte if it signals an error on non-Latin-1 > characters" means that the second example can't be used any more. In the second case, of course "supported characters" are what included in the charset iso-8859-5, and string-make-unibyte should accept them. Again, the result is the same as encoding by the coding system iso-8859-5, but we only know about the coding system CTEXT here. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-26 0:07 ` Kenichi Handa @ 2003-11-26 14:14 ` Stefan Monnier 2003-11-27 1:34 ` Kenichi Handa 0 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2003-11-26 14:14 UTC (permalink / raw) Cc: jas, emacs-devel >> but since the output is a unibyte string, >> that restrict it to cases where the code-points can be encoded in 8 bits, >> thus it doesn't sound very generic > Yes. But I thought generic or not is not a point here. Except that if it's not generic (in the sense that it does not behave meaningfully in all language environments), then it can't be used in generic elisp code, right? >> and I don't see any application for it >> (nor do I see any practical difference with using encode-coding-string >> since the output AFAIK would be the same). > My examples shows that we can't use encode-coding-string. > How can we use encode-coding-string without knowing what > coding system to use? I haven't heard your answer yet. I can't answer this question without knowing the answer to my question: what is string-make-unibyte used for. I'm not saying that we can do something like: (defun string-make-unibyte (s) (encode-coding-string s <blabla>)) but I'm saying that everywhere where the current string-make-unibyte is used, we should be able to easily replace it by a call to encode-coding-string or a code to my make-string-unibyte (which does not pay attention to the language environment and only accepts multibyte chars that correspond to bytes, i.e. eight-bit-control or eight-bit-graphic, or ASCII, and multibyte chars whose internal code point is 128-255). > But, my understanding is that > string-make-unibyte/multibyte are designed not to change the > number of characters to make the difference of > unibyte/multibyte transparent in Lisp. That is indeed an absolute requirement. >> Of course: that's pretty much what I suggested: make-string-unibyte only >> accepts multibyte chars that correspond to "bytes". > I agree with that. But, it just changes the behaviour of > the function on error case. It doesn't change the concept > of what it does. Except that I said "byte" not "code point", which makes a difference in non-latin-1 locales. >> I don't see any use of string-make-unibyte in your two examples. > Again, I'd like to ask how to use encode-coding-string > without knowing the proper coding-system in each case. How could I know the coding-system to use when replacing `string-make-unibyte' if I don't have any actual call to string-make-unibyte to work with ? Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-26 14:14 ` Stefan Monnier @ 2003-11-27 1:34 ` Kenichi Handa 2003-11-27 14:23 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-27 1:34 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvhe0rp6ml.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >>> but since the output is a unibyte string, >>> that restrict it to cases where the code-points can be encoded in 8 bits, >>> thus it doesn't sound very generic >> Yes. But I thought generic or not is not a point here. > Except that if it's not generic (in the sense that it does not behave > meaningfully in all language environments), then it can't be used in generic > elisp code, right? Yes. But, it simply means that insertion of multibyte string in a unibyte buffer can't be generic. >> My examples shows that we can't use encode-coding-string. >> How can we use encode-coding-string without knowing what >> coding system to use? I haven't heard your answer yet. > I can't answer this question without knowing the answer to my question: > what is string-make-unibyte used for. It is used for converting a multibyte string to unibyte before it is inserted in a unibyte buffer. > I'm not saying that we can do something like: > (defun string-make-unibyte (s) (encode-coding-string s <blabla>)) ??? I have thought that you are saying that because you wrote below: > To do what your string-make-unibyte does you should use > `encode-coding-string' where the coding system is passed explicitly. Anyway, > but I'm saying that everywhere where the current string-make-unibyte is > used, we should be able to easily replace it by a call to > encode-coding-string or a code to my make-string-unibyte (which does > not pay attention to the language environment and only accepts multibyte > chars that correspond to bytes, i.e. eight-bit-control or > eight-bit-graphic, or ASCII, and multibyte chars whose internal code point > is 128-255). It's an ambiguous statement. Which are you sauing? Replace string-make-unibyte by: (1) encode-coding-string or make-string-unibyte. (2) a code that applies encode-coding-string or make-string-unibyte to the whole string depending on something (perhaps on the input string?). (3) a code that applies encode-coding-string to substrings where that is appropriate, and applies make-string-unibyte to the remaing substrings. (4) something that I still don't understand. >>> I don't see any use of string-make-unibyte in your two examples. >> Again, I'd like to ask how to use encode-coding-string >> without knowing the proper coding-system in each case. > How could I know the coding-system to use when replacing > `string-make-unibyte' if I don't have any actual call to > string-make-unibyte to work with ? What a strange logic?!? You have been argued that we should replace string-make-unibyte with something that uses encode-coding-string. Then you should have an idea about what coding-system to use for encode-coding-string. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-27 1:34 ` Kenichi Handa @ 2003-11-27 14:23 ` Stefan Monnier 2003-12-01 0:43 ` Kenichi Handa 0 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2003-11-27 14:23 UTC (permalink / raw) Cc: jas, emacs-devel >> I can't answer this question without knowing the answer to my question: >> what is string-make-unibyte used for. > It is used for converting a multibyte string to unibyte > before it is inserted in a unibyte buffer. I meant `what is "converting from multibyte to unibyte" used for'. I.e. it can be used for different things in different contexts and I can't answer in general, so I need a concrete case. > It's an ambiguous statement. Which are you sauing? > Replace string-make-unibyte by: > (1) encode-coding-string or make-string-unibyte. > (2) a code that applies encode-coding-string or > make-string-unibyte to the whole string depending on > something (perhaps on the input string?). > (3) a code that applies encode-coding-string to substrings > where that is appropriate, and applies make-string-unibyte > to the remaing substrings. > (4) something that I still don't understand. I'm saying that each *call* to string-make-unibyte can be replaced by a call to either encode-coding-string or make-string-unibyte. But the decision of which to use and which coding-system to use depends on the context. Now why would we want to do the work of changing all those calls? Because all those that would use encode-coding-string are incorrect in using string-make-unibyte because they won't do the right thing in some language environments. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-27 14:23 ` Stefan Monnier @ 2003-12-01 0:43 ` Kenichi Handa 2003-12-01 16:15 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-12-01 0:43 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvad6hlwu1.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >>> I can't answer this question without knowing the answer to my question: >>> what is string-make-unibyte used for. >> It is used for converting a multibyte string to unibyte >> before it is inserted in a unibyte buffer. > I meant `what is "converting from multibyte to unibyte" used for'. > I.e. it can be used for different things in different contexts and I can't > answer in general, so I need a concrete case. It is used for not loosing information about text even if you kill a text in a multibyte buffer and paste it in a unibyte buffer. When you kill the just pasted text of a unibyte buffer and paste it in the original multibyte buffer, you recover the same character sequence. Anyway, I already showed you this example: In Latin-2 environment but the default encoding is CTEXT. In that case also, inserting multibyte latin-2 string in unibyte buffer works the same way as in this case: In Latin-2 environment and the default environment is iso-latin-2. And, that's because the functionality of string-make-unibyte doesn't have to know about coding system. All it has to know is which character set to use. If you can't answer in general, please answer to this concrete question. In Latin-2 environment where one's primary character set is latin-iso8859-2 but the default encoding is CTEXT, how to make insertion of a multibyte string (containing only latin-iso8859-2 characters) in a unibyte buffer work with your method? Such an insertion may happen when a user kill a text in a multibyte buffer and yank it in a unibyte buffer. >> It's an ambiguous statement. Which are you sauing? >> Replace string-make-unibyte by: >> (1) encode-coding-string or make-string-unibyte. >> (2) a code that applies encode-coding-string or >> make-string-unibyte to the whole string depending on >> something (perhaps on the input string?). >> (3) a code that applies encode-coding-string to substrings >> where that is appropriate, and applies make-string-unibyte >> to the remaing substrings. >> (4) something that I still don't understand. > I'm saying that each *call* to string-make-unibyte can be replaced > by a call to either encode-coding-string or make-string-unibyte. > But the decision of which to use and which coding-system to use > depends on the context. Are you talking about the actual Emacs Lisp codes that explicitely call make-string-unibyte? I've been talking about the functionality of make-string-unibyte itself, especially about the implicit call to the C function copy_text that does the same thing as make-string-unibyte. Is that the reason why it seems that we are talking at corss purposes. > Now why would we want to do the work of changing all those calls? > Because all those that would use encode-coding-string are incorrect > in using string-make-unibyte because they won't do the right thing > in some language environments. What is the right thing to do when a multibyte Japanese text is being pasted into a unibyte buffer? I think signalling an error is the only right thing, and I've never objected to make copy_text and Fstring_make_unibyte signal an error in such a case. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-12-01 0:43 ` Kenichi Handa @ 2003-12-01 16:15 ` Stefan Monnier 2003-12-02 13:07 ` Kenichi Handa 0 siblings, 1 reply; 58+ messages in thread From: Stefan Monnier @ 2003-12-01 16:15 UTC (permalink / raw) Cc: jas, emacs-devel > It is used for not loosing information about text even if > you kill a text in a multibyte buffer and paste it in a > unibyte buffer. That's the kind of concrete case I needed, thank you. Now I'll have to go back and reread the thread to understand things better. Are there other cases like that ? Also, should we really allow such a thing ? I mean, it's a dangerous operation since it only works if the user is lucky enough to use just the right subset of characters. So we should at least signal an error if the conversion is unsafe (in that make-string-multibyte will not recover the original string). BTW, in which kind of circumstances is the user presented with both a multibyte buffer and a unibyte buffer ? > Are you talking about the actual Emacs Lisp codes that > explicitely call make-string-unibyte? I've been talking > about the functionality of make-string-unibyte itself, > especially about the implicit call to the C function > copy_text that does the same thing as make-string-unibyte. > Is that the reason why it seems that we are talking at corss > purposes. I'm talking about both. > What is the right thing to do when a multibyte Japanese text > is being pasted into a unibyte buffer? > I think signalling an error is the only right thing, and > I've never objected to make copy_text and > Fstring_make_unibyte signal an error in such a case. I agree on the signalling, of course, I just want to push it further and signal even when pasting latin-2 multibyte text into a unibyte buffer. After all, why should Slovak users be able to do that but Japanese users not ? In my view, everytime we use this kind of thing, we're taking a temporary shortcut that is "good enough for 8bit users" but not for the rest of the world. AFAIK, unibyte buffers should only be used internally and never presented to the user. This is because unibyte buffers contain bytes (in my view) whereas the user wants to see characters. Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-12-01 16:15 ` Stefan Monnier @ 2003-12-02 13:07 ` Kenichi Handa 2003-12-02 16:06 ` Stefan Monnier 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-12-02 13:07 UTC (permalink / raw) Cc: jas, emacs-devel In article <jwvd6b8ttfj.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> It is used for not loosing information about text even if >> you kill a text in a multibyte buffer and paste it in a >> unibyte buffer. > That's the kind of concrete case I needed, thank you. I'm very glad that now we can start to argue on the same wavelength. > Now I'll have to go back and reread the thread to understand things > better. Please. > Are there other cases like that ? For instance, on searching a multibyte string in a unibyte buffer. But, if we are searching for a regular expression that contains a character range (e.g. [a-z]), the current way of simple multibyte->unibyte conversion doesn't work in many cases. I fixed it in the unicode branch. > Also, should we really allow such a thing ? I myself tend to agree with dropping such a way of unibyte support, but that should be decided by Richard. > I mean, it's a dangerous operation since it only works if the user > is lucky enough to use just the right subset of > characters. But, we can expect such a luck in many situations where people mostly uses only characters belonging to their primary charset. > So we should at least signal an error if the conversion is > unsafe (in that make-string-multibyte will not recover the > original string). Shall we test it with HEAD to check how often such an error occurs? > BTW, in which kind of circumstances is the user presented with both > a multibyte buffer and a unibyte buffer ? Even if one starts Emacs with --unibyte, emacs sometimes make a multibyte buffer (e.g. C-h h). And, even if one starts Emacs with --multibyte, he may have a file that contains, for instance, latin-1 characters and raw-byte data, and he may want to read such a file with the coding system raw-text (then C-x = always shows \000..\377). >> Are you talking about the actual Emacs Lisp codes that >> explicitely call make-string-unibyte? I've been talking >> about the functionality of make-string-unibyte itself, >> especially about the implicit call to the C function >> copy_text that does the same thing as make-string-unibyte. >> Is that the reason why it seems that we are talking at corss >> purposes. > I'm talking about both. > I agree on the signalling, of course, I just want to push it further > and signal even when pasting latin-2 multibyte text into a unibyte buffer. > After all, why should Slovak users be able to do that but Japanese users > not ? In my view, everytime we use this kind of thing, we're taking > a temporary shortcut that is "good enough for 8bit users" but not for the > rest of the world. The fact that something doesn't work for double-byte charset users can't be a reason strong enough for dropping it for single-byte charset users. > AFAIK, unibyte buffers should only be used internally and never presented > to the user. This is because unibyte buffers contain bytes (in my view) > whereas the user wants to see characters. I agree that is a very clean view, and I myself expressed the same thing several times. But, it seems that Richard doesn't want to drop the current way of unibyte support. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-12-02 13:07 ` Kenichi Handa @ 2003-12-02 16:06 ` Stefan Monnier 0 siblings, 0 replies; 58+ messages in thread From: Stefan Monnier @ 2003-12-02 16:06 UTC (permalink / raw) Cc: jas, emacs-devel >> So we should at least signal an error if the conversion is >> unsafe (in that make-string-multibyte will not recover the >> original string). > Shall we test it with HEAD to check how often such an error > occurs? That would be great. >> BTW, in which kind of circumstances is the user presented with both >> a multibyte buffer and a unibyte buffer ? > Even if one starts Emacs with --unibyte, emacs sometimes > make a multibyte buffer (e.g. C-h h). I guess in a unibyte session, it makes sense, because in such a case, unibyte buffers do contain characters and the user explicitly tells us "don't bother me about multiple charsets, just pretend all fits within 8bits". > And, even if one starts Emacs with --multibyte, he may have a file that > contains, for instance, latin-1 characters and raw-byte data, and he may > want to read such a file with the coding system raw-text (then C-x = > always shows \000..\377). Is such a buffer necessarily unibyte ? Why not multibyte ? Or is it for performance reasons ? And what should happen if we paste text containing 8859-5 ou BIG5 text in such a buffer ? > The fact that something doesn't work for double-byte charset > users can't be a reason strong enough for dropping it for > single-byte charset users. Agreed. But we should encourage people to "do it right" by calling the appropriate encoding/decoding functions so it works for all cases. I believe that a good way to encourage people is by discouraging the use of string-make-unibyte (and other ways to use copy_text similarly). Stefan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: eight-bit char handling in emacs-unicode 2003-11-23 23:48 ` Stefan Monnier 2003-11-25 1:07 ` Kenichi Handa @ 2003-11-25 4:28 ` Richard Stallman 1 sibling, 0 replies; 58+ messages in thread From: Richard Stallman @ 2003-11-25 4:28 UTC (permalink / raw) Cc: jas, emacs-devel, handa No, as a matter of fact I don't see why in a utf-8 environment, it makes any sense to have a function that turns a multibyte string into a unibyte string encoded in latin-1 (without even complaining when it encounters other characters). There are programs that need to do explicit encoding. This will always be necessary. ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>]
* Re: eight-bit char handling in emacs-unicode [not found] ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> @ 2003-12-09 21:49 ` Richard Stallman 0 siblings, 0 replies; 58+ messages in thread From: Richard Stallman @ 2003-12-09 21:49 UTC (permalink / raw) Cc: emacs-devel, handa, jas So you seem to be thinking about a piece of elisp (or maybe C) that will call string-make-unibyte, but I'm wondering which piece of code you're thinking of, because this piece of code will work if your keyboard uses latin-1 encoding, but not if it uses utf-8 encoding. That may be good enough for some users. Also I'm wondering why this piece of code needs to use string-make-unibyte, instead of encode-coding-string (the only good reason I can think of is that the coding-system to use is not immediately apparent. One possible reason to use string-make-unibyte is because you want things to work "as if they'd been converted by something else". As long as Emacs performs this conversion in other situations on its own, it is useful to make it available as a separate function. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-13 6:10 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 6:51 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-15 22:32 ` Simon Josefsson 2003-11-17 1:12 ` BIG5-HKSCS? Kenichi Handa 1 sibling, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-15 22:32 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <ilur80c50uj.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> Kenichi Handa <handa@m17n.org> writes: >>> % cvs -z3 -d:pserver:anoncvs@subversions.gnu.org:/cvsroot/emacs co -r emacs-unicode-2 emacs > >> I tried starting Gnus on it, but it failed. It died with a elisp >> backtrace regarding define-key or something like that within bbdb. >> Since bbdb isn't a critical part, > > As bbdb is not a part of Emacs, I have no idea what is wrong > with it. Here is a test case: (setq bbdb-mode-map (make-keymap)) (suppress-keymap bbdb-mode-map) (define-key bbdb-mode-map [(?\;)] 'bbdb-record-edit-notes) (define-key bbdb-mode-map [(??)] 'bbdb-help) Thanks. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-15 22:32 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-17 1:12 ` Kenichi Handa 2003-11-17 2:06 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 1 reply; 58+ messages in thread From: Kenichi Handa @ 2003-11-17 1:12 UTC (permalink / raw) Cc: emacs-unicode, emacs-devel In article <ilud6bte00n.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> As bbdb is not a part of Emacs, I have no idea what is wrong >> with it. > Here is a test case: > (setq bbdb-mode-map (make-keymap)) > (suppress-keymap bbdb-mode-map) > (define-key bbdb-mode-map [(?\;)] 'bbdb-record-edit-notes) > (define-key bbdb-mode-map [(??)] 'bbdb-help) Thank you. I found a bug in handling Lucid style event type list. I've just installed a fix. By the way, Oliver Scholz <epameinondas@gmx.de> has also started testing emacs-unicode and our discussion shifted to emacs-unicode@gnu.org mailing list. I think it is better to use that mailing list for bug-reports specific to emacs-unicode. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-17 1:12 ` BIG5-HKSCS? Kenichi Handa @ 2003-11-17 2:06 ` Simon Josefsson 2003-11-17 5:45 ` BIG5-HKSCS? Eli Zaretskii 0 siblings, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-17 2:06 UTC (permalink / raw) Cc: emacs-unicode, emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <ilud6bte00n.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >>> As bbdb is not a part of Emacs, I have no idea what is wrong >>> with it. > >> Here is a test case: > >> (setq bbdb-mode-map (make-keymap)) >> (suppress-keymap bbdb-mode-map) >> (define-key bbdb-mode-map [(?\;)] 'bbdb-record-edit-notes) >> (define-key bbdb-mode-map [(??)] 'bbdb-help) > > Thank you. I found a bug in handling Lucid style event type > list. I've just installed a fix. Thanks, I'll try it. > By the way, Oliver Scholz <epameinondas@gmx.de> has also > started testing emacs-unicode and our discussion shifted to > emacs-unicode@gnu.org mailing list. I think it is better to > use that mailing list for bug-reports specific to > emacs-unicode. Where can I find archives for the list? I sent a report to the list, but it didn't appear to be a proper mailing list, so I got a user unknown bounce from <oldo@coli.uni-sb.de>. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-17 2:06 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-17 5:45 ` Eli Zaretskii 2003-11-17 7:43 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 1 reply; 58+ messages in thread From: Eli Zaretskii @ 2003-11-17 5:45 UTC (permalink / raw) Cc: emacs-unicode, emacs-devel, handa > From: Simon Josefsson <jas@extundo.com> > Date: Mon, 17 Nov 2003 03:06:22 +0100 > > > By the way, Oliver Scholz <epameinondas@gmx.de> has also > > started testing emacs-unicode and our discussion shifted to > > emacs-unicode@gnu.org mailing list. I think it is better to > > use that mailing list for bug-reports specific to > > emacs-unicode. > > Where can I find archives for the list? If you have a login on fencepost.gnu.org, I can tell you where to find the archives (mail me privately). Otherwise, tough. > I sent a report to the list, but it didn't appear to be a proper > mailing list, so I got a user unknown bounce from > <oldo@coli.uni-sb.de>. What do you mean by ``didn't appear to be a proper mailing list''? What did you do and what happened, exactly? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-17 5:45 ` BIG5-HKSCS? Eli Zaretskii @ 2003-11-17 7:43 ` Simon Josefsson 2003-11-18 7:01 ` BIG5-HKSCS? Richard Stallman 0 siblings, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-17 7:43 UTC (permalink / raw) Cc: emacs-unicode, handa, emacs-devel Eli Zaretskii <eliz@elta.co.il> writes: >> From: Simon Josefsson <jas@extundo.com> >> Date: Mon, 17 Nov 2003 03:06:22 +0100 >> >> > By the way, Oliver Scholz <epameinondas@gmx.de> has also >> > started testing emacs-unicode and our discussion shifted to >> > emacs-unicode@gnu.org mailing list. I think it is better to >> > use that mailing list for bug-reports specific to >> > emacs-unicode. >> >> Where can I find archives for the list? > > If you have a login on fencepost.gnu.org, I can tell you where to > find the archives (mail me privately). Otherwise, tough. I found it. >> I sent a report to the list, but it didn't appear to be a proper >> mailing list, so I got a user unknown bounce from >> <oldo@coli.uni-sb.de>. > > What do you mean by ``didn't appear to be a proper mailing list''? I meant that it wasn't run by some mailing list software that set the sender address to the mailing list software, instead of maintaining the original sender address (i.e., my address). The consequence is that when there is a network (or other) problem for any member of the list, I get a bounce. Fortunately, the list doesn't appear to have many members. Moving the list to mail.gnu.org would, besides fixing that problem, allow public archiving of the list, and user customization of delivery options. > What did you do and what happened, exactly? I sent a mail to the list, and got a user unknown bounce from one member on the list. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-17 7:43 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-18 7:01 ` Richard Stallman 2003-11-18 8:56 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 1 reply; 58+ messages in thread From: Richard Stallman @ 2003-11-18 7:01 UTC (permalink / raw) Cc: eliz, emacs-devel, emacs-unicode, handa I meant that it wasn't run by some mailing list software that set the sender address to the mailing list software, instead of maintaining the original sender address (i.e., my address). That way of running a list is a pain in the neck, because it makes sending a reply just to the sender of a message rather inconvenient. So we do not generally set up our lists that way. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-18 7:01 ` BIG5-HKSCS? Richard Stallman @ 2003-11-18 8:56 ` Simon Josefsson 2003-11-19 5:15 ` BIG5-HKSCS? Richard Stallman 0 siblings, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-18 8:56 UTC (permalink / raw) Cc: eliz, emacs-devel, emacs-unicode, handa Richard Stallman <rms@gnu.org> writes: > I meant that it wasn't run by some mailing list software that set the > sender address to the mailing list software, instead of maintaining > the original sender address (i.e., my address). > > That way of running a list is a pain in the neck, because it makes > sending a reply just to the sender of a message rather inconvenient. > So we do not generally set up our lists that way. I think there is some confusion here; the sender address is not the same as adding a Reply-To header, which I believe you refer to. I agree using a mailing list software that add a Reply-To header that point to the mailing list itself is just wrong. But altering the sender address to the mailing list software itself avoids a torrent of bounces to everyone that sends a message to the list. For example, for each and every message I have sent to emacs-unicode@gnu.org I have received a bounce message saying that one of the lists' member is not known. If that was a large list, with thousands of subscribers, the risk that, say, 10 of the subscribes have disappeared from the net is rather high. Getting tens of bounces for every message you send to a list is not good. I think it would be best to make emacs-unicode@gnu.org a proper mail.gnu.org mailing list. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-18 8:56 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-19 5:15 ` Richard Stallman 2003-11-20 5:48 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 1 reply; 58+ messages in thread From: Richard Stallman @ 2003-11-19 5:15 UTC (permalink / raw) Cc: eliz, emacs-devel, emacs-unicode, handa I think there is some confusion here; the sender address is not the same as adding a Reply-To header, which I believe you refer to. I agree using a mailing list software that add a Reply-To header that point to the mailing list itself is just wrong. But altering the sender address to the mailing list software itself avoids a torrent of bounces to everyone that sends a message to the list. You are right, I did misunderstand that point. Sorry. I think it would be best to make emacs-unicode@gnu.org a proper mail.gnu.org mailing list. It is a proper mailing list now, just not managed through mailman. I have no opinion on whether to use mailman to manage that list, but I strongly object to the idea that there is something less valid or less proper about defining a mailing list as an alias. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-19 5:15 ` BIG5-HKSCS? Richard Stallman @ 2003-11-20 5:48 ` Simon Josefsson 2003-11-20 5:56 ` BIG5-HKSCS? Eli Zaretskii 0 siblings, 1 reply; 58+ messages in thread From: Simon Josefsson @ 2003-11-20 5:48 UTC (permalink / raw) Cc: eliz, emacs-devel, emacs-unicode, handa Richard Stallman <rms@gnu.org> writes: > I think it would be best to make emacs-unicode@gnu.org a proper > mail.gnu.org mailing list. > > It is a proper mailing list now, just not managed through mailman. > > I have no opinion on whether to use mailman to manage that list, > but I strongly object to the idea that there is something less > valid or less proper about defining a mailing list as an alias. Right. I used the term "proper mailing list" in search of a better term, that would distinguish between mailing lists handled by mailman and mailing lists handled by sendmail aliases. (Incidentally, lately I have been receiving bounces for eliz@gnu.org for every post I send to emacs-unicode@gnu.org... perhaps fixed by now though.) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-20 5:48 ` BIG5-HKSCS? Simon Josefsson @ 2003-11-20 5:56 ` Eli Zaretskii 2003-11-20 6:20 ` BIG5-HKSCS? Simon Josefsson 0 siblings, 1 reply; 58+ messages in thread From: Eli Zaretskii @ 2003-11-20 5:56 UTC (permalink / raw) Cc: emacs-unicode, emacs-devel, handa > From: Simon Josefsson <jas@extundo.com> > Date: Thu, 20 Nov 2003 06:48:49 +0100 > > (Incidentally, lately I have been receiving bounces for eliz@gnu.org > for every post I send to emacs-unicode@gnu.org... perhaps fixed by > now though.) When was the last bounce, please? I think I fixed that a few days ago. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: BIG5-HKSCS? 2003-11-20 5:56 ` BIG5-HKSCS? Eli Zaretskii @ 2003-11-20 6:20 ` Simon Josefsson 0 siblings, 0 replies; 58+ messages in thread From: Simon Josefsson @ 2003-11-20 6:20 UTC (permalink / raw) Cc: emacs-unicode, emacs-devel, handa Eli Zaretskii <eliz@elta.co.il> writes: >> From: Simon Josefsson <jas@extundo.com> >> Date: Thu, 20 Nov 2003 06:48:49 +0100 >> >> (Incidentally, lately I have been receiving bounces for eliz@gnu.org >> for every post I send to emacs-unicode@gnu.org... perhaps fixed by >> now though.) > > When was the last bounce, please? I think I fixed that a few days > ago. Last time I posted (except previous message) was 18 Nov. I haven't received a bounce to the previous mail yet, and since you received it, I guess it works... ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2003-12-09 21:49 UTC | newest] Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-11-12 16:11 BIG5-HKSCS? Simon Josefsson 2003-11-13 1:53 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 4:14 ` BIG5-HKSCS? Simon Josefsson 2003-11-13 5:34 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 5:50 ` BIG5-HKSCS? Simon Josefsson 2003-11-13 4:49 ` BIG5-HKSCS? Simon Josefsson 2003-11-13 6:10 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 6:51 ` BIG5-HKSCS? Simon Josefsson 2003-11-13 9:01 ` BIG5-HKSCS? Kenichi Handa 2003-11-13 13:29 ` BIG5-HKSCS? Oliver Scholz 2003-11-13 23:40 ` BIG5-HKSCS? Kenichi Handa 2003-11-14 13:35 ` BIG5-HKSCS? Oliver Scholz 2003-11-13 16:34 ` BIG5-HKSCS? Simon Josefsson 2003-11-14 0:47 ` eight-bit char handling in emacs-unicode Kenichi Handa 2003-11-14 13:25 ` Oliver Scholz 2003-11-15 1:09 ` Kenichi Handa 2003-11-15 10:26 ` Oliver Scholz 2003-11-15 21:47 ` Simon Josefsson 2003-11-15 3:04 ` Simon Josefsson 2003-11-16 15:03 ` Alex Schroeder 2003-11-17 21:17 ` Stefan Monnier 2003-11-18 7:33 ` Kenichi Handa 2003-11-18 17:12 ` Stefan Monnier 2003-11-19 0:06 ` Kenichi Handa 2003-11-19 3:05 ` Stefan Monnier 2003-11-19 10:46 ` Juri Linkov 2003-11-19 13:48 ` Stefan Monnier 2003-11-20 23:41 ` Kenichi Handa 2003-11-21 0:41 ` Kenichi Handa 2003-11-21 5:27 ` Stefan Monnier 2003-11-21 6:27 ` Kenichi Handa 2003-11-21 14:59 ` Stefan Monnier 2003-11-22 1:25 ` Kenichi Handa 2003-11-22 23:53 ` Stefan Monnier 2003-11-23 7:30 ` Kenichi Handa 2003-11-23 23:48 ` Stefan Monnier 2003-11-25 1:07 ` Kenichi Handa [not found] ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> 2003-11-26 0:07 ` Kenichi Handa 2003-11-26 14:14 ` Stefan Monnier 2003-11-27 1:34 ` Kenichi Handa 2003-11-27 14:23 ` Stefan Monnier 2003-12-01 0:43 ` Kenichi Handa 2003-12-01 16:15 ` Stefan Monnier 2003-12-02 13:07 ` Kenichi Handa 2003-12-02 16:06 ` Stefan Monnier 2003-11-25 4:28 ` Richard Stallman [not found] ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> 2003-12-09 21:49 ` Richard Stallman 2003-11-15 22:32 ` BIG5-HKSCS? Simon Josefsson 2003-11-17 1:12 ` BIG5-HKSCS? Kenichi Handa 2003-11-17 2:06 ` BIG5-HKSCS? Simon Josefsson 2003-11-17 5:45 ` BIG5-HKSCS? Eli Zaretskii 2003-11-17 7:43 ` BIG5-HKSCS? Simon Josefsson 2003-11-18 7:01 ` BIG5-HKSCS? Richard Stallman 2003-11-18 8:56 ` BIG5-HKSCS? Simon Josefsson 2003-11-19 5:15 ` BIG5-HKSCS? Richard Stallman 2003-11-20 5:48 ` BIG5-HKSCS? Simon Josefsson 2003-11-20 5:56 ` BIG5-HKSCS? Eli Zaretskii 2003-11-20 6:20 ` BIG5-HKSCS? Simon Josefsson
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).