* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows @ 2011-08-18 9:01 Kazuhiro Ito 2011-08-18 9:48 ` Andreas Schwab ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-18 9:01 UTC (permalink / raw) To: 9318 When I start Emacs and evaluate the below code, unexpected result returns. (let ((func (lambda () (with-temp-buffer (mapc 'insert '(166 25339)) (encode-coding-region (point-min) (point-max) 'ctext-unix) (buffer-string))))) (cons (funcall func) (funcall func))) -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") car of the result is not constant. In the worst case, emacs crashes. It doesn't occur on Linux. If I evaluate twice, car and cdr of the last result are correct. Using encode-coding-string instead of encode-coding-region has no problem. (let ((func (lambda () (encode-coding-string (mapconcat 'char-to-string '(166 25339) "") 'ctext-unix)))) (cons (funcall func) (funcall func))) -> ("^[$(D\"C^[$(H*f^[(B" . "^[$(D\"C^[$(H*f^[(B") Before calling encode-coding-string also can avoid problem. (let ((func (lambda () (with-temp-buffer (mapc 'insert '(166 25339)) (encode-coding-region (point-min) (point-max) 'ctext-unix) (buffer-string))))) (encode-coding-string (mapconcat 'char-to-string '(166 25339) "") 'ctext-unix) (cons (funcall func) (funcall func))) -> ("^[$(D\"C^[$(H*f^[(B" . "^[$(D\"C^[$(H*f^[(B") -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows 2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito @ 2011-08-18 9:48 ` Andreas Schwab 2011-08-18 21:33 ` Kazuhiro Ito 2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito 2011-12-05 9:11 ` Paul Eggert 2 siblings, 1 reply; 19+ messages in thread From: Andreas Schwab @ 2011-08-18 9:48 UTC (permalink / raw) To: Kazuhiro Ito; +Cc: 9318 Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes: > Before calling encode-coding-string also can avoid problem. Perhaps something is clobbered by some autoloading? Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows 2011-08-18 9:48 ` Andreas Schwab @ 2011-08-18 21:33 ` Kazuhiro Ito 0 siblings, 0 replies; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-18 21:33 UTC (permalink / raw) To: Andreas Schwab; +Cc: 9318 > Perhaps something is clobbered by some autoloading? I think I don't understand what you mean excatly, but these phenomena are reproducible on precompiled binary (*1) with -Q option. (*1) http://ftp.gnu.org/pub/gnu/emacs/windows/emacs-23.3-bin-i386.zip -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito 2011-08-18 9:48 ` Andreas Schwab @ 2011-08-19 13:46 ` Kazuhiro Ito 2011-08-20 21:26 ` Chong Yidong 2011-12-05 9:11 ` Paul Eggert 2 siblings, 1 reply; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-19 13:46 UTC (permalink / raw) To: 9318 > When I start Emacs and evaluate the below code, unexpected result returns. > (let ((func (lambda () > (with-temp-buffer > (mapc 'insert '(166 25339)) > (encode-coding-region (point-min) (point-max) 'ctext-unix) > (buffer-string))))) > (cons (funcall func) > (funcall func))) > -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") > car of the result is not constant. I noticed this problem is not Windows specific. I confirmed that it is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via SSH from remote host. But it doesn't occur on openSUSE 11.3. -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito @ 2011-08-20 21:26 ` Chong Yidong 2011-08-21 0:17 ` Kazuhiro Ito 0 siblings, 1 reply; 19+ messages in thread From: Chong Yidong @ 2011-08-20 21:26 UTC (permalink / raw) To: Kazuhiro Ito; +Cc: 9318 Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes: >> When I start Emacs and evaluate the below code, unexpected result returns. > >> (let ((func (lambda () >> (with-temp-buffer >> (mapc 'insert '(166 25339)) >> (encode-coding-region (point-min) (point-max) 'ctext-unix) >> (buffer-string))))) >> (cons (funcall func) >> (funcall func))) >> -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") > >> car of the result is not constant. > > I noticed this problem is not Windows specific. I confirmed that it > is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via > SSH from remote host. But it doesn't occur on openSUSE 11.3. Could you run Emacs under a debugger, trigger the crash, and provide a backtrace? (You will need to have compiled Emacs with debugging symbols.) ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-20 21:26 ` Chong Yidong @ 2011-08-21 0:17 ` Kazuhiro Ito 2011-08-24 9:37 ` Kazuhiro Ito 0 siblings, 1 reply; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-21 0:17 UTC (permalink / raw) To: Chong Yidong; +Cc: 9318 > >> When I start Emacs and evaluate the below code, unexpected result returns. > > > >> (let ((func (lambda () > >> (with-temp-buffer > >> (mapc 'insert '(166 25339)) > >> (encode-coding-region (point-min) (point-max) 'ctext-unix) > >> (buffer-string))))) > >> (cons (funcall func) > >> (funcall func))) > >> -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") > > > > I noticed this problem is not Windows specific. I confirmed that it > > is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via > > SSH from remote host. But it doesn't occur on openSUSE 11.3. > > Could you run Emacs under a debugger, trigger the crash, and provide a > backtrace? (You will need to have compiled Emacs with debugging > symbols.) I built Emacs 23.3 with "-O0 -g" option on NetBSD 5.1 (amd64), and started with below commad (via SSH). gdb --args emacs -Q --no-splash Next, inputtedand below code and evaluated with C-x C-e. (progn (goto-char (point-min)) (insert #x80) (insert (make-string 16 ?A)) (encode-coding-region 1 18 'ctext-unix)) backtrace is below. Please let me know if you need more information. Program received signal SIGSEGV, Segmentation fault. 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 5473 if (STRING_MARKED_P (ptr)) (gdb) bt full #0 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 ptr = (struct Lisp_String *) 0x4141414141414140 obj = 4702111234474983745 cdr_count = 0 #1 0x0000000000557320 in mark_char_table (ptr=0x1281800) at alloc.c:5405 val = 4702111234474983745 size = 130 i = 0 #2 0x0000000000557315 in mark_char_table (ptr=0x17f6c00) at alloc.c:5402 val = 19404805 size = 34 i = 14 #3 0x0000000000557315 in mark_char_table (ptr=0x13ea700) at alloc.c:5402 val = 25127941 size = 18 i = 6 #4 0x0000000000557315 in mark_char_table (ptr=0x10ba800) at alloc.c:5402 val = 20883205 size = 68 i = 4 #5 0x0000000000557838 in mark_object (arg=17541125) at alloc.c:5567 obj = 17541125 cdr_count = 0 #6 0x0000000000557228 in mark_vectorlike (ptr=0xb16480) at alloc.c:5377 size = 10 i = 9 #7 0x0000000000557855 in mark_object (arg=11625605) at alloc.c:5569 obj = 11625605 cdr_count = 0 #8 0x0000000000557228 in mark_vectorlike (ptr=0xb56000) at alloc.c:5377 size = 434 i = 107 #9 0x0000000000557855 in mark_object (arg=11886597) at alloc.c:5569 obj = 11886597 cdr_count = 0 #10 0x00000000005577b0 in mark_object (arg=10786565) at alloc.c:5562 h = (struct Lisp_Hash_Table *) 0xa49700 obj = 10786565 cdr_count = 0 #11 0x00000000005568ff in Fgarbage_collect () at alloc.c:5092 bind = (struct specbinding *) 0xb96526 catch = (struct catchtag *) 0x7f7fffffc508 handler = (struct handler *) 0x10 stack_top_variable = 0 '\0' i = 418 message_p = 0 total = {140187732526192, 140187732526008, 140187732526000, 4294967295, 12148454, 10960258, 10312685, 68} count = 8 t1 = { tv_sec = 1313842937, tv_usec = 498976 } t2 = { tv_sec = 0, tv_usec = 140187732530104 } t3 = { tv_sec = 11465618, tv_usec = 0 } #12 0x0000000000577bb4 in Ffuncall (nargs=2, args=0x7f7fffffc4f0) at eval.c:2965 fun = 10313885 original_fun = 10959186 funcar = 10762338 numargs = 1 lisp_numargs = 10950075 val = 68 backtrace = { next = 0x7f7fffffc9a0, function = 0x7f7fffffc4f8, args = 0x7f7fffffc500, nargs = 1, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0x7f7fffffc500 i = 0 #13 0x00000000005ce3c1 in Fbyte_code (bytestr=9300689, vector=9300725, maxdepth=12) at bytecode.c:680 count = 7 op = 1 vectorp = (Lisp_Object *) 0x8deb00 bytestr_length = 18 stack = { pc = 0x96972f ")\207", top = 0x7f7fffffc4f8, bottom = 0x7f7fffffc4f0, byte_string = 9300689, byte_string_start = 0x96971f "\b\203\b", constants = 9300725, next = 0x7f7fffffcb40 } top = (Lisp_Object *) 0x7f7fffffc4f0 result = 10956883 #14 0x00000000005788cc in funcall_lambda (fun=9300621, nargs=1, arg_vector=0x7f7fffffca28) at eval.c:3220 val = 10762242 syms_left = 10762242 next = 18577650 count = 6 i = 1 optional = 0 rest = 0 #15 0x000000000057821a in Ffuncall (nargs=2, args=0x7f7fffffca20) at eval.c:3077 fun = 9300621 original_fun = 18577602 funcar = 18577842 numargs = 1 lisp_numargs = 10956963 val = 10762242 backtrace = { next = 0x7f7fffffced0, function = 0x7f7fffffca20, args = 0x7f7fffffca28, nargs = 1, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0xa730a3 i = 0 #16 0x00000000005ce3c1 in Fbyte_code (bytestr=9301185, vector=9301221, maxdepth=12) at bytecode.c:680 count = 5 op = 1 vectorp = (Lisp_Object *) 0x8decf0 bytestr_length = 31 stack = { pc = 0x969692 "\v)B\211\034A\n=\204\033", top = 0x7f7fffffca28, bottom = 0x7f7fffffca20, byte_string = 9301185, byte_string_start = 0x969685 "\b\204\b", constants = 9301221, next = 0x0 } top = (Lisp_Object *) 0x7f7fffffca20 result = 10762242 #17 0x00000000005788cc in funcall_lambda (fun=9301109, nargs=1, arg_vector=0x7f7fffffcfa8) at eval.c:3220 val = 140187732528832 syms_left = 10762242 next = 18577650 count = 4 i = 1 optional = 0 rest = 0 #18 0x000000000057821a in Ffuncall (nargs=2, args=0x7f7fffffcfa0) at eval.c:3077 fun = 9301109 original_fun = 11438610 funcar = 5059672 numargs = 1 lisp_numargs = 5059670 val = 10762242 backtrace = { next = 0x7f7fffffd310, function = 0x7f7fffffcfa0, args = 0x7f7fffffcfa8, nargs = 1, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0xa77993 i = 0 #19 0x000000000057296b in Fcall_interactively (function=11438610, record_flag=10762242, keys=10790405) at callint.c:869 val = 4 args = (Lisp_Object *) 0x7f7fffffcfa0 visargs = (Lisp_Object *) 0x7f7fffffcf80 specs = 9301281 filter_specs = 9301281 teml = 5734938 up_event = 10762242 enable = 10762242 speccount = 2 next_event = 2 prefix_arg = 10762242 string = (unsigned char *) 0x7f7fffffcfc0 "P" tem = (unsigned char *) 0x61652c "" varies = (int *) 0x7f7fffffcf60 i = 2 j = 1 count = 1 foo = 1 prompt1 = '\0' <repeats 99 times> tem1 = 0x0 arg_from_tty = 0 gcpro1 = { next = 0xa43802, var = 0xa43802, nvars = 0 } gcpro2 = { next = 0xa53bc2, var = 0xa51c05, nvars = 10828738 } gcpro3 = { next = 0xa55952, var = 0xa53bc2, nvars = 2 } gcpro4 = { next = 0xa43802, var = 0xb4a776, nvars = 2 } gcpro5 = { next = 0xa43802, var = 0xa43802, nvars = 10836306 } key_count = 2 record_then_fail = 0 save_this_command = 11438610 save_last_command = 11490098 save_this_original_command = 11438610 save_real_this_command = 11438610 #20 0x0000000000577f70 in Ffuncall (nargs=4, args=0x7f7fffffd3b0) at eval.c:3037 fun = 10312397 original_fun = 10978002 funcar = 4294967297 numargs = 3 lisp_numargs = 10937344 val = 315 backtrace = { next = 0x0, function = 0x7f7fffffd3b0, args = 0x7f7fffffd3b8, nargs = 3, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0x7f7fffffd3b8 i = 0 #21 0x000000000057795d in call3 (fn=10978002, arg1=11438610, arg2=10762242, arg3=10762242) at eval.c:2857 ret_ungc_val = 9301109 gcpro1 = { next = 0x8dec75, var = 0xa43802, nvars = 4 } args = {10978002, 11438610, 10762242, 10762242} #22 0x00000000004e4bca in Fcommand_execute (cmd=11438610, record_flag=10762242, keys=10762242, special=10762242) at keyboard.c:10562 final = 9301109 tem = 10762242 prefixarg = 10762242 #23 0x00000000004d564d in command_loop_1 () at keyboard.c:1906 cmd = 11438610 lose = 1 keybuf = {96, 20, 8, 0, 140187732530800, 18451712, 1893, 0, 140187732530816, 1983, 18451712, 4294967317, 140187732530800, 6299742, 10656928, 216, 10937344, 7378697632079252736, 140187732530864, 9720, 274877896416, 140187732531032, 0, 140187732530872, 140187732530384, 0, 10762242, 12348018, 8166853, 10762242} i = 2 prev_modiff = 158 prev_buffer = (struct buffer *) 0xa51c00 already_adjusted = 0 #24 0x0000000000575049 in internal_condition_case (bfun=0x4d3a17 <command_loop_1>, handlers=10851522, hfun=0x4d34bc <cmd_error>) at eval.c:1492 val = 10762242 c = { tag = 10762242, val = 10762242, next = 0x7f7fffffd880, gcpro = 0x0, jmp = {2129, 140187732531264, 140187732541408, 140187698962432, 140187696909296, 3, 140187732531000, 5722036, 0, 140187732531488, 18636288}, backlist = 0x0, handlerlist = 0x0, lisp_eval_depth = 0, pdlcount = 2, poll_suppress_count = 0, interrupt_input_blocked = 0, byte_stack = 0x0 } h = { handler = 10851522, var = 10762242, chosen_clause = 0, tag = 0x7f7fffffd790, next = 0x0 } #25 0x00000000004d389f in command_loop_2 () at keyboard.c:1362 val = 1 #26 0x0000000000574a0e in internal_catch (tag=10846786, func=0x4d3885 <command_loop_2>, arg=10762242) at eval.c:1228 c = { tag = 10846786, val = 10762242, next = 0x0, gcpro = 0x0, jmp = {2129, 140187732531488, 140187732541408, 140187698962432, 140187696909296, 3, 140187732531288, 5720565, 4301358603, 10820608, 11046651}, backlist = 0x0, handlerlist = 0x0, lisp_eval_depth = 0, pdlcount = 2, poll_suppress_count = 0, interrupt_input_blocked = 0, byte_stack = 0x0 } #27 0x00000000004d3859 in command_loop () at keyboard.c:1341 No locals. #28 0x00000000004d3004 in recursive_edit_1 () at keyboard.c:956 count = 1 val = 5059007 #29 0x00000000004d31a6 in Frecursive_edit () at keyboard.c:1018 count = 0 buffer = 10762242 #30 0x00000000004d169a in main (argc=3, argv=0x7f7fffffdb70) at emacs.c:1833 dummy = 140187730444288 stack_bottom_variable = 0 '\0' do_initial_setlocale = 1 skip_args = 0 rlim = { rlim_cur = 8720384, rlim_max = 33554432 } no_loadup = 0 junk = 0x0 dname_arg = 0x0 Lisp Backtrace: "eval-last-sexp-1" (0xffffca28) "eval-last-sexp" (0xffffcfa8) "call-interactively" (0xffffd3b8) -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-21 0:17 ` Kazuhiro Ito @ 2011-08-24 9:37 ` Kazuhiro Ito 2011-08-24 12:06 ` Eli Zaretskii 2011-08-24 17:59 ` Andreas Schwab 0 siblings, 2 replies; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-24 9:37 UTC (permalink / raw) To: Chong Yidong; +Cc: 9318 > I built Emacs 23.3 with "-O0 -g" option on NetBSD 5.1 (amd64), and > started with below commad (via SSH). > > gdb --args emacs -Q --no-splash > > Next, inputtedand below code and evaluated with C-x C-e. > > (progn > (goto-char (point-min)) > (insert #x80) > (insert (make-string 16 ?A)) > (encode-coding-region 1 18 'ctext-unix)) > > backtrace is below. Please let me know if you need more information. > > > Program received signal SIGSEGV, Segmentation fault. > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 > 5473 if (STRING_MARKED_P (ptr)) I think relocation of buffer may cause the problem. The comment for CODING_DECODE_CHAR macro in coding.c says as below. > /* This wrapper macro is used to preserve validity of pointers into > buffer text across calls to decode_char, which could cause > relocation of buffers if it loads a charset map, because loading a > charset map allocates large structures. */ encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may load a charset map. If this is the cause of the problem, encode_coding_emace_mule() has the same problem. -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-24 9:37 ` Kazuhiro Ito @ 2011-08-24 12:06 ` Eli Zaretskii 2011-08-25 9:49 ` Kazuhiro Ito 2011-08-24 17:59 ` Andreas Schwab 1 sibling, 1 reply; 19+ messages in thread From: Eli Zaretskii @ 2011-08-24 12:06 UTC (permalink / raw) To: Kazuhiro Ito; +Cc: cyd, 9318 > Date: Wed, 24 Aug 2011 18:37:24 +0900 > From: Kazuhiro Ito <kzhr@d1.dion.ne.jp> > Cc: 9318@debbugs.gnu.org > > > (progn > > (goto-char (point-min)) > > (insert #x80) > > (insert (make-string 16 ?A)) > > (encode-coding-region 1 18 'ctext-unix)) > > > > backtrace is below. Please let me know if you need more information. > > > > > > Program received signal SIGSEGV, Segmentation fault. > > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 > > 5473 if (STRING_MARKED_P (ptr)) > > I think relocation of buffer may cause the problem. > > The comment for CODING_DECODE_CHAR macro in coding.c says as below. > > > /* This wrapper macro is used to preserve validity of pointers into > > buffer text across calls to decode_char, which could cause > > relocation of buffers if it loads a charset map, because loading a > > charset map allocates large structures. */ > > encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses > ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may > load a charset map. But which pointer(s) in encode_coding_iso_2022 can be altered by relocation? Do you actually see any of the pointers used by this function modified by relocation of some buffer? ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-24 12:06 ` Eli Zaretskii @ 2011-08-25 9:49 ` Kazuhiro Ito 0 siblings, 0 replies; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-25 9:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 9318 > > > (progn > > > (goto-char (point-min)) > > > (insert #x80) > > > (insert (make-string 16 ?A)) > > > (encode-coding-region 1 18 'ctext-unix)) > > > > > > backtrace is below. Please let me know if you need more information. > > > > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 > > > 5473 if (STRING_MARKED_P (ptr)) > > > > I think relocation of buffer may cause the problem. > > > > The comment for CODING_DECODE_CHAR macro in coding.c says as below. > > > > > /* This wrapper macro is used to preserve validity of pointers into > > > buffer text across calls to decode_char, which could cause > > > relocation of buffers if it loads a charset map, because loading a > > > charset map allocates large structures. */ > > > > encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses > > ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may > > load a charset map. > > But which pointer(s) in encode_coding_iso_2022 can be altered by > relocation? encode_coding() sets coding->destination with coding_set_destination() before calling encode_coding_iso_2022(). I think at least correct value of coding->destination can change in encode_coding_iso_2022() by loading charset maps. > Do you actually see any of the pointers used by this > function modified by relocation of some buffer? No, beacuse I don't know how to see. -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-24 9:37 ` Kazuhiro Ito 2011-08-24 12:06 ` Eli Zaretskii @ 2011-08-24 17:59 ` Andreas Schwab 2011-08-25 9:54 ` Kazuhiro Ito 1 sibling, 1 reply; 19+ messages in thread From: Andreas Schwab @ 2011-08-24 17:59 UTC (permalink / raw) To: Kazuhiro Ito; +Cc: Chong Yidong, 9318 Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes: > I think relocation of buffer may cause the problem. Does that help? diff --git a/src/coding.c b/src/coding.c index 65c8a76..f34a023 100644 --- a/src/coding.c +++ b/src/coding.c @@ -915,8 +915,8 @@ record_conversion_result (struct coding_system *coding, } } -/* This wrapper macro is used to preserve validity of pointers into - buffer text across calls to decode_char, which could cause +/* These wrapper macros are used to preserve validity of pointers into + buffer text across calls to decode_char/encode_char, which could cause relocation of buffers if it loads a charset map, because loading a charset map allocates large structures. */ #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ @@ -935,6 +935,21 @@ record_conversion_result (struct coding_system *coding, src_end += offset; \ } \ } while (0) +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ + do { \ + charset_map_loaded = 0; \ + code = ENCODE_CHAR (charset, c); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) /* If there are at least BYTES length of room at dst, allocate memory @@ -2652,7 +2667,7 @@ encode_coding_emacs_mule (struct coding_system *coding) { charset = CHARSET_FROM_ID (preferred_charset_id); if (CHAR_CHARSET_P (c, charset)) - code = ENCODE_CHAR (charset, c); + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); else charset = char_charset (c, charset_list, &code); } @@ -4185,7 +4200,8 @@ decode_coding_iso_2022 (struct coding_system *coding) #define ENCODE_ISO_CHARACTER(charset, c) \ do { \ - int code = ENCODE_CHAR ((charset), (c)); \ + int code; \ + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); \ \ if (CHARSET_DIMENSION (charset) == 1) \ ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply related [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-24 17:59 ` Andreas Schwab @ 2011-08-25 9:54 ` Kazuhiro Ito 2011-08-26 11:41 ` Kazuhiro Ito 0 siblings, 1 reply; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-25 9:54 UTC (permalink / raw) To: Andreas Schwab; +Cc: Chong Yidong, 9318 > > I think relocation of buffer may cause the problem. > > Does that help? > > diff --git a/src/coding.c b/src/coding.c > index 65c8a76..f34a023 100644 > --- a/src/coding.c > +++ b/src/coding.c > @@ -915,8 +915,8 @@ record_conversion_result (struct coding_system *coding, > } > } > > -/* This wrapper macro is used to preserve validity of pointers into > - buffer text across calls to decode_char, which could cause > +/* These wrapper macros are used to preserve validity of pointers into > + buffer text across calls to decode_char/encode_char, which could cause > relocation of buffers if it loads a charset map, because loading a > charset map allocates large structures. */ > #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ > @@ -935,6 +935,21 @@ record_conversion_result (struct coding_system *coding, > src_end += offset; \ > } \ > } while (0) > +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ > + do { \ > + charset_map_loaded = 0; \ > + code = ENCODE_CHAR (charset, c); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > > > /* If there are at least BYTES length of room at dst, allocate memory > @@ -2652,7 +2667,7 @@ encode_coding_emacs_mule (struct coding_system *coding) > { > charset = CHARSET_FROM_ID (preferred_charset_id); > if (CHAR_CHARSET_P (c, charset)) > - code = ENCODE_CHAR (charset, c); > + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); > else > charset = char_charset (c, charset_list, &code); > } > @@ -4185,7 +4200,8 @@ decode_coding_iso_2022 (struct coding_system *coding) > #define ENCODE_ISO_CHARACTER(charset, c) \ > do { \ > - int code = ENCODE_CHAR ((charset), (c)); \ > + int code; \ > + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); \ > \ > if (CHARSET_DIMENSION (charset) == 1) \ > ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ Andreas' patch resolved the problem partially. It resolved the problem on NetBSD with '-O0' CFLAGS, but failed on NetBSD with '-O2' and Windows. I confirmed that adding the protection of coding->dst_object to Andreas' patch resolved the problem on NetBSD with '-O2' but not on Windows. I don't know whether it is incorrect way or is not enough. --- src/coding.c 2011-07-01 11:03:55 +0000 +++ src/coding.c 2011-08-24 23:39:49 +0000 @@ -7397,10 +7436,15 @@ setup_ccl_program (&cclspec.ccl, CODING_CCL_ENCODER (coding)); } do { + struct gcpro gcpro1; + GCPRO1 (coding->dst_object); + coding_set_source (coding); consume_chars (coding, translation_table, max_lookup); coding_set_destination (coding); (*(coding->encoder)) (coding); + + UNGCPRO; } while (coding->consumed_char < coding->src_chars); if (BUFFERP (coding->dst_object) && coding->produced_char > 0) -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-25 9:54 ` Kazuhiro Ito @ 2011-08-26 11:41 ` Kazuhiro Ito 2011-08-28 0:04 ` Kazuhiro Ito 0 siblings, 1 reply; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-26 11:41 UTC (permalink / raw) To: Andreas Schwab; +Cc: Chong Yidong, 9318 > > > I think relocation of buffer may cause the problem. > > > > Does that help? > > Andreas' patch resolved the problem partially. It resolved the problem on > NetBSD with '-O0' CFLAGS, but failed on NetBSD with '-O2' and Windows. > > I confirmed that adding the protection of coding->dst_object to > Andreas' patch resolved the problem on NetBSD with '-O2' but not on > Windows. I don't know whether it is incorrect way or is not enough. I noticed char_charset() could cause relocation of buffers because it could call encode_char(). I confirmed similar changes to callers of char_charset() fixed my problem (without the protection of coding->dst_object). SUMMARY OF THE PROBLEM: In encode_coding_XXX(), calling encode_char() could cause relocation of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR could also cause relocation because they could call encode_char(). After using of them, coding->destination, dst, dst_end should be updated as needed. -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-26 11:41 ` Kazuhiro Ito @ 2011-08-28 0:04 ` Kazuhiro Ito 2011-08-30 23:30 ` Kazuhiro Ito 0 siblings, 1 reply; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-28 0:04 UTC (permalink / raw) To: Andreas Schwab; +Cc: Chong Yidong, 9318 > SUMMARY OF THE PROBLEM: > In encode_coding_XXX(), calling encode_char() could cause relocation > of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR > could also cause relocation because they could call encode_char(). > After using of them, coding->destination, dst, dst_end should be > updated as needed. I noticed CHAR_CHARSET_P macro slipped out of my check. CHAR_CHARSET_P could also cause relocation of buffers. -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-28 0:04 ` Kazuhiro Ito @ 2011-08-30 23:30 ` Kazuhiro Ito 2011-12-01 1:56 ` Kenichi Handa 0 siblings, 1 reply; 19+ messages in thread From: Kazuhiro Ito @ 2011-08-30 23:30 UTC (permalink / raw) To: Andreas Schwab; +Cc: Chong Yidong, 9318 > > SUMMARY OF THE PROBLEM: > > In encode_coding_XXX(), calling encode_char() could cause relocation > > of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR > > could also cause relocation because they could call encode_char(). > > After using of them, coding->destination, dst, dst_end should be > > updated as needed. > > I noticed CHAR_CHARSET_P macro slipped out of my check. > CHAR_CHARSET_P could also cause relocation of buffers. Here is the patch for the code, which contains Andreas' patch. In my environment, problems are fixed. I think it would be better that the interface of encode_designation_at_bol() is changed. === modified file 'src/coding.c' --- src/coding.c 2011-05-09 09:59:23 +0000 +++ src/coding.c 2011-08-28 07:33:54 +0000 @@ -1026,6 +1026,54 @@ } \ } while (0) +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ + do { \ + charset_map_loaded = 0; \ + code = ENCODE_CHAR (charset, c); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \ + do { \ + charset_map_loaded = 0; \ + charset = char_charset (c, charset_list, code_return); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \ + do { \ + charset_map_loaded = 0; \ + result = CHAR_CHARSET_P(c, charset); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + /* If there are at least BYTES length of room at dst, allocate memory for coding->destination and update dst and dst_end. We don't have @@ -2778,14 +2826,19 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (CHAR_CHARSET_P (c, charset)) + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (result) code = ENCODE_CHAR (charset, c); else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); } else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { c = coding->default_char; @@ -2794,7 +2847,8 @@ EMIT_ONE_ASCII_BYTE (c); continue; } - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); } dimension = CHARSET_DIMENSION (charset); emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); @@ -4317,8 +4371,9 @@ #define ENCODE_ISO_CHARACTER(charset, c) \ do { \ - int code = ENCODE_CHAR ((charset),(c)); \ - \ + int code; \ + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \ + \ if (CHARSET_DIMENSION (charset) == 1) \ ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ else \ @@ -4476,7 +4531,17 @@ c = *charbuf++; if (c == '\n') break; + + charset_map_loaded = 0; charset = char_charset (c, charset_list, NULL); + if (charset_map_loaded) + { + const unsigned char *orig = coding->destination; + + coding_set_destination (coding); + dst += coding->destination - orig; + } + id = CHARSET_ID (charset); reg = CODING_ISO_REQUEST (coding, id); if (reg >= 0 && r[reg] < 0) @@ -4543,6 +4608,12 @@ /* We have to produce designation sequences if any now. */ dst = encode_designation_at_bol (coding, charbuf, charbuf_end, dst); + if (charset_map_loaded) + { + EMACS_INT offset = coding->destination + coding->dst_bytes - dst_end; + dst_end += offset; + dst_prev += offset; + } bol_designation = 0; /* We are sure that designation sequences are all ASCII bytes. */ produced_chars += dst - dst_prev; @@ -4616,12 +4687,17 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (! CHAR_CHARSET_P (c, charset)) - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (! result) + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + NULL, charset); } else - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + NULL, charset); if (!charset) { if (coding->mode & CODING_MODE_SAFE_ENCODING) @@ -4632,7 +4708,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, + charset_list, NULL, charset); } } ENCODE_ISO_CHARACTER (charset, c); @@ -5064,7 +5141,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); if (!charset) { @@ -5076,7 +5155,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -5153,7 +5233,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { @@ -5165,7 +5247,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -5747,7 +5831,9 @@ } else { - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); + if (charset) { if (CHARSET_DIMENSION (charset) == 1) -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-30 23:30 ` Kazuhiro Ito @ 2011-12-01 1:56 ` Kenichi Handa 2011-12-05 7:10 ` Kenichi Handa 0 siblings, 1 reply; 19+ messages in thread From: Kenichi Handa @ 2011-12-01 1:56 UTC (permalink / raw) To: Kazuhiro Ito; +Cc: schwab, 9318 In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes: > Here is the patch for the code, which contains Andreas' patch. In my > environment, problems are fixed. I think it would be better that the > interface of encode_designation_at_bol() is changed. Oops, sorry, I have vaguely thought that your patch below has already been applied, but just noticed that it was not. I'll commit a slightly modified version including the improved interface for encode_designation_at_bol soon. By the way, it would be good if we had a way to suppress buffer text relocation temporarily. --- Kenichi Handa handa@m17n.org > === modified file 'src/coding.c' > --- src/coding.c 2011-05-09 09:59:23 +0000 > +++ src/coding.c 2011-08-28 07:33:54 +0000 > @@ -1026,6 +1026,54 @@ > } \ > } while (0) > +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ > + do { \ > + charset_map_loaded = 0; \ > + code = ENCODE_CHAR (charset, c); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > + > +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \ > + do { \ > + charset_map_loaded = 0; \ > + charset = char_charset (c, charset_list, code_return); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > + > +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \ > + do { \ > + charset_map_loaded = 0; \ > + result = CHAR_CHARSET_P(c, charset); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > + > /* If there are at least BYTES length of room at dst, allocate memory > for coding->destination and update dst and dst_end. We don't have > @@ -2778,14 +2826,19 @@ > if (preferred_charset_id >= 0) > { > + int result; > + > charset = CHARSET_FROM_ID (preferred_charset_id); > - if (CHAR_CHARSET_P (c, charset)) > + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); > + if (result) > code = ENCODE_CHAR (charset, c); > else > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > } > else > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > if (! charset) > { > c = coding->default_char; > @@ -2794,7 +2847,8 @@ > EMIT_ONE_ASCII_BYTE (c); > continue; > } > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > } > dimension = CHARSET_DIMENSION (charset); > emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); > @@ -4317,8 +4371,9 @@ > #define ENCODE_ISO_CHARACTER(charset, c) \ > do { \ > - int code = ENCODE_CHAR ((charset),(c)); \ > - \ > + int code; \ > + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \ > + \ > if (CHARSET_DIMENSION (charset) == 1) \ > ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ > else \ > @@ -4476,7 +4531,17 @@ > c = *charbuf++; > if (c == '\n') > break; > + > + charset_map_loaded = 0; > charset = char_charset (c, charset_list, NULL); > + if (charset_map_loaded) > + { > + const unsigned char *orig = coding->destination; > + > + coding_set_destination (coding); > + dst += coding->destination - orig; > + } > + > id = CHARSET_ID (charset); > reg = CODING_ISO_REQUEST (coding, id); > if (reg >= 0 && r[reg] < 0) > @@ -4543,6 +4608,12 @@ > /* We have to produce designation sequences if any now. */ > dst = encode_designation_at_bol (coding, charbuf, charbuf_end, dst); > + if (charset_map_loaded) > + { > + EMACS_INT offset = coding->destination + coding->dst_bytes - dst_end; > + dst_end += offset; > + dst_prev += offset; > + } > bol_designation = 0; > /* We are sure that designation sequences are all ASCII bytes. */ > produced_chars += dst - dst_prev; > @@ -4616,12 +4687,17 @@ > if (preferred_charset_id >= 0) > { > + int result; > + > charset = CHARSET_FROM_ID (preferred_charset_id); > - if (! CHAR_CHARSET_P (c, charset)) > - charset = char_charset (c, charset_list, NULL); > + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); > + if (! result) > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + NULL, charset); > } > else > - charset = char_charset (c, charset_list, NULL); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + NULL, charset); > if (!charset) > { > if (coding->mode & CODING_MODE_SAFE_ENCODING) > @@ -4632,7 +4708,8 @@ > else > { > c = coding->default_char; > - charset = char_charset (c, charset_list, NULL); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, > + charset_list, NULL, charset); > } > } > ENCODE_ISO_CHARACTER (charset, c); > @@ -5064,7 +5141,9 @@ > else > { > unsigned code; > - struct charset *charset = char_charset (c, charset_list, &code); > + struct charset *charset; > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > if (!charset) > { > @@ -5076,7 +5155,8 @@ > else > { > c = coding->default_char; > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, > + charset_list, &code, charset); > } > } > if (code == CHARSET_INVALID_CODE (charset)) > @@ -5153,7 +5233,9 @@ > else > { > unsigned code; > - struct charset *charset = char_charset (c, charset_list, &code); > + struct charset *charset; > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > if (! charset) > { > @@ -5165,7 +5247,8 @@ > else > { > c = coding->default_char; > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, > + charset_list, &code, charset); > } > } > if (code == CHARSET_INVALID_CODE (charset)) > @@ -5747,7 +5831,9 @@ > } > else > { > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > + > if (charset) > { > if (CHARSET_DIMENSION (charset) == 1) > -- > Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-12-01 1:56 ` Kenichi Handa @ 2011-12-05 7:10 ` Kenichi Handa 2011-12-05 11:31 ` Kazuhiro Ito 0 siblings, 1 reply; 19+ messages in thread From: Kenichi Handa @ 2011-12-05 7:10 UTC (permalink / raw) To: 9318; +Cc: kzhr, schwab In article <tl7zkfdnjgj.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes: > > Here is the patch for the code, which contains Andreas' patch. In my > > environment, problems are fixed. I think it would be better that the > > interface of encode_designation_at_bol() is changed. > Oops, sorry, I have vaguely thought that your patch below > has already been applied, but just noticed that it was not. > I'll commit a slightly modified version including the > improved interface for encode_designation_at_bol soon. I've just installed the following changes. As I don't have cygwin environment now, could you please check if this change surely fix the problem? --- Kenichi Handa handa@m17n.org 2011-12-05 Kenichi Handa <handa@m17n.org> * coding.c (encode_designation_at_bol): New args charbuf_end and dst. Return the number of produced bytes. Callers changed. (coding_set_source): Return how many bytes coding->source was relocated. (coding_set_destination): Return how many bytes coding->destination was relocated. (CODING_DECODE_CHAR, CODING_ENCODE_CHAR, CODING_CHAR_CHARSET) (CODING_CHAR_CHARSET_P): Adjusted for the avove changes. 2011-12-05 Kazuhiro Ito <kzhr@d1.dion.ne.jp> (tiny change) * coding.c (CODING_CHAR_CHARSET_P): New macro. (encode_coding_emacs_mule, encode_coding_iso_2022): Use the above macro (Bug#9318). 2011-12-05 Andreas Schwab <schwab@linux-m68k.org> The following changes are to fix Bug#9318. * coding.c (CODING_ENCODE_CHAR, CODING_CHAR_CHARSET): New macros. (encode_coding_emacs_mule, ENCODE_ISO_CHARACTER) (encode_coding_iso_2022, encode_coding_sjis) (encode_coding_big5, encode_coding_charset): Use the above macros. === modified file 'src/coding.c' --- src/coding.c 2011-11-07 01:57:07 +0000 +++ src/coding.c 2011-12-05 06:14:46 +0000 @@ -847,16 +847,16 @@ static void decode_coding_raw_text (struct coding_system *); static int encode_coding_raw_text (struct coding_system *); -static void coding_set_source (struct coding_system *); -static void coding_set_destination (struct coding_system *); +static EMACS_INT coding_set_source (struct coding_system *); +static EMACS_INT coding_set_destination (struct coding_system *); static void coding_alloc_by_realloc (struct coding_system *, EMACS_INT); static void coding_alloc_by_making_gap (struct coding_system *, EMACS_INT, EMACS_INT); static unsigned char *alloc_destination (struct coding_system *, EMACS_INT, unsigned char *); static void setup_iso_safe_charsets (Lisp_Object); -static unsigned char *encode_designation_at_bol (struct coding_system *, - int *, unsigned char *); +static int encode_designation_at_bol (struct coding_system *, + int *, int *, unsigned char *); static int detect_eol (const unsigned char *, EMACS_INT, enum coding_category); static Lisp_Object adjust_coding_eol_type (struct coding_system *, int); @@ -915,27 +915,68 @@ } } -/* This wrapper macro is used to preserve validity of pointers into - buffer text across calls to decode_char, which could cause - relocation of buffers if it loads a charset map, because loading a - charset map allocates large structures. */ +/* These wrapper macros are used to preserve validity of pointers into + buffer text across calls to decode_char, encode_char, etc, which + could cause relocation of buffers if it loads a charset map, + because loading a charset map allocates large structures. */ + #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ do { \ + EMACS_INT offset; \ + \ charset_map_loaded = 0; \ c = DECODE_CHAR (charset, code); \ - if (charset_map_loaded) \ + if (charset_map_loaded \ + && (offset = coding_set_source (coding))) \ { \ - const unsigned char *orig = coding->source; \ - EMACS_INT offset; \ - \ - coding_set_source (coding); \ - offset = coding->source - orig; \ src += offset; \ src_base += offset; \ src_end += offset; \ } \ } while (0) +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + code = ENCODE_CHAR (charset, c); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + charset = char_charset (c, charset_list, code_return); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + result = CHAR_CHARSET_P (c, charset); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + /* If there are at least BYTES length of room at dst, allocate memory for coding->destination and update dst and dst_end. We don't have @@ -1015,9 +1056,14 @@ | ((p)[-1] & 0x3F)))) -static void +/* Update coding->source from coding->src_object, and return how many + bytes coding->source was changed. */ + +static EMACS_INT coding_set_source (struct coding_system *coding) { + const unsigned char *orig = coding->source; + if (BUFFERP (coding->src_object)) { struct buffer *buf = XBUFFER (coding->src_object); @@ -1036,11 +1082,18 @@ /* Otherwise, the source is C string and is never relocated automatically. Thus we don't have to update anything. */ } + return coding->source - orig; } -static void + +/* Update coding->destination from coding->dst_object, and return how + many bytes coding->destination was changed. */ + +static EMACS_INT coding_set_destination (struct coding_system *coding) { + const unsigned char *orig = coding->destination; + if (BUFFERP (coding->dst_object)) { if (BUFFERP (coding->src_object) && coding->src_pos < 0) @@ -1065,6 +1118,7 @@ /* Otherwise, the destination is C string and is never relocated automatically. Thus we don't have to update anything. */ } + return coding->destination - orig; } @@ -2650,14 +2704,19 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (CHAR_CHARSET_P (c, charset)) + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (result) code = ENCODE_CHAR (charset, c); else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); } else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { c = coding->default_char; @@ -2666,7 +2725,8 @@ EMIT_ONE_ASCII_BYTE (c); continue; } - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); } dimension = CHARSET_DIMENSION (charset); emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); @@ -4185,7 +4245,8 @@ #define ENCODE_ISO_CHARACTER(charset, c) \ do { \ - int code = ENCODE_CHAR ((charset), (c)); \ + int code; \ + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \ \ if (CHARSET_DIMENSION (charset) == 1) \ ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ @@ -4283,15 +4344,19 @@ /* Produce designation sequences of charsets in the line started from - SRC to a place pointed by DST, and return updated DST. + CHARBUF to a place pointed by DST, and return the number of + produced bytes. DST should not directly point a buffer text area + which may be relocated by char_charset call. If the current block ends before any end-of-line, we may fail to find all the necessary designations. */ -static unsigned char * -encode_designation_at_bol (struct coding_system *coding, int *charbuf, +static int +encode_designation_at_bol (struct coding_system *coding, + int *charbuf, int *charbuf_end, unsigned char *dst) { + unsigned char *orig; struct charset *charset; /* Table of charsets to be designated to each graphic register. */ int r[4]; @@ -4309,7 +4374,7 @@ for (reg = 0; reg < 4; reg++) r[reg] = -1; - while (found < 4) + while (charbuf < charbuf_end && found < 4) { int id; @@ -4334,7 +4399,7 @@ ENCODE_DESIGNATION (CHARSET_FROM_ID (r[reg]), reg, coding); } - return dst; + return dst - orig; } /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ @@ -4378,13 +4443,26 @@ if (bol_designation) { - unsigned char *dst_prev = dst; - /* We have to produce designation sequences if any now. */ - dst = encode_designation_at_bol (coding, charbuf, dst); - bol_designation = 0; + unsigned char desig_buf[16]; + int nbytes; + EMACS_INT offset; + + charset_map_loaded = 0; + nbytes = encode_designation_at_bol (coding, charbuf, charbuf_end, + desig_buf); + if (charset_map_loaded + && (offset = coding_set_destination (coding))) + { + dst += offset; + dst_end += offset; + } + memcpy (dst, desig_buf, nbytes); + dst += nbytes; /* We are sure that designation sequences are all ASCII bytes. */ - produced_chars += dst - dst_prev; + produced_chars += nbytes; + bol_designation = 0; + ASSURE_DESTINATION (safe_room); } c = *charbuf++; @@ -4455,12 +4533,17 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (! CHAR_CHARSET_P (c, charset)) - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (! result) + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + NULL, charset); } else - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + NULL, charset); if (!charset) { if (coding->mode & CODING_MODE_SAFE_ENCODING) @@ -4471,7 +4554,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, NULL, charset); } } ENCODE_ISO_CHARACTER (charset, c); @@ -4897,7 +4981,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (!charset) { @@ -4909,7 +4995,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -4984,7 +5071,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { @@ -4996,7 +5085,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -5572,7 +5662,9 @@ } else { - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); + if (charset) { if (CHARSET_DIMENSION (charset) == 1) ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-12-05 7:10 ` Kenichi Handa @ 2011-12-05 11:31 ` Kazuhiro Ito 0 siblings, 0 replies; 19+ messages in thread From: Kazuhiro Ito @ 2011-12-05 11:31 UTC (permalink / raw) To: Kenichi Handa; +Cc: schwab, 9318 > In article <tl7zkfdnjgj.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > > > In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes: > > > Here is the patch for the code, which contains Andreas' patch. In my > > > environment, problems are fixed. I think it would be better that the > > > interface of encode_designation_at_bol() is changed. > > > Oops, sorry, I have vaguely thought that your patch below > > has already been applied, but just noticed that it was not. > > I'll commit a slightly modified version including the > > improved interface for encode_designation_at_bol soon. > > I've just installed the following changes. As I don't have > cygwin environment now, could you please check if this > change surely fix the problem? As far as I confirmed, the problems were fixed (except the point Paul pointed out). Thank you. Additionally, if you have time, please confirm Bug#8619 and Bug#9389. -- Kazuhiro Ito ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito 2011-08-18 9:48 ` Andreas Schwab 2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito @ 2011-12-05 9:11 ` Paul Eggert 2011-12-06 0:30 ` Kenichi Handa 2 siblings, 1 reply; 19+ messages in thread From: Paul Eggert @ 2011-12-05 9:11 UTC (permalink / raw) To: Kenichi Handa; +Cc: 9318 That patch (bzr 106613) causes Emacs to use an uninitialized variable; I found this via static checking with GCC. I installed the following further patch, which I think is right and anyway does not introduce a bug -- can you please check it? Thanks. * coding.c (encode_designation_at_bol): Don't use uninitialized local variable (Bug#9318). === modified file 'src/coding.c' --- src/coding.c 2011-12-05 07:03:31 +0000 +++ src/coding.c 2011-12-05 09:00:44 +0000 @@ -4356,7 +4356,7 @@ int *charbuf, int *charbuf_end, unsigned char *dst) { - unsigned char *orig; + unsigned char *orig = dst; struct charset *charset; /* Table of charsets to be designated to each graphic register. */ int r[4]; ^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result 2011-12-05 9:11 ` Paul Eggert @ 2011-12-06 0:30 ` Kenichi Handa 0 siblings, 0 replies; 19+ messages in thread From: Kenichi Handa @ 2011-12-06 0:30 UTC (permalink / raw) To: Paul Eggert; +Cc: 9318 In article <4EDC8AD9.3050004@cs.ucla.edu>, Paul Eggert <eggert@cs.ucla.edu> writes: > That patch (bzr 106613) causes Emacs to use an uninitialized variable; > I found this via static checking with GCC. I installed the following > further patch, which I think is right and anyway does not introduce a bug -- > can you please check it? Thanks. Oops, my fault. Yes, your patch is correct. Thank you. --- Kenichi Handa handa@m17n.org > * coding.c (encode_designation_at_bol): Don't use uninitialized > local variable (Bug#9318). > === modified file 'src/coding.c' > --- src/coding.c 2011-12-05 07:03:31 +0000 > +++ src/coding.c 2011-12-05 09:00:44 +0000 > @@ -4356,7 +4356,7 @@ > int *charbuf, int *charbuf_end, > unsigned char *dst) > { > - unsigned char *orig; > + unsigned char *orig = dst; > struct charset *charset; > /* Table of charsets to be designated to each graphic register. */ > int r[4]; ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2011-12-06 0:30 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito 2011-08-18 9:48 ` Andreas Schwab 2011-08-18 21:33 ` Kazuhiro Ito 2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito 2011-08-20 21:26 ` Chong Yidong 2011-08-21 0:17 ` Kazuhiro Ito 2011-08-24 9:37 ` Kazuhiro Ito 2011-08-24 12:06 ` Eli Zaretskii 2011-08-25 9:49 ` Kazuhiro Ito 2011-08-24 17:59 ` Andreas Schwab 2011-08-25 9:54 ` Kazuhiro Ito 2011-08-26 11:41 ` Kazuhiro Ito 2011-08-28 0:04 ` Kazuhiro Ito 2011-08-30 23:30 ` Kazuhiro Ito 2011-12-01 1:56 ` Kenichi Handa 2011-12-05 7:10 ` Kenichi Handa 2011-12-05 11:31 ` Kazuhiro Ito 2011-12-05 9:11 ` Paul Eggert 2011-12-06 0:30 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).