* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows
@ 2011-08-18 9:01 Kazuhiro Ito
2011-08-18 9:48 ` Andreas Schwab
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-18 9:01 UTC (permalink / raw)
To: 9318
When I start Emacs and evaluate the below code, unexpected result returns.
(let ((func (lambda ()
(with-temp-buffer
(mapc 'insert '(166 25339))
(encode-coding-region (point-min) (point-max) 'ctext-unix)
(buffer-string)))))
(cons (funcall func)
(funcall func)))
-> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B")
car of the result is not constant. In the worst case, emacs
crashes. It doesn't occur on Linux. If I evaluate twice, car and cdr
of the last result are correct. Using encode-coding-string instead of
encode-coding-region has no problem.
(let ((func (lambda ()
(encode-coding-string
(mapconcat 'char-to-string '(166 25339) "")
'ctext-unix))))
(cons (funcall func)
(funcall func)))
-> ("^[$(D\"C^[$(H*f^[(B" . "^[$(D\"C^[$(H*f^[(B")
Before calling encode-coding-string also can avoid problem.
(let ((func (lambda ()
(with-temp-buffer
(mapc 'insert '(166 25339))
(encode-coding-region (point-min) (point-max) 'ctext-unix)
(buffer-string)))))
(encode-coding-string
(mapconcat 'char-to-string '(166 25339) "") 'ctext-unix)
(cons (funcall func)
(funcall func)))
-> ("^[$(D\"C^[$(H*f^[(B" . "^[$(D\"C^[$(H*f^[(B")
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows
2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito
@ 2011-08-18 9:48 ` Andreas Schwab
2011-08-18 21:33 ` Kazuhiro Ito
2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito
2011-12-05 9:11 ` Paul Eggert
2 siblings, 1 reply; 19+ messages in thread
From: Andreas Schwab @ 2011-08-18 9:48 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: 9318
Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> Before calling encode-coding-string also can avoid problem.
Perhaps something is clobbered by some autoloading?
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows
2011-08-18 9:48 ` Andreas Schwab
@ 2011-08-18 21:33 ` Kazuhiro Ito
0 siblings, 0 replies; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-18 21:33 UTC (permalink / raw)
To: Andreas Schwab; +Cc: 9318
> Perhaps something is clobbered by some autoloading?
I think I don't understand what you mean excatly, but these phenomena are
reproducible on precompiled binary (*1) with -Q option.
(*1) http://ftp.gnu.org/pub/gnu/emacs/windows/emacs-23.3-bin-i386.zip
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito
2011-08-18 9:48 ` Andreas Schwab
@ 2011-08-19 13:46 ` Kazuhiro Ito
2011-08-20 21:26 ` Chong Yidong
2011-12-05 9:11 ` Paul Eggert
2 siblings, 1 reply; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-19 13:46 UTC (permalink / raw)
To: 9318
> When I start Emacs and evaluate the below code, unexpected result returns.
> (let ((func (lambda ()
> (with-temp-buffer
> (mapc 'insert '(166 25339))
> (encode-coding-region (point-min) (point-max) 'ctext-unix)
> (buffer-string)))))
> (cons (funcall func)
> (funcall func)))
> -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B")
> car of the result is not constant.
I noticed this problem is not Windows specific. I confirmed that it
is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via
SSH from remote host. But it doesn't occur on openSUSE 11.3.
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito
@ 2011-08-20 21:26 ` Chong Yidong
2011-08-21 0:17 ` Kazuhiro Ito
0 siblings, 1 reply; 19+ messages in thread
From: Chong Yidong @ 2011-08-20 21:26 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: 9318
Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
>> When I start Emacs and evaluate the below code, unexpected result returns.
>
>> (let ((func (lambda ()
>> (with-temp-buffer
>> (mapc 'insert '(166 25339))
>> (encode-coding-region (point-min) (point-max) 'ctext-unix)
>> (buffer-string)))))
>> (cons (funcall func)
>> (funcall func)))
>> -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B")
>
>> car of the result is not constant.
>
> I noticed this problem is not Windows specific. I confirmed that it
> is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via
> SSH from remote host. But it doesn't occur on openSUSE 11.3.
Could you run Emacs under a debugger, trigger the crash, and provide a
backtrace? (You will need to have compiled Emacs with debugging
symbols.)
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-20 21:26 ` Chong Yidong
@ 2011-08-21 0:17 ` Kazuhiro Ito
2011-08-24 9:37 ` Kazuhiro Ito
0 siblings, 1 reply; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-21 0:17 UTC (permalink / raw)
To: Chong Yidong; +Cc: 9318
> >> When I start Emacs and evaluate the below code, unexpected result returns.
> >
> >> (let ((func (lambda ()
> >> (with-temp-buffer
> >> (mapc 'insert '(166 25339))
> >> (encode-coding-region (point-min) (point-max) 'ctext-unix)
> >> (buffer-string)))))
> >> (cons (funcall func)
> >> (funcall func)))
> >> -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B")
> >
> > I noticed this problem is not Windows specific. I confirmed that it
> > is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via
> > SSH from remote host. But it doesn't occur on openSUSE 11.3.
>
> Could you run Emacs under a debugger, trigger the crash, and provide a
> backtrace? (You will need to have compiled Emacs with debugging
> symbols.)
I built Emacs 23.3 with "-O0 -g" option on NetBSD 5.1 (amd64), and
started with below commad (via SSH).
gdb --args emacs -Q --no-splash
Next, inputtedand below code and evaluated with C-x C-e.
(progn
(goto-char (point-min))
(insert #x80)
(insert (make-string 16 ?A))
(encode-coding-region 1 18 'ctext-unix))
backtrace is below. Please let me know if you need more information.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473
5473 if (STRING_MARKED_P (ptr))
(gdb) bt full
#0 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473
ptr = (struct Lisp_String *) 0x4141414141414140
obj = 4702111234474983745
cdr_count = 0
#1 0x0000000000557320 in mark_char_table (ptr=0x1281800) at alloc.c:5405
val = 4702111234474983745
size = 130
i = 0
#2 0x0000000000557315 in mark_char_table (ptr=0x17f6c00) at alloc.c:5402
val = 19404805
size = 34
i = 14
#3 0x0000000000557315 in mark_char_table (ptr=0x13ea700) at alloc.c:5402
val = 25127941
size = 18
i = 6
#4 0x0000000000557315 in mark_char_table (ptr=0x10ba800) at alloc.c:5402
val = 20883205
size = 68
i = 4
#5 0x0000000000557838 in mark_object (arg=17541125) at alloc.c:5567
obj = 17541125
cdr_count = 0
#6 0x0000000000557228 in mark_vectorlike (ptr=0xb16480) at alloc.c:5377
size = 10
i = 9
#7 0x0000000000557855 in mark_object (arg=11625605) at alloc.c:5569
obj = 11625605
cdr_count = 0
#8 0x0000000000557228 in mark_vectorlike (ptr=0xb56000) at alloc.c:5377
size = 434
i = 107
#9 0x0000000000557855 in mark_object (arg=11886597) at alloc.c:5569
obj = 11886597
cdr_count = 0
#10 0x00000000005577b0 in mark_object (arg=10786565) at alloc.c:5562
h = (struct Lisp_Hash_Table *) 0xa49700
obj = 10786565
cdr_count = 0
#11 0x00000000005568ff in Fgarbage_collect () at alloc.c:5092
bind = (struct specbinding *) 0xb96526
catch = (struct catchtag *) 0x7f7fffffc508
handler = (struct handler *) 0x10
stack_top_variable = 0 '\0'
i = 418
message_p = 0
total = {140187732526192, 140187732526008, 140187732526000, 4294967295,
12148454, 10960258, 10312685, 68}
count = 8
t1 = {
tv_sec = 1313842937,
tv_usec = 498976
}
t2 = {
tv_sec = 0,
tv_usec = 140187732530104
}
t3 = {
tv_sec = 11465618,
tv_usec = 0
}
#12 0x0000000000577bb4 in Ffuncall (nargs=2, args=0x7f7fffffc4f0) at eval.c:2965
fun = 10313885
original_fun = 10959186
funcar = 10762338
numargs = 1
lisp_numargs = 10950075
val = 68
backtrace = {
next = 0x7f7fffffc9a0,
function = 0x7f7fffffc4f8,
args = 0x7f7fffffc500,
nargs = 1,
evalargs = 0 '\0',
debug_on_exit = 0 '\0'
}
internal_args = (Lisp_Object *) 0x7f7fffffc500
i = 0
#13 0x00000000005ce3c1 in Fbyte_code (bytestr=9300689, vector=9300725, maxdepth=12)
at bytecode.c:680
count = 7
op = 1
vectorp = (Lisp_Object *) 0x8deb00
bytestr_length = 18
stack = {
pc = 0x96972f ")\207",
top = 0x7f7fffffc4f8,
bottom = 0x7f7fffffc4f0,
byte_string = 9300689,
byte_string_start = 0x96971f "\b\203\b",
constants = 9300725,
next = 0x7f7fffffcb40
}
top = (Lisp_Object *) 0x7f7fffffc4f0
result = 10956883
#14 0x00000000005788cc in funcall_lambda (fun=9300621, nargs=1,
arg_vector=0x7f7fffffca28) at eval.c:3220
val = 10762242
syms_left = 10762242
next = 18577650
count = 6
i = 1
optional = 0
rest = 0
#15 0x000000000057821a in Ffuncall (nargs=2, args=0x7f7fffffca20) at eval.c:3077
fun = 9300621
original_fun = 18577602
funcar = 18577842
numargs = 1
lisp_numargs = 10956963
val = 10762242
backtrace = {
next = 0x7f7fffffced0,
function = 0x7f7fffffca20,
args = 0x7f7fffffca28,
nargs = 1,
evalargs = 0 '\0',
debug_on_exit = 0 '\0'
}
internal_args = (Lisp_Object *) 0xa730a3
i = 0
#16 0x00000000005ce3c1 in Fbyte_code (bytestr=9301185, vector=9301221, maxdepth=12)
at bytecode.c:680
count = 5
op = 1
vectorp = (Lisp_Object *) 0x8decf0
bytestr_length = 31
stack = {
pc = 0x969692 "\v)B\211\034A\n=\204\033",
top = 0x7f7fffffca28,
bottom = 0x7f7fffffca20,
byte_string = 9301185,
byte_string_start = 0x969685 "\b\204\b",
constants = 9301221,
next = 0x0
}
top = (Lisp_Object *) 0x7f7fffffca20
result = 10762242
#17 0x00000000005788cc in funcall_lambda (fun=9301109, nargs=1,
arg_vector=0x7f7fffffcfa8) at eval.c:3220
val = 140187732528832
syms_left = 10762242
next = 18577650
count = 4
i = 1
optional = 0
rest = 0
#18 0x000000000057821a in Ffuncall (nargs=2, args=0x7f7fffffcfa0) at eval.c:3077
fun = 9301109
original_fun = 11438610
funcar = 5059672
numargs = 1
lisp_numargs = 5059670
val = 10762242
backtrace = {
next = 0x7f7fffffd310,
function = 0x7f7fffffcfa0,
args = 0x7f7fffffcfa8,
nargs = 1,
evalargs = 0 '\0',
debug_on_exit = 0 '\0'
}
internal_args = (Lisp_Object *) 0xa77993
i = 0
#19 0x000000000057296b in Fcall_interactively (function=11438610,
record_flag=10762242, keys=10790405) at callint.c:869
val = 4
args = (Lisp_Object *) 0x7f7fffffcfa0
visargs = (Lisp_Object *) 0x7f7fffffcf80
specs = 9301281
filter_specs = 9301281
teml = 5734938
up_event = 10762242
enable = 10762242
speccount = 2
next_event = 2
prefix_arg = 10762242
string = (unsigned char *) 0x7f7fffffcfc0 "P"
tem = (unsigned char *) 0x61652c ""
varies = (int *) 0x7f7fffffcf60
i = 2
j = 1
count = 1
foo = 1
prompt1 = '\0' <repeats 99 times>
tem1 = 0x0
arg_from_tty = 0
gcpro1 = {
next = 0xa43802,
var = 0xa43802,
nvars = 0
}
gcpro2 = {
next = 0xa53bc2,
var = 0xa51c05,
nvars = 10828738
}
gcpro3 = {
next = 0xa55952,
var = 0xa53bc2,
nvars = 2
}
gcpro4 = {
next = 0xa43802,
var = 0xb4a776,
nvars = 2
}
gcpro5 = {
next = 0xa43802,
var = 0xa43802,
nvars = 10836306
}
key_count = 2
record_then_fail = 0
save_this_command = 11438610
save_last_command = 11490098
save_this_original_command = 11438610
save_real_this_command = 11438610
#20 0x0000000000577f70 in Ffuncall (nargs=4, args=0x7f7fffffd3b0) at eval.c:3037
fun = 10312397
original_fun = 10978002
funcar = 4294967297
numargs = 3
lisp_numargs = 10937344
val = 315
backtrace = {
next = 0x0,
function = 0x7f7fffffd3b0,
args = 0x7f7fffffd3b8,
nargs = 3,
evalargs = 0 '\0',
debug_on_exit = 0 '\0'
}
internal_args = (Lisp_Object *) 0x7f7fffffd3b8
i = 0
#21 0x000000000057795d in call3 (fn=10978002, arg1=11438610, arg2=10762242,
arg3=10762242) at eval.c:2857
ret_ungc_val = 9301109
gcpro1 = {
next = 0x8dec75,
var = 0xa43802,
nvars = 4
}
args = {10978002, 11438610, 10762242, 10762242}
#22 0x00000000004e4bca in Fcommand_execute (cmd=11438610, record_flag=10762242,
keys=10762242, special=10762242) at keyboard.c:10562
final = 9301109
tem = 10762242
prefixarg = 10762242
#23 0x00000000004d564d in command_loop_1 () at keyboard.c:1906
cmd = 11438610
lose = 1
keybuf = {96, 20, 8, 0, 140187732530800, 18451712, 1893, 0,
140187732530816, 1983, 18451712, 4294967317, 140187732530800, 6299742, 10656928,
216, 10937344, 7378697632079252736, 140187732530864, 9720, 274877896416,
140187732531032, 0, 140187732530872, 140187732530384, 0, 10762242, 12348018,
8166853, 10762242}
i = 2
prev_modiff = 158
prev_buffer = (struct buffer *) 0xa51c00
already_adjusted = 0
#24 0x0000000000575049 in internal_condition_case (bfun=0x4d3a17 <command_loop_1>,
handlers=10851522, hfun=0x4d34bc <cmd_error>) at eval.c:1492
val = 10762242
c = {
tag = 10762242,
val = 10762242,
next = 0x7f7fffffd880,
gcpro = 0x0,
jmp = {2129, 140187732531264, 140187732541408, 140187698962432, 140187696909296,
3, 140187732531000, 5722036, 0, 140187732531488, 18636288},
backlist = 0x0,
handlerlist = 0x0,
lisp_eval_depth = 0,
pdlcount = 2,
poll_suppress_count = 0,
interrupt_input_blocked = 0,
byte_stack = 0x0
}
h = {
handler = 10851522,
var = 10762242,
chosen_clause = 0,
tag = 0x7f7fffffd790,
next = 0x0
}
#25 0x00000000004d389f in command_loop_2 () at keyboard.c:1362
val = 1
#26 0x0000000000574a0e in internal_catch (tag=10846786,
func=0x4d3885 <command_loop_2>, arg=10762242) at eval.c:1228
c = {
tag = 10846786,
val = 10762242,
next = 0x0,
gcpro = 0x0,
jmp = {2129, 140187732531488, 140187732541408, 140187698962432, 140187696909296,
3, 140187732531288, 5720565, 4301358603, 10820608, 11046651},
backlist = 0x0,
handlerlist = 0x0,
lisp_eval_depth = 0,
pdlcount = 2,
poll_suppress_count = 0,
interrupt_input_blocked = 0,
byte_stack = 0x0
}
#27 0x00000000004d3859 in command_loop () at keyboard.c:1341
No locals.
#28 0x00000000004d3004 in recursive_edit_1 () at keyboard.c:956
count = 1
val = 5059007
#29 0x00000000004d31a6 in Frecursive_edit () at keyboard.c:1018
count = 0
buffer = 10762242
#30 0x00000000004d169a in main (argc=3, argv=0x7f7fffffdb70) at emacs.c:1833
dummy = 140187730444288
stack_bottom_variable = 0 '\0'
do_initial_setlocale = 1
skip_args = 0
rlim = {
rlim_cur = 8720384,
rlim_max = 33554432
}
no_loadup = 0
junk = 0x0
dname_arg = 0x0
Lisp Backtrace:
"eval-last-sexp-1" (0xffffca28)
"eval-last-sexp" (0xffffcfa8)
"call-interactively" (0xffffd3b8)
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-21 0:17 ` Kazuhiro Ito
@ 2011-08-24 9:37 ` Kazuhiro Ito
2011-08-24 12:06 ` Eli Zaretskii
2011-08-24 17:59 ` Andreas Schwab
0 siblings, 2 replies; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-24 9:37 UTC (permalink / raw)
To: Chong Yidong; +Cc: 9318
> I built Emacs 23.3 with "-O0 -g" option on NetBSD 5.1 (amd64), and
> started with below commad (via SSH).
>
> gdb --args emacs -Q --no-splash
>
> Next, inputtedand below code and evaluated with C-x C-e.
>
> (progn
> (goto-char (point-min))
> (insert #x80)
> (insert (make-string 16 ?A))
> (encode-coding-region 1 18 'ctext-unix))
>
> backtrace is below. Please let me know if you need more information.
>
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473
> 5473 if (STRING_MARKED_P (ptr))
I think relocation of buffer may cause the problem.
The comment for CODING_DECODE_CHAR macro in coding.c says as below.
> /* This wrapper macro is used to preserve validity of pointers into
> buffer text across calls to decode_char, which could cause
> relocation of buffers if it loads a charset map, because loading a
> charset map allocates large structures. */
encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses
ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may
load a charset map. If this is the cause of the problem,
encode_coding_emace_mule() has the same problem.
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-24 9:37 ` Kazuhiro Ito
@ 2011-08-24 12:06 ` Eli Zaretskii
2011-08-25 9:49 ` Kazuhiro Ito
2011-08-24 17:59 ` Andreas Schwab
1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2011-08-24 12:06 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: cyd, 9318
> Date: Wed, 24 Aug 2011 18:37:24 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> Cc: 9318@debbugs.gnu.org
>
> > (progn
> > (goto-char (point-min))
> > (insert #x80)
> > (insert (make-string 16 ?A))
> > (encode-coding-region 1 18 'ctext-unix))
> >
> > backtrace is below. Please let me know if you need more information.
> >
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473
> > 5473 if (STRING_MARKED_P (ptr))
>
> I think relocation of buffer may cause the problem.
>
> The comment for CODING_DECODE_CHAR macro in coding.c says as below.
>
> > /* This wrapper macro is used to preserve validity of pointers into
> > buffer text across calls to decode_char, which could cause
> > relocation of buffers if it loads a charset map, because loading a
> > charset map allocates large structures. */
>
> encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses
> ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may
> load a charset map.
But which pointer(s) in encode_coding_iso_2022 can be altered by
relocation? Do you actually see any of the pointers used by this
function modified by relocation of some buffer?
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-24 9:37 ` Kazuhiro Ito
2011-08-24 12:06 ` Eli Zaretskii
@ 2011-08-24 17:59 ` Andreas Schwab
2011-08-25 9:54 ` Kazuhiro Ito
1 sibling, 1 reply; 19+ messages in thread
From: Andreas Schwab @ 2011-08-24 17:59 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: Chong Yidong, 9318
Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> I think relocation of buffer may cause the problem.
Does that help?
diff --git a/src/coding.c b/src/coding.c
index 65c8a76..f34a023 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -915,8 +915,8 @@ record_conversion_result (struct coding_system *coding,
}
}
-/* This wrapper macro is used to preserve validity of pointers into
- buffer text across calls to decode_char, which could cause
+/* These wrapper macros are used to preserve validity of pointers into
+ buffer text across calls to decode_char/encode_char, which could cause
relocation of buffers if it loads a charset map, because loading a
charset map allocates large structures. */
#define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \
@@ -935,6 +935,21 @@ record_conversion_result (struct coding_system *coding,
src_end += offset; \
} \
} while (0)
+#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \
+ do { \
+ charset_map_loaded = 0; \
+ code = ENCODE_CHAR (charset, c); \
+ if (charset_map_loaded) \
+ { \
+ const unsigned char *orig = coding->destination; \
+ EMACS_INT offset; \
+ \
+ coding_set_destination (coding); \
+ offset = coding->destination - orig; \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
/* If there are at least BYTES length of room at dst, allocate memory
@@ -2652,7 +2667,7 @@ encode_coding_emacs_mule (struct coding_system *coding)
{
charset = CHARSET_FROM_ID (preferred_charset_id);
if (CHAR_CHARSET_P (c, charset))
- code = ENCODE_CHAR (charset, c);
+ CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code);
else
charset = char_charset (c, charset_list, &code);
}
@@ -4185,7 +4200,8 @@ decode_coding_iso_2022 (struct coding_system *coding)
#define ENCODE_ISO_CHARACTER(charset, c) \
do { \
- int code = ENCODE_CHAR ((charset), (c)); \
+ int code; \
+ CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); \
\
if (CHARSET_DIMENSION (charset) == 1) \
ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply related [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-24 12:06 ` Eli Zaretskii
@ 2011-08-25 9:49 ` Kazuhiro Ito
0 siblings, 0 replies; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-25 9:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 9318
> > > (progn
> > > (goto-char (point-min))
> > > (insert #x80)
> > > (insert (make-string 16 ?A))
> > > (encode-coding-region 1 18 'ctext-unix))
> > >
> > > backtrace is below. Please let me know if you need more information.
> > >
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473
> > > 5473 if (STRING_MARKED_P (ptr))
> >
> > I think relocation of buffer may cause the problem.
> >
> > The comment for CODING_DECODE_CHAR macro in coding.c says as below.
> >
> > > /* This wrapper macro is used to preserve validity of pointers into
> > > buffer text across calls to decode_char, which could cause
> > > relocation of buffers if it loads a charset map, because loading a
> > > charset map allocates large structures. */
> >
> > encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses
> > ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may
> > load a charset map.
>
> But which pointer(s) in encode_coding_iso_2022 can be altered by
> relocation?
encode_coding() sets coding->destination with coding_set_destination()
before calling encode_coding_iso_2022(). I think at least correct
value of coding->destination can change in encode_coding_iso_2022() by
loading charset maps.
> Do you actually see any of the pointers used by this
> function modified by relocation of some buffer?
No, beacuse I don't know how to see.
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-24 17:59 ` Andreas Schwab
@ 2011-08-25 9:54 ` Kazuhiro Ito
2011-08-26 11:41 ` Kazuhiro Ito
0 siblings, 1 reply; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-25 9:54 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Chong Yidong, 9318
> > I think relocation of buffer may cause the problem.
>
> Does that help?
>
> diff --git a/src/coding.c b/src/coding.c
> index 65c8a76..f34a023 100644
> --- a/src/coding.c
> +++ b/src/coding.c
> @@ -915,8 +915,8 @@ record_conversion_result (struct coding_system *coding,
> }
> }
>
> -/* This wrapper macro is used to preserve validity of pointers into
> - buffer text across calls to decode_char, which could cause
> +/* These wrapper macros are used to preserve validity of pointers into
> + buffer text across calls to decode_char/encode_char, which could cause
> relocation of buffers if it loads a charset map, because loading a
> charset map allocates large structures. */
> #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \
> @@ -935,6 +935,21 @@ record_conversion_result (struct coding_system *coding,
> src_end += offset; \
> } \
> } while (0)
> +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \
> + do { \
> + charset_map_loaded = 0; \
> + code = ENCODE_CHAR (charset, c); \
> + if (charset_map_loaded) \
> + { \
> + const unsigned char *orig = coding->destination; \
> + EMACS_INT offset; \
> + \
> + coding_set_destination (coding); \
> + offset = coding->destination - orig; \
> + dst += offset; \
> + dst_end += offset; \
> + } \
> + } while (0)
>
>
> /* If there are at least BYTES length of room at dst, allocate memory
> @@ -2652,7 +2667,7 @@ encode_coding_emacs_mule (struct coding_system *coding)
> {
> charset = CHARSET_FROM_ID (preferred_charset_id);
> if (CHAR_CHARSET_P (c, charset))
> - code = ENCODE_CHAR (charset, c);
> + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code);
> else
> charset = char_charset (c, charset_list, &code);
> }
> @@ -4185,7 +4200,8 @@ decode_coding_iso_2022 (struct coding_system *coding)
> #define ENCODE_ISO_CHARACTER(charset, c) \
> do { \
> - int code = ENCODE_CHAR ((charset), (c)); \
> + int code; \
> + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); \
> \
> if (CHARSET_DIMENSION (charset) == 1) \
> ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \
Andreas' patch resolved the problem partially. It resolved the problem on
NetBSD with '-O0' CFLAGS, but failed on NetBSD with '-O2' and Windows.
I confirmed that adding the protection of coding->dst_object to
Andreas' patch resolved the problem on NetBSD with '-O2' but not on
Windows. I don't know whether it is incorrect way or is not enough.
--- src/coding.c 2011-07-01 11:03:55 +0000
+++ src/coding.c 2011-08-24 23:39:49 +0000
@@ -7397,10 +7436,15 @@
setup_ccl_program (&cclspec.ccl, CODING_CCL_ENCODER (coding));
}
do {
+ struct gcpro gcpro1;
+ GCPRO1 (coding->dst_object);
+
coding_set_source (coding);
consume_chars (coding, translation_table, max_lookup);
coding_set_destination (coding);
(*(coding->encoder)) (coding);
+
+ UNGCPRO;
} while (coding->consumed_char < coding->src_chars);
if (BUFFERP (coding->dst_object) && coding->produced_char > 0)
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-25 9:54 ` Kazuhiro Ito
@ 2011-08-26 11:41 ` Kazuhiro Ito
2011-08-28 0:04 ` Kazuhiro Ito
0 siblings, 1 reply; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-26 11:41 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Chong Yidong, 9318
> > > I think relocation of buffer may cause the problem.
> >
> > Does that help?
>
> Andreas' patch resolved the problem partially. It resolved the problem on
> NetBSD with '-O0' CFLAGS, but failed on NetBSD with '-O2' and Windows.
>
> I confirmed that adding the protection of coding->dst_object to
> Andreas' patch resolved the problem on NetBSD with '-O2' but not on
> Windows. I don't know whether it is incorrect way or is not enough.
I noticed char_charset() could cause relocation of buffers because it
could call encode_char(). I confirmed similar changes to callers of
char_charset() fixed my problem (without the protection of
coding->dst_object).
SUMMARY OF THE PROBLEM:
In encode_coding_XXX(), calling encode_char() could cause relocation
of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR
could also cause relocation because they could call encode_char().
After using of them, coding->destination, dst, dst_end should be
updated as needed.
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-26 11:41 ` Kazuhiro Ito
@ 2011-08-28 0:04 ` Kazuhiro Ito
2011-08-30 23:30 ` Kazuhiro Ito
0 siblings, 1 reply; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-28 0:04 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Chong Yidong, 9318
> SUMMARY OF THE PROBLEM:
> In encode_coding_XXX(), calling encode_char() could cause relocation
> of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR
> could also cause relocation because they could call encode_char().
> After using of them, coding->destination, dst, dst_end should be
> updated as needed.
I noticed CHAR_CHARSET_P macro slipped out of my check.
CHAR_CHARSET_P could also cause relocation of buffers.
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-28 0:04 ` Kazuhiro Ito
@ 2011-08-30 23:30 ` Kazuhiro Ito
2011-12-01 1:56 ` Kenichi Handa
0 siblings, 1 reply; 19+ messages in thread
From: Kazuhiro Ito @ 2011-08-30 23:30 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Chong Yidong, 9318
> > SUMMARY OF THE PROBLEM:
> > In encode_coding_XXX(), calling encode_char() could cause relocation
> > of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR
> > could also cause relocation because they could call encode_char().
> > After using of them, coding->destination, dst, dst_end should be
> > updated as needed.
>
> I noticed CHAR_CHARSET_P macro slipped out of my check.
> CHAR_CHARSET_P could also cause relocation of buffers.
Here is the patch for the code, which contains Andreas' patch. In my
environment, problems are fixed. I think it would be better that the
interface of encode_designation_at_bol() is changed.
=== modified file 'src/coding.c'
--- src/coding.c 2011-05-09 09:59:23 +0000
+++ src/coding.c 2011-08-28 07:33:54 +0000
@@ -1026,6 +1026,54 @@
} \
} while (0)
+#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \
+ do { \
+ charset_map_loaded = 0; \
+ code = ENCODE_CHAR (charset, c); \
+ if (charset_map_loaded) \
+ { \
+ const unsigned char *orig = coding->destination; \
+ EMACS_INT offset; \
+ \
+ coding_set_destination (coding); \
+ offset = coding->destination - orig; \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
+#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \
+ do { \
+ charset_map_loaded = 0; \
+ charset = char_charset (c, charset_list, code_return); \
+ if (charset_map_loaded) \
+ { \
+ const unsigned char *orig = coding->destination; \
+ EMACS_INT offset; \
+ \
+ coding_set_destination (coding); \
+ offset = coding->destination - orig; \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
+#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \
+ do { \
+ charset_map_loaded = 0; \
+ result = CHAR_CHARSET_P(c, charset); \
+ if (charset_map_loaded) \
+ { \
+ const unsigned char *orig = coding->destination; \
+ EMACS_INT offset; \
+ \
+ coding_set_destination (coding); \
+ offset = coding->destination - orig; \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
/* If there are at least BYTES length of room at dst, allocate memory
for coding->destination and update dst and dst_end. We don't have
@@ -2778,14 +2826,19 @@
if (preferred_charset_id >= 0)
{
+ int result;
+
charset = CHARSET_FROM_ID (preferred_charset_id);
- if (CHAR_CHARSET_P (c, charset))
+ CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+ if (result)
code = ENCODE_CHAR (charset, c);
else
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ &code, charset);
}
else
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (! charset)
{
c = coding->default_char;
@@ -2794,7 +2847,8 @@
EMIT_ONE_ASCII_BYTE (c);
continue;
}
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ &code, charset);
}
dimension = CHARSET_DIMENSION (charset);
emacs_mule_id = CHARSET_EMACS_MULE_ID (charset);
@@ -4317,8 +4371,9 @@
#define ENCODE_ISO_CHARACTER(charset, c) \
do { \
- int code = ENCODE_CHAR ((charset),(c)); \
- \
+ int code; \
+ CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \
+ \
if (CHARSET_DIMENSION (charset) == 1) \
ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \
else \
@@ -4476,7 +4531,17 @@
c = *charbuf++;
if (c == '\n')
break;
+
+ charset_map_loaded = 0;
charset = char_charset (c, charset_list, NULL);
+ if (charset_map_loaded)
+ {
+ const unsigned char *orig = coding->destination;
+
+ coding_set_destination (coding);
+ dst += coding->destination - orig;
+ }
+
id = CHARSET_ID (charset);
reg = CODING_ISO_REQUEST (coding, id);
if (reg >= 0 && r[reg] < 0)
@@ -4543,6 +4608,12 @@
/* We have to produce designation sequences if any now. */
dst = encode_designation_at_bol (coding, charbuf, charbuf_end, dst);
+ if (charset_map_loaded)
+ {
+ EMACS_INT offset = coding->destination + coding->dst_bytes - dst_end;
+ dst_end += offset;
+ dst_prev += offset;
+ }
bol_designation = 0;
/* We are sure that designation sequences are all ASCII bytes. */
produced_chars += dst - dst_prev;
@@ -4616,12 +4687,17 @@
if (preferred_charset_id >= 0)
{
+ int result;
+
charset = CHARSET_FROM_ID (preferred_charset_id);
- if (! CHAR_CHARSET_P (c, charset))
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+ if (! result)
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ NULL, charset);
}
else
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ NULL, charset);
if (!charset)
{
if (coding->mode & CODING_MODE_SAFE_ENCODING)
@@ -4632,7 +4708,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c,
+ charset_list, NULL, charset);
}
}
ENCODE_ISO_CHARACTER (charset, c);
@@ -5064,7 +5141,9 @@
else
{
unsigned code;
- struct charset *charset = char_charset (c, charset_list, &code);
+ struct charset *charset;
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (!charset)
{
@@ -5076,7 +5155,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c,
+ charset_list, &code, charset);
}
}
if (code == CHARSET_INVALID_CODE (charset))
@@ -5153,7 +5233,9 @@
else
{
unsigned code;
- struct charset *charset = char_charset (c, charset_list, &code);
+ struct charset *charset;
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (! charset)
{
@@ -5165,7 +5247,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c,
+ charset_list, &code, charset);
}
}
if (code == CHARSET_INVALID_CODE (charset))
@@ -5747,7 +5831,9 @@
}
else
{
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
+ &code, charset);
+
if (charset)
{
if (CHARSET_DIMENSION (charset) == 1)
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-30 23:30 ` Kazuhiro Ito
@ 2011-12-01 1:56 ` Kenichi Handa
2011-12-05 7:10 ` Kenichi Handa
0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2011-12-01 1:56 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: schwab, 9318
In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> Here is the patch for the code, which contains Andreas' patch. In my
> environment, problems are fixed. I think it would be better that the
> interface of encode_designation_at_bol() is changed.
Oops, sorry, I have vaguely thought that your patch below
has already been applied, but just noticed that it was not.
I'll commit a slightly modified version including the
improved interface for encode_designation_at_bol soon.
By the way, it would be good if we had a way to suppress
buffer text relocation temporarily.
---
Kenichi Handa
handa@m17n.org
> === modified file 'src/coding.c'
> --- src/coding.c 2011-05-09 09:59:23 +0000
> +++ src/coding.c 2011-08-28 07:33:54 +0000
> @@ -1026,6 +1026,54 @@
> } \
> } while (0)
> +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \
> + do { \
> + charset_map_loaded = 0; \
> + code = ENCODE_CHAR (charset, c); \
> + if (charset_map_loaded) \
> + { \
> + const unsigned char *orig = coding->destination; \
> + EMACS_INT offset; \
> + \
> + coding_set_destination (coding); \
> + offset = coding->destination - orig; \
> + dst += offset; \
> + dst_end += offset; \
> + } \
> + } while (0)
> +
> +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \
> + do { \
> + charset_map_loaded = 0; \
> + charset = char_charset (c, charset_list, code_return); \
> + if (charset_map_loaded) \
> + { \
> + const unsigned char *orig = coding->destination; \
> + EMACS_INT offset; \
> + \
> + coding_set_destination (coding); \
> + offset = coding->destination - orig; \
> + dst += offset; \
> + dst_end += offset; \
> + } \
> + } while (0)
> +
> +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \
> + do { \
> + charset_map_loaded = 0; \
> + result = CHAR_CHARSET_P(c, charset); \
> + if (charset_map_loaded) \
> + { \
> + const unsigned char *orig = coding->destination; \
> + EMACS_INT offset; \
> + \
> + coding_set_destination (coding); \
> + offset = coding->destination - orig; \
> + dst += offset; \
> + dst_end += offset; \
> + } \
> + } while (0)
> +
> /* If there are at least BYTES length of room at dst, allocate memory
> for coding->destination and update dst and dst_end. We don't have
> @@ -2778,14 +2826,19 @@
> if (preferred_charset_id >= 0)
> {
> + int result;
> +
> charset = CHARSET_FROM_ID (preferred_charset_id);
> - if (CHAR_CHARSET_P (c, charset))
> + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
> + if (result)
> code = ENCODE_CHAR (charset, c);
> else
> - charset = char_charset (c, charset_list, &code);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + &code, charset);
> }
> else
> - charset = char_charset (c, charset_list, &code);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + &code, charset);
> if (! charset)
> {
> c = coding->default_char;
> @@ -2794,7 +2847,8 @@
> EMIT_ONE_ASCII_BYTE (c);
> continue;
> }
> - charset = char_charset (c, charset_list, &code);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + &code, charset);
> }
> dimension = CHARSET_DIMENSION (charset);
> emacs_mule_id = CHARSET_EMACS_MULE_ID (charset);
> @@ -4317,8 +4371,9 @@
> #define ENCODE_ISO_CHARACTER(charset, c) \
> do { \
> - int code = ENCODE_CHAR ((charset),(c)); \
> - \
> + int code; \
> + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \
> + \
> if (CHARSET_DIMENSION (charset) == 1) \
> ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \
> else \
> @@ -4476,7 +4531,17 @@
> c = *charbuf++;
> if (c == '\n')
> break;
> +
> + charset_map_loaded = 0;
> charset = char_charset (c, charset_list, NULL);
> + if (charset_map_loaded)
> + {
> + const unsigned char *orig = coding->destination;
> +
> + coding_set_destination (coding);
> + dst += coding->destination - orig;
> + }
> +
> id = CHARSET_ID (charset);
> reg = CODING_ISO_REQUEST (coding, id);
> if (reg >= 0 && r[reg] < 0)
> @@ -4543,6 +4608,12 @@
> /* We have to produce designation sequences if any now. */
> dst = encode_designation_at_bol (coding, charbuf, charbuf_end, dst);
> + if (charset_map_loaded)
> + {
> + EMACS_INT offset = coding->destination + coding->dst_bytes - dst_end;
> + dst_end += offset;
> + dst_prev += offset;
> + }
> bol_designation = 0;
> /* We are sure that designation sequences are all ASCII bytes. */
> produced_chars += dst - dst_prev;
> @@ -4616,12 +4687,17 @@
> if (preferred_charset_id >= 0)
> {
> + int result;
> +
> charset = CHARSET_FROM_ID (preferred_charset_id);
> - if (! CHAR_CHARSET_P (c, charset))
> - charset = char_charset (c, charset_list, NULL);
> + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
> + if (! result)
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + NULL, charset);
> }
> else
> - charset = char_charset (c, charset_list, NULL);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + NULL, charset);
> if (!charset)
> {
> if (coding->mode & CODING_MODE_SAFE_ENCODING)
> @@ -4632,7 +4708,8 @@
> else
> {
> c = coding->default_char;
> - charset = char_charset (c, charset_list, NULL);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c,
> + charset_list, NULL, charset);
> }
> }
> ENCODE_ISO_CHARACTER (charset, c);
> @@ -5064,7 +5141,9 @@
> else
> {
> unsigned code;
> - struct charset *charset = char_charset (c, charset_list, &code);
> + struct charset *charset;
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + &code, charset);
> if (!charset)
> {
> @@ -5076,7 +5155,8 @@
> else
> {
> c = coding->default_char;
> - charset = char_charset (c, charset_list, &code);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c,
> + charset_list, &code, charset);
> }
> }
> if (code == CHARSET_INVALID_CODE (charset))
> @@ -5153,7 +5233,9 @@
> else
> {
> unsigned code;
> - struct charset *charset = char_charset (c, charset_list, &code);
> + struct charset *charset;
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + &code, charset);
> if (! charset)
> {
> @@ -5165,7 +5247,8 @@
> else
> {
> c = coding->default_char;
> - charset = char_charset (c, charset_list, &code);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c,
> + charset_list, &code, charset);
> }
> }
> if (code == CHARSET_INVALID_CODE (charset))
> @@ -5747,7 +5831,9 @@
> }
> else
> {
> - charset = char_charset (c, charset_list, &code);
> + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list,
> + &code, charset);
> +
> if (charset)
> {
> if (CHARSET_DIMENSION (charset) == 1)
> --
> Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-12-01 1:56 ` Kenichi Handa
@ 2011-12-05 7:10 ` Kenichi Handa
2011-12-05 11:31 ` Kazuhiro Ito
0 siblings, 1 reply; 19+ messages in thread
From: Kenichi Handa @ 2011-12-05 7:10 UTC (permalink / raw)
To: 9318; +Cc: kzhr, schwab
In article <tl7zkfdnjgj.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:
> In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> > Here is the patch for the code, which contains Andreas' patch. In my
> > environment, problems are fixed. I think it would be better that the
> > interface of encode_designation_at_bol() is changed.
> Oops, sorry, I have vaguely thought that your patch below
> has already been applied, but just noticed that it was not.
> I'll commit a slightly modified version including the
> improved interface for encode_designation_at_bol soon.
I've just installed the following changes. As I don't have
cygwin environment now, could you please check if this
change surely fix the problem?
---
Kenichi Handa
handa@m17n.org
2011-12-05 Kenichi Handa <handa@m17n.org>
* coding.c (encode_designation_at_bol): New args charbuf_end and
dst. Return the number of produced bytes. Callers changed.
(coding_set_source): Return how many bytes coding->source was
relocated.
(coding_set_destination): Return how many bytes
coding->destination was relocated.
(CODING_DECODE_CHAR, CODING_ENCODE_CHAR, CODING_CHAR_CHARSET)
(CODING_CHAR_CHARSET_P): Adjusted for the avove changes.
2011-12-05 Kazuhiro Ito <kzhr@d1.dion.ne.jp> (tiny change)
* coding.c (CODING_CHAR_CHARSET_P): New macro.
(encode_coding_emacs_mule, encode_coding_iso_2022): Use the above
macro (Bug#9318).
2011-12-05 Andreas Schwab <schwab@linux-m68k.org>
The following changes are to fix Bug#9318.
* coding.c (CODING_ENCODE_CHAR, CODING_CHAR_CHARSET): New macros.
(encode_coding_emacs_mule, ENCODE_ISO_CHARACTER)
(encode_coding_iso_2022, encode_coding_sjis)
(encode_coding_big5, encode_coding_charset): Use the above macros.
=== modified file 'src/coding.c'
--- src/coding.c 2011-11-07 01:57:07 +0000
+++ src/coding.c 2011-12-05 06:14:46 +0000
@@ -847,16 +847,16 @@
static void decode_coding_raw_text (struct coding_system *);
static int encode_coding_raw_text (struct coding_system *);
-static void coding_set_source (struct coding_system *);
-static void coding_set_destination (struct coding_system *);
+static EMACS_INT coding_set_source (struct coding_system *);
+static EMACS_INT coding_set_destination (struct coding_system *);
static void coding_alloc_by_realloc (struct coding_system *, EMACS_INT);
static void coding_alloc_by_making_gap (struct coding_system *,
EMACS_INT, EMACS_INT);
static unsigned char *alloc_destination (struct coding_system *,
EMACS_INT, unsigned char *);
static void setup_iso_safe_charsets (Lisp_Object);
-static unsigned char *encode_designation_at_bol (struct coding_system *,
- int *, unsigned char *);
+static int encode_designation_at_bol (struct coding_system *,
+ int *, int *, unsigned char *);
static int detect_eol (const unsigned char *,
EMACS_INT, enum coding_category);
static Lisp_Object adjust_coding_eol_type (struct coding_system *, int);
@@ -915,27 +915,68 @@
}
}
-/* This wrapper macro is used to preserve validity of pointers into
- buffer text across calls to decode_char, which could cause
- relocation of buffers if it loads a charset map, because loading a
- charset map allocates large structures. */
+/* These wrapper macros are used to preserve validity of pointers into
+ buffer text across calls to decode_char, encode_char, etc, which
+ could cause relocation of buffers if it loads a charset map,
+ because loading a charset map allocates large structures. */
+
#define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \
do { \
+ EMACS_INT offset; \
+ \
charset_map_loaded = 0; \
c = DECODE_CHAR (charset, code); \
- if (charset_map_loaded) \
+ if (charset_map_loaded \
+ && (offset = coding_set_source (coding))) \
{ \
- const unsigned char *orig = coding->source; \
- EMACS_INT offset; \
- \
- coding_set_source (coding); \
- offset = coding->source - orig; \
src += offset; \
src_base += offset; \
src_end += offset; \
} \
} while (0)
+#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \
+ do { \
+ EMACS_INT offset; \
+ \
+ charset_map_loaded = 0; \
+ code = ENCODE_CHAR (charset, c); \
+ if (charset_map_loaded \
+ && (offset = coding_set_destination (coding))) \
+ { \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
+#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \
+ do { \
+ EMACS_INT offset; \
+ \
+ charset_map_loaded = 0; \
+ charset = char_charset (c, charset_list, code_return); \
+ if (charset_map_loaded \
+ && (offset = coding_set_destination (coding))) \
+ { \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
+#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \
+ do { \
+ EMACS_INT offset; \
+ \
+ charset_map_loaded = 0; \
+ result = CHAR_CHARSET_P (c, charset); \
+ if (charset_map_loaded \
+ && (offset = coding_set_destination (coding))) \
+ { \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
/* If there are at least BYTES length of room at dst, allocate memory
for coding->destination and update dst and dst_end. We don't have
@@ -1015,9 +1056,14 @@
| ((p)[-1] & 0x3F))))
-static void
+/* Update coding->source from coding->src_object, and return how many
+ bytes coding->source was changed. */
+
+static EMACS_INT
coding_set_source (struct coding_system *coding)
{
+ const unsigned char *orig = coding->source;
+
if (BUFFERP (coding->src_object))
{
struct buffer *buf = XBUFFER (coding->src_object);
@@ -1036,11 +1082,18 @@
/* Otherwise, the source is C string and is never relocated
automatically. Thus we don't have to update anything. */
}
+ return coding->source - orig;
}
-static void
+
+/* Update coding->destination from coding->dst_object, and return how
+ many bytes coding->destination was changed. */
+
+static EMACS_INT
coding_set_destination (struct coding_system *coding)
{
+ const unsigned char *orig = coding->destination;
+
if (BUFFERP (coding->dst_object))
{
if (BUFFERP (coding->src_object) && coding->src_pos < 0)
@@ -1065,6 +1118,7 @@
/* Otherwise, the destination is C string and is never relocated
automatically. Thus we don't have to update anything. */
}
+ return coding->destination - orig;
}
@@ -2650,14 +2704,19 @@
if (preferred_charset_id >= 0)
{
+ int result;
+
charset = CHARSET_FROM_ID (preferred_charset_id);
- if (CHAR_CHARSET_P (c, charset))
+ CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+ if (result)
code = ENCODE_CHAR (charset, c);
else
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
}
else
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (! charset)
{
c = coding->default_char;
@@ -2666,7 +2725,8 @@
EMIT_ONE_ASCII_BYTE (c);
continue;
}
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
}
dimension = CHARSET_DIMENSION (charset);
emacs_mule_id = CHARSET_EMACS_MULE_ID (charset);
@@ -4185,7 +4245,8 @@
#define ENCODE_ISO_CHARACTER(charset, c) \
do { \
- int code = ENCODE_CHAR ((charset), (c)); \
+ int code; \
+ CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \
\
if (CHARSET_DIMENSION (charset) == 1) \
ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \
@@ -4283,15 +4344,19 @@
/* Produce designation sequences of charsets in the line started from
- SRC to a place pointed by DST, and return updated DST.
+ CHARBUF to a place pointed by DST, and return the number of
+ produced bytes. DST should not directly point a buffer text area
+ which may be relocated by char_charset call.
If the current block ends before any end-of-line, we may fail to
find all the necessary designations. */
-static unsigned char *
-encode_designation_at_bol (struct coding_system *coding, int *charbuf,
+static int
+encode_designation_at_bol (struct coding_system *coding,
+ int *charbuf, int *charbuf_end,
unsigned char *dst)
{
+ unsigned char *orig;
struct charset *charset;
/* Table of charsets to be designated to each graphic register. */
int r[4];
@@ -4309,7 +4374,7 @@
for (reg = 0; reg < 4; reg++)
r[reg] = -1;
- while (found < 4)
+ while (charbuf < charbuf_end && found < 4)
{
int id;
@@ -4334,7 +4399,7 @@
ENCODE_DESIGNATION (CHARSET_FROM_ID (r[reg]), reg, coding);
}
- return dst;
+ return dst - orig;
}
/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
@@ -4378,13 +4443,26 @@
if (bol_designation)
{
- unsigned char *dst_prev = dst;
-
/* We have to produce designation sequences if any now. */
- dst = encode_designation_at_bol (coding, charbuf, dst);
- bol_designation = 0;
+ unsigned char desig_buf[16];
+ int nbytes;
+ EMACS_INT offset;
+
+ charset_map_loaded = 0;
+ nbytes = encode_designation_at_bol (coding, charbuf, charbuf_end,
+ desig_buf);
+ if (charset_map_loaded
+ && (offset = coding_set_destination (coding)))
+ {
+ dst += offset;
+ dst_end += offset;
+ }
+ memcpy (dst, desig_buf, nbytes);
+ dst += nbytes;
/* We are sure that designation sequences are all ASCII bytes. */
- produced_chars += dst - dst_prev;
+ produced_chars += nbytes;
+ bol_designation = 0;
+ ASSURE_DESTINATION (safe_room);
}
c = *charbuf++;
@@ -4455,12 +4533,17 @@
if (preferred_charset_id >= 0)
{
+ int result;
+
charset = CHARSET_FROM_ID (preferred_charset_id);
- if (! CHAR_CHARSET_P (c, charset))
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+ if (! result)
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ NULL, charset);
}
else
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ NULL, charset);
if (!charset)
{
if (coding->mode & CODING_MODE_SAFE_ENCODING)
@@ -4471,7 +4554,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+ charset_list, NULL, charset);
}
}
ENCODE_ISO_CHARACTER (charset, c);
@@ -4897,7 +4981,9 @@
else
{
unsigned code;
- struct charset *charset = char_charset (c, charset_list, &code);
+ struct charset *charset;
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (!charset)
{
@@ -4909,7 +4995,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+ charset_list, &code, charset);
}
}
if (code == CHARSET_INVALID_CODE (charset))
@@ -4984,7 +5071,9 @@
else
{
unsigned code;
- struct charset *charset = char_charset (c, charset_list, &code);
+ struct charset *charset;
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (! charset)
{
@@ -4996,7 +5085,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+ charset_list, &code, charset);
}
}
if (code == CHARSET_INVALID_CODE (charset))
@@ -5572,7 +5662,9 @@
}
else
{
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
+
if (charset)
{
if (CHARSET_DIMENSION (charset) == 1)
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito
2011-08-18 9:48 ` Andreas Schwab
2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito
@ 2011-12-05 9:11 ` Paul Eggert
2011-12-06 0:30 ` Kenichi Handa
2 siblings, 1 reply; 19+ messages in thread
From: Paul Eggert @ 2011-12-05 9:11 UTC (permalink / raw)
To: Kenichi Handa; +Cc: 9318
That patch (bzr 106613) causes Emacs to use an uninitialized variable;
I found this via static checking with GCC. I installed the following
further patch, which I think is right and anyway does not introduce a bug --
can you please check it? Thanks.
* coding.c (encode_designation_at_bol): Don't use uninitialized
local variable (Bug#9318).
=== modified file 'src/coding.c'
--- src/coding.c 2011-12-05 07:03:31 +0000
+++ src/coding.c 2011-12-05 09:00:44 +0000
@@ -4356,7 +4356,7 @@
int *charbuf, int *charbuf_end,
unsigned char *dst)
{
- unsigned char *orig;
+ unsigned char *orig = dst;
struct charset *charset;
/* Table of charsets to be designated to each graphic register. */
int r[4];
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-12-05 7:10 ` Kenichi Handa
@ 2011-12-05 11:31 ` Kazuhiro Ito
0 siblings, 0 replies; 19+ messages in thread
From: Kazuhiro Ito @ 2011-12-05 11:31 UTC (permalink / raw)
To: Kenichi Handa; +Cc: schwab, 9318
> In article <tl7zkfdnjgj.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:
>
> > In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> > > Here is the patch for the code, which contains Andreas' patch. In my
> > > environment, problems are fixed. I think it would be better that the
> > > interface of encode_designation_at_bol() is changed.
>
> > Oops, sorry, I have vaguely thought that your patch below
> > has already been applied, but just noticed that it was not.
> > I'll commit a slightly modified version including the
> > improved interface for encode_designation_at_bol soon.
>
> I've just installed the following changes. As I don't have
> cygwin environment now, could you please check if this
> change surely fix the problem?
As far as I confirmed, the problems were fixed (except the point Paul
pointed out). Thank you.
Additionally, if you have time, please confirm Bug#8619 and Bug#9389.
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 19+ messages in thread
* bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
2011-12-05 9:11 ` Paul Eggert
@ 2011-12-06 0:30 ` Kenichi Handa
0 siblings, 0 replies; 19+ messages in thread
From: Kenichi Handa @ 2011-12-06 0:30 UTC (permalink / raw)
To: Paul Eggert; +Cc: 9318
In article <4EDC8AD9.3050004@cs.ucla.edu>, Paul Eggert <eggert@cs.ucla.edu> writes:
> That patch (bzr 106613) causes Emacs to use an uninitialized variable;
> I found this via static checking with GCC. I installed the following
> further patch, which I think is right and anyway does not introduce a bug --
> can you please check it? Thanks.
Oops, my fault. Yes, your patch is correct. Thank you.
---
Kenichi Handa
handa@m17n.org
> * coding.c (encode_designation_at_bol): Don't use uninitialized
> local variable (Bug#9318).
> === modified file 'src/coding.c'
> --- src/coding.c 2011-12-05 07:03:31 +0000
> +++ src/coding.c 2011-12-05 09:00:44 +0000
> @@ -4356,7 +4356,7 @@
> int *charbuf, int *charbuf_end,
> unsigned char *dst)
> {
> - unsigned char *orig;
> + unsigned char *orig = dst;
> struct charset *charset;
> /* Table of charsets to be designated to each graphic register. */
> int r[4];
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2011-12-06 0:30 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito
2011-08-18 9:48 ` Andreas Schwab
2011-08-18 21:33 ` Kazuhiro Ito
2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito
2011-08-20 21:26 ` Chong Yidong
2011-08-21 0:17 ` Kazuhiro Ito
2011-08-24 9:37 ` Kazuhiro Ito
2011-08-24 12:06 ` Eli Zaretskii
2011-08-25 9:49 ` Kazuhiro Ito
2011-08-24 17:59 ` Andreas Schwab
2011-08-25 9:54 ` Kazuhiro Ito
2011-08-26 11:41 ` Kazuhiro Ito
2011-08-28 0:04 ` Kazuhiro Ito
2011-08-30 23:30 ` Kazuhiro Ito
2011-12-01 1:56 ` Kenichi Handa
2011-12-05 7:10 ` Kenichi Handa
2011-12-05 11:31 ` Kazuhiro Ito
2011-12-05 9:11 ` Paul Eggert
2011-12-06 0:30 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).