* Process output truncation when using UTF-8 @ 2003-05-28 8:12 Milan Zamazal 2003-05-30 8:16 ` Kenichi Handa 0 siblings, 1 reply; 9+ messages in thread From: Milan Zamazal @ 2003-05-28 8:12 UTC (permalink / raw) [-- Attachment #1: Type: text/plain, Size: 1052 bytes --] In GNU Emacs 21.3.2 (i386-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2003-04-24 on raven, modified by Debian configured using `configure i386-linux --prefix=/usr --sharedstatedir=/var/lib --libexecdir=/usr/lib --localstatedir=/var/lib --infodir=/usr/share/info --mandir=/usr/share/man --with-pop=yes --with-x=yes --with-x-toolkit=athena --without-gif' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: cs_CZ.ISO8859-2 value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: C locale-coding-system: iso-8859-2 default-enable-multibyte-characters: t Please describe exactly what actions triggered the bug and the precise symptoms of the bug: When I start a process, set its input and output encoding to utf-8 and send a long string to it through process-send-string, usually some final part of the sent string is missing on the recipient's side. You can try to reproduce the problem by evaluating the following Elisp file: [-- Attachment #2: utf-8-bug.el --] [-- Type: application/emacs-lisp, Size: 322 bytes --] [-- Attachment #3: Type: text/plain, Size: 418 bytes --] It should send Czech Emacs tutorial to a process, that stores it into the /tmp/foo file. On my system, some end part of the tutorial is missing in /tmp/foo -- about the last 900 characters are missing. It doesn't happen when the process encoding is set to iso-8859-2 instead. I can reproduce the problem on my Debian GNU/Linux i386 system with both Emacs 21.3 as packaged by Debian and current Emacs CVS snapshot. [-- Attachment #4: Type: text/plain, Size: 148 bytes --] _______________________________________________ Bug-gnu-emacs mailing list Bug-gnu-emacs@gnu.org http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-05-28 8:12 Process output truncation when using UTF-8 Milan Zamazal @ 2003-05-30 8:16 ` Kenichi Handa 2003-06-03 14:59 ` Milan Zamazal 0 siblings, 1 reply; 9+ messages in thread From: Kenichi Handa @ 2003-05-30 8:16 UTC (permalink / raw) Cc: bug-gnu-emacs In article <874r3f1ocb.fsf@zamazal.org>, Milan Zamazal <pdm@zamazal.org> writes: > When I start a process, set its input and output encoding > to utf-8 and send a long string to it through > process-send-string, usually some final part of the sent > string is missing on the recipient's side. Thank you for the report. I've just installed the attached change in RC and HEAD. It should be applicable also to Emacs 21.3. --- Ken'ichi HANDA handa@m17n.org 2003-05-30 Kenichi Handa <handa@m17n.org> * coding.c (ccl_coding_driver): Set ccl->eight_bit_control properly before calling ccl_driver. * ccl.h (struct ccl_program) <eight_bit_control: Comment fixed. * ccl.c (CCL_WRITE_CHAR): Increment extra_bytes only when it is nonzero. (ccl_driver): Initialize extra_bytes to ccl->eight_bit_control. (setup_ccl_program): Initialize ccl->eight_bit_control to zero. Index: coding.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/coding.c,v retrieving revision 1.222.4.16 retrieving revision 1.222.4.17 diff -u -c -r1.222.4.16 -r1.222.4.17 cvs server: conflicting specifications of output style *** coding.c 7 Mar 2003 04:37:21 -0000 1.222.4.16 --- coding.c 30 May 2003 08:12:19 -0000 1.222.4.17 *************** *** 4477,4483 **** --- 4477,4486 ---- if (ccl->eol_type ==CODING_EOL_UNDECIDED) ccl->eol_type = CODING_EOL_LF; ccl->cr_consumed = coding->spec.ccl.cr_carryover; + ccl->eight_bit_control = coding->dst_multibyte; } + else + ccl->eight_bit_control = 1; ccl->multibyte = coding->src_multibyte; if (coding->spec.ccl.eight_bit_carryover[0] != 0) { Index: ccl.h =================================================================== RCS file: /cvsroot/emacs/emacs/src/ccl.h,v retrieving revision 1.16 retrieving revision 1.16.12.1 diff -u -c -r1.16 -r1.16.12.1 cvs server: conflicting specifications of output style *** ccl.h 27 Feb 2001 03:29:08 -0000 1.16 --- ccl.h 30 May 2003 08:12:38 -0000 1.16.12.1 *************** *** 65,72 **** system. */ int suppress_error; /* If nonzero, don't insert error message in the output. */ ! int eight_bit_control; /* Set to nonzero if CCL_WRITE_CHAR ! writes eight-bit-control char. */ }; /* This data type is used for the spec field of the structure --- 65,75 ---- system. */ int suppress_error; /* If nonzero, don't insert error message in the output. */ ! int eight_bit_control; /* If nonzero, ccl_driver counts all ! eight-bit-control bytes written by ! CCL_WRITE_CHAR. After execution, ! if no such byte is written, set ! this value to zero. */ }; /* This data type is used for the spec field of the structure Index: ccl.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/ccl.c,v retrieving revision 1.71 retrieving revision 1.71.12.1 diff -u -c -r1.71 -r1.71.12.1 cvs server: conflicting specifications of output style *** ccl.c 17 May 2001 09:09:14 -0000 1.71 --- ccl.c 30 May 2003 08:13:01 -0000 1.71.12.1 *************** *** 717,723 **** if (bytes == 1) \ { \ *dst++ = (ch); \ ! if ((ch) >= 0x80 && (ch) < 0xA0) \ /* We may have to convert this eight-bit char to \ multibyte form later. */ \ extra_bytes++; \ --- 717,723 ---- if (bytes == 1) \ { \ *dst++ = (ch); \ ! if (extra_bytes && (ch) >= 0x80 && (ch) < 0xA0) \ /* We may have to convert this eight-bit char to \ multibyte form later. */ \ extra_bytes++; \ *************** *** 731,736 **** --- 731,737 ---- CCL_SUSPEND (CCL_STAT_SUSPEND_BY_DST); \ } while (0) + /* Encode one character CH to multibyte form and write to the current output buffer. The output bytes always forms a valid multibyte sequence. */ *************** *** 874,880 **** each of them will be converted to multibyte form of 2-byte sequence. For that conversion, we remember how many more bytes we must keep in DESTINATION in this variable. */ ! int extra_bytes = 0; if (ic >= ccl->eof_ic) ic = CCL_HEADER_MAIN; --- 875,881 ---- each of them will be converted to multibyte form of 2-byte sequence. For that conversion, we remember how many more bytes we must keep in DESTINATION in this variable. */ ! int extra_bytes = ccl->eight_bit_control; if (ic >= ccl->eof_ic) ic = CCL_HEADER_MAIN; *************** *** 1849,1855 **** ccl->ic = ic; ccl->stack_idx = stack_idx; ccl->prog = ccl_prog; ! ccl->eight_bit_control = (extra_bytes > 0); if (consumed) *consumed = src - source; return (dst ? dst - destination : 0); --- 1850,1856 ---- ccl->ic = ic; ccl->stack_idx = stack_idx; ccl->prog = ccl_prog; ! ccl->eight_bit_control = (extra_bytes > 1); if (consumed) *consumed = src - source; return (dst ? dst - destination : 0); *************** *** 2004,2009 **** --- 2005,2011 ---- ccl->stack_idx = 0; ccl->eol_type = CODING_EOL_LF; ccl->suppress_error = 0; + ccl->eight_bit_control = 0; return 0; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-05-30 8:16 ` Kenichi Handa @ 2003-06-03 14:59 ` Milan Zamazal 2003-06-04 12:48 ` Kenichi Handa 0 siblings, 1 reply; 9+ messages in thread From: Milan Zamazal @ 2003-06-03 14:59 UTC (permalink / raw) Cc: bug-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 640 bytes --] >>>>> "KH" == Kenichi Handa <handa@m17n.org> writes: KH> In article <874r3f1ocb.fsf@zamazal.org>, Milan Zamazal KH> <pdm@zamazal.org> writes: >> When I start a process, set its input and output encoding to >> utf-8 and send a long string to it through process-send-string, >> usually some final part of the sent string is missing on the >> recipient's side. KH> Thank you for the report. I've just installed the attached KH> change in RC and HEAD. It should be applicable also to Emacs KH> 21.3. Thank you. The case I've reported works well in Emacs 21.3, but this one still truncates the output: [-- Attachment #2: utf-8-bug.el --] [-- Type: application/emacs-lisp, Size: 330 bytes --] [-- Attachment #3: Type: text/plain, Size: 276 bytes --] The difference is that the process encoding is set to utf-8-dos now instead of utf-8. Regards, Milan Zamazal -- Here is my advice, don't try to program the bleeding edge for the general populace unless you really, really, really like migraines. Neal H. Walfield [-- Attachment #4: Type: text/plain, Size: 148 bytes --] _______________________________________________ Bug-gnu-emacs mailing list Bug-gnu-emacs@gnu.org http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-06-03 14:59 ` Milan Zamazal @ 2003-06-04 12:48 ` Kenichi Handa 2003-06-13 10:38 ` Milan Zamazal 0 siblings, 1 reply; 9+ messages in thread From: Kenichi Handa @ 2003-06-04 12:48 UTC (permalink / raw) Cc: bug-gnu-emacs In article <87el2bi4uy.fsf@zamazal.org>, Milan Zamazal <pdm@zamazal.org> writes: > Thank you. The case I've reported works well in Emacs 21.3, but this > one still truncates the output: [...] > The difference is that the process encoding is set to utf-8-dos now > instead of utf-8. Thank you for testing it. Please try this additional patch. --- Ken'ichi HANDA handa@m17n.org Index: coding.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/coding.c,v retrieving revision 1.285 retrieving revision 1.286 diff -u -c -r1.285 -r1.286 cvs server: conflicting specifications of output style *** coding.c 2 Jun 2003 18:49:29 -0000 1.285 --- coding.c 4 Jun 2003 12:43:09 -0000 1.286 *************** *** 4505,4511 **** int magnification; if (coding->type == coding_type_ccl) ! magnification = coding->spec.ccl.encoder.buf_magnification; else if (CODING_REQUIRE_ENCODING (coding)) magnification = 3; else --- 4505,4515 ---- int magnification; if (coding->type == coding_type_ccl) ! { ! magnification = coding->spec.ccl.encoder.buf_magnification; ! if (coding->eol_type == CODING_EOL_CRLF) ! magnification *= 2; ! } else if (CODING_REQUIRE_ENCODING (coding)) magnification = 3; else ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-06-04 12:48 ` Kenichi Handa @ 2003-06-13 10:38 ` Milan Zamazal 2003-06-19 4:24 ` Kenichi Handa 0 siblings, 1 reply; 9+ messages in thread From: Milan Zamazal @ 2003-06-13 10:38 UTC (permalink / raw) Cc: bug-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 618 bytes --] >>>>> "KH" == Kenichi Handa <handa@m17n.org> writes: KH> In article <87el2bi4uy.fsf@zamazal.org>, Milan Zamazal KH> <pdm@zamazal.org> writes: >> Thank you. The case I've reported works well in Emacs 21.3, but >> this one still truncates the output: KH> [...] >> The difference is that the process encoding is set to utf-8-dos >> now instead of utf-8. KH> Thank you for testing it. Please try this additional patch. Thank you. The file is no longer truncated, but there's an extra blank line inserted after each line of the text. The unchanged test case reproduces the problem: [-- Attachment #2: utf-8-bug.el --] [-- Type: application/emacs-lisp, Size: 330 bytes --] [-- Attachment #3: Type: text/plain, Size: 594 bytes --] It seems LF-LF is inserted at the ends of lines, instead of CR-LF. If the process encoding is set to `utf-8' instead of `utf-8-dos', everything is fine, there are no extra blank lines. The problem was present already before applying the additional patch. I think it wasn't present in untouched Emacs 21.3, but I'm not sure. Regards, Milan Zamazal -- The seeker after truth should be humbler than the dust. The world crushes the dust under its feet, but the seeker after truth should so humble himself that even the dust could crush him. -- M. K. Gandhi [-- Attachment #4: Type: text/plain, Size: 148 bytes --] _______________________________________________ Bug-gnu-emacs mailing list Bug-gnu-emacs@gnu.org http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-06-13 10:38 ` Milan Zamazal @ 2003-06-19 4:24 ` Kenichi Handa 2003-06-20 3:15 ` Richard Stallman 0 siblings, 1 reply; 9+ messages in thread From: Kenichi Handa @ 2003-06-19 4:24 UTC (permalink / raw) Cc: bug-gnu-emacs >>>>>> "KH" == Kenichi Handa <handa@m17n.org> writes: KH> In article <87el2bi4uy.fsf@zamazal.org>, Milan Zamazal KH> <pdm@zamazal.org> writes: >>> Thank you. The case I've reported works well in Emacs 21.3, but >>> this one still truncates the output: KH> [...] >>> The difference is that the process encoding is set to utf-8-dos >>> now instead of utf-8. KH> Thank you for testing it. Please try this additional patch. > Thank you. The file is no longer truncated, but there's an extra blank > line inserted after each line of the text. The unchanged test case > reproduces the problem: > [2 utf-8-bug.el <application/emacs-lisp (7bit)>] > [3 <text/plain (7bit)>] > It seems LF-LF is inserted at the ends of lines, instead of CR-LF. I can reproduce this bug, but it's not an error of encoding routine. I can reproduce this bug also with this test case. (set-language-environment "English") (let ((process (start-process "foo" nil "tee" "/tmp/foo"))) (set-process-coding-system process 'raw-text-dos 'raw-text-dos) (help-with-tutorial) (set-buffer "TUTORIAL") (process-send-string process (buffer-substring-no-properties (point-min) (point-max))) (process-send-eof process)) The encoding routine correctly produces CR LF, but somehow CR is converted to LF (perhaps by pty because the following test works correctly). (set-language-environment "English") (let* ((process-connection-type nil) (process (start-process "foo" nil "tee" "/tmp/foo"))) (set-process-coding-system process 'raw-text-dos 'raw-text-dos) (help-with-tutorial) (set-buffer "TUTORIAL") (process-send-string process (buffer-substring-no-properties (point-min) (point-max))) (process-send-eof process)) --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-06-19 4:24 ` Kenichi Handa @ 2003-06-20 3:15 ` Richard Stallman 2003-06-20 4:20 ` Kenichi Handa 0 siblings, 1 reply; 9+ messages in thread From: Richard Stallman @ 2003-06-20 3:15 UTC (permalink / raw) Cc: bug-gnu-emacs The encoding routine correctly produces CR LF, but somehow CR is converted to LF (perhaps by pty because the following test works correctly). Is it the pty and tty mechanism that does the conversion, perhaps? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-06-20 3:15 ` Richard Stallman @ 2003-06-20 4:20 ` Kenichi Handa 2003-06-21 4:56 ` Richard Stallman 0 siblings, 1 reply; 9+ messages in thread From: Kenichi Handa @ 2003-06-20 4:20 UTC (permalink / raw) Cc: bug-gnu-emacs In article <E19TCNP-0002oB-6h@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > The encoding routine correctly produces CR LF, but somehow > CR is converted to LF (perhaps by pty because the following > test works correctly). > Is it the pty and tty mechanism that does the conversion, perhaps? Perhaps. I found this code in child_setup_tty (in sysdep.c). #if 0 /* This causes bugs in (for instance) telnet to certain sites. */ s.main.c_iflag &= ~ICRNL; /* Disable map of CR to NL on input */ #ifdef INLCR /* Just being cautious, since I can't check how widespread INLCR is--rms. */ s.main.c_iflag &= ~INLCR; /* Disable map of NL to CR on input */ #endif #endif I tried to change the first line above to "#if 1". The newly build emacs doesn't do CR->LF conversion, thus works well with the test case (utf-8-bug.el). --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process output truncation when using UTF-8 2003-06-20 4:20 ` Kenichi Handa @ 2003-06-21 4:56 ` Richard Stallman 0 siblings, 0 replies; 9+ messages in thread From: Richard Stallman @ 2003-06-21 4:56 UTC (permalink / raw) Cc: bug-gnu-emacs #if 0 /* This causes bugs in (for instance) telnet to certain sites. */ s.main.c_iflag &= ~ICRNL; /* Disable map of CR to NL on input */ #ifdef INLCR /* Just being cautious, since I can't check how widespread INLCR is--rms. */ s.main.c_iflag &= ~INLCR; /* Disable map of NL to CR on input */ #endif #endif I tried to change the first line above to "#if 1". The newly build emacs doesn't do CR->LF conversion, thus works well with the test case (utf-8-bug.el). I don't know if that problem with (for instance) telnet still exists. It might not--we could change this and see if anyone loses. Is there a reason to use a pty in this case? If you set process-connection-type to nil for this application, does that solve the problem? ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-06-21 4:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-05-28 8:12 Process output truncation when using UTF-8 Milan Zamazal 2003-05-30 8:16 ` Kenichi Handa 2003-06-03 14:59 ` Milan Zamazal 2003-06-04 12:48 ` Kenichi Handa 2003-06-13 10:38 ` Milan Zamazal 2003-06-19 4:24 ` Kenichi Handa 2003-06-20 3:15 ` Richard Stallman 2003-06-20 4:20 ` Kenichi Handa 2003-06-21 4:56 ` Richard Stallman
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).