unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* Process output truncation when using UTF-8
@ 2003-05-28  8:12 Milan Zamazal
  2003-05-30  8:16 ` Kenichi Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Milan Zamazal @ 2003-05-28  8:12 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

In GNU Emacs 21.3.2 (i386-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2003-04-24 on raven, modified by Debian
configured using `configure  i386-linux --prefix=/usr --sharedstatedir=/var/lib --libexecdir=/usr/lib --localstatedir=/var/lib --infodir=/usr/share/info --mandir=/usr/share/man --with-pop=yes --with-x=yes --with-x-toolkit=athena --without-gif'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: cs_CZ.ISO8859-2
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: C
  locale-coding-system: iso-8859-2
  default-enable-multibyte-characters: t

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

When I start a process, set its input and output encoding to utf-8 and
send a long string to it through process-send-string, usually some final
part of the sent string is missing on the recipient's side.

You can try to reproduce the problem by evaluating the following Elisp
file:


[-- Attachment #2: utf-8-bug.el --]
[-- Type: application/emacs-lisp, Size: 322 bytes --]

[-- Attachment #3: Type: text/plain, Size: 418 bytes --]


It should send Czech Emacs tutorial to a process, that stores it into
the /tmp/foo file.  On my system, some end part of the tutorial is
missing in /tmp/foo -- about the last 900 characters are missing.  It
doesn't happen when the process encoding is set to iso-8859-2 instead.

I can reproduce the problem on my Debian GNU/Linux i386 system with both
Emacs 21.3 as packaged by Debian and current Emacs CVS snapshot.

[-- Attachment #4: Type: text/plain, Size: 148 bytes --]

_______________________________________________
Bug-gnu-emacs mailing list
Bug-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-05-28  8:12 Process output truncation when using UTF-8 Milan Zamazal
@ 2003-05-30  8:16 ` Kenichi Handa
  2003-06-03 14:59   ` Milan Zamazal
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2003-05-30  8:16 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <874r3f1ocb.fsf@zamazal.org>, Milan Zamazal <pdm@zamazal.org> writes:
> When I start a process, set its input and output encoding
> to utf-8 and send a long string to it through
> process-send-string, usually some final part of the sent
> string is missing on the recipient's side.

Thank you for the report.  I've just installed the attached
change in RC and HEAD.   It should be applicable also to
Emacs 21.3.

---
Ken'ichi HANDA
handa@m17n.org

2003-05-30  Kenichi Handa  <handa@m17n.org>

	* coding.c (ccl_coding_driver): Set ccl->eight_bit_control
	properly before calling ccl_driver.

	* ccl.h (struct ccl_program) <eight_bit_control: Comment fixed.

	* ccl.c (CCL_WRITE_CHAR): Increment extra_bytes only when it is
	nonzero.
	(ccl_driver): Initialize extra_bytes to ccl->eight_bit_control.
	(setup_ccl_program): Initialize ccl->eight_bit_control to zero.

Index: coding.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/coding.c,v
retrieving revision 1.222.4.16
retrieving revision 1.222.4.17
diff -u -c -r1.222.4.16 -r1.222.4.17
cvs server: conflicting specifications of output style
*** coding.c	7 Mar 2003 04:37:21 -0000	1.222.4.16
--- coding.c	30 May 2003 08:12:19 -0000	1.222.4.17
***************
*** 4477,4483 ****
--- 4477,4486 ----
        if (ccl->eol_type ==CODING_EOL_UNDECIDED)
  	ccl->eol_type = CODING_EOL_LF;
        ccl->cr_consumed = coding->spec.ccl.cr_carryover;
+       ccl->eight_bit_control = coding->dst_multibyte;
      }
+   else
+     ccl->eight_bit_control = 1;
    ccl->multibyte = coding->src_multibyte;
    if (coding->spec.ccl.eight_bit_carryover[0] != 0)
      {
Index: ccl.h
===================================================================
RCS file: /cvsroot/emacs/emacs/src/ccl.h,v
retrieving revision 1.16
retrieving revision 1.16.12.1
diff -u -c -r1.16 -r1.16.12.1
cvs server: conflicting specifications of output style
*** ccl.h	27 Feb 2001 03:29:08 -0000	1.16
--- ccl.h	30 May 2003 08:12:38 -0000	1.16.12.1
***************
*** 65,72 ****
  				   system.  */
    int suppress_error;		/* If nonzero, don't insert error
  				   message in the output.  */
!   int eight_bit_control;	/* Set to nonzero if CCL_WRITE_CHAR
! 				   writes eight-bit-control char.  */
  };
  
  /* This data type is used for the spec field of the structure
--- 65,75 ----
  				   system.  */
    int suppress_error;		/* If nonzero, don't insert error
  				   message in the output.  */
!   int eight_bit_control;	/* If nonzero, ccl_driver counts all
! 				   eight-bit-control bytes written by
! 				   CCL_WRITE_CHAR.  After execution,
! 				   if no such byte is written, set
! 				   this value to zero.  */
  };
  
  /* This data type is used for the spec field of the structure
Index: ccl.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/ccl.c,v
retrieving revision 1.71
retrieving revision 1.71.12.1
diff -u -c -r1.71 -r1.71.12.1
cvs server: conflicting specifications of output style
*** ccl.c	17 May 2001 09:09:14 -0000	1.71
--- ccl.c	30 May 2003 08:13:01 -0000	1.71.12.1
***************
*** 717,723 ****
  	if (bytes == 1)							\
  	  {								\
  	    *dst++ = (ch);						\
! 	    if ((ch) >= 0x80 && (ch) < 0xA0)				\
  	      /* We may have to convert this eight-bit char to		\
  		 multibyte form later.  */				\
  	      extra_bytes++;						\
--- 717,723 ----
  	if (bytes == 1)							\
  	  {								\
  	    *dst++ = (ch);						\
! 	    if (extra_bytes && (ch) >= 0x80 && (ch) < 0xA0)		\
  	      /* We may have to convert this eight-bit char to		\
  		 multibyte form later.  */				\
  	      extra_bytes++;						\
***************
*** 731,736 ****
--- 731,737 ----
        CCL_SUSPEND (CCL_STAT_SUSPEND_BY_DST);				\
    } while (0)
  
+ 
  /* Encode one character CH to multibyte form and write to the current
     output buffer.  The output bytes always forms a valid multibyte
     sequence.  */
***************
*** 874,880 ****
       each of them will be converted to multibyte form of 2-byte
       sequence.  For that conversion, we remember how many more bytes
       we must keep in DESTINATION in this variable.  */
!   int extra_bytes = 0;
  
    if (ic >= ccl->eof_ic)
      ic = CCL_HEADER_MAIN;
--- 875,881 ----
       each of them will be converted to multibyte form of 2-byte
       sequence.  For that conversion, we remember how many more bytes
       we must keep in DESTINATION in this variable.  */
!   int extra_bytes = ccl->eight_bit_control;
  
    if (ic >= ccl->eof_ic)
      ic = CCL_HEADER_MAIN;
***************
*** 1849,1855 ****
    ccl->ic = ic;
    ccl->stack_idx = stack_idx;
    ccl->prog = ccl_prog;
!   ccl->eight_bit_control = (extra_bytes > 0);
    if (consumed)
      *consumed = src - source;
    return (dst ? dst - destination : 0);
--- 1850,1856 ----
    ccl->ic = ic;
    ccl->stack_idx = stack_idx;
    ccl->prog = ccl_prog;
!   ccl->eight_bit_control = (extra_bytes > 1);
    if (consumed)
      *consumed = src - source;
    return (dst ? dst - destination : 0);
***************
*** 2004,2009 ****
--- 2005,2011 ----
    ccl->stack_idx = 0;
    ccl->eol_type = CODING_EOL_LF;
    ccl->suppress_error = 0;
+   ccl->eight_bit_control = 0;
    return 0;
  }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-05-30  8:16 ` Kenichi Handa
@ 2003-06-03 14:59   ` Milan Zamazal
  2003-06-04 12:48     ` Kenichi Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Milan Zamazal @ 2003-06-03 14:59 UTC (permalink / raw)
  Cc: bug-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

>>>>> "KH" == Kenichi Handa <handa@m17n.org> writes:

    KH> In article <874r3f1ocb.fsf@zamazal.org>, Milan Zamazal
    KH> <pdm@zamazal.org> writes:
    >> When I start a process, set its input and output encoding to
    >> utf-8 and send a long string to it through process-send-string,
    >> usually some final part of the sent string is missing on the
    >> recipient's side.

    KH> Thank you for the report.  I've just installed the attached
    KH> change in RC and HEAD.  It should be applicable also to Emacs
    KH> 21.3.

Thank you.  The case I've reported works well in Emacs 21.3, but this
one still truncates the output:


[-- Attachment #2: utf-8-bug.el --]
[-- Type: application/emacs-lisp, Size: 330 bytes --]

[-- Attachment #3: Type: text/plain, Size: 276 bytes --]


The difference is that the process encoding is set to utf-8-dos now
instead of utf-8.

Regards,

Milan Zamazal

-- 
Here is my advice, don't try to program the bleeding edge for the
general populace unless you really, really, really like migraines.
						   Neal H. Walfield

[-- Attachment #4: Type: text/plain, Size: 148 bytes --]

_______________________________________________
Bug-gnu-emacs mailing list
Bug-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-06-03 14:59   ` Milan Zamazal
@ 2003-06-04 12:48     ` Kenichi Handa
  2003-06-13 10:38       ` Milan Zamazal
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2003-06-04 12:48 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <87el2bi4uy.fsf@zamazal.org>, Milan Zamazal <pdm@zamazal.org> writes:
> Thank you.  The case I've reported works well in Emacs 21.3, but this
> one still truncates the output:
[...]
> The difference is that the process encoding is set to utf-8-dos now
> instead of utf-8.

Thank you for testing it.  Please try this additional patch.

---
Ken'ichi HANDA
handa@m17n.org

Index: coding.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/coding.c,v
retrieving revision 1.285
retrieving revision 1.286
diff -u -c -r1.285 -r1.286
cvs server: conflicting specifications of output style
*** coding.c	2 Jun 2003 18:49:29 -0000	1.285
--- coding.c	4 Jun 2003 12:43:09 -0000	1.286
***************
*** 4505,4511 ****
    int magnification;
  
    if (coding->type == coding_type_ccl)
!     magnification = coding->spec.ccl.encoder.buf_magnification;
    else if (CODING_REQUIRE_ENCODING (coding))
      magnification = 3;
    else
--- 4505,4515 ----
    int magnification;
  
    if (coding->type == coding_type_ccl)
!     {
!       magnification = coding->spec.ccl.encoder.buf_magnification;
!       if (coding->eol_type == CODING_EOL_CRLF)
! 	magnification *= 2;
!     }
    else if (CODING_REQUIRE_ENCODING (coding))
      magnification = 3;
    else

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-06-04 12:48     ` Kenichi Handa
@ 2003-06-13 10:38       ` Milan Zamazal
  2003-06-19  4:24         ` Kenichi Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Milan Zamazal @ 2003-06-13 10:38 UTC (permalink / raw)
  Cc: bug-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

>>>>> "KH" == Kenichi Handa <handa@m17n.org> writes:

    KH> In article <87el2bi4uy.fsf@zamazal.org>, Milan Zamazal
    KH> <pdm@zamazal.org> writes:
    >> Thank you.  The case I've reported works well in Emacs 21.3, but
    >> this one still truncates the output:

    KH> [...]
    >> The difference is that the process encoding is set to utf-8-dos
    >> now instead of utf-8.

    KH> Thank you for testing it.  Please try this additional patch.

Thank you.  The file is no longer truncated, but there's an extra blank
line inserted after each line of the text.  The unchanged test case
reproduces the problem:


[-- Attachment #2: utf-8-bug.el --]
[-- Type: application/emacs-lisp, Size: 330 bytes --]

[-- Attachment #3: Type: text/plain, Size: 594 bytes --]


It seems LF-LF is inserted at the ends of lines, instead of CR-LF.

If the process encoding is set to `utf-8' instead of `utf-8-dos',
everything is fine, there are no extra blank lines.  The problem was
present already before applying the additional patch.  I think it wasn't
present in untouched Emacs 21.3, but I'm not sure.

Regards,

Milan Zamazal

-- 
The seeker after truth should be humbler than the dust.  The world crushes the
dust under its feet, but the seeker after truth should so humble himself that
even the dust could crush him.                                 -- M. K. Gandhi

[-- Attachment #4: Type: text/plain, Size: 148 bytes --]

_______________________________________________
Bug-gnu-emacs mailing list
Bug-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-06-13 10:38       ` Milan Zamazal
@ 2003-06-19  4:24         ` Kenichi Handa
  2003-06-20  3:15           ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2003-06-19  4:24 UTC (permalink / raw)
  Cc: bug-gnu-emacs

>>>>>>  "KH" == Kenichi Handa <handa@m17n.org> writes:

KH>  In article <87el2bi4uy.fsf@zamazal.org>, Milan Zamazal
KH>  <pdm@zamazal.org> writes:
>>>  Thank you.  The case I've reported works well in Emacs 21.3, but
>>>  this one still truncates the output:

KH>  [...]
>>>  The difference is that the process encoding is set to utf-8-dos
>>>  now instead of utf-8.

KH>  Thank you for testing it.  Please try this additional patch.

> Thank you.  The file is no longer truncated, but there's an extra blank
> line inserted after each line of the text.  The unchanged test case
> reproduces the problem:

> [2 utf-8-bug.el <application/emacs-lisp (7bit)>]

> [3  <text/plain (7bit)>]

> It seems LF-LF is inserted at the ends of lines, instead of CR-LF.

I can reproduce this bug, but it's not an error of encoding
routine.  I can reproduce this bug also with this test case.

(set-language-environment "English")
(let ((process (start-process "foo" nil "tee" "/tmp/foo")))
  (set-process-coding-system process 'raw-text-dos 'raw-text-dos)
  (help-with-tutorial)
  (set-buffer "TUTORIAL")
  (process-send-string process (buffer-substring-no-properties
				(point-min) (point-max)))
  (process-send-eof process))

The encoding routine correctly produces CR LF, but somehow
CR is converted to LF (perhaps by pty because the following
test works correctly).

(set-language-environment "English")
(let* ((process-connection-type nil)
       (process (start-process "foo" nil "tee" "/tmp/foo")))
  (set-process-coding-system process 'raw-text-dos 'raw-text-dos)
  (help-with-tutorial)
  (set-buffer "TUTORIAL")
  (process-send-string process (buffer-substring-no-properties
				(point-min) (point-max)))
  (process-send-eof process))

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-06-19  4:24         ` Kenichi Handa
@ 2003-06-20  3:15           ` Richard Stallman
  2003-06-20  4:20             ` Kenichi Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2003-06-20  3:15 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    The encoding routine correctly produces CR LF, but somehow
    CR is converted to LF (perhaps by pty because the following
    test works correctly).

Is it the pty and tty mechanism that does the conversion, perhaps?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-06-20  3:15           ` Richard Stallman
@ 2003-06-20  4:20             ` Kenichi Handa
  2003-06-21  4:56               ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2003-06-20  4:20 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <E19TCNP-0002oB-6h@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     The encoding routine correctly produces CR LF, but somehow
>     CR is converted to LF (perhaps by pty because the following
>     test works correctly).

> Is it the pty and tty mechanism that does the conversion, perhaps?

Perhaps.  I found this code in child_setup_tty (in
sysdep.c).

#if 0  /* This causes bugs in (for instance) telnet to certain sites.  */
  s.main.c_iflag &= ~ICRNL;	/* Disable map of CR to NL on input */
#ifdef INLCR  /* Just being cautious, since I can't check how
		 widespread INLCR is--rms.  */
  s.main.c_iflag &= ~INLCR;	/* Disable map of NL to CR on input */
#endif
#endif

I tried to change the first line above to "#if 1".  The
newly build emacs doesn't do CR->LF conversion, thus works
well with the test case (utf-8-bug.el).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process output truncation when using UTF-8
  2003-06-20  4:20             ` Kenichi Handa
@ 2003-06-21  4:56               ` Richard Stallman
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Stallman @ 2003-06-21  4:56 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    #if 0  /* This causes bugs in (for instance) telnet to certain sites.  */
      s.main.c_iflag &= ~ICRNL;	/* Disable map of CR to NL on input */
    #ifdef INLCR  /* Just being cautious, since I can't check how
		     widespread INLCR is--rms.  */
      s.main.c_iflag &= ~INLCR;	/* Disable map of NL to CR on input */
    #endif
    #endif

    I tried to change the first line above to "#if 1".  The
    newly build emacs doesn't do CR->LF conversion, thus works
    well with the test case (utf-8-bug.el).

I don't know if that problem with (for instance) telnet still exists.
It might not--we could change this and see if anyone loses.

Is there a reason to use a pty in this case?  If you set
process-connection-type to nil for this application,
does that solve the problem?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-06-21  4:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-28  8:12 Process output truncation when using UTF-8 Milan Zamazal
2003-05-30  8:16 ` Kenichi Handa
2003-06-03 14:59   ` Milan Zamazal
2003-06-04 12:48     ` Kenichi Handa
2003-06-13 10:38       ` Milan Zamazal
2003-06-19  4:24         ` Kenichi Handa
2003-06-20  3:15           ` Richard Stallman
2003-06-20  4:20             ` Kenichi Handa
2003-06-21  4:56               ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).