unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#1006: garbled unicode characters in M-x term
@ 2008-09-19 19:34 ` Andreas Politz
  2008-09-25  0:15   ` bug#1006: marked as done (garbled unicode characters in M-x term) Emacs bug Tracking System
  0 siblings, 1 reply; 3+ messages in thread
From: Andreas Politz @ 2008-09-19 19:34 UTC (permalink / raw)
  To: bug-gnu-emacs


Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:


Problem : Under certain circumstances multibyte characters in M-x term
become garbled and display as single byte escape sequences.

Example : debians aptitude (character U+2592)

 From a post I made to gnu.emacs.help:

Ok, I think I found the problem. term uses `binary' as input coding.
After it has examined the input, it inserts the relevant/visible parts
of it into the buffer. Only at this point it decodes the bytes with
the apropriate coding (variable:locale-coding-system).
At some point it splits the input string, to make it suitable for the
with of the `terminal'. The problem is, that it measures bytes not
characters. So the 3-byte character in question in aptitude, which is mostly
on the last column, gets split in 2 strings a 1 and 2 byte. This 2
strings, when encoded and inserted independently, will result in
what was described as the problem.

Solution would be to encode the string before checking the length of
it.

-ap

If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
     `bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
/usr/share/emacs/22.2/etc/DEBUG for instructions.


In GNU Emacs 22.2.1 (i486-pc-linux-gnu, GTK+ Version 2.12.11)
  of 2008-07-25 on raven, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
configured using `configure  '--build=i486-linux-gnu' '--host=i486-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var/lib' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs22:/etc/emacs:/usr/local/share/emacs/22.2/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/22.2/site-lisp:/usr/share/emacs/site-lisp:/usr/share/emacs/22.2/leim' '--with-x=yes' '--with-x-toolkit=gtk' '--with-toolkit-scroll-bars' 'build_alias=i486-linux-gnu' 'host_alias=i486-linux-gnu' 'CFLAGS=-DDEBIAN -g -O2' 'LDFLAGS=-g' 'CPPFLAGS=''

Important settings:
   value of $LC_ALL: nil
   value of $LC_COLLATE: nil
   value of $LC_CTYPE: nil
   value of $LC_MESSAGES: nil
   value of $LC_MONETARY: nil
   value of $LC_NUMERIC: nil
   value of $LC_TIME: nil
   value of $LANG: en_US.UTF-8
   locale-coding-system: utf-8
   default-enable-multibyte-characters: t

Major mode: Fundamental

Minor modes in effect:
   shell-dirtrack-mode: t
   auto-fill-function: do-auto-fill
   show-paren-mode: t
   savehist-mode: t
   icomplete-mode: t
   global-hi-lock-mode: t
   hi-lock-mode: t
   display-time-mode: t
   tooltip-mode: t
   mouse-wheel-mode: t
   menu-bar-mode: t
   file-name-shadow-mode: t
   global-font-lock-mode: t
   font-lock-mode: t
   unify-8859-on-encoding-mode: t
   utf-translate-cjk-mode: t
   auto-compression-mode: t
   column-number-mode: t
   line-number-mode: t

Recent input:
C-x C-s M-x d i f f SPC u DEL C-g C-x o M-? m C-M-v
C-x k RET C-x C-g M-x d i f f RET RET t e r m . RET
C-x o C-v C-v C-v C-v C-v M-< M-x w o m a n RET d i
f f RET C-v C-v C-v M-v C-r i g n o r e C-r C-g C-x
b t e r C-s C-s C-g C-x o M-x C-g C-u M-x d i f f RET
RET t e r C-s RET w <return> C-x o C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-x o C-x
o M-< C-x k RET C-x o C-u M-x d i f f RET RET t e r
C-s RET DEL w <return> C-x C-g C-u C-g M-x d i f f
RET RET t e r m . RET C-x o C-v C-v C-v C-v C-v M-v
M-v M-v M-v M-v C-x o C-x C-w ~ / . e m / t e r m .
e l <return> C-x b f o RET C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n M-x r e p o SPC r t RET g r a <backspace>
<backspace> a r b e l e d DEL DEL DEL DEL l e d C-g

Recent messages:
Repeating command 1 other-window
Quit
Repeating command 1 other-window [2 times]
Saving file /home/andy/.emacs.d/term.el...
Wrote /home/andy/.emacs.d/term.el
Making completion list...
Loading emacsbug...done
Quit







^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#1006: garbled unicode characters in M-x term
@ 2008-09-24 20:30 Chong Yidong
  0 siblings, 0 replies; 3+ messages in thread
From: Chong Yidong @ 2008-09-24 20:30 UTC (permalink / raw)
  To: Andreas Politz; +Cc: 1006

> Ok, I think I found the problem. term uses `binary' as input coding.
> After it has examined the input, it inserts the relevant/visible parts
> of it into the buffer. Only at this point it decodes the bytes with
> the apropriate coding (variable:locale-coding-system).  At some point
> it splits the input string, to make it suitable for the with of the
> `terminal'. The problem is, that it measures bytes not characters. So
> the 3-byte character in question in aptitude, which is mostly on the
> last column, gets split in 2 strings a 1 and 2 byte. This 2 strings,
> when encoded and inserted independently, will result in what was
> described as the problem.

Thanks for the analysis.  Could you try to write a patch to fix this?






^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#1006: marked as done (garbled unicode characters in M-x term)
  2008-09-19 19:34 ` bug#1006: garbled unicode characters in M-x term Andreas Politz
@ 2008-09-25  0:15   ` Emacs bug Tracking System
  0 siblings, 0 replies; 3+ messages in thread
From: Emacs bug Tracking System @ 2008-09-25  0:15 UTC (permalink / raw)
  To: Chong Yidong

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]


Your message dated Wed, 24 Sep 2008 20:07:46 -0400
with message-id <87myhxhyul.fsf@cyd.mit.edu>
and subject line Re: garbled unicode characters in M-x term
has caused the Emacs bug report #1006,
regarding garbled unicode characters in M-x term
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact don@donarmstrong.com
immediately.)


-- 
1006: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=1006
Emacs Bug Tracking System
Contact don@donarmstrong.com with problems

[-- Attachment #2: Type: message/rfc822, Size: 6852 bytes --]

From: Andreas Politz <politza@fh-trier.de>
To: bug-gnu-emacs@gnu.org
Subject: garbled unicode characters in M-x term
Date: Fri, 19 Sep 2008 21:34:51 +0200
Message-ID: <48D3FEDB.30009@fh-trier.de>


Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:


Problem : Under certain circumstances multibyte characters in M-x term
become garbled and display as single byte escape sequences.

Example : debians aptitude (character U+2592)

 From a post I made to gnu.emacs.help:

Ok, I think I found the problem. term uses `binary' as input coding.
After it has examined the input, it inserts the relevant/visible parts
of it into the buffer. Only at this point it decodes the bytes with
the apropriate coding (variable:locale-coding-system).
At some point it splits the input string, to make it suitable for the
with of the `terminal'. The problem is, that it measures bytes not
characters. So the 3-byte character in question in aptitude, which is mostly
on the last column, gets split in 2 strings a 1 and 2 byte. This 2
strings, when encoded and inserted independently, will result in
what was described as the problem.

Solution would be to encode the string before checking the length of
it.

-ap

If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
     `bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
/usr/share/emacs/22.2/etc/DEBUG for instructions.


In GNU Emacs 22.2.1 (i486-pc-linux-gnu, GTK+ Version 2.12.11)
  of 2008-07-25 on raven, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
configured using `configure  '--build=i486-linux-gnu' '--host=i486-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var/lib' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs22:/etc/emacs:/usr/local/share/emacs/22.2/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/22.2/site-lisp:/usr/share/emacs/site-lisp:/usr/share/emacs/22.2/leim' '--with-x=yes' '--with-x-toolkit=gtk' '--with-toolkit-scroll-bars' 'build_alias=i486-linux-gnu' 'host_alias=i486-linux-gnu' 'CFLAGS=-DDEBIAN -g -O2' 'LDFLAGS=-g' 'CPPFLAGS=''

Important settings:
   value of $LC_ALL: nil
   value of $LC_COLLATE: nil
   value of $LC_CTYPE: nil
   value of $LC_MESSAGES: nil
   value of $LC_MONETARY: nil
   value of $LC_NUMERIC: nil
   value of $LC_TIME: nil
   value of $LANG: en_US.UTF-8
   locale-coding-system: utf-8
   default-enable-multibyte-characters: t

Major mode: Fundamental

Minor modes in effect:
   shell-dirtrack-mode: t
   auto-fill-function: do-auto-fill
   show-paren-mode: t
   savehist-mode: t
   icomplete-mode: t
   global-hi-lock-mode: t
   hi-lock-mode: t
   display-time-mode: t
   tooltip-mode: t
   mouse-wheel-mode: t
   menu-bar-mode: t
   file-name-shadow-mode: t
   global-font-lock-mode: t
   font-lock-mode: t
   unify-8859-on-encoding-mode: t
   utf-translate-cjk-mode: t
   auto-compression-mode: t
   column-number-mode: t
   line-number-mode: t

Recent input:
C-x C-s M-x d i f f SPC u DEL C-g C-x o M-? m C-M-v
C-x k RET C-x C-g M-x d i f f RET RET t e r m . RET
C-x o C-v C-v C-v C-v C-v M-< M-x w o m a n RET d i
f f RET C-v C-v C-v M-v C-r i g n o r e C-r C-g C-x
b t e r C-s C-s C-g C-x o M-x C-g C-u M-x d i f f RET
RET t e r C-s RET w <return> C-x o C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-x o C-x
o M-< C-x k RET C-x o C-u M-x d i f f RET RET t e r
C-s RET DEL w <return> C-x C-g C-u C-g M-x d i f f
RET RET t e r m . RET C-x o C-v C-v C-v C-v C-v M-v
M-v M-v M-v M-v C-x o C-x C-w ~ / . e m / t e r m .
e l <return> C-x b f o RET C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n M-x r e p o SPC r t RET g r a <backspace>
<backspace> a r b e l e d DEL DEL DEL DEL l e d C-g

Recent messages:
Repeating command 1 other-window
Quit
Repeating command 1 other-window [2 times]
Saving file /home/andy/.emacs.d/term.el...
Wrote /home/andy/.emacs.d/term.el
Making completion list...
Loading emacsbug...done
Quit




[-- Attachment #3: Type: message/rfc822, Size: 1672 bytes --]

From: Chong Yidong <cyd@stupidchicken.com>
To: Andreas Politz <politza@fh-trier.de>
Cc: 1006-done@emacsbugs.donarmstrong.com
Subject: Re: garbled unicode characters in M-x term
Date: Wed, 24 Sep 2008 20:07:46 -0400
Message-ID: <87myhxhyul.fsf@cyd.mit.edu>

>>>> Thanks for the analysis.  Could you try to write a patch to fix
>>>> this?
>>>>
>>> I did. It's a followup in the thread on emacs.bug .
>>
>> Hmm, I don't see your message.  Could you please mail it directly to
>> me?
>
> Shure, here it comes :

The patch looks good.  I've installed it into the Emacs CVS trunk, with
a few minor cosmetic changes.  Thanks very much for debugging and fixing
this.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-09-25  0:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <87myhxhyul.fsf@cyd.mit.edu>
2008-09-19 19:34 ` bug#1006: garbled unicode characters in M-x term Andreas Politz
2008-09-25  0:15   ` bug#1006: marked as done (garbled unicode characters in M-x term) Emacs bug Tracking System
2008-09-24 20:30 bug#1006: garbled unicode characters in M-x term Chong Yidong

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).