unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* term/encoding problem
@ 2008-09-18 17:25 Andreas Politz
  2008-09-18 18:02 ` Peter Dyballa
       [not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 5+ messages in thread
From: Andreas Politz @ 2008-09-18 17:25 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I am trying to make debians aptitude work in M-x term.  The
problem is that it uses a unicode character, which emacs does not
display properly. It's this one

9618 (#o22622, #x2592)

, which looks like

▒

In term it is displayed as escape sequences and the tooltip says
'Untranslated unicode'. (describe-coding-system 'utf-8) tells me
that this character is outside it's supported range. But I can insert
it in a buffer with (decode-char 'ucs 9618) .

How do I make emacs display this byte sequence properly ?

-ap


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: term/encoding problem
  2008-09-18 17:25 term/encoding problem Andreas Politz
@ 2008-09-18 18:02 ` Peter Dyballa
       [not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Dyballa @ 2008-09-18 18:02 UTC (permalink / raw)
  To: Andreas Politz; +Cc: help-gnu-emacs


Am 18.09.2008 um 19:25 schrieb Andreas Politz:

> How do I make emacs display this byte sequence properly ?


Is the *term* buffer set to UTF-8 encoding?

--
Mit friedvollen Grüßen

   Pete

Stau ist nur hinten blöd, vorne geht's!






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: term/encoding problem
       [not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
@ 2008-09-18 18:16   ` Andreas Politz
  2008-09-18 19:39     ` Peter Dyballa
       [not found]     ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 5+ messages in thread
From: Andreas Politz @ 2008-09-18 18:16 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Dyballa wrote:
> 
> Am 18.09.2008 um 19:25 schrieb Andreas Politz:
> 
>> How do I make emacs display this byte sequence properly ?
> 
> 
> Is the *term* buffer set to UTF-8 encoding?
> 

Yes, it is:

----------%<----------------
u -- mule-utf-8-unix

UTF-8 encoding for Emacs-supported Unicode characters.
It supports Unicode characters of these ranges:
     U+0000..U+33FF, U+E000..U+FFFF.
They correspond to these Emacs character sets:
     ascii, latin-iso8859-1, mule-unicode-0100-24ff,
     mule-unicode-2500-33ff, mule-unicode-e000-ffff
--------%<------------------

And I was wrong in saying, that this character is outside
the supported range.

Note that I get a 'Invalid character' message, when I try to
insert it via quoted-insert and it's octal value
( C-q 22622 ).

-ap


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: term/encoding problem
  2008-09-18 18:16   ` Andreas Politz
@ 2008-09-18 19:39     ` Peter Dyballa
       [not found]     ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Dyballa @ 2008-09-18 19:39 UTC (permalink / raw)
  To: Andreas Politz; +Cc: help-gnu-emacs


Am 18.09.2008 um 20:16 schrieb Andreas Politz:

> Note that I get a 'Invalid character' message, when I try to
> insert it via quoted-insert and it's octal value
> ( C-q 22622 ).

Ahh! So you're with GNU Emacs 22.x? I can reproduce it in 22.2. Once  
I check this character in Kermit's utf8.txt file it's described as:

       character: ▒ (299218, #o1110322, #x490d2, U+2592)
         charset: mule-unicode-2500-33ff
		 (Unicode characters of the range U+2500..U+33FF.)

In UTF-8 presentation this character is encoded with these three  
bytes: E2 96 92. These are in "ASCII" (rather an 8-bit "ASCII"):  
‚ ñ í. Using C-q 1 1 1 0 3 2 2 <some disturbance> I can insert  
HALF SHADE. Could be this non-Unicode Emacs has to use some extras to  
handle this ...

If no-one on this list has an explanation I'd write a bug report (see  
Help menu), also mentioning the 'Invalid character' message. Although  
it looks as if GNU Emacs 22.x seems to recommend to use 1110322  
instead of 22622 ...

--
Mit friedvollen Grüßen

   Pete

Mac OS X is like a wigwam: no fences, no gates, but an apache inside.






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: term/encoding problem
       [not found]     ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
@ 2008-09-19 19:30       ` Andreas Politz
  0 siblings, 0 replies; 5+ messages in thread
From: Andreas Politz @ 2008-09-19 19:30 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Dyballa wrote:
> 
> Am 18.09.2008 um 20:16 schrieb Andreas Politz:
> 
>> Note that I get a 'Invalid character' message, when I try to
>> insert it via quoted-insert and it's octal value
>> ( C-q 22622 ).
> 
> Ahh! So you're with GNU Emacs 22.x? I can reproduce it in 22.2. Once I 
> check this character in Kermit's utf8.txt file it's described as:
> 
Yes emacs 22.2.1 .
>       character: ▒ (299218, #o1110322, #x490d2, U+2592)
>         charset: mule-unicode-2500-33ff
>          (Unicode characters of the range U+2500..U+33FF.)
> 
> In UTF-8 presentation this character is encoded with these three bytes: 
> E2 96 92. These are in "ASCII" (rather an 8-bit "ASCII"): ‚ ñ í. Using 
> C-q 1 1 1 0 3 2 2 <some disturbance> I can insert HALF SHADE. Could be 
> this non-Unicode Emacs has to use some extras to handle this ...
> 
> If no-one on this list has an explanation I'd write a bug report (see 
> Help menu), also mentioning the 'Invalid character' message. Although it 
> looks as if GNU Emacs 22.x seems to recommend to use 1110322 instead of 
> 22622 ...
> 
> -- 
 From what I learned since my first mail, emacs22 uses it's own distinguished
encoding for it's buffers (mule), which explains the difference byte codes.

But, I think I found the problem. term uses `binary' as input coding.
After it has examined the input, it inserts the relevant/visible parts
of it into the buffer. Only at this point it decodes the bytes with
the apropriate coding (variable:locale-coding-system).
At some point it splits the input string, to make it suitable for the
with of the `terminal'. The problem is, that it measures bytes not
characters. So the 3-byte character in question in aptitude, which is mostly
on the last column, gets split in 2 strings a 1 and 2 byte. This 2
strings, when encoded and inserted independently, will result in
what was described as the problem.

I filed a bug report.
Thanks anyway.

-ap


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-09-19 19:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-18 17:25 term/encoding problem Andreas Politz
2008-09-18 18:02 ` Peter Dyballa
     [not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
2008-09-18 18:16   ` Andreas Politz
2008-09-18 19:39     ` Peter Dyballa
     [not found]     ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
2008-09-19 19:30       ` Andreas Politz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).