* term/encoding problem
@ 2008-09-18 17:25 Andreas Politz
2008-09-18 18:02 ` Peter Dyballa
[not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 5+ messages in thread
From: Andreas Politz @ 2008-09-18 17:25 UTC (permalink / raw)
To: help-gnu-emacs
Hi,
I am trying to make debians aptitude work in M-x term. The
problem is that it uses a unicode character, which emacs does not
display properly. It's this one
9618 (#o22622, #x2592)
, which looks like
▒
In term it is displayed as escape sequences and the tooltip says
'Untranslated unicode'. (describe-coding-system 'utf-8) tells me
that this character is outside it's supported range. But I can insert
it in a buffer with (decode-char 'ucs 9618) .
How do I make emacs display this byte sequence properly ?
-ap
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: term/encoding problem
2008-09-18 17:25 term/encoding problem Andreas Politz
@ 2008-09-18 18:02 ` Peter Dyballa
[not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 5+ messages in thread
From: Peter Dyballa @ 2008-09-18 18:02 UTC (permalink / raw)
To: Andreas Politz; +Cc: help-gnu-emacs
Am 18.09.2008 um 19:25 schrieb Andreas Politz:
> How do I make emacs display this byte sequence properly ?
Is the *term* buffer set to UTF-8 encoding?
--
Mit friedvollen Grüßen
Pete
Stau ist nur hinten blöd, vorne geht's!
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>]
* Re: term/encoding problem
[not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
@ 2008-09-18 18:16 ` Andreas Politz
2008-09-18 19:39 ` Peter Dyballa
[not found] ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 5+ messages in thread
From: Andreas Politz @ 2008-09-18 18:16 UTC (permalink / raw)
To: help-gnu-emacs
Peter Dyballa wrote:
>
> Am 18.09.2008 um 19:25 schrieb Andreas Politz:
>
>> How do I make emacs display this byte sequence properly ?
>
>
> Is the *term* buffer set to UTF-8 encoding?
>
Yes, it is:
----------%<----------------
u -- mule-utf-8-unix
UTF-8 encoding for Emacs-supported Unicode characters.
It supports Unicode characters of these ranges:
U+0000..U+33FF, U+E000..U+FFFF.
They correspond to these Emacs character sets:
ascii, latin-iso8859-1, mule-unicode-0100-24ff,
mule-unicode-2500-33ff, mule-unicode-e000-ffff
--------%<------------------
And I was wrong in saying, that this character is outside
the supported range.
Note that I get a 'Invalid character' message, when I try to
insert it via quoted-insert and it's octal value
( C-q 22622 ).
-ap
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: term/encoding problem
2008-09-18 18:16 ` Andreas Politz
@ 2008-09-18 19:39 ` Peter Dyballa
[not found] ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 5+ messages in thread
From: Peter Dyballa @ 2008-09-18 19:39 UTC (permalink / raw)
To: Andreas Politz; +Cc: help-gnu-emacs
Am 18.09.2008 um 20:16 schrieb Andreas Politz:
> Note that I get a 'Invalid character' message, when I try to
> insert it via quoted-insert and it's octal value
> ( C-q 22622 ).
Ahh! So you're with GNU Emacs 22.x? I can reproduce it in 22.2. Once
I check this character in Kermit's utf8.txt file it's described as:
character: ▒ (299218, #o1110322, #x490d2, U+2592)
charset: mule-unicode-2500-33ff
(Unicode characters of the range U+2500..U+33FF.)
In UTF-8 presentation this character is encoded with these three
bytes: E2 96 92. These are in "ASCII" (rather an 8-bit "ASCII"):
‚ ñ í. Using C-q 1 1 1 0 3 2 2 <some disturbance> I can insert
HALF SHADE. Could be this non-Unicode Emacs has to use some extras to
handle this ...
If no-one on this list has an explanation I'd write a bug report (see
Help menu), also mentioning the 'Invalid character' message. Although
it looks as if GNU Emacs 22.x seems to recommend to use 1110322
instead of 22622 ...
--
Mit friedvollen Grüßen
Pete
Mac OS X is like a wigwam: no fences, no gates, but an apache inside.
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>]
* Re: term/encoding problem
[not found] ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
@ 2008-09-19 19:30 ` Andreas Politz
0 siblings, 0 replies; 5+ messages in thread
From: Andreas Politz @ 2008-09-19 19:30 UTC (permalink / raw)
To: help-gnu-emacs
Peter Dyballa wrote:
>
> Am 18.09.2008 um 20:16 schrieb Andreas Politz:
>
>> Note that I get a 'Invalid character' message, when I try to
>> insert it via quoted-insert and it's octal value
>> ( C-q 22622 ).
>
> Ahh! So you're with GNU Emacs 22.x? I can reproduce it in 22.2. Once I
> check this character in Kermit's utf8.txt file it's described as:
>
Yes emacs 22.2.1 .
> character: ▒ (299218, #o1110322, #x490d2, U+2592)
> charset: mule-unicode-2500-33ff
> (Unicode characters of the range U+2500..U+33FF.)
>
> In UTF-8 presentation this character is encoded with these three bytes:
> E2 96 92. These are in "ASCII" (rather an 8-bit "ASCII"): ‚ ñ í. Using
> C-q 1 1 1 0 3 2 2 <some disturbance> I can insert HALF SHADE. Could be
> this non-Unicode Emacs has to use some extras to handle this ...
>
> If no-one on this list has an explanation I'd write a bug report (see
> Help menu), also mentioning the 'Invalid character' message. Although it
> looks as if GNU Emacs 22.x seems to recommend to use 1110322 instead of
> 22622 ...
>
> --
From what I learned since my first mail, emacs22 uses it's own distinguished
encoding for it's buffers (mule), which explains the difference byte codes.
But, I think I found the problem. term uses `binary' as input coding.
After it has examined the input, it inserts the relevant/visible parts
of it into the buffer. Only at this point it decodes the bytes with
the apropriate coding (variable:locale-coding-system).
At some point it splits the input string, to make it suitable for the
with of the `terminal'. The problem is, that it measures bytes not
characters. So the 3-byte character in question in aptitude, which is mostly
on the last column, gets split in 2 strings a 1 and 2 byte. This 2
strings, when encoded and inserted independently, will result in
what was described as the problem.
I filed a bug report.
Thanks anyway.
-ap
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-09-19 19:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-18 17:25 term/encoding problem Andreas Politz
2008-09-18 18:02 ` Peter Dyballa
[not found] ` <mailman.19487.1221760968.18990.help-gnu-emacs@gnu.org>
2008-09-18 18:16 ` Andreas Politz
2008-09-18 19:39 ` Peter Dyballa
[not found] ` <mailman.19499.1221766765.18990.help-gnu-emacs@gnu.org>
2008-09-19 19:30 ` Andreas Politz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).