From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andreas Politz Newsgroups: gmane.emacs.help Subject: Re: term/encoding problem Date: Fri, 19 Sep 2008 21:30:46 +0200 Organization: FH-Trier Message-ID: <1221852859.787701@arno.fh-trier.de> References: <1221758956.574694@arno.fh-trier.de> <1221761982.647629@arno.fh-trier.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1221853298 26875 80.91.229.12 (19 Sep 2008 19:41:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 19 Sep 2008 19:41:38 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Sep 19 21:42:35 2008 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Kgls8-0007rs-63 for geh-help-gnu-emacs@m.gmane.org; Fri, 19 Sep 2008 21:42:32 +0200 Original-Received: from localhost ([127.0.0.1]:52479 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kglr6-0000yx-T8 for geh-help-gnu-emacs@m.gmane.org; Fri, 19 Sep 2008 15:41:28 -0400 Original-Path: news.stanford.edu!headwall.stanford.edu!newshub.sdsu.edu!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!news.belwue.de!news.uni-kl.de!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 45 Original-NNTP-Posting-Host: 143-93-54-11.arno.fh-trier.de Original-X-Trace: news.uni-kl.de 1221852870 20892 143.93.54.11 (19 Sep 2008 19:34:30 GMT) Original-X-Complaints-To: usenet@news.uni-kl.de Original-NNTP-Posting-Date: Fri, 19 Sep 2008 19:34:30 +0000 (UTC) User-Agent: Mozilla-Thunderbird 2.0.0.16 (X11/20080724) In-Reply-To: Cache-Post-Path: arno.fh-trier.de!unknown@dslb-084-059-201-036.pools.arcor-ip.net X-Cache: nntpcache 3.0.1 (see http://www.nntpcache.org/) Original-Xref: news.stanford.edu gnu.emacs.help:162477 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:57818 Archived-At: Peter Dyballa wrote: > > Am 18.09.2008 um 20:16 schrieb Andreas Politz: > >> Note that I get a 'Invalid character' message, when I try to >> insert it via quoted-insert and it's octal value >> ( C-q 22622 ). > > Ahh! So you're with GNU Emacs 22.x? I can reproduce it in 22.2. Once I > check this character in Kermit's utf8.txt file it's described as: > Yes emacs 22.2.1 . > character: ▒ (299218, #o1110322, #x490d2, U+2592) > charset: mule-unicode-2500-33ff > (Unicode characters of the range U+2500..U+33FF.) > > In UTF-8 presentation this character is encoded with these three bytes: > E2 96 92. These are in "ASCII" (rather an 8-bit "ASCII"): ‚ ñ í. Using > C-q 1 1 1 0 3 2 2 I can insert HALF SHADE. Could be > this non-Unicode Emacs has to use some extras to handle this ... > > If no-one on this list has an explanation I'd write a bug report (see > Help menu), also mentioning the 'Invalid character' message. Although it > looks as if GNU Emacs 22.x seems to recommend to use 1110322 instead of > 22622 ... > > -- From what I learned since my first mail, emacs22 uses it's own distinguished encoding for it's buffers (mule), which explains the difference byte codes. But, I think I found the problem. term uses `binary' as input coding. After it has examined the input, it inserts the relevant/visible parts of it into the buffer. Only at this point it decodes the bytes with the apropriate coding (variable:locale-coding-system). At some point it splits the input string, to make it suitable for the with of the `terminal'. The problem is, that it measures bytes not characters. So the 3-byte character in question in aptitude, which is mostly on the last column, gets split in 2 strings a 1 and 2 byte. This 2 strings, when encoded and inserted independently, will result in what was described as the problem. I filed a bug report. Thanks anyway. -ap