From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: strange UTF8 encoding problem (relevant to decoding-system-gone-awry?) Date: Tue, 22 Feb 2005 16:38:06 +0900 (JST) Message-ID: <200502220738.QAA22199@etlken.m17n.org> References: <87y8dnmjx5.fsf@kanga.tapsellferrier.co.uk> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1109057986 10853 80.91.229.2 (22 Feb 2005 07:39:46 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 22 Feb 2005 07:39:46 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Feb 22 08:39:45 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1D3Udn-000768-Hr for ged-emacs-devel@m.gmane.org; Tue, 22 Feb 2005 08:39:31 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D3Uuu-0007lx-38 for ged-emacs-devel@m.gmane.org; Tue, 22 Feb 2005 02:57:12 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D3UuB-0007P8-33 for emacs-devel@gnu.org; Tue, 22 Feb 2005 02:56:27 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D3Uu7-0007Lv-Vc for emacs-devel@gnu.org; Tue, 22 Feb 2005 02:56:25 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D3Uu6-0007KZ-Id for emacs-devel@gnu.org; Tue, 22 Feb 2005 02:56:22 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1D3UcV-0006Ud-9K for emacs-devel@gnu.org; Tue, 22 Feb 2005 02:38:11 -0500 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j1M7c7XD001690; Tue, 22 Feb 2005 16:38:09 +0900 Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j1M7c7PN018651; Tue, 22 Feb 2005 16:38:07 +0900 Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA22199; Tue, 22 Feb 2005 16:38:06 +0900 (JST) Original-To: Nic Ferrier In-reply-to: <87y8dnmjx5.fsf@kanga.tapsellferrier.co.uk> (message from Nic Ferrier on Thu, 17 Feb 2005 12:48:38 +0000) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-MailScanner-To: ged-emacs-devel@m.gmane.org Xref: main.gmane.org gmane.emacs.devel:33716 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33716 In article <87y8dnmjx5.fsf@kanga.tapsellferrier.co.uk>, Nic Ferrier writes: > I've noted the current discussion on Emacs coding. > I am experiencing a strange problem with Emacs encoding which I > thought I might share. > I'm reading the tcpd package's hosts_acccess man page with Emacs man > from this version of Emacs: > GNU Emacs 21.3.50.22 (i686-pc-linux-gnu, GTK+ Version 2.4.10) of > 2004-12-14 > In the man page viewed on a terminal there are nice little bullet > characters. Hexdump shows these characters as B7 so obviously the > terminal is not UTF-8. > The UTF-8 sequence for B7 is 0301 0267. > When I view the man page in Emacs with utf-8 encoding on by default I > get a \267. Encoding the page as unix produces: \302\267 which > *does* look like a valid UTF-8 byte sequence. > When I do (what-cursor-position) on the character I get 302 which is > the first byte in the sequence. > I'm not sure what Emacs is doing here. It looks like valid UTF-8 and > yet (what-cursor-position) obviously does not believe there is a UTF-8 > character. > Anybody got any idea why the correct character doesn't display? I can't reproduce it. What I did is: % LANG=de_DE.UTF-8 emacs -Q and M-x man RET man RET It surely decodes utf-8 output of man command correctly. What is the value of enable-multibyte-characters? Can you reproduce the bug with -Q arg? --- Ken'ichi HANDA handa@m17n.org