From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Nic Ferrier Newsgroups: gmane.emacs.devel Subject: strange UTF8 encoding problem (relevant to decoding-system-gone-awry?) Date: Thu, 17 Feb 2005 12:48:38 +0000 Message-ID: <87y8dnmjx5.fsf@kanga.tapsellferrier.co.uk> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1108646324 7007 80.91.229.6 (17 Feb 2005 13:18:44 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 17 Feb 2005 13:18:44 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 17 14:18:38 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1D1lYE-0005WS-00 for ; Thu, 17 Feb 2005 14:18:38 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D1loN-0005hn-DG for ged-emacs-devel@m.gmane.org; Thu, 17 Feb 2005 08:35:19 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D1ll0-0004iv-QD for emacs-devel@gnu.org; Thu, 17 Feb 2005 08:31:50 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D1lkW-0004Zi-BY for emacs-devel@gnu.org; Thu, 17 Feb 2005 08:31:31 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D1lkU-0004Rd-QD for emacs-devel@gnu.org; Thu, 17 Feb 2005 08:31:18 -0500 Original-Received: from [80.168.156.78] (helo=owls-tree.tapsellferrier.co.uk) by monty-python.gnu.org with esmtp (TLSv1:RC4-SHA:128) (Exim 4.34) id 1D1l9R-0007Gz-EO for emacs-devel@gnu.org; Thu, 17 Feb 2005 07:53:01 -0500 Original-Received: from [217.207.145.34] (helo=kanga) by owls-tree.tapsellferrier.co.uk with esmtp (Exim 4.30 #1 (Debian)) id 1D1l5n-0001It-0B for ; Thu, 17 Feb 2005 12:49:15 +0000 Original-Received: from nferrier by kanga with local (Exim 4.34 #1 (Debian)) id 1D1l5F-0000ew-FJ for ; Thu, 17 Feb 2005 12:48:41 +0000 Original-To: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:33581 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33581 I've noted the current discussion on Emacs coding. I am experiencing a strange problem with Emacs encoding which I thought I might share. I'm reading the tcpd package's hosts_acccess man page with Emacs man from this version of Emacs: GNU Emacs 21.3.50.22 (i686-pc-linux-gnu, GTK+ Version 2.4.10) of 2004-12-14 In the man page viewed on a terminal there are nice little bullet characters. Hexdump shows these characters as B7 so obviously the terminal is not UTF-8. The UTF-8 sequence for B7 is 0301 0267. When I view the man page in Emacs with utf-8 encoding on by default I get a \267. Encoding the page as unix produces: \302\267 which *does* look like a valid UTF-8 byte sequence. When I do (what-cursor-position) on the character I get 302 which is the first byte in the sequence. I'm not sure what Emacs is doing here. It looks like valid UTF-8 and yet (what-cursor-position) obviously does not believe there is a UTF-8 character. Anybody got any idea why the correct character doesn't display? btw Woman display the manual page with the strange bullet converted to an asterisk. -- Nic Ferrier http://www.tapsellferrier.co.uk