From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen. Date: Thu, 19 Nov 2009 23:31:45 +0100 Organization: Organization?!? Message-ID: <87my2ign8u.fsf@lola.goethe.zz> References: <20091119082040.GA1720@muc.de> <874ooq8xay.fsf@wanchan.jasonrumney.net> <20091119141852.GC1720@muc.de> <20091119155848.GB1314@muc.de> <87aayiihe9.fsf@lola.goethe.zz> <20091119180848.GE1314@muc.de> <47325.130.55.118.19.1258658705.squirrel@webmail.lanl.gov> <20091119212550.GG1314@muc.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1258669954 21366 80.91.229.12 (19 Nov 2009 22:32:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 19 Nov 2009 22:32:34 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 19 23:32:27 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NBFYA-0005Yg-Qu for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 23:32:27 +0100 Original-Received: from localhost ([127.0.0.1]:39614 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBFYA-00080R-AU for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 17:32:26 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBFY2-0007vT-0p for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:32:18 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBFXx-0007md-A3 for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:32:17 -0500 Original-Received: from [199.232.76.173] (port=38052 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBFXx-0007mT-5R for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:32:13 -0500 Original-Received: from lo.gmane.org ([80.91.229.12]:43061) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NBFXw-0000vo-DJ for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:32:12 -0500 Original-Received: from list by lo.gmane.org with local (Exim 4.50) id 1NBFXs-0005Pc-4W for emacs-devel@gnu.org; Thu, 19 Nov 2009 23:32:08 +0100 Original-Received: from p5b2c20b9.dip.t-dialin.net ([91.44.32.185]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 19 Nov 2009 23:32:08 +0100 Original-Received: from dak by p5b2c20b9.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 19 Nov 2009 23:32:08 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 56 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: p5b2c20b9.dip.t-dialin.net X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) Cancel-Lock: sha1:FEhH8gvrHPlBuFvg2wph/XI1P/M= X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117307 Archived-At: Alan Mackenzie writes: > OK - so what's happening is that ?ñ is unambiguously 241. But Emacs > cannot say whether that is unibyte 241 or multibyte 241, which it > encodes as 4194289. Despite not knowing, Emacs is determined never to > confuse a 4194289 type of 241 with a 241 type of 241. So, despite the > fact that the character 4194289 probably originated as a unibyte ?ñ, ?ñ is the code point of a character. Unibyte strings contain bytes, not characters. ?ñ is a confusing way of writing 241 in the context of unibyte, just like '\n' may be a confusing way of writing 10 in the context of number bases. > Why couldn't Emacs have simply displayed the character as "ñ"? Because there is no character with a byte representation of 241. You are apparently demanding that Emacs display this "wild byte" as if it were really encoded in latin-1. What is so special about latin-1? Latin-1 characters have a byte representation in utf-8, but it is not 241. > Why does it have to enforce its internal dirty linen on an > unsuspecting hacker? It doesn't. And since we are talking about a non-character isolated byte, Emacs displays it as a non-character isolated byte rather than throwing it out on the terminal and confusing the user with whatever the terminal may make of it. > That meaning is an artificial one imposed by Emacs itself. Is there > any pressing reason to distinguish 4194289 from 241 when displaying > them as characters on a screen? 4194289 is the Emacs code point for "invalid raw byte with value 241", 241 is the Emacs code point for "Unicode character 241, part of latin-1 plane". If you throw them to encode-region, the resulting unibyte string will contain 241 for the first, but whatever external representation is proper for the specified encoding for the second. If you encode to latin-1, the distinction will get lost. If you encode to other encodings, it won't. > Sorry, what the heck is "the byte with value 241"? Does this concept > have any meaning, any utility beyond the machiavellian one of > confusing me? How would one use "the byte with value 241", and why > does it need to be kept distinct from "ñ"? You can use Emacs to load an executable, change some string inside of it (make sure that it contains the same number of bytes afterwards!) and save, and everything you did not edit is the same. That's a very fine thing. To have this work, Emacs needs an internal representation for "byte with code x that is not valid as part of a character". -- David Kastrup