From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Davis Herring" Newsgroups: gmane.emacs.devel Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen. Date: Thu, 19 Nov 2009 11:25:05 -0800 (PST) Message-ID: <47325.130.55.118.19.1258658705.squirrel@webmail.lanl.gov> References: <20091118191258.GA2676@muc.de> <20091119082040.GA1720@muc.de> <874ooq8xay.fsf@wanchan.jasonrumney.net> <20091119141852.GC1720@muc.de> <20091119155848.GB1314@muc.de> <87aayiihe9.fsf@lola.goethe.zz> <20091119180848.GE1314@muc.de> Reply-To: herring@lanl.gov NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1258660205 14616 80.91.229.12 (19 Nov 2009 19:50:05 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 19 Nov 2009 19:50:05 +0000 (UTC) Cc: David Kastrup , emacs-devel@gnu.org To: "Alan Mackenzie" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 19 20:49:58 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NBD0t-0000JE-BW for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 20:49:56 +0100 Original-Received: from localhost ([127.0.0.1]:37473 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBD0s-0000OH-UC for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 14:49:54 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBD07-0008Mt-7K for emacs-devel@gnu.org; Thu, 19 Nov 2009 14:49:07 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBD01-0008KZ-Nu for emacs-devel@gnu.org; Thu, 19 Nov 2009 14:49:06 -0500 Original-Received: from [199.232.76.173] (port=57908 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBD01-0008KT-HQ for emacs-devel@gnu.org; Thu, 19 Nov 2009 14:49:01 -0500 Original-Received: from proofpoint2.lanl.gov ([204.121.3.26]:43629) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NBCd5-0007cN-LO; Thu, 19 Nov 2009 14:25:20 -0500 Original-Received: from mailrelay2.lanl.gov (mailrelay2.lanl.gov [128.165.4.103]) by proofpoint2.lanl.gov (8.14.3/8.14.3) with ESMTP id nAJJP5j5016115; Thu, 19 Nov 2009 12:25:06 -0700 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by mailrelay2.lanl.gov (Postfix) with ESMTP id AEABC15CC836; Thu, 19 Nov 2009 12:25:05 -0700 (MST) X-NIE-2-Virus-Scanner: amavisd-new at mailrelay2.lanl.gov Original-Received: from webmail1.lanl.gov (webmail1.lanl.gov [128.165.4.106]) by mailrelay2.lanl.gov (Postfix) with ESMTP id 87FCA15CC835; Thu, 19 Nov 2009 12:25:05 -0700 (MST) Original-Received: by webmail1.lanl.gov (Postfix, from userid 48) id 85DD61DE027B; Thu, 19 Nov 2009 12:25:05 -0700 (MST) Original-Received: from 130.55.118.19 (SquirrelMail authenticated user 196434) by webmail.lanl.gov with HTTP; Thu, 19 Nov 2009 11:25:05 -0800 (PST) In-Reply-To: <20091119180848.GE1314@muc.de> User-Agent: SquirrelMail/1.4.8-5.7.lanl7 X-Priority: 3 (Normal) Importance: Normal X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5, 1.2.40, 4.0.166 definitions=2009-11-19_17:2009-11-16, 2009-11-19, 2009-11-19 signatures=0 X-MIME-Autoconverted: from 8bit to quoted-printable by proofpoint2.lanl.gov id nAJJP5j5016115 X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117288 Archived-At: [I end up having to say the same thing several times here; I thought it preferable to omitting any of Alan's questions or any aspect of the problem. It's not meant to be a rant.] > No, you (all of you) are missing the point. That point is that if an > Emacs Lisp hacker writes "?=F1", it should work, regardless of > what "codepoint" it has, what "bytes" represent it, whether those > "bytes" are coded with a different codepoint, or what have you. All of > that stuff is uninteresting. If it gets interesting, like now, it is > because it is buggy. When you wrote ?=F1, it did work -- that character has the Unicode (and Emacs 23) code point 241, so that two-character token is entirely equivalent to the token "241" in Emacs source. (This is independent of the encoding of the source file: the same two characters might be represented by many different octet sequences in the source file, but you always get 241 as the value (which is a code point and is distinct from octet sequences anyway).) But you didn't insert that object! You forced it into a (perhaps surprisingly: unibyte) string, which interpreted its argument (the intege= r 241) as a raw byte value, because that's what unibyte strings contain.=20 When you then inserted the string, Emacs transformed it into a (somewhat artificial) character whose meaning is "this was really the byte 241, which, since it corresponds to no UTF-8 character, must merely be reproduced literally on disk" and whose Emacs code point is 4194289.=20 (That integer looks like it could be derived from 241 by sign-extension for the convenience of Emacs hackers; the connection is unimportant to th= e user.) > OK. Surely displaying it as "\361" is a bug? Should it not display as > "\17777761". If it did, it would have saved half of my ranting. No: characters are displayed according to their meaning, not their internal code point. As it happens, this character's whole meaning is "the byte #o361", so that's what's displayed. > So, how did the character "=F1" get turned into the illegal byte #xf1? = Is > that the bug? By its use in `aset' in a unibyte context (determined entirely by the target string). >> You assume that ?=F1 is a character. > > I do indeed. It is self evident. Its characterness is determined by context, because (as you know) Emacs has no distinct character type. So, in the isolation of English prose, w= e have no way of telling whether ?=F1 "is" a character or an integer, any m= ore than we can guess about 241. (We can guess about the writer's desires, but not about the real effects.) > Now, would you too please just agree that when I execute the three form= s > above, and "=F1" should appear? That's Stefan's point: should common string literals generate multibyte strings (so as to change the meaning, not of the string, but of `aset', t= o what you want)? Maybe: one could also address the issue by disallowing `aset' on unibyte strings (or strings entirely) and introducing `aset-unibyte' (and perhaps `aset-multibyte') so that the argument interpretation (and the O(n) nature of the latter) would be made clear to the programmer. Maybe the doc-string for `aset' should just bear a reall= y loud warning. It bears more consideration than merely "yes" to your question, as reasonable as it seems. > What is the correct Emacs internal representation for "=F1" and "=E4"? = They > surely cannot share internal representations with other > (non-)characters? They have the unique internal representation as (mostly) Unicode code points (integers) 241 and 228, which happen to be identical to the representations of bytes of those values (which interpretation prevails i= n a unibyte context). Davis --=20 This product is sold by volume, not by mass. If it appears too dense or too sparse, it is because mass-energy conversion has occurred during shipping.