From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen. Date: Thu, 19 Nov 2009 23:16:24 +0100 Organization: Organization?!? Message-ID: <87r5rugnyf.fsf@lola.goethe.zz> References: <20091119082040.GA1720@muc.de> <874ooq8xay.fsf@wanchan.jasonrumney.net> <20091119141852.GC1720@muc.de> <20091119155848.GB1314@muc.de> <87aayiihe9.fsf@lola.goethe.zz> <20091119180848.GE1314@muc.de> <834ooqqoln.fsf@gnu.org> <20091119205354.GF1314@muc.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1258669036 18063 80.91.229.12 (19 Nov 2009 22:17:16 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 19 Nov 2009 22:17:16 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 19 23:17:09 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NBFJL-0006zM-JL for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 23:17:07 +0100 Original-Received: from localhost ([127.0.0.1]:33863 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBFJK-0006Kl-RJ for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 17:17:06 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBFJE-0006Je-Gg for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:17:00 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBFJA-0006DB-TQ for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:17:00 -0500 Original-Received: from [199.232.76.173] (port=42165 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBFJA-0006D5-PQ for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:16:56 -0500 Original-Received: from lo.gmane.org ([80.91.229.12]:46754) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NBFJ9-0006eq-VZ for emacs-devel@gnu.org; Thu, 19 Nov 2009 17:16:56 -0500 Original-Received: from list by lo.gmane.org with local (Exim 4.50) id 1NBFJ8-0006ud-0U for emacs-devel@gnu.org; Thu, 19 Nov 2009 23:16:54 +0100 Original-Received: from p5b2c20b9.dip.t-dialin.net ([91.44.32.185]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 19 Nov 2009 23:16:53 +0100 Original-Received: from dak by p5b2c20b9.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 19 Nov 2009 23:16:53 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 92 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: p5b2c20b9.dip.t-dialin.net X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) Cancel-Lock: sha1:xT/OTH5YvCH9baLzlpTAbS4V3T0= X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117305 Archived-At: Alan Mackenzie writes: > Hi, Eli! > > On Thu, Nov 19, 2009 at 09:52:20PM +0200, Eli Zaretskii wrote: >> > Date: Thu, 19 Nov 2009 18:08:48 +0000 >> > From: Alan Mackenzie >> > Cc: emacs-devel@gnu.org > >> > No, you (all of you) are missing the point. That point is that if an >> > Emacs Lisp hacker writes "?ñ", it should work, regardless of what >> > "codepoint" it has, what "bytes" represent it, whether those "bytes" >> > are coded with a different codepoint, or what have you. > >> No can do, as long as we support both unibyte and multibyte buffers >> and strings. > > This seems to be the big thing. That ?ñ has no unique meaning. Wrong. It means the character code of the character ñ in Emacs' internal encoding. > The current situation violates the description on the elisp page > "Basic Char Syntax", which describes the situation as I understood it > up until half an hour ago. Hm? 2.3.3.1 Basic Char Syntax ......................... Since characters are really integers, the printed representation of a character is a decimal number. This is also a possible read syntax for a character, but writing characters that way in Lisp programs is not clear programming. You should _always_ use the special read syntax formats that Emacs Lisp provides for characters. These syntax formats start with a question mark. This makes very very very clear that we are talking about an integer here. Not that the higher node does not also mention this: 2.3.3 Character Type -------------------- A "character" in Emacs Lisp is nothing more than an integer. In other words, characters are represented by their character codes. For example, the character `A' is represented as the integer 65. >> > OK. Surely displaying it as "\361" is a bug? > >> If `a' can be represented as 97, then why cannot \361 be represented >> as 4194289? > > ROFLMAO. If this weren't true, you couldn't invent it. ;-) Since raw bytes above 127 are not legal utf-8 sequences and we want some character representation for them, and since character codes 128 to 255 are already valid Unicode codepoints, the obvious solution is to use numbers that aren't valid Unicode codepoints. One could have chosen -128 to -255 for example. Except that we don't have a natural algorithm for encoding those in a superset of utf-8. >> > So, how did the character "ñ" get turned into the illegal byte >> > #xf1? > >> It did so because you used aset to put it into a unibyte string. > > So, what should I have done to achieve the desired effect? How should > I modify "(aset nl 0 ?ü)" so that it does the Right Thing? Using aset on strings is crude. If it were up to me, I would not allow this operation at all. >> > Are you saying that Emacs is converting "?ñ" and "?ä" into the >> > wrong integers? > >> Emacs can convert it into 2 distinct integer representations. It >> decides which one by the context. And you just happened to give it >> the wrong context. > > OK, I understand that now, thanks. Too bad that it's wrong. ?ñ is the integer that is Emacs' internal character code for ñ. A single integer representation, only different on Emacsen with different internal character codes. If you want to produce an actual string from it, use char-to-string. -- David Kastrup