From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen. Date: Thu, 19 Nov 2009 21:57:07 +0000 Message-ID: <20091119215707.GI1314@muc.de> References: <20091118191258.GA2676@muc.de> <20091119082040.GA1720@muc.de> <874ooq8xay.fsf@wanchan.jasonrumney.net> <20091119141852.GC1720@muc.de> <20091119155848.GB1314@muc.de> <837htmqp0e.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1258667563 12140 80.91.229.12 (19 Nov 2009 21:52:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 19 Nov 2009 21:52:43 +0000 (UTC) Cc: emacs-devel@gnu.org, schwab@linux-m68k.org, monnier@iro.umontreal.ca, jasonr@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 19 22:52:35 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NBEvK-0007Nb-IU for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 22:52:18 +0100 Original-Received: from localhost ([127.0.0.1]:37068 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBEvJ-0005d3-Rs for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 16:52:17 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBEvC-0005cq-56 for emacs-devel@gnu.org; Thu, 19 Nov 2009 16:52:10 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBEv7-0005cO-97 for emacs-devel@gnu.org; Thu, 19 Nov 2009 16:52:09 -0500 Original-Received: from [199.232.76.173] (port=45108 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBEv7-0005cL-1s for emacs-devel@gnu.org; Thu, 19 Nov 2009 16:52:05 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:1227 helo=mail.muc.de) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NBEv5-0001vb-QY for emacs-devel@gnu.org; Thu, 19 Nov 2009 16:52:04 -0500 Original-Received: (qmail 94669 invoked by uid 3782); 19 Nov 2009 21:52:02 -0000 Original-Received: from acm.muc.de (pD9E51409.dip.t-dialin.net [217.229.20.9]) by colin2.muc.de (tmda-ofmipd) with ESMTP; Thu, 19 Nov 2009 22:51:59 +0100 Original-Received: (qmail 6502 invoked by uid 1000); 19 Nov 2009 21:57:07 -0000 Content-Disposition: inline In-Reply-To: <837htmqp0e.fsf@gnu.org> User-Agent: Mutt/1.5.9i X-Delivery-Agent: TMDA/1.1.5 (Fettercairn) X-Primary-Address: acm@muc.de X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.6-4.9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117304 Archived-At: Hi, Eli! On Thu, Nov 19, 2009 at 09:43:29PM +0200, Eli Zaretskii wrote: > > Date: Thu, 19 Nov 2009 15:58:48 +0000 > > From: Alan Mackenzie > > Cc: emacs-devel@gnu.org, Andreas Schwab , > > Jason Rumney > > > No: the string does not contain any characters, only bytes, because > > > it's a unibyte string. > > I'm thinking from the lisp viewpoint. The string is a data structure > > which contains characters. I really don't want to have to think > > about the difference between "chars" and "bytes" when I'm hacking > > lisp. If I do, then the abstraction "string" is broken. > No, it isn't. Emacs supports unibyte strings and multibyte strings. > The latter hold characters, but the former hold raw bytes. See > "(elisp) Text Representations". The abstraction is broken. It is broken because it isn't abstract - its users have to think about the way characters are represented. In an effective abstraction, a user could just write "ñ" or ?ñ and rely on the underlying mechanisms to work. Instead of the abstraction "string", we have two grossly inferior abstractions, "unibyte string" and "multibyte string". Please suggest to me the correct elisp to "replace the zeroth character of an existing string with Spanish n-twiddle". If this is impossible to write, or it's grossly larger than the buggy "(aset nl 0 ?ñ)", that's a demonstration of the breakage. > > > The byte 241 can be inserted in multibyte strings and buffers > > > because it is also a char of code 4194289 (which gets displayed as > > > \361). > > Hang on a mo'! How can the byte 241 "be" a char of code 4194289? This > > is some strange usage of the word "be" that I wasn't previously aware > > of. ;-) > That's how Emacs 23 represents raw bytes in multibyte buffers and > strings. Why is it necessary to distinguish between 'A' and 65? Surely they're both just 0x41? I'm missing something here. > > At this point, would you please just agree with me that when I do > > (setq nl "\n") > > (aset nl 0 ?ñ) > > (insert nl) > > , what should appear on the screen should be "ñ", NOT "\361"? > No, I don't agree. If you want to get a human-readable text string, > don't use aset; use string operations instead. There aren't any. `store-substring' will fail if the bits-and-bytes representation of the new bit differ in size from the old bit, thus surely isn't any better than `aset'. At least `aset' tries to convert to multibyte. I don't imagine anybody here would hold that the current state of strings is ideal. I'm still trying to piece together what the essence of the problem is. -- Alan Mackenzie (Nuremberg, Germany).