From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?iso-8859-1?B?SvxyZ2VuIEhhcnRtYW5u?= Newsgroups: gmane.emacs.help Subject: RE: [Solved] RE: Differences between identical strings in Emacs lisp Date: Tue, 7 Apr 2015 19:02:38 +0200 Message-ID: References: , <87pp7gu7by.fsf@kuiper.lan.informatimago.com>, , <83mw2khvc1.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1428426197 5083 80.91.229.3 (7 Apr 2015 17:03:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 7 Apr 2015 17:03:17 +0000 (UTC) To: "help-gnu-emacs@gnu.org" Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Apr 07 19:03:04 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YfWtx-0007TV-KK for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 19:03:01 +0200 Original-Received: from localhost ([::1]:47462 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfWtx-00080u-0N for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 13:03:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43872) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfWth-00080e-Lw for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:02:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YfWtb-0003Lr-Je for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:02:45 -0400 Original-Received: from dub004-omc4s24.hotmail.com ([157.55.2.99]:56870) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfWtb-0003Ln-BY for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:02:39 -0400 Original-Received: from DUB124-W45 ([157.55.2.73]) by DUB004-OMC4S24.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Tue, 7 Apr 2015 10:02:38 -0700 X-TMN: [1O3NYB1JjwIsyzFd/ePURX1odwCMDgxE] X-Originating-Email: [juergen_hartmann_@hotmail.com] Importance: Normal In-Reply-To: <83mw2khvc1.fsf@gnu.org> X-OriginalArrivalTime: 07 Apr 2015 17:02:38.0516 (UTC) FILETIME=[A6E80740:01D07154] X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 157.55.2.99 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103559 Archived-At: Thank you for your comments and your caring advises=2C Eli Zaretskii:=0A= =0A= > May I ask why you need to mess with unibyte strings?=A0 (Your original=0A= > message doesn't seem to present a real problem=2C just something that=0A= > puzzled you.)=0A= =0A= That's right: I was trying to learn something about the basic Lisp data typ= es=0A= and their constants and=2C as a side effect=2C trying to understand some of= these=0A= "cryptic" read and write sequences that one sees in Emacs from time to time= .=0A= Doing so it was "\xBA" that unnoticeable lured me into the land of the=0A= unicode strings. And being there=2C as you warn below=2C the confusion star= ted.=0A= =0A= First I thought that some hidden decoding based on some charsets or coding= =0A= systems occurs. But now--thanks to Pascal Bourguignon and you--I know the= =0A= enemy=2C or at least its name.=0A= =0A= >> In seams that one can use "\u00BA" to achieve this in a string constant= =3B=0A= >> it=0A= >> evaluates to a multibyte string containing the integer 186:=0A= >>=0A= >>=A0=A0=A0 "\u00BA"=0A= >>=A0=A0=A0 --> "=BA"=0A= >=0A= > Why can't you simply use the =BA character? why do you need to use its=0A= > codepoint?=0A= =0A= Of course this would be possible. As said above=2C the focus here lies in t= he=0A= rather abstract Lisp topic=2C namely the conversion a hex code-point to a= =0A= string.=0A= =0A= >> ... For example the constant "\x3FFFBA" is an unibyte string=0A= >> containing the integer 186:=0A= >>=0A= >>=A0=A0=A0 "\x3FFFBA"=0A= >>=A0=A0=A0 --> "\272"=0A= >=0A= > "Contains" is incorrect here.=A0 That constant _represents_ a raw byte=0A= > whose value is 186.=A0 Emacs goes out of its way under the hood to show= =0A= > you 186 when the buffer or string contains 0x3FFFBA.=0A= =0A= What is the correct parlance here: Is it correct to say that the constant= =0A= "\x3FFFBA\x3FFFBB\x3FFFBC" is not a string because it does not contain (?)= =0A= any characters=3B rather it is just a sequence of raw bytes?=0A= =0A= >> ...=0A= >> This seems to be an undocumented feature.=0A= >=0A= > It's barely documented in the node "Text Representations" in the ELisp=0A= > manual.=0A= =0A= I knew that=2C and that the range [#x3FFF80..#x3FFFFF] of code-points is us= ed=0A= for the multibyte representation of raw bytes I learned from section "32.3= =0A= Converting Text Representations". My surprise concerning the behavior of=0A= "\x3FFFBA" refers to the fact=2C that it is a unibyte string--from the sent= ence=0A= "But beware:..." in section "2.3.8.2 Non-ASCII Characters in Strings" of th= e=0A= ELisp manual I thought it would be different. (But this was just my faulty= =0A= interpretation.)=0A= =0A= > This is a tricky issue=2C so you are well advised to stay away of=0A= > unibyte strings as much as you can=2C for your sanity's sake.=0A= =0A= It was not my fault--"\xBA" is the bad guy.=0A= =0A= >> ...=0A= >=0A= > Don't try to learn about unibyte/multibyte strings using ASCII=0A= > characters as examples=2C because ASCII is treated specially for obvious= =0A= > reasons.=0A= =0A= Okay.=0A= =0A= > ...=0A= >=0A= > Yes=2C and therefore you don't need to consider the multibyte property.= =0A= >=0A= >> ...=0A= >=0A= > As they should: you are comparing a character with a raw byte.=0A= >=0A= >> ... definition of the term character according to which a character=0A= >> actually=0A= >> _is_ that integer (cf. lisp manual=2C section "2.3.3 Character Type").= =0A= >=0A= > It is an integer=2C but note that no one told you anywhere that a raw=0A= > byte is a character.=A0 It's a raw byte.=0A= =0A= Ah=2C that seems to be the key: raw bytes are not characters. (Up to now I= =0A= thought that raw bytes are a special set of characters that have different= =0A= representations in unibyte and multibyte contexts.) This distinction remove= s=0A= all the apparent ambiguities.=0A= =0A= In spite of my previous promise not to try to learn something about the=0A= unibyte/multibyte topic from ASCII=2C I shily dare to ask another question = in=0A= this context (don't beat me): Does the A in the unibyte string "A" represen= t=0A= a character or a raw byte? Or both? In the latter case=2C is this that spec= ial=0A= treatment of ASCII you talked about before?=0A= =0A= > I'd still suggest that you try as much as you can not to use unibyte=0A= > strings in your Lisp applications.=A0 That way lies madness.=0A= =0A= I will try to follow that advice--and I hope that it is not too late...=0A= =0A= So=2C thank you very much for your enlightening answers.=0A= =0A= Juergen=0A= =0A= =