From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: =?iso-8859-1?B?SvxyZ2VuIEhhcnRtYW5u?= <juergen_hartmann_@hotmail.com>
Newsgroups: gmane.emacs.help
Subject: [Solved] RE: Differences between identical strings in Emacs lisp
Date: Tue, 7 Apr 2015 15:55:48 +0200
Message-ID: <DUB124-W47A919DD324708DB061CCA8FD0@phx.gbl>
References: <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>,
	<87pp7gu7by.fsf@kuiper.lan.informatimago.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1428414978 880 80.91.229.3 (7 Apr 2015 13:56:18 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 7 Apr 2015 13:56:18 +0000 (UTC)
To: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Apr 07 15:56:10 2015
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1YfTz7-0005yq-Eu
	for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 15:56:09 +0200
Original-Received: from localhost ([::1]:45160 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1YfTz7-0005Ot-19
	for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 09:56:09 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35572)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <juergen_hartmann_@hotmail.com>) id 1YfTyu-0005Mu-6d
	for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 09:55:58 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <juergen_hartmann_@hotmail.com>) id 1YfTyo-0002X1-71
	for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 09:55:56 -0400
Original-Received: from dub004-omc4s15.hotmail.com ([157.55.2.90]:65156)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <juergen_hartmann_@hotmail.com>) id 1YfTyn-0002W9-JE
	for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 09:55:50 -0400
Original-Received: from DUB124-W4 ([157.55.2.73]) by DUB004-OMC4S15.hotmail.com over
	TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); 
	Tue, 7 Apr 2015 06:55:48 -0700
X-TMN: [MLvpdHdtwKx6348ORNrRQBPJ8UCX3X8d]
X-Originating-Email: [juergen_hartmann_@hotmail.com]
Importance: Normal
In-Reply-To: <87pp7gu7by.fsf@kuiper.lan.informatimago.com>
X-OriginalArrivalTime: 07 Apr 2015 13:55:48.0263 (UTC)
	FILETIME=[8D132B70:01D0713A]
X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy]
X-Received-From: 157.55.2.90
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:103556
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/103556>

Thank you Pascal Bourguignon for your explanation:=0A=
=0A=
> ...=0A=
> =0A=
>     (mapcar 'multibyte-string-p (list "\xBA" (concat '(#xBA))))=0A=
>     --> (nil t)=0A=
> =0A=
> string-equal (and therefore string=3D) don't ignore the multibyte propert=
y=0A=
> of a string.=0A=
=0A=
So it's all about the multibyte property?=0A=
=0A=
> You can use:=0A=
> =0A=
>     (mapcar 'string-as-unibyte  (list "\xBA" (concat '(#xBA))))=0A=
>     --> ("\272" "\302\272")=0A=
> =0A=
> to see the difference.=0A=
=0A=
I see: "\xBA" stays as it is--a unibyte string containing the raw character=
=0A=
\272--=2C while the multibyte string (concat '(#xBA)) gets converted in its=
=0A=
UTF-8 unibyte form.=0A=
=0A=
> Now=2C it's hard to say how to "solve" this problem=2C basically=2C you a=
sked=0A=
> for it: "\xBA" is not a valid way to write a string containing masculine=
=0A=
> ordinal.=0A=
=0A=
In seams that one can use "\u00BA" to achieve this in a string constant=3B =
it=0A=
evaluates to a multibyte string containing the integer 186:=0A=
=0A=
   "\u00BA"=0A=
   --> "=BA"=0A=
=0A=
   (multibyte-string-p "\u00BA")=0A=
   --> t=0A=
=0A=
   (append "\u00BA" ())=0A=
   --> (186)=0A=
=0A=
I found it very surprising=2C that it is not only the escape sequences=0A=
(characters) in the string constant that determine its multibyte property=
=2C=0A=
but it is also the other way round: The sequence \x yields=0A=
different results depending on the multibyte property of the string constan=
t=0A=
it is used in. For example the constant "\x3FFFBA" is an unibyte string=0A=
containing the integer 186:=0A=
=0A=
   "\x3FFFBA"=0A=
   --> "\272"=0A=
=0A=
   (multibyte-string-p "\x3FFFBA")=0A=
   --> nil=0A=
=0A=
   (append "\x3FFFBA" ())=0A=
   --> (186)=0A=
=0A=
The constant "\x3FFFBA =C4" on the other hand is a mulibyte string in which=
 the=0A=
sequence \x3FFFBA yields the integer 4194234:=0A=
=0A=
   "\x3FFFBA =C4"=0A=
   --> "\272 =C4"=0A=
=0A=
   (multibyte-string-p "\x3FFFBA =C4")=0A=
   --> t=0A=
=0A=
   (append "\x3FFFBA =C4" ())=0A=
   --> (4194234 32 196)=0A=
=0A=
This seems to be an undocumented feature.=0A=
=0A=
> I guess you could extract back the bytes=2C and recreate the string=0A=
> correctly:=0A=
> =0A=
>     (map 'string 'identity (map 'list 'identity "\xBA"))=0A=
>     --> "=BA"=0A=
> =0A=
>     (string=3D (map 'string 'identity (map 'list 'identity "\xBA"))=0A=
>              (concat '(#xBA)))=0A=
>     --> t=0A=
=0A=
So reassembling the string by means of map 'string results in a string=0A=
containing the same integer as "\xBA"=2C namely 186=2C but as a multibyte s=
tring=0A=
and the according interpretation of its contents?=0A=
=0A=
In this respect it is interesting to compare another pair of strings: "A" a=
nd=0A=
(substring "A=C4" 0 1). Both of them contain the same integer=2C namely 65=
=2C and are=0A=
printed as "A"--they only differ in their multibyte property: The former is=
=0A=
an unibyte string=2C the latter multibyte:=0A=
=0A=
   "A"=0A=
   --> "A"=0A=
=0A=
   (multibyte-string-p "A")=0A=
   --> nil=0A=
=0A=
   (append "A" ())=0A=
   --> (65)=0A=
=0A=
and=0A=
=0A=
   (substring "A=C4" 0 1)=0A=
   --> "A"=0A=
=0A=
   (multibyte-string-p (substring "A=C4" 0 1))=0A=
   --> t=0A=
=0A=
   (append (substring "A=C4" 0 1) ())=0A=
   --> (65)=0A=
=0A=
The point is that they compare equal in spite of their different multibyte=
=0A=
property:=0A=
=0A=
   (string=3D "A" (substring "A=C4" 0 1))=0A=
   --> t=0A=
=0A=
So=2C as you said before: "string-equal (and therefore string=3D) don't ign=
ore=0A=
the multibyte property of a string". But it seems that it is not this=0A=
property per se that makes the difference=2C but the differing interpretati=
on=0A=
of the strings contents as a result of this property.=0A=
=0A=
> (On the other hand=2C one might argue that having both unibyte and=0A=
> multibyte strings in a lisp implementation is not a good idea=2C and=0A=
> there's the opportunity for a big refactoring and simplification).=0A=
>=0A=
> ...=0A=
=0A=
At least it makes it hard to keep the concepts clear.=0A=
=0A=
To illustrate this=2C consider the strings "A" and (substring "A=C4" 0 1) f=
rom=0A=
above. They have the same integer content=2C only differ in their multibyte=
=0A=
property and compare equal.=0A=
=0A=
If we just change their integer values--in both strings alike--from 65 to=
=0A=
186=2C we get the pair "\xBA" and (concat '(#xBA))=2C that we also discusse=
d=0A=
before. Also here the only difference lies in the multibyte property=2C whi=
le=0A=
the integer values are the same. But this time the strings compare differen=
t.=0A=
=0A=
One might say that this is not surprising=2C because this time the integers=
 are=0A=
interpreted as different characters. But this would be in contradiction to=
=0A=
the definition of the term character according to which a character actuall=
y=0A=
_is_ that integer (cf. lisp manual=2C section "2.3.3 Character Type").=0A=
=0A=
Does we come to the limit of the definition of what a character is?=0A=
=0A=
But this gets pretty philosophical. For the practical purpose you helped me=
=0A=
a lot and I think that I got some better feeling for this topic.=0A=
=0A=
Thank you very much.=0A=
=0A=
J=FCrgen=0A=
=0A=
 		 	   		  =