From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.help
Subject: Re: [Solved] RE: Differences between identical strings in Emacs lisp
Date: Tue, 07 Apr 2015 20:28:59 +0300
Message-ID: <834morj19g.fsf@gnu.org>
References: <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
	<87pp7gu7by.fsf@kuiper.lan.informatimago.com>
	<DUB124-W47A919DD324708DB061CCA8FD0@phx.gbl>
	<83mw2khvc1.fsf@gnu.org>
	<DUB124-W45A69F132A4E17352678BBA8FD0@phx.gbl>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
X-Trace: ger.gmane.org 1428427756 32438 80.91.229.3 (7 Apr 2015 17:29:16 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 7 Apr 2015 17:29:16 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Apr 07 19:29:08 2015
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1YfXJE-0002jf-7Q
	for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 19:29:08 +0200
Original-Received: from localhost ([::1]:47680 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1YfXJD-0004LM-EW
	for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 13:29:07 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52174)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1YfXJ1-0004L6-RD
	for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:28:57 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1YfXIz-0003VK-17
	for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:28:55 -0400
Original-Received: from mtaout22.012.net.il ([80.179.55.172]:44811)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1YfXIy-0003VE-PD
	for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:28:52 -0400
Original-Received: from conversion-daemon.a-mtaout22.012.net.il by
	a-mtaout22.012.net.il (HyperSendmail v2007.08) id
	<0NMG00F005VIGF00@a-mtaout22.012.net.il> for
	help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 20:28:51 +0300 (IDT)
Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0NMG00FFW5W29D80@a-mtaout22.012.net.il> for
	help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 20:28:51 +0300 (IDT)
In-reply-to: <DUB124-W45A69F132A4E17352678BBA8FD0@phx.gbl>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: Solaris 10
X-Received-From: 80.179.55.172
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:103560
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/103560>

> From: J=FCrgen Hartmann <juergen_hartmann_@hotmail.com>
> Date: Tue, 7 Apr 2015 19:02:38 +0200
>=20
> Thank you for your comments and your caring advises, Eli Zaretskii:
>=20
> > May I ask why you need to mess with unibyte strings?=A0 (Your ori=
ginal
> > message doesn't seem to present a real problem, just something th=
at
> > puzzled you.)
>=20
> That's right: I was trying to learn something about the basic Lisp =
data types
> and their constants and, as a side effect, trying to understand som=
e of these
> "cryptic" read and write sequences that one sees in Emacs from time=
 to time.

A worthy goal.

> First I thought that some hidden decoding based on some charsets or=
 coding
> systems occurs.

Actually, some sort of "decoding" does occur, albeit perhaps not in
the use cases you tried -- Emacs will sometimes silently convert
unibyte characters to their locale-dependent multibyte equivalents.

This whole area of unibyte strings is replete with dwim-ish hacks and
kludges, all in an attempt to do what the user expects.  Thus the
confusion and the advice to stay away of that gray area.

> >> ... For example the constant "\x3FFFBA" is an unibyte string
> >> containing the integer 186:
> >>
> >>=A0=A0=A0 "\x3FFFBA"
> >>=A0=A0=A0 --> "\272"
> >
> > "Contains" is incorrect here.=A0 That constant _represents_ a raw=
 byte
> > whose value is 186.=A0 Emacs goes out of its way under the hood t=
o show
> > you 186 when the buffer or string contains 0x3FFFBA.
>=20
> What is the correct parlance here: Is it correct to say that the co=
nstant
> "\x3FFFBA\x3FFFBB\x3FFFBC" is not a string because it does not cont=
ain (?)
> any characters; rather it is just a sequence of raw bytes?

It's a "unibyte string", which, by definition, contains raw bytes.

But it is actually better to say that the raw bytes there are \272 an=
d
not \x3FFFBC.  The latter is just the representation Emacs uses for
the former, Emacs goes out of its way not to show that internal
representation to the user.

> >> ... definition of the term character according to which a charac=
ter
> >> actually
> >> _is_ that integer (cf. lisp manual, section "2.3.3 Character Typ=
e").
> >
> > It is an integer, but note that no one told you anywhere that a r=
aw
> > byte is a character.=A0 It's a raw byte.
>=20
> Ah, that seems to be the key: raw bytes are not characters.

Exactly.

> (Up to now I thought that raw bytes are a special set of characters
> that have different representations in unibyte and multibyte
> contexts.)

They _are_ a special "character set", but only in the very technical
sense of "character set" in Emacs.  By their nature and their
properties in Emacs, they are not characters.

> In spite of my previous promise not to try to learn something about=
 the
> unibyte/multibyte topic from ASCII, I shily dare to ask another que=
stion in
> this context (don't beat me): Does the A in the unibyte string "A" =
represent
> a character or a raw byte? Or both? In the latter case, is this tha=
t special
> treatment of ASCII you talked about before?

Raw bytes are only those whose value is above 127, so A is a
character.

For subtle technical reasons (or maybe by some historical accident), =
a
pure-ASCII string is a unibyte string, although it contains
characters, not raw bytes.  So having a unibyte string does not yet
mean you have raw bytes in it.

> > I'd still suggest that you try as much as you can not to use unib=
yte
> > strings in your Lisp applications.=A0 That way lies madness.
>=20
> I will try to follow that advice--and I hope that it is not too lat=
e...

By far the only valid use case where you need to manipulate unibyte
strings of raw bytes is if you need to encode or decode strings by
calling encode-coding-region and its ilk.  E.g., an application that
needs to send base64-encoded text needs first to encode it using
whatever coding-system is appropriate, which produces unibyte text
containing raw bytes, and then call base64-encode-region to produce
the final result.  And similarly for decoding such stuff.  You will
see examples of this in Gnus and Rmail, for example.

> So, thank you very much for your enlightening answers.

You are welcome.