From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: [Solved] RE: Differences between identical strings in Emacs lisp Date: Tue, 07 Apr 2015 20:28:59 +0300 Message-ID: <834morj19g.fsf@gnu.org> References: <87pp7gu7by.fsf@kuiper.lan.informatimago.com> <83mw2khvc1.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1428427756 32438 80.91.229.3 (7 Apr 2015 17:29:16 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 7 Apr 2015 17:29:16 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Apr 07 19:29:08 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YfXJE-0002jf-7Q for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 19:29:08 +0200 Original-Received: from localhost ([::1]:47680 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfXJD-0004LM-EW for geh-help-gnu-emacs@m.gmane.org; Tue, 07 Apr 2015 13:29:07 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52174) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfXJ1-0004L6-RD for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:28:57 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YfXIz-0003VK-17 for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:28:55 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:44811) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfXIy-0003VE-PD for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 13:28:52 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NMG00F005VIGF00@a-mtaout22.012.net.il> for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 20:28:51 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NMG00FFW5W29D80@a-mtaout22.012.net.il> for help-gnu-emacs@gnu.org; Tue, 07 Apr 2015 20:28:51 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103560 Archived-At: > From: J=FCrgen Hartmann > Date: Tue, 7 Apr 2015 19:02:38 +0200 >=20 > Thank you for your comments and your caring advises, Eli Zaretskii: >=20 > > May I ask why you need to mess with unibyte strings?=A0 (Your ori= ginal > > message doesn't seem to present a real problem, just something th= at > > puzzled you.) >=20 > That's right: I was trying to learn something about the basic Lisp = data types > and their constants and, as a side effect, trying to understand som= e of these > "cryptic" read and write sequences that one sees in Emacs from time= to time. A worthy goal. > First I thought that some hidden decoding based on some charsets or= coding > systems occurs. Actually, some sort of "decoding" does occur, albeit perhaps not in the use cases you tried -- Emacs will sometimes silently convert unibyte characters to their locale-dependent multibyte equivalents. This whole area of unibyte strings is replete with dwim-ish hacks and kludges, all in an attempt to do what the user expects. Thus the confusion and the advice to stay away of that gray area. > >> ... For example the constant "\x3FFFBA" is an unibyte string > >> containing the integer 186: > >> > >>=A0=A0=A0 "\x3FFFBA" > >>=A0=A0=A0 --> "\272" > > > > "Contains" is incorrect here.=A0 That constant _represents_ a raw= byte > > whose value is 186.=A0 Emacs goes out of its way under the hood t= o show > > you 186 when the buffer or string contains 0x3FFFBA. >=20 > What is the correct parlance here: Is it correct to say that the co= nstant > "\x3FFFBA\x3FFFBB\x3FFFBC" is not a string because it does not cont= ain (?) > any characters; rather it is just a sequence of raw bytes? It's a "unibyte string", which, by definition, contains raw bytes. But it is actually better to say that the raw bytes there are \272 an= d not \x3FFFBC. The latter is just the representation Emacs uses for the former, Emacs goes out of its way not to show that internal representation to the user. > >> ... definition of the term character according to which a charac= ter > >> actually > >> _is_ that integer (cf. lisp manual, section "2.3.3 Character Typ= e"). > > > > It is an integer, but note that no one told you anywhere that a r= aw > > byte is a character.=A0 It's a raw byte. >=20 > Ah, that seems to be the key: raw bytes are not characters. Exactly. > (Up to now I thought that raw bytes are a special set of characters > that have different representations in unibyte and multibyte > contexts.) They _are_ a special "character set", but only in the very technical sense of "character set" in Emacs. By their nature and their properties in Emacs, they are not characters. > In spite of my previous promise not to try to learn something about= the > unibyte/multibyte topic from ASCII, I shily dare to ask another que= stion in > this context (don't beat me): Does the A in the unibyte string "A" = represent > a character or a raw byte? Or both? In the latter case, is this tha= t special > treatment of ASCII you talked about before? Raw bytes are only those whose value is above 127, so A is a character. For subtle technical reasons (or maybe by some historical accident), = a pure-ASCII string is a unibyte string, although it contains characters, not raw bytes. So having a unibyte string does not yet mean you have raw bytes in it. > > I'd still suggest that you try as much as you can not to use unib= yte > > strings in your Lisp applications.=A0 That way lies madness. >=20 > I will try to follow that advice--and I hope that it is not too lat= e... By far the only valid use case where you need to manipulate unibyte strings of raw bytes is if you need to encode or decode strings by calling encode-coding-region and its ilk. E.g., an application that needs to send base64-encoded text needs first to encode it using whatever coding-system is appropriate, which produces unibyte text containing raw bytes, and then call base64-encode-region to produce the final result. And similarly for decoding such stuff. You will see examples of this in Gnus and Rmail, for example. > So, thank you very much for your enlightening answers. You are welcome.