all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: [Solved] RE: Differences between identical strings in Emacs lisp
Date: Tue, 07 Apr 2015 20:28:59 +0300	[thread overview]
Message-ID: <834morj19g.fsf@gnu.org> (raw)
In-Reply-To: <DUB124-W45A69F132A4E17352678BBA8FD0@phx.gbl>

> From: Jürgen Hartmann <juergen_hartmann_@hotmail.com>
> Date: Tue, 7 Apr 2015 19:02:38 +0200
> 
> Thank you for your comments and your caring advises, Eli Zaretskii:
> 
> > May I ask why you need to mess with unibyte strings?  (Your original
> > message doesn't seem to present a real problem, just something that
> > puzzled you.)
> 
> That's right: I was trying to learn something about the basic Lisp data types
> and their constants and, as a side effect, trying to understand some of these
> "cryptic" read and write sequences that one sees in Emacs from time to time.

A worthy goal.

> First I thought that some hidden decoding based on some charsets or coding
> systems occurs.

Actually, some sort of "decoding" does occur, albeit perhaps not in
the use cases you tried -- Emacs will sometimes silently convert
unibyte characters to their locale-dependent multibyte equivalents.

This whole area of unibyte strings is replete with dwim-ish hacks and
kludges, all in an attempt to do what the user expects.  Thus the
confusion and the advice to stay away of that gray area.

> >> ... For example the constant "\x3FFFBA" is an unibyte string
> >> containing the integer 186:
> >>
> >>    "\x3FFFBA"
> >>    --> "\272"
> >
> > "Contains" is incorrect here.  That constant _represents_ a raw byte
> > whose value is 186.  Emacs goes out of its way under the hood to show
> > you 186 when the buffer or string contains 0x3FFFBA.
> 
> What is the correct parlance here: Is it correct to say that the constant
> "\x3FFFBA\x3FFFBB\x3FFFBC" is not a string because it does not contain (?)
> any characters; rather it is just a sequence of raw bytes?

It's a "unibyte string", which, by definition, contains raw bytes.

But it is actually better to say that the raw bytes there are \272 and
not \x3FFFBC.  The latter is just the representation Emacs uses for
the former, Emacs goes out of its way not to show that internal
representation to the user.

> >> ... definition of the term character according to which a character
> >> actually
> >> _is_ that integer (cf. lisp manual, section "2.3.3 Character Type").
> >
> > It is an integer, but note that no one told you anywhere that a raw
> > byte is a character.  It's a raw byte.
> 
> Ah, that seems to be the key: raw bytes are not characters.

Exactly.

> (Up to now I thought that raw bytes are a special set of characters
> that have different representations in unibyte and multibyte
> contexts.)

They _are_ a special "character set", but only in the very technical
sense of "character set" in Emacs.  By their nature and their
properties in Emacs, they are not characters.

> In spite of my previous promise not to try to learn something about the
> unibyte/multibyte topic from ASCII, I shily dare to ask another question in
> this context (don't beat me): Does the A in the unibyte string "A" represent
> a character or a raw byte? Or both? In the latter case, is this that special
> treatment of ASCII you talked about before?

Raw bytes are only those whose value is above 127, so A is a
character.

For subtle technical reasons (or maybe by some historical accident), a
pure-ASCII string is a unibyte string, although it contains
characters, not raw bytes.  So having a unibyte string does not yet
mean you have raw bytes in it.

> > I'd still suggest that you try as much as you can not to use unibyte
> > strings in your Lisp applications.  That way lies madness.
> 
> I will try to follow that advice--and I hope that it is not too late...

By far the only valid use case where you need to manipulate unibyte
strings of raw bytes is if you need to encode or decode strings by
calling encode-coding-region and its ilk.  E.g., an application that
needs to send base64-encoded text needs first to encode it using
whatever coding-system is appropriate, which produces unibyte text
containing raw bytes, and then call base64-encode-region to produce
the final result.  And similarly for decoding such stuff.  You will
see examples of this in Gnus and Rmail, for example.

> So, thank you very much for your enlightening answers.

You are welcome.




  reply	other threads:[~2015-04-07 17:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
2015-04-07  0:10 ` Differences between identical strings in Emacs lisp Pascal J. Bourguignon
2015-04-07 13:55   ` [Solved] " Jürgen Hartmann
2015-04-07 14:22     ` Eli Zaretskii
2015-04-07 17:02       ` Jürgen Hartmann
2015-04-07 17:28         ` Eli Zaretskii [this message]
2015-04-08 11:01           ` Jürgen Hartmann
2015-04-08 11:59             ` Eli Zaretskii
2015-04-08 12:37               ` Stefan Monnier
2015-04-09 10:38                 ` Jürgen Hartmann
2015-04-09 12:32                   ` Stefan Monnier
2015-04-09 12:45                   ` Eli Zaretskii
2015-04-10  2:35                     ` Richard Wordingham
2015-04-10  4:46                       ` Stefan Monnier
2015-04-10 12:24                         ` Jürgen Hartmann
2015-04-09 10:36               ` Jürgen Hartmann
2015-04-07 18:24         ` Thien-Thi Nguyen
2015-04-09 10:40           ` Jürgen Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=834morj19g.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.