From: "Jürgen Hartmann" <juergen_hartmann_@hotmail.com>
To: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
Subject: RE: [Solved] RE: Differences between identical strings in Emacs lisp
Date: Tue, 7 Apr 2015 19:02:38 +0200 [thread overview]
Message-ID: <DUB124-W45A69F132A4E17352678BBA8FD0@phx.gbl> (raw)
In-Reply-To: <83mw2khvc1.fsf@gnu.org>
Thank you for your comments and your caring advises, Eli Zaretskii:
> May I ask why you need to mess with unibyte strings? (Your original
> message doesn't seem to present a real problem, just something that
> puzzled you.)
That's right: I was trying to learn something about the basic Lisp data types
and their constants and, as a side effect, trying to understand some of these
"cryptic" read and write sequences that one sees in Emacs from time to time.
Doing so it was "\xBA" that unnoticeable lured me into the land of the
unicode strings. And being there, as you warn below, the confusion started.
First I thought that some hidden decoding based on some charsets or coding
systems occurs. But now--thanks to Pascal Bourguignon and you--I know the
enemy, or at least its name.
>> In seams that one can use "\u00BA" to achieve this in a string constant;
>> it
>> evaluates to a multibyte string containing the integer 186:
>>
>> "\u00BA"
>> --> "º"
>
> Why can't you simply use the º character? why do you need to use its
> codepoint?
Of course this would be possible. As said above, the focus here lies in the
rather abstract Lisp topic, namely the conversion a hex code-point to a
string.
>> ... For example the constant "\x3FFFBA" is an unibyte string
>> containing the integer 186:
>>
>> "\x3FFFBA"
>> --> "\272"
>
> "Contains" is incorrect here. That constant _represents_ a raw byte
> whose value is 186. Emacs goes out of its way under the hood to show
> you 186 when the buffer or string contains 0x3FFFBA.
What is the correct parlance here: Is it correct to say that the constant
"\x3FFFBA\x3FFFBB\x3FFFBC" is not a string because it does not contain (?)
any characters; rather it is just a sequence of raw bytes?
>> ...
>> This seems to be an undocumented feature.
>
> It's barely documented in the node "Text Representations" in the ELisp
> manual.
I knew that, and that the range [#x3FFF80..#x3FFFFF] of code-points is used
for the multibyte representation of raw bytes I learned from section "32.3
Converting Text Representations". My surprise concerning the behavior of
"\x3FFFBA" refers to the fact, that it is a unibyte string--from the sentence
"But beware:..." in section "2.3.8.2 Non-ASCII Characters in Strings" of the
ELisp manual I thought it would be different. (But this was just my faulty
interpretation.)
> This is a tricky issue, so you are well advised to stay away of
> unibyte strings as much as you can, for your sanity's sake.
It was not my fault--"\xBA" is the bad guy.
>> ...
>
> Don't try to learn about unibyte/multibyte strings using ASCII
> characters as examples, because ASCII is treated specially for obvious
> reasons.
Okay.
> ...
>
> Yes, and therefore you don't need to consider the multibyte property.
>
>> ...
>
> As they should: you are comparing a character with a raw byte.
>
>> ... definition of the term character according to which a character
>> actually
>> _is_ that integer (cf. lisp manual, section "2.3.3 Character Type").
>
> It is an integer, but note that no one told you anywhere that a raw
> byte is a character. It's a raw byte.
Ah, that seems to be the key: raw bytes are not characters. (Up to now I
thought that raw bytes are a special set of characters that have different
representations in unibyte and multibyte contexts.) This distinction removes
all the apparent ambiguities.
In spite of my previous promise not to try to learn something about the
unibyte/multibyte topic from ASCII, I shily dare to ask another question in
this context (don't beat me): Does the A in the unibyte string "A" represent
a character or a raw byte? Or both? In the latter case, is this that special
treatment of ASCII you talked about before?
> I'd still suggest that you try as much as you can not to use unibyte
> strings in your Lisp applications. That way lies madness.
I will try to follow that advice--and I hope that it is not too late...
So, thank you very much for your enlightening answers.
Juergen
next prev parent reply other threads:[~2015-04-07 17:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
2015-04-07 0:10 ` Differences between identical strings in Emacs lisp Pascal J. Bourguignon
2015-04-07 13:55 ` [Solved] " Jürgen Hartmann
2015-04-07 14:22 ` Eli Zaretskii
2015-04-07 17:02 ` Jürgen Hartmann [this message]
2015-04-07 17:28 ` Eli Zaretskii
2015-04-08 11:01 ` Jürgen Hartmann
2015-04-08 11:59 ` Eli Zaretskii
2015-04-08 12:37 ` Stefan Monnier
2015-04-09 10:38 ` Jürgen Hartmann
2015-04-09 12:32 ` Stefan Monnier
2015-04-09 12:45 ` Eli Zaretskii
2015-04-10 2:35 ` Richard Wordingham
2015-04-10 4:46 ` Stefan Monnier
2015-04-10 12:24 ` Jürgen Hartmann
2015-04-09 10:36 ` Jürgen Hartmann
2015-04-07 18:24 ` Thien-Thi Nguyen
2015-04-09 10:40 ` Jürgen Hartmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DUB124-W45A69F132A4E17352678BBA8FD0@phx.gbl \
--to=juergen_hartmann_@hotmail.com \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).