all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Jürgen Hartmann" <juergen_hartmann_@hotmail.com>
To: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
Subject: RE: [Solved] RE: Differences between identical strings in Emacs lisp
Date: Wed, 8 Apr 2015 13:01:16 +0200	[thread overview]
Message-ID: <DUB124-W33489FF54D2A924A11EBBAA8FC0@phx.gbl> (raw)
In-Reply-To: <834morj19g.fsf@gnu.org>

Thank you, Eli Zaretskii, for your explanations:

>> [About mapping between unibyte and multibyte strings]
>>
>> First I thought that some hidden decoding based on some charsets or
>> coding
>> systems occurs.
>
> Actually, some sort of "decoding" does occur, albeit perhaps not in
> the use cases you tried -- Emacs will sometimes silently convert
> unibyte characters to their locale-dependent multibyte equivalents.

On which occasion such a conversion is done? Has this anything to do with the
the charset that is individually defined in language-info-alist for nearly
each language environment?

> This whole area of unibyte strings is replete with dwim-ish hacks and
> kludges, all in an attempt to do what the user expects.  Thus the
> confusion and the advice to stay away of that gray area.

Sounds like the well known design conflict between "behaving smart" and
"being straight".

>> [About "\x3FFFBA\x3FFFBB\x3FFFBC"]
>
> It's a "unibyte string", which, by definition, contains raw bytes.
>
> But it is actually better to say that the raw bytes there are \272 and
> not \x3FFFBC.  The latter is just the representation Emacs uses for
> the former, Emacs goes out of its way not to show that internal
> representation to the user.
>
>> ...
>>
>> Ah, that seems to be the key: raw bytes are not characters.
>
> Exactly.

Great! Lesson learned.

>> [About raw bytes]
>
> They _are_ a special "character set", but only in the very technical
> sense of "character set" in Emacs.  By their nature and their
> properties in Emacs, they are not characters.
>
>> [About characters and raw bytes in unibyte context]
>
> Raw bytes are only those whose value is above 127, so A is a
> character.
>
> For subtle technical reasons (or maybe by some historical accident), a
> pure-ASCII string is a unibyte string, although it contains
> characters, not raw bytes.  So having a unibyte string does not yet
> mean you have raw bytes in it.

It seems that all my related observations that puzzled me before can be well
explained by the strict distinction between characters and raw bytes and the
mapping between the latter's integer representations in the range
[0x80..0xFF] in an unibyte context and in the range [0x3FFF80..0x3FFFFF] in a
multibyte context.

> By far the only valid use case where you need to manipulate unibyte
> strings of raw bytes is if you need to encode or decode strings by
> calling encode-coding-region and its ilk.  E.g., an application that
> needs to send base64-encoded text needs first to encode it using
> whatever coding-system is appropriate, which produces unibyte text
> containing raw bytes, and then call base64-encode-region to produce
> the final result.  And similarly for decoding such stuff.  You will
> see examples of this in Gnus and Rmail, for example.
>
>> So, thank you very much for your enlightening answers.
>
> You are welcome.

Thank you very much.

Juergen

 		 	   		  

  reply	other threads:[~2015-04-08 11:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
2015-04-07  0:10 ` Differences between identical strings in Emacs lisp Pascal J. Bourguignon
2015-04-07 13:55   ` [Solved] " Jürgen Hartmann
2015-04-07 14:22     ` Eli Zaretskii
2015-04-07 17:02       ` Jürgen Hartmann
2015-04-07 17:28         ` Eli Zaretskii
2015-04-08 11:01           ` Jürgen Hartmann [this message]
2015-04-08 11:59             ` Eli Zaretskii
2015-04-08 12:37               ` Stefan Monnier
2015-04-09 10:38                 ` Jürgen Hartmann
2015-04-09 12:32                   ` Stefan Monnier
2015-04-09 12:45                   ` Eli Zaretskii
2015-04-10  2:35                     ` Richard Wordingham
2015-04-10  4:46                       ` Stefan Monnier
2015-04-10 12:24                         ` Jürgen Hartmann
2015-04-09 10:36               ` Jürgen Hartmann
2015-04-07 18:24         ` Thien-Thi Nguyen
2015-04-09 10:40           ` Jürgen Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DUB124-W33489FF54D2A924A11EBBAA8FC0@phx.gbl \
    --to=juergen_hartmann_@hotmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.