* Differences between identical strings in Emacs lisp
@ 2015-04-06 13:21 Jürgen Hartmann
2015-04-07 21:10 ` Stefan Monnier
0 siblings, 1 reply; 5+ messages in thread
From: Jürgen Hartmann @ 2015-04-06 13:21 UTC (permalink / raw)
To: help-gnu-emacs@gnu.org
What is the difference between the string represented by the constant "\xBA"
and the result of (concat '(#xBA))?
Background:
When I start Emacs 24.4 in Linux with the -Q option and the POSIX locale to
have clean conditions, i.e.
LC_ALL=C emacs -Q
the evaluation of
"\xBA"
in *scratch* (lisp interaction mode) yields a result
printed as "\272".
In contrast to that, the result of
(concat '(#xBA))
is printed as "º", i.e. the "masculine ordinal indicator" glyph in double
quotes. The glyph's character is described by the command describe-char as
follows:
-----------------------------------------------------------------------------
position: 235 of 341 (69%), column: 1
character: º (displayed as º) (codepoint 186, #o272, #xba)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0xBA
script: latin
syntax: _ which means: symbol
category: .:Base, L:Left-to-right (strong), h:Korean, j:Japanese, l:Latin
to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
buffer code: #xC2 #xBA
file code: #xC2 #xBA (encoded by coding system nil)
display: by this font (glyph code)
xft:-unknown-DejaVu Sans Mono-normal-normal-normal-*-15-*-*-*-m-0-iso10646-1 (#x7C)
Character code properties: customize what to show
name: MASCULINE ORDINAL INDICATOR
general-category: Lo (Letter, Other)
decomposition: (super 111) (super 'o')
There are text properties here:
face font-lock-string-face
fontified t
[back]
-----------------------------------------------------------------------------
Obviously the result of (concat '(#xBA)) gets interpreted (decoded) on the
basis of the unicode charset, while "\xBA" is treated as a raw byte.
Comparing these strings directly also shows hat they are different:
(string= "\xBA" (concat '(#xBA)))
evaluates to nil.
On the other hand, the expressions
(append "\xBA" ())
and
(append (concat '(#xBA)) ())
both evaluate to (186), indicating that the strings contain the same
character(s). So they are identical.
How to resolve this contradiction?
Since I could not find a clue in the manuals or via google, any explanation,
idea, hint, link is greatly appreciated.
Juergen
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Differences between identical strings in Emacs lisp
2015-04-06 13:21 Differences between identical strings in Emacs lisp Jürgen Hartmann
@ 2015-04-07 21:10 ` Stefan Monnier
2015-04-08 11:02 ` Jürgen Hartmann
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Monnier @ 2015-04-07 21:10 UTC (permalink / raw)
To: help-gnu-emacs
> both evaluate to (186), indicating that the strings contain the same
> character(s). So they are identical.
No: the "\xBA" string does not contain any character, it only contains
bytes (we call such "string of bytes" a "unibyte string" and the usual
"string of characters" is called a "multibyte string").
And yes, the (integer) codes of the bytes of "\xBA" happen to be
identical to the (integer) codes of the characters of (concat '(#xBA)).
So (equal (append "\xBA" nil) (append "º" nil)) is non-nil.
Note that the same applies to: (equal (append "\xBA" nil) (append [#xBA] nil))
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Differences between identical strings in Emacs lisp
2015-04-07 21:10 ` Stefan Monnier
@ 2015-04-08 11:02 ` Jürgen Hartmann
2015-04-08 13:44 ` Jürgen Hartmann
0 siblings, 1 reply; 5+ messages in thread
From: Jürgen Hartmann @ 2015-04-08 11:02 UTC (permalink / raw)
To: help-gnu-emacs@gnu.org
@Stefan Monnier: Thank you for your clarification:
>> both evaluate to (186), indicating that the strings contain the same
>> character(s). So they are identical.
>
> No: the "\xBA" string does not contain any character, it only contains
> bytes (we call such "string of bytes" a "unibyte string" and the usual
> "string of characters" is called a "multibyte string").
That's very important, in deed--according to the golden rule:
"Clarity of concept requires clarity of terms."
> And yes, the (integer) codes of the bytes of "\xBA" happen to be
> identical to the (integer) codes of the characters of (concat '(#xBA)).
>
> So (equal (append "\xBA" nil) (append "º" nil)) is non-nil.
> Note that the same applies to: (equal (append "\xBA" nil) (append [#xBA] nil))
I think my problem was that I have missed the type--unibyte vs. multibyte--of
my strings, the fact that characters and raw bytes are different things, and
that the (integer) codes of raw bytes gets converted between unibyte and
multibyte contexts. Because of the latter we have equality for example
between "\xBA" and (concat '(#x3FFFBA)):
(string= "\xBA" (concat '(#x3FFFBA)))
--> t
Again, thank you for your input.
Juergen
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Differences between identical strings in Emacs lisp
2015-04-08 11:02 ` Jürgen Hartmann
@ 2015-04-08 13:44 ` Jürgen Hartmann
0 siblings, 0 replies; 5+ messages in thread
From: Jürgen Hartmann @ 2015-04-08 13:44 UTC (permalink / raw)
To: help-gnu-emacs@gnu.org
Argh! Writing about it, I did the same mistake again.
Please forget the wrong example in my previous post:
> Because of the latter we have equality for example
> between "\xBA" and (concat '(#x3FFFBA)):
>
> (string= "\xBA" (concat '(#x3FFFBA)))
> --> t
Of course "\xBA" and "\x3FFFBA" represent the same raw byte \272 and both in
an unibyte string. Therefore they are trivially equal.
And what makes it even more embarrassing: I already wrote it right in another
post before.
Sorry.
Juergen
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>]
* Re: Differences between identical strings in Emacs lisp
[not found] <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
@ 2015-04-07 0:10 ` Pascal J. Bourguignon
0 siblings, 0 replies; 5+ messages in thread
From: Pascal J. Bourguignon @ 2015-04-07 0:10 UTC (permalink / raw)
To: help-gnu-emacs
Jürgen Hartmann <juergen_hartmann_@hotmail.com> writes:
> What is the difference between the string represented by the constant "\xBA"
> and the result of (concat '(#xBA))?
(mapcar 'multibyte-string-p (list "\xBA" (concat '(#xBA))))
--> (nil t)
string-equal (and therefore string=) don't ignore the multibyte property
of a string.
You can use:
(mapcar 'string-as-unibyte (list "\xBA" (concat '(#xBA))))
--> ("\272" "\302\272")
to see the difference.
Now, it's hard to say how to "solve" this problem, basically, you asked
for it: "\xBA" is not a valid way to write a string containing masculine
ordinal.
I guess you could extract back the bytes, and recreate the string
correctly:
(map 'string 'identity (map 'list 'identity "\xBA"))
--> "º"
(string= (map 'string 'identity (map 'list 'identity "\xBA"))
(concat '(#xBA)))
--> t
(On the other hand, one might argue that having both unibyte and
multibyte strings in a lisp implementation is not a good idea, and
there's the opportunity for a big refactoring and simplification).
--
__Pascal Bourguignon__ http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-04-08 13:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-06 13:21 Differences between identical strings in Emacs lisp Jürgen Hartmann
2015-04-07 21:10 ` Stefan Monnier
2015-04-08 11:02 ` Jürgen Hartmann
2015-04-08 13:44 ` Jürgen Hartmann
[not found] <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
2015-04-07 0:10 ` Pascal J. Bourguignon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).