all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Differences between identical strings in Emacs lisp
@ 2015-04-06 13:21 Jürgen Hartmann
  2015-04-07 21:10 ` Stefan Monnier
  0 siblings, 1 reply; 5+ messages in thread
From: Jürgen Hartmann @ 2015-04-06 13:21 UTC (permalink / raw
  To: help-gnu-emacs@gnu.org

What is the difference between the string represented by the constant "\xBA"
and the result of (concat '(#xBA))?

Background:

When I start Emacs 24.4 in Linux with the -Q option and the POSIX locale to
have clean conditions, i.e.

   LC_ALL=C emacs -Q

the evaluation of

   "\xBA"

in *scratch* (lisp interaction mode) yields a result
printed as "\272".

In contrast to that, the result of

   (concat '(#xBA))

is printed as "º", i.e. the "masculine ordinal indicator" glyph in double
quotes. The glyph's character is described by the command describe-char as
follows:

-----------------------------------------------------------------------------
             position: 235 of 341 (69%), column: 1
            character: º (displayed as º) (codepoint 186, #o272, #xba)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0xBA
               script: latin
               syntax: _     which means: symbol
             category: .:Base, L:Left-to-right (strong), h:Korean, j:Japanese, l:Latin
             to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
          buffer code: #xC2 #xBA
            file code: #xC2 #xBA (encoded by coding system nil)
              display: by this font (glyph code)
    xft:-unknown-DejaVu Sans Mono-normal-normal-normal-*-15-*-*-*-m-0-iso10646-1 (#x7C)

Character code properties: customize what to show
  name: MASCULINE ORDINAL INDICATOR
  general-category: Lo (Letter, Other)
  decomposition: (super 111) (super 'o')

There are text properties here:
  face                 font-lock-string-face
  fontified            t

[back]
-----------------------------------------------------------------------------

Obviously the result of (concat '(#xBA)) gets interpreted (decoded) on the
basis of the unicode charset, while "\xBA" is treated as a raw byte.

Comparing these strings directly also shows hat they are different:

   (string= "\xBA" (concat '(#xBA)))

evaluates to nil.

On the other hand, the expressions

   (append "\xBA" ())

and

   (append (concat '(#xBA)) ())

both evaluate to (186), indicating that the strings contain the same
character(s). So they are identical.

How to resolve this contradiction?

Since I could not find a clue in the manuals or via google, any explanation,
idea, hint, link is greatly appreciated.

Juergen

 		 	   		  


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Differences between identical strings in Emacs lisp
       [not found] <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
@ 2015-04-07  0:10 ` Pascal J. Bourguignon
  0 siblings, 0 replies; 5+ messages in thread
From: Pascal J. Bourguignon @ 2015-04-07  0:10 UTC (permalink / raw
  To: help-gnu-emacs

Jürgen Hartmann <juergen_hartmann_@hotmail.com> writes:

> What is the difference between the string represented by the constant "\xBA"
> and the result of (concat '(#xBA))?

    (mapcar 'multibyte-string-p (list "\xBA" (concat '(#xBA))))
    --> (nil t)

string-equal (and therefore string=) don't ignore the multibyte property
of a string.


You can use:

    (mapcar 'string-as-unibyte  (list "\xBA" (concat '(#xBA))))
    --> ("\272" "\302\272")

to see the difference.



Now, it's hard to say how to "solve" this problem, basically, you asked
for it: "\xBA" is not a valid way to write a string containing masculine
ordinal.

I guess you could extract back the bytes, and recreate the string
correctly:

    (map 'string 'identity (map 'list 'identity "\xBA"))
    --> "º"

    (string= (map 'string 'identity (map 'list 'identity "\xBA"))
             (concat '(#xBA)))
    --> t



(On the other hand, one might argue that having both unibyte and
multibyte strings in a lisp implementation is not a good idea, and
there's the opportunity for a big refactoring and simplification).

-- 
__Pascal Bourguignon__                 http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Differences between identical strings in Emacs lisp
  2015-04-06 13:21 Differences between identical strings in Emacs lisp Jürgen Hartmann
@ 2015-04-07 21:10 ` Stefan Monnier
  2015-04-08 11:02   ` Jürgen Hartmann
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Monnier @ 2015-04-07 21:10 UTC (permalink / raw
  To: help-gnu-emacs

> both evaluate to (186), indicating that the strings contain the same
> character(s).  So they are identical.

No: the "\xBA" string does not contain any character, it only contains
bytes (we call such "string of bytes" a "unibyte string" and the usual
"string of characters" is called a "multibyte string").

And yes, the (integer) codes of the bytes of "\xBA" happen to be
identical to the (integer) codes of the characters of (concat '(#xBA)).

So (equal (append "\xBA" nil) (append "º" nil)) is non-nil.
Note that the same applies to: (equal (append "\xBA" nil) (append [#xBA] nil))


        Stefan




^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Differences between identical strings in Emacs lisp
  2015-04-07 21:10 ` Stefan Monnier
@ 2015-04-08 11:02   ` Jürgen Hartmann
  2015-04-08 13:44     ` Jürgen Hartmann
  0 siblings, 1 reply; 5+ messages in thread
From: Jürgen Hartmann @ 2015-04-08 11:02 UTC (permalink / raw
  To: help-gnu-emacs@gnu.org

@Stefan Monnier: Thank you for your clarification:

>> both evaluate to (186), indicating that the strings contain the same
>> character(s).  So they are identical.
>
> No: the "\xBA" string does not contain any character, it only contains
> bytes (we call such "string of bytes" a "unibyte string" and the usual
> "string of characters" is called a "multibyte string").

That's very important, in deed--according to the golden rule:
"Clarity of concept requires clarity of terms."

> And yes, the (integer) codes of the bytes of "\xBA" happen to be
> identical to the (integer) codes of the characters of (concat '(#xBA)).
>
> So (equal (append "\xBA" nil) (append "º" nil)) is non-nil.
> Note that the same applies to: (equal (append "\xBA" nil) (append [#xBA] nil))

I think my problem was that I have missed the type--unibyte vs. multibyte--of
my strings, the fact that characters and raw bytes are different things, and
that the (integer) codes of raw bytes gets converted between unibyte and
multibyte contexts. Because of the latter we have equality for example
between "\xBA" and (concat '(#x3FFFBA)):

   (string= "\xBA" (concat '(#x3FFFBA)))
   --> t

Again, thank you for your input.

Juergen

 		 	   		  


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Differences between identical strings in Emacs lisp
  2015-04-08 11:02   ` Jürgen Hartmann
@ 2015-04-08 13:44     ` Jürgen Hartmann
  0 siblings, 0 replies; 5+ messages in thread
From: Jürgen Hartmann @ 2015-04-08 13:44 UTC (permalink / raw
  To: help-gnu-emacs@gnu.org

Argh! Writing about it, I did the same mistake again.

Please forget the wrong example in my previous post:

> Because of the latter we have equality for example
> between "\xBA" and (concat '(#x3FFFBA)):
>
>    (string= "\xBA" (concat '(#x3FFFBA)))
>    --> t

Of course "\xBA" and "\x3FFFBA" represent the same raw byte \272 and both in
an unibyte string. Therefore they are trivially equal.

And what makes it even more embarrassing: I already wrote it right in another
post before.

Sorry.

Juergen

 		 	   		  


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-08 13:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-06 13:21 Differences between identical strings in Emacs lisp Jürgen Hartmann
2015-04-07 21:10 ` Stefan Monnier
2015-04-08 11:02   ` Jürgen Hartmann
2015-04-08 13:44     ` Jürgen Hartmann
     [not found] <mailman.76.1428326518.904.help-gnu-emacs@gnu.org>
2015-04-07  0:10 ` Pascal J. Bourguignon

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.