bug#4848: 23.1.50; \u and \x in string

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug#4848: 23.1.50; \u and \x in string
@ 2009-11-02  5:31 Richard Stallman
  2009-11-02  7:17 ` Stefan Monnier
  2016-06-14  2:45 ` Noam Postavsky
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Stallman @ 2009-11-02  5:31 UTC (permalink / raw)
  To: emacs-pretest-bug

"\ue1" gives the error "Non-hex digit used for Unicode escape".
Why doesn't it work to give the Unicode character Ã¡?

Note that \xe1 does not work for this any more.
It gives a different character, which displays as \341 and
is described as follows by C-x =.

  Char: \341 (4194273, #o17777741, #x3fffe1, raw-byte) point=442 of 2980 (15%) column=0

That too is confusing, and certainly not documented clearly where \x
is explained.  Is there any way to specify unicode e1 with \x?


In GNU Emacs 23.1.50.4 (mipsel-unknown-linux-gnu, GTK+ Version 2.12.12)
 of 2009-08-11 on theobromine2
configured using `configure  'CFLAGS=-O0 -g -Wno-pointer-sign' 'mipsel-unknown-linux-gnu' 'build_alias=mipsel-unknown-linux-gnu' 'host_alias=mipsel-unknown-linux-gnu' 'target_alias=mipsel-unknown-linux-gnu''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: RMAIL Edit

Minor modes in effect:
  shell-dirtrack-mode: t
  diff-auto-refine-mode: t
  gpm-mouse-mode: t
  display-battery-mode: t
  tooltip-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t
  abbrev-mode: t

Recent input:
b R TAB RET ESC < C-u C-n C-u C-u C-n C-u C-n C-n C-n 
C-n C-f 4 b o u t C-_ C-x b o u t - 2 2 RET C-a C-p 
C-x 4 b R TAB RET C-u ESC x c o m p a r e RET C-x o 
C-x o C-x b RET C-b C-b C-b C-b | ESC C-x C-x C-s C-x 
b RET C-x o C-b C-b C-x ESC ESC ESC p ESC p RET C-x 
o C-x o C-x o C-x C-g C-x 4 b RET C-a ESC f C-f C-@ 
ESC C-f ESC w ESC : C-y RET C-x o ESC : ( l o o k i 
n g - a t SPC C-y ) RET C-x o C-e ESC b ESC d 2 4 0 
ESC C-x C-x o ESC : ESC p RET C-x = C-x o o C-_ C-x 
o ESC : ESC p C-e ESC DEL ESC DEL ESC DEL " \ 2 4 0 
DEL DEL DEL x a 0 " ) RET C-u C-x = C-\ a ' C-g e C-x 
= C-f a ' C-b C-x = ESC : ESC p C-e C-b C-b ESC DEL 
DEL C-\ a ' C-e RET C-x = ESC : ESC p C-e C-b C-b DEL 
\ 3 4 1 RET C-x = ESC : ESC p C-e C-b C-b DEL DEL DEL 
x e 1 RET C-x = ESC : ESC p C-e C-b C-b C-b C-b DEL 
u C-e RET ESC : ESC p C-e C-b C-b C-b C-b ESC u C-e 
RET ESC : ESC p C-e C-b C-b C-b C-b 0 0 C-e RET ESC 
x r e p o r t SPC e m a c s SPC b u g RET

Recent messages:
Char: =e1 (225, #o341, #xe1) point=1382 of 28873 (5%) column=57
t
Char: =e1 (225, #o341, #xe1) point=1382 of 28873 (5%) column=57
nil
Char: =e1 (225, #o341, #xe1) point=1382 of 28873 (5%) column=57
nil
Char: =e1 (225, #o341, #xe1) point=1382 of 28873 (5%) column=57
let: Non-hex digit used for Unicode escape [2 times]
t
Source file `/home/rms/emacs-cvs/lisp/mail/emacsbug.el' newer than byte-compiled file

Load-path shadows:
None found.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-02  5:31 bug#4848: 23.1.50; \u and \x in string Richard Stallman
@ 2009-11-02  7:17 ` Stefan Monnier
  2009-11-02  7:33   ` Jason Rumney
  2009-11-03 13:39   ` Richard Stallman
  2016-06-14  2:45 ` Noam Postavsky
  1 sibling, 2 replies; 10+ messages in thread
From: Stefan Monnier @ 2009-11-02  7:17 UTC (permalink / raw)
  To: rms; +Cc: emacs-pretest-bug, 4848

> "\ue1" gives the error "Non-hex digit used for Unicode escape".
> Why doesn't it work to give the Unicode character Ã¡?

I think you mean \u00e1

> Note that \xe1 does not work for this any more.

Indeed, this refers to the byte 225 rather than to the char 225.

> That too is confusing, and certainly not documented clearly where \x
> is explained.  Is there any way to specify unicode e1 with \x?

\x00e1 also works like \u00e1.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-02  7:17 ` Stefan Monnier
@ 2009-11-02  7:33   ` Jason Rumney
  2009-11-03 13:39   ` Richard Stallman
  1 sibling, 0 replies; 10+ messages in thread
From: Jason Rumney @ 2009-11-02  7:33 UTC (permalink / raw)
  To: Stefan Monnier, 4848; +Cc: rms

Stefan Monnier wrote:

>> "\ue1" gives the error "Non-hex digit used for Unicode escape".
>> Why doesn't it work to give the Unicode character Ã¡?
>>     
>
> I think you mean \u00e1
>   

I think the error message means "Insufficient hex digits used for 
Unicode escape".

>> Note that \xe1 does not work for this any more.
>>     
>
> Indeed, this refers to the byte 225 rather than to the char 225.
>   
>
> \x00e1 also works like \u00e1.
>   

That is definitely confusing.






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-02  7:17 ` Stefan Monnier
  2009-11-02  7:33   ` Jason Rumney
@ 2009-11-03 13:39   ` Richard Stallman
  2009-11-03 14:49     ` Stefan Monnier
  2009-11-03 18:35     ` Eli Zaretskii
  1 sibling, 2 replies; 10+ messages in thread
From: Richard Stallman @ 2009-11-03 13:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-pretest-bug, 4848

    > "\ue1" gives the error "Non-hex digit used for Unicode escape".
    > Why doesn't it work to give the Unicode character Ã¡?

    I think you mean \u00e1

Why shouldn't \ue1 work?

    > Note that \xe1 does not work for this any more.

    Indeed, this refers to the byte 225 rather than to the char 225.

This needs to be documented.  But is it a good meaning for \x?  It
will rarely be useful this way.  Also, is it an incompatible change?





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-03 13:39   ` Richard Stallman
@ 2009-11-03 14:49     ` Stefan Monnier
  2009-11-05  1:57       ` Richard Stallman
  2009-11-03 18:35     ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2009-11-03 14:49 UTC (permalink / raw)
  To: rms; +Cc: emacs-pretest-bug, 4848

>> "\ue1" gives the error "Non-hex digit used for Unicode escape".
>> Why doesn't it work to give the Unicode character Ã¡?
>     I think you mean \u00e1
> Why shouldn't \ue1 work?

Because the \u format is \uNNNN with exactly 4 hex digits.

>> Note that \xe1 does not work for this any more.
>     Indeed, this refers to the byte 225 rather than to the char 225.
> This needs to be documented.  But is it a good meaning for \x?  It
> will rarely be useful this way.  Also, is it an incompatible change?

I haven't managed to keep track of all the changes w.r.t how we treat
\NNN vs \xMM vs \xMMMMM and how it impacts whether the resulting string
is unibyte or multibyte.  My understanding is that there have been
several incompatible changes in this area (and some of those were
inevitable).  E.g. in Emacs-22:

   ELISP> "\222"
   "\222"
   ELISP> "\xa4"
   "\xa4"
   ELISP> (multibyte-string-p "\222")
   nil
   ELISP> (multibyte-string-p "\xa4")
   t
   ELISP> (multibyte-string-p "\xa45")
   t
   ELISP> 

whereas in Emacs-23.1:

   ELISP> "\222"
   "\222"
   ELISP> "\xa4"
   "\244"
   ELISP> (multibyte-string-p "\222")
   nil
   ELISP> (multibyte-string-p "\xa4")
   nil
   ELISP> (multibyte-string-p "\xa45")
   t
   ELISP> 

Of course, given that fact that char-numbers have changed, the
backward compatibility of \xNNNN is irrelevant since they do not
represent the same char any more.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-03 13:39   ` Richard Stallman
  2009-11-03 14:49     ` Stefan Monnier
@ 2009-11-03 18:35     ` Eli Zaretskii
  2009-11-05  1:56       ` Richard Stallman
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2009-11-03 18:35 UTC (permalink / raw)
  To: rms, 4848

> From: Richard Stallman <rms@gnu.org>
> Date: Tue, 03 Nov 2009 08:39:00 -0500
> Cc: emacs-pretest-bug@gnu.org, 4848@emacsbugs.donarmstrong.com
> 
> This needs to be documented.

I'm not sure what you wanted to be documented.  Is the description in
"(elisp)General Escape Syntax" what you were looking for?





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-03 18:35     ` Eli Zaretskii
@ 2009-11-05  1:56       ` Richard Stallman
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Stallman @ 2009-11-05  1:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 4848

    I'm not sure what you wanted to be documented.  Is the description in
    "(elisp)General Escape Syntax" what you were looking for?

The version I have is from August.  If it has been substantially
improved since then, maybe it is good.  The text from August was
inadequate and even wrong:

      To use hex, write a question mark followed by a backslash, @samp{x},
    and the hexadecimal character code.  You can use any number of hex
    digits, so you can represent any character code in this way.
    Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
    character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
    @iftex
    @samp{@`a}.
    @end iftex
    @ifnottex
    @samp{a} with grave accent.
    @end ifnottex

And here is something from Non-ASCII In Strings:

      You can also represent a multibyte non-@acronym{ASCII} character with its
    character code: use a hex escape, @samp{\x@var{nnnnnnn}}, with as many
    digits as necessary.  (Multibyte non-@acronym{ASCII} character codes are all
    greater than 256.)  Any character which is not a valid hex digit
    terminates this construct.  If the next character in the string could be
    interpreted as a hex digit, write @w{@samp{\ }} (backslash and space) to
    terminate the hex escape---for example, @w{@samp{\x8e0\ }} represents
    one character, @samp{a} with grave accent.  @w{@samp{\ }} in a string
    constant is just like backslash-newline; it does not contribute any
    character to the string, but it does terminate the preceding hex escape.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-03 14:49     ` Stefan Monnier
@ 2009-11-05  1:57       ` Richard Stallman
  2009-11-05  2:48         ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Stallman @ 2009-11-05  1:57 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-pretest-bug, 4848

    > Why shouldn't \ue1 work?

    Because the \u format is \uNNNN with exactly 4 hex digits.

In other words, "it doesn't work because we decided it should't work".
But why should't it work?  Why shouldn't two digits be allowed?

Is there a good reason not to allow that?





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-05  1:57       ` Richard Stallman
@ 2009-11-05  2:48         ` Stefan Monnier
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Monnier @ 2009-11-05  2:48 UTC (permalink / raw)
  To: rms; +Cc: 4848

>> Why shouldn't \ue1 work?
>     Because the \u format is \uNNNN with exactly 4 hex digits.

> In other words, "it doesn't work because we decided it should't work".
> But why should't it work?  Why shouldn't two digits be allowed?
> Is there a good reason not to allow that?

I think the \u format is taken from C and it doesn't have an "end" like
our \x format has.  So for example "\u11111" means (concat "\u1111" "1").


        Stefan






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4848: 23.1.50; \u and \x in string
  2009-11-02  5:31 bug#4848: 23.1.50; \u and \x in string Richard Stallman
  2009-11-02  7:17 ` Stefan Monnier
@ 2016-06-14  2:45 ` Noam Postavsky
  1 sibling, 0 replies; 10+ messages in thread
From: Noam Postavsky @ 2016-06-14  2:45 UTC (permalink / raw)
  To: 4848-done

"Non-ASCII In Strings" now (24.5) says the following which explains
about "\xN" producing unibyte characters.

   You can also use hexadecimal escape sequences (‘\xN’) and octal
escape sequences (‘\N’) in string constants.  *But beware:* If a string
constant contains hexadecimal or octal escape sequences, and these
escape sequences all specify unibyte characters (i.e., less than 256),
and there are no other literal non-ASCII characters or Unicode-style
escape sequences in the string, then Emacs automatically assumes that it
is a unibyte string.  That is to say, it assumes that all non-ASCII
characters occurring in the string are 8-bit raw bytes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-06-14  2:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-02  5:31 bug#4848: 23.1.50; \u and \x in string Richard Stallman
2009-11-02  7:17 ` Stefan Monnier
2009-11-02  7:33   ` Jason Rumney
2009-11-03 13:39   ` Richard Stallman
2009-11-03 14:49     ` Stefan Monnier
2009-11-05  1:57       ` Richard Stallman
2009-11-05  2:48         ` Stefan Monnier
2009-11-03 18:35     ` Eli Zaretskii
2009-11-05  1:56       ` Richard Stallman
2016-06-14  2:45 ` Noam Postavsky

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.