Unicode character read representation

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Unicode character read representation
@ 2009-02-21 14:07 Chong Yidong
  2009-02-21 14:22 ` Harald Hanche-Olsen
  2009-02-24 11:14 ` Kenichi Handa
  0 siblings, 2 replies; 8+ messages in thread
From: Chong Yidong @ 2009-02-21 14:07 UTC (permalink / raw)
  To: emacs-devel

From objects.texi in the Lisp manual:

  `\U00NNNNNN' represents the character whose Unicode code point is
  `U+NNNNNN', if such a character is supported by Emacs.  If the
  corresponding character is not supported, Emacs signals an error.

Are there any Unicode code points not supported by Emacs, or is this
sentence obsolete?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode character read representation
  2009-02-21 14:07 Unicode character read representation Chong Yidong
@ 2009-02-21 14:22 ` Harald Hanche-Olsen
  2009-02-24 11:14 ` Kenichi Handa
  1 sibling, 0 replies; 8+ messages in thread
From: Harald Hanche-Olsen @ 2009-02-21 14:22 UTC (permalink / raw)
  To: emacs-devel

+ Chong Yidong <cyd@stupidchicken.com>:

> From objects.texi in the Lisp manual:
> 
>   `\U00NNNNNN' represents the character whose Unicode code point is
>   `U+NNNNNN', if such a character is supported by Emacs.  If the
>   corresponding character is not supported, Emacs signals an error.
> 
> Are there any Unicode code points not supported by Emacs, or is this
> sentence obsolete?

I don't know the answer to your question, but it appears to me that
some code points SHOULD not be supported. The most famous example
being U+FFFE, which is why U+FEFF ZERO WIDTH NO-BREAK SPACE is useful
as a byte-order mark. (But Emacs reads "\0000FFFE" just fine. A bug?)

- Harald




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode character read representation
  2009-02-21 14:07 Unicode character read representation Chong Yidong
  2009-02-21 14:22 ` Harald Hanche-Olsen
@ 2009-02-24 11:14 ` Kenichi Handa
  2009-02-24 22:16   ` Stefan Monnier
  1 sibling, 1 reply; 8+ messages in thread
From: Kenichi Handa @ 2009-02-24 11:14 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

In article <87hc2n28a4.fsf@cyd.mit.edu>, Chong Yidong <cyd@stupidchicken.com> writes:

> From objects.texi in the Lisp manual:
>   `\U00NNNNNN' represents the character whose Unicode code point is
>   `U+NNNNNN', if such a character is supported by Emacs.  If the
>   corresponding character is not supported, Emacs signals an error.

> Are there any Unicode code points not supported by Emacs,

No.

> or is this sentence obsolete?

Not completely obsolete, but should be modified somehow.

At first, #x0..#x3FFFFF are all valid Emacs character codes.

Some of U+NNNNNN are valid Unicode code points for
"noncharacter" (e.g. U+FFFE, U+FFFF), some are invalid
Unicode code points (U+120000..U+3FFFFF), some are invalid
both as Unicode code points and Emacs character codes
(U+400000 and over).

Currently Emacs signals an error only for U+400000 and over,
and I'm not sure how strictly we should interprete
\U.. notation.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode character read representation
  2009-02-24 11:14 ` Kenichi Handa
@ 2009-02-24 22:16   ` Stefan Monnier
  2009-02-26  7:28     ` Kenichi Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2009-02-24 22:16 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Chong Yidong, emacs-devel

> Currently Emacs signals an error only for U+400000 and over,
> and I'm not sure how strictly we should interprete
> \U.. notation.

I think the \U notation should only work for actual unicode chars.
(assuming the \x{..} notation can be used for everything else).


        Stefan




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode character read representation
  2009-02-24 22:16   ` Stefan Monnier
@ 2009-02-26  7:28     ` Kenichi Handa
  2009-02-26 15:08       ` Stefan Monnier
  0 siblings, 1 reply; 8+ messages in thread
From: Kenichi Handa @ 2009-02-26  7:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: cyd, emacs-devel

In article <jwvprh7qy68.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > Currently Emacs signals an error only for U+400000 and over,
> > and I'm not sure how strictly we should interprete
> > \U.. notation.

> I think the \U notation should only work for actual unicode chars.
> (assuming the \x{..} notation can be used for everything else).

For instance 0xFFFF is a valid Unicode code-point, but is
not a character.  Should it be accepted or not?

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode character read representation
  2009-02-26  7:28     ` Kenichi Handa
@ 2009-02-26 15:08       ` Stefan Monnier
  2009-02-27  0:51         ` Kenichi Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2009-02-26 15:08 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: cyd, emacs-devel

>> > Currently Emacs signals an error only for U+400000 and over,
>> > and I'm not sure how strictly we should interprete
>> > \U.. notation.

>> I think the \U notation should only work for actual unicode chars.
>> (assuming the \x{..} notation can be used for everything else).

> For instance 0xFFFF is a valid Unicode code-point, but is
> not a character.  Should it be accepted or not?

Yes, it should.  But I think that \u3FFFFF shouldn't since it's not
a valid unicode code point.


        Stefan




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode character read representation
  2009-02-26 15:08       ` Stefan Monnier
@ 2009-02-27  0:51         ` Kenichi Handa
  2009-02-27  1:45           ` Chong Yidong
  0 siblings, 1 reply; 8+ messages in thread
From: Kenichi Handa @ 2009-02-27  0:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: cyd, emacs-devel

In article <jwvzlg9dyo4.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > For instance 0xFFFF is a valid Unicode code-point, but is
> > not a character.  Should it be accepted or not?

> Yes, it should.  But I think that \u3FFFFF shouldn't since it's not
> a valid unicode code point.

Ok, I've just installed this change.

Index: lread.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/lread.c,v
retrieving revision 1.403
retrieving revision 1.404
diff -u -r1.403 -r1.404
--- lread.c	25 Feb 2009 12:47:24 -0000	1.403
+++ lread.c	27 Feb 2009 00:48:03 -0000	1.404
@@ -2205,7 +2205,7 @@
       /* A Unicode escape. We only permit them in strings and characters,
 	 not arbitrarily in the source code, as in some other languages.  */
       {
-	int i = 0;
+	unsigned int i = 0;
 	int count = 0;
 
 	while (++count <= unicode_hex_count)
@@ -2222,7 +2222,8 @@
 		break;
 	      }
 	  }
-
+	if (i > 0x10FFFF)
+	  error ("Non-Unicode character: 0x%x", i);
 	return i;
       }
 
---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode character read representation
  2009-02-27  0:51         ` Kenichi Handa
@ 2009-02-27  1:45           ` Chong Yidong
  0 siblings, 0 replies; 8+ messages in thread
From: Chong Yidong @ 2009-02-27  1:45 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Stefan Monnier, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

>> Yes, it should.  But I think that \u3FFFFF shouldn't since it's not
>> a valid unicode code point.
>
> Ok, I've just installed this change.

Thank you.  I've updated the Lisp manual.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-02-27  1:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-21 14:07 Unicode character read representation Chong Yidong
2009-02-21 14:22 ` Harald Hanche-Olsen
2009-02-24 11:14 ` Kenichi Handa
2009-02-24 22:16   ` Stefan Monnier
2009-02-26  7:28     ` Kenichi Handa
2009-02-26 15:08       ` Stefan Monnier
2009-02-27  0:51         ` Kenichi Handa
2009-02-27  1:45           ` Chong Yidong

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).