unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Aidan Kehoe <kehoea@parhasard.net>
Subject: [PATCH] Unicode Lisp reader escapes
Date: Sat, 29 Apr 2006 17:35:55 +0200	[thread overview]
Message-ID: <17491.34779.959316.484740@parhasard.net> (raw)


I realise you are all focused on the release with an intensity that would
scare small children, were any of them let near, but if any of you have a
minute free, I’d love to hear philosophical and technical objections to the
below.

The background is that it hasn’t ever been possible to consistently specify
a non-Latin-1 character by means of a general escape sequence, since what
character a given integer represents varies from release to release and even
from invocation to invocation. The below allows you to specify a backslash
escape with exactly four or exactly eight hexadecimal digits in a character
or string, and have the editor interpret them as the corresponding Unicode
code point. So, ?\u20AC would be interpreted as the Euro sign, "\u0448" as
Cyrillic sha, ?\U001D0ED as Byzantine musical symbol arktiko ke. 

Why not wait until the Unicode branch is merged? Well, that won’t solve the
problem either; people naturally want their code to be as compatible as
possible, so they will avoid the assumption that the integer-to-character
mapping is Unicode compatible as long as there are editors in the wild for
which that is not true. If this is integrated a good bit before the Unicode
branch is (which is what I would like), it will mean people can use this
syntax (which most modern programming languages have already, and which
people use) and be sure it’s compatible years before what would otherwise be
the case. 

lispref/ChangeLog addition:

2006-04-29  Aidan Kehoe  <kehoea@parhasard.net>

	* objects.texi (Character Type):
	Describe the Unicode character escape syntax; \uABCD or \U00ABCDEF
	specifies Unicode characters U+ABCD and U+ABCDEF respectively. 
	

src/ChangeLog addition:

2006-04-29  Aidan Kehoe  <kehoea@parhasard.net>

	* lread.c (read_escape):
	Provide a Unicode character escape syntax; \u followed by exactly
	four or \U followed by exactly eight hex digits in a comment or
	string is read as a Unicode character with that code point. 


GNU Emacs Trunk source patch:
Diff command:   cvs -q diff -u
Files affected: src/lread.c lispref/objects.texi

Index: lispref/objects.texi
===================================================================
RCS file: /sources/emacs/emacs/lispref/objects.texi,v
retrieving revision 1.51
diff -u -u -r1.51 objects.texi
--- lispref/objects.texi	6 Feb 2006 11:55:10 -0000	1.51
+++ lispref/objects.texi	29 Apr 2006 15:15:09 -0000
@@ -431,6 +431,20 @@
 bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper.
 @end ifnottex
 
+@cindex unicode character escape
+  Emacs provides a syntax for specifying characters by their Unicode code
+points.  @samp{?\uABCD} will give you an Emacs character that maps to
+the code point @samp{U+ABCD} in Unicode-based representations (UTF-8
+text files, Unicode-oriented fonts, etc.) There is a slightly different
+syntax for specifying characters with code points above @samp{#xFFFF};
+@samp{\U00ABCDEF} will give you an Emacs character that maps to the code
+point @samp{U+ABCDEF} in Unicode-based representations, if such an Emacs
+character exists.
+
+  Unlike in some other languages, while this syntax is available for
+character literals, and (see later) in strings, it is not available
+elsewhere in your Lisp source code.
+
 @cindex @samp{\} in character constant
 @cindex backslash in character constant
 @cindex octal character code
Index: src/lread.c
===================================================================
RCS file: /sources/emacs/emacs/src/lread.c,v
retrieving revision 1.350
diff -u -u -r1.350 lread.c
--- src/lread.c	27 Feb 2006 02:04:35 -0000	1.350
+++ src/lread.c	29 Apr 2006 15:15:10 -0000
@@ -1743,6 +1743,9 @@
      int *byterep;
 {
   register int c = READCHAR;
+  /* \u allows up to four hex digits, \U up to eight. Default to the
+     behaviour for \u, and change this value in the case that \U is seen. */
+  int unicode_hex_count = 4;
 
   *byterep = 0;
 
@@ -1907,6 +1910,48 @@
 	return i;
       }
 
+    case 'U':
+      /* Post-Unicode-2.0: Up to eight hex chars */
+      unicode_hex_count = 8;
+    case 'u':
+
+      /* A Unicode escape. We only permit them in strings and characters,
+	 not arbitrarily in the source code as in some other languages. */
+      {
+	int i = 0;
+	int count = 0;
+	Lisp_Object lisp_char;
+	while (++count <= unicode_hex_count)
+	  {
+	    c = READCHAR;
+	    /* isdigit(), isalpha() may be locale-specific, which we don't
+	       want. */
+	    if      (c >= '0' && c <= '9')  i = (i << 4) + (c - '0');
+	    else if (c >= 'a' && c <= 'f')  i = (i << 4) + (c - 'a') + 10;
+            else if (c >= 'A' && c <= 'F')  i = (i << 4) + (c - 'A') + 10;
+	    else
+	      {
+		error ("Non-hex digit used for Unicode escape");
+		break;
+	      }
+	  }
+
+	lisp_char = call2(intern("decode-char"), intern("ucs"),
+			  make_number(i));
+
+	if (EQ(Qnil, lisp_char))
+	  {
+	    /* This is ugly and horrible and trashes the user's data. */
+	    XSETFASTINT (i, MAKE_CHAR (charset_katakana_jisx0201, 
+				       34 + 128, 46 + 128));
+            return i;
+	  }
+	else
+	  {
+	    return XFASTINT (lisp_char);
+	  }
+      }
+
     default:
       if (BASE_LEADING_CODE_P (c))
 	c = read_multibyte (c, readcharfun);

-- 
In the beginning God created the heavens and the earth. And God was a
bug-eyed, hexagonal smurf with a head of electrified hair; and God said:
“Si, mi chiamano Mimi...”

             reply	other threads:[~2006-04-29 15:35 UTC|newest]

Thread overview: 202+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-29 15:35 Aidan Kehoe [this message]
2006-04-29 23:26 ` [PATCH] Unicode Lisp reader escapes Stefan Monnier
2006-04-30  8:26   ` Aidan Kehoe
2006-04-30  3:04 ` Richard Stallman
2006-04-30  8:14   ` Aidan Kehoe
2006-04-30 20:53     ` Richard Stallman
2006-04-30 21:04       ` Andreas Schwab
2006-04-30 21:57         ` Aidan Kehoe
2006-04-30 22:14           ` Andreas Schwab
2006-05-01 18:32         ` Richard Stallman
2006-05-01 19:03           ` Oliver Scholz
2006-05-02  4:45             ` Richard Stallman
2006-05-02  0:46           ` Kenichi Handa
2006-05-02  6:41           ` Aidan Kehoe
2006-05-02 21:36             ` Richard Stallman
2006-04-30 21:56       ` Aidan Kehoe
2006-05-01  1:44         ` Miles Bader
2006-05-01  3:12           ` Stefan Monnier
2006-05-01  3:41             ` Miles Bader
2006-05-01 12:29               ` Stefan Monnier
2006-05-05 23:15       ` Juri Linkov
2006-05-06 23:36         ` Richard Stallman
2006-05-09 20:43           ` Juri Linkov
2006-05-11  3:44             ` Richard Stallman
2006-05-11 12:03               ` Juri Linkov
2006-05-11 13:16                 ` Kenichi Handa
2006-05-12  4:15                 ` Richard Stallman
2006-06-03 18:44                   ` Aidan Kehoe
     [not found]                   ` <17537.54719.354843.89030@parhasard.net>
     [not found]                     ` <ufyieqj0v.fsf@gnu.org>
2006-06-15 18:38                       ` Aidan Kehoe
2006-06-17 18:57                         ` Eli Zaretskii
2006-06-18 16:11                           ` Aidan Kehoe
2006-06-18 19:55                             ` Eli Zaretskii
2006-06-20  2:37                               ` Kenichi Handa
2006-06-20 17:56                                 ` Richard Stallman
2006-06-23 18:35                                 ` Aidan Kehoe
2006-06-24  6:50                                   ` Eli Zaretskii
2006-05-02  6:43 ` Kenichi Handa
2006-05-02  7:00   ` Aidan Kehoe
2006-05-02 10:45     ` Eli Zaretskii
2006-05-02 11:13       ` Aidan Kehoe
2006-05-02 19:31         ` Eli Zaretskii
2006-05-02 20:25           ` Aidan Kehoe
2006-05-02 22:16             ` Oliver Scholz
2006-05-02 11:33     ` Kenichi Handa
2006-05-02 22:50       ` Aidan Kehoe
2006-05-03  7:43         ` Kenichi Handa
2006-05-03 17:21         ` Kevin Rodgers
2006-05-03 18:51           ` Andreas Schwab
2006-05-04 21:14             ` Aidan Kehoe
2006-05-08  1:31               ` Kenichi Handa
2006-05-08  6:54                 ` Aidan Kehoe
2006-05-08 13:55                 ` Stefan Monnier
2006-05-08 14:24                   ` Aidan Kehoe
2006-05-08 15:32                     ` Stefan Monnier
2006-05-08 16:39                       ` Aidan Kehoe
2006-05-08 17:39                         ` Stefan Monnier
2006-05-09  7:04                           ` Aidan Kehoe
2006-05-09 19:05                             ` Eli Zaretskii
2006-05-10  6:05                               ` Aidan Kehoe
2006-05-10 17:49                                 ` Eli Zaretskii
2006-05-10 21:37                                   ` Luc Teirlinck
2006-05-11  3:45                                     ` Eli Zaretskii
2006-05-10 21:48                                   ` Luc Teirlinck
2006-05-11  1:08                                   ` Luc Teirlinck
2006-05-11  2:29                                     ` Luc Teirlinck
2006-05-11  3:46                                   ` Richard Stallman
2006-05-09  0:36                   ` Kenichi Handa
2006-05-02 10:36   ` Eli Zaretskii
2006-05-02 10:59     ` Aidan Kehoe
2006-05-02 19:26       ` Eli Zaretskii
2006-05-03  2:59     ` Kenichi Handa
2006-05-03  8:47       ` Eli Zaretskii
2006-05-03 14:21         ` Stefan Monnier
2006-05-03 18:26           ` Eli Zaretskii
2006-05-03 21:12             ` Ken Raeburn
2006-05-04 14:17             ` Richard Stallman
2006-05-04 16:41               ` Aidan Kehoe
2006-05-05 10:39                 ` Eli Zaretskii
2006-05-05 16:35                   ` Aidan Kehoe
2006-05-05 19:05                 ` Richard Stallman
2006-05-05 19:20                   ` Aidan Kehoe
2006-05-05 19:57                     ` Aidan Kehoe
2006-05-06 14:25                       ` Richard Stallman
2006-05-06 17:26                         ` Aidan Kehoe
2006-05-07  5:01                           ` Richard Stallman
2006-05-07  6:38                             ` Aidan Kehoe
2006-05-07  7:00                               ` David Kastrup
2006-05-07  7:15                                 ` Aidan Kehoe
2006-05-07 16:50                             ` Aidan Kehoe
2006-05-08 22:28                               ` Richard Stallman
2006-05-04  1:33           ` Kenichi Handa
2006-05-04  8:23             ` Oliver Scholz
2006-05-04 11:57               ` Kim F. Storm
2006-05-04 12:18                 ` Stefan Monnier
2006-05-04 12:21                   ` Kim F. Storm
2006-05-04 16:31                   ` Eli Zaretskii
2006-05-04 21:40                     ` Stefan Monnier
2006-05-05 10:25                       ` Eli Zaretskii
2006-05-05 12:31                         ` Oliver Scholz
2006-05-05 18:08                           ` Stuart D. Herring
2006-05-05 13:05                         ` Stefan Monnier
2006-05-05 17:23                           ` Oliver Scholz
2006-05-04 13:07                 ` Oliver Scholz
2006-05-04 16:32             ` Eli Zaretskii
2006-05-04 20:55               ` Aidan Kehoe
2006-05-05  9:33                 ` Oliver Scholz
2006-05-05 10:02                   ` Oliver Scholz
2006-05-05 18:33                   ` Aidan Kehoe
2006-05-05 18:42                     ` Oliver Scholz
2006-05-05 21:37                     ` Eli Zaretskii
2006-05-06 14:24                   ` Richard Stallman
2006-05-06 15:01                     ` Oliver Scholz
     [not found]                     ` <877j4z5had.fsf@gmx.de>
2006-05-07  5:00                       ` Richard Stallman
2006-05-07 12:38                         ` Kenichi Handa
2006-05-07 21:26                           ` Oliver Scholz
2006-05-08  1:14                             ` Kenichi Handa
2006-05-08 22:29                             ` Richard Stallman
2006-05-09  3:42                               ` Eli Zaretskii
2006-05-09 20:41                                 ` Richard Stallman
2006-05-09 21:03                                   ` Stefan Monnier
2006-05-10  3:33                                   ` Eli Zaretskii
2006-05-09  5:13                               ` Kenichi Handa
2006-05-10  3:20                                 ` Richard Stallman
2006-05-10  5:37                                   ` Kenichi Handa
2006-05-10  7:22                                     ` Stefan Monnier
2006-05-11  3:45                                       ` Richard Stallman
2006-05-11 12:41                                         ` Stefan Monnier
2006-05-11 12:51                                           ` Kenichi Handa
2006-05-11 16:46                                             ` Stefan Monnier
2006-05-11  3:44                                     ` Richard Stallman
2006-05-11  3:44                                     ` Richard Stallman
2006-05-11  7:31                                       ` Kenichi Handa
2006-05-12  4:14                                         ` Richard Stallman
2006-05-12  5:26                                           ` Kenichi Handa
2006-05-13  4:52                                             ` Richard Stallman
2006-05-13 13:25                                               ` Stefan Monnier
2006-05-13 20:41                                                 ` Richard Stallman
2006-05-14 13:32                                                   ` Stefan Monnier
2006-05-14 23:29                                                     ` Richard Stallman
2006-05-15  0:55                                                       ` Stefan Monnier
2006-05-15  2:49                                                         ` Oliver Scholz
2006-05-15  3:27                                                           ` Stefan Monnier
2006-05-15 10:20                                                             ` Oliver Scholz
2006-05-15 11:12                                                               ` Oliver Scholz
2006-05-15 20:37                                                           ` Richard Stallman
2006-05-16  9:49                                                             ` Oliver Scholz
2006-05-16 11:16                                                               ` Kim F. Storm
2006-05-16 11:39                                                                 ` Romain Francoise
2006-05-16 11:58                                                                 ` Oliver Scholz
2006-05-16 14:24                                                                   ` Kim F. Storm
2006-05-17  3:45                                                                   ` Richard Stallman
2006-05-17  8:37                                                                     ` Oliver Scholz
2006-05-17 20:09                                                                       ` Richard Stallman
2006-05-17 12:37                                                                     ` Oliver Scholz
2006-05-19  2:05                                                                       ` Richard Stallman
2006-05-19  8:47                                                                         ` Oliver Scholz
2006-05-18  1:09                                                                     ` Kenichi Handa
2006-05-21  0:57                                                                       ` Richard Stallman
2006-05-22  1:33                                                                         ` Kenichi Handa
2006-05-22 15:12                                                                           ` Richard Stallman
2006-05-23  1:05                                                                             ` Kenichi Handa
2006-05-23  5:18                                                                               ` Juri Linkov
2006-05-24  2:18                                                                                 ` Richard Stallman
2006-06-02  6:49                                                                                   ` Kenichi Handa
2006-06-02  8:00                                                                                     ` Kim F. Storm
2006-06-02  9:27                                                                                     ` Juri Linkov
2006-06-02 10:50                                                                                       ` Eli Zaretskii
2006-06-02 11:39                                                                                       ` Kenichi Handa
2006-06-02 12:12                                                                                         ` Juri Linkov
2006-06-02 22:39                                                                                       ` Richard Stallman
2006-06-03  6:42                                                                                         ` Juri Linkov
2006-06-04  2:23                                                                                           ` Richard Stallman
2006-06-05  7:24                                                                                             ` Kenichi Handa
2006-06-05 21:31                                                                                               ` Richard Stallman
2006-06-07  1:24                                                                                                 ` Kenichi Handa
2006-06-02 22:39                                                                                     ` Richard Stallman
2006-05-24  2:17                                                                               ` Richard Stallman
2006-05-17 15:15                                                                   ` Stefan Monnier
2006-05-17  3:45                                                                 ` Richard Stallman
2006-05-17  3:45                                                               ` Richard Stallman
2006-05-17  8:53                                                                 ` Oliver Scholz
2006-05-17 20:09                                                                   ` Richard Stallman
2006-05-18  9:12                                                                     ` Oliver Scholz
2006-05-15 20:37                                                         ` Richard Stallman
2006-05-15  5:13                                               ` Kenichi Handa
2006-05-15  8:06                                                 ` Kim F. Storm
2006-05-15  9:04                                                   ` Andreas Schwab
2006-05-15 20:38                                                   ` Richard Stallman
2006-05-15 14:08                                                 ` Stefan Monnier
2006-05-15 20:37                                                 ` Richard Stallman
2006-05-16 10:07                                                   ` Oliver Scholz
2006-05-18  0:31                                                   ` Kenichi Handa
2006-05-11  9:44                                       ` Oliver Scholz
2006-05-08  7:36                           ` Richard Stallman
2006-05-08  7:50                             ` Kenichi Handa
2006-05-05 19:05               ` Richard Stallman
2006-05-05 21:43                 ` Eli Zaretskii
2006-05-06 14:25                   ` Richard Stallman
2006-05-04  1:26         ` Kenichi Handa
     [not found] <E1FaJ0b-0008G8-8u@monty-python.gnu.org>
2006-04-30 21:16 ` Jonathan Yavner
2006-05-01 18:32   ` Richard Stallman
2006-05-02  5:03     ` Jonathan Yavner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17491.34779.959316.484740@parhasard.net \
    --to=kehoea@parhasard.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).