From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Michal Nazarewicz <mina86@mina86.com>
Newsgroups: gmane.emacs.bugs
Subject: bug#24603: [RFC 16/18] Refactor character class checking;
	optimise ASCII case
Date: Tue,  4 Oct 2016 03:10:39 +0200
Message-ID: <1475543441-10493-16-git-send-email-mina86@mina86.com>
References: <1475543441-10493-1-git-send-email-mina86@mina86.com>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: blaine.gmane.org 1475544072 6329 195.159.176.226 (4 Oct 2016 01:21:12 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 4 Oct 2016 01:21:12 +0000 (UTC)
To: 24603@debbugs.gnu.org
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Oct 04 03:21:08 2016
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1brEPp-0000g5-MJ
	for geb-bug-gnu-emacs@m.gmane.org; Tue, 04 Oct 2016 03:21:06 +0200
Original-Received: from localhost ([::1]:39769 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1brEPo-0003Ob-9u
	for geb-bug-gnu-emacs@m.gmane.org; Mon, 03 Oct 2016 21:21:04 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56542)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1brEHD-0006oR-1o
	for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:12:13 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1brEH9-0002XY-Lb
	for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:12:09 -0400
Original-Received: from debbugs.gnu.org ([208.118.235.43]:37372)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1brEH9-0002XK-H4
	for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:12:07 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1brEH9-0006kQ-DG
	for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:12:07 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Michal Nazarewicz <mina86@mina86.com>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Tue, 04 Oct 2016 01:12:07 +0000
Resent-Message-ID: <handler.24603.B24603.147554347225722@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 24603
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.147554347225722
	(code B ref 24603); Tue, 04 Oct 2016 01:12:07 +0000
Original-Received: (at 24603) by debbugs.gnu.org; 4 Oct 2016 01:11:12 +0000
Original-Received: from localhost ([127.0.0.1]:43544 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1brEGF-0006gm-EF
	for submit@debbugs.gnu.org; Mon, 03 Oct 2016 21:11:11 -0400
Original-Received: from mail-wm0-f54.google.com ([74.125.82.54]:38639)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <mpn@google.com>) id 1brEG9-0006dQ-C2
	for 24603@debbugs.gnu.org; Mon, 03 Oct 2016 21:11:06 -0400
Original-Received: by mail-wm0-f54.google.com with SMTP id p138so182363065wmb.1
	for <24603@debbugs.gnu.org>; Mon, 03 Oct 2016 18:11:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113;
	h=sender:from:to:subject:date:message-id:in-reply-to:references
	:mime-version:content-transfer-encoding;
	bh=eWHfLhPSrP6vFY+0i6Sb6ZSRK4k71Ze11TfnLyWLbWs=;
	b=SrmmmHHHtTfXKRGz8QGN2+BNn/rfR/vs3XV5udRMY4hGcpUdW+jB7vZYI8toSoUGxW
	zSuoET6k8Qh+R0O/SA725AXCSunhkQG1AVKebEKBPqpV8xgSdcvyl7/HD1/8I4fMg7rT
	RQQTImk8dPZxZ13f1BOynwtYhOMl8gArUk9bGTJ1WSs0JzHlFH6KxLanSYoZY21rIzOX
	Zz8bcTXfisKymJajhPIGp6yWzBBgqTlj/xAcR1HYhEAYXEBWnbgqJAr5hVO6MyvyxK6q
	aihlmy8M8CgHsaEhS/FMsjp+ybXXzsaYkIn+Y2yGD18QhgrTwlmU/v0jbhau6h6U8dQK
	pW6w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:from:to:subject:date:message-id
	:in-reply-to:references:mime-version:content-transfer-encoding;
	bh=eWHfLhPSrP6vFY+0i6Sb6ZSRK4k71Ze11TfnLyWLbWs=;
	b=BSH6mXpI44L/069RmH7IIH/tW1rZbNhYxbIB+AezVNab4PjBDlKhc/LGBBM0fK9hyx
	M1xPHqcFQokrzwXfdYPsyLZ4Hz7DE8rW91an+T5t8bV0QBW7jFpKJNPa9zpFrpfcg16J
	YysM000X56JCiO3Luta98vnVNGVHOABEvWFgyJPjk53PIBgA9zMdNzi4cqrcG7N9WetG
	9g8Wc84x44l8Q4HYLmMIj/yjqQtBVkJmF4XgPuQip5H+6FXjtScEdE5VXDW51SGqtBv9
	+7NKKjaU1Xb6C9F56NxwNgWPmcWWRgs8oVliRV6WgFJObL99hGy2tMlJMi346l2E8VgR
	gOKQ==
X-Gm-Message-State: AA6/9RlAqVHd8NaI/mB0NSwDSoPfbv8jrJQeLADwnolRzxhcD2lJuKaZ3gJc2fbWupjnz0X0
X-Received: by 10.28.184.67 with SMTP id i64mr1001466wmf.63.1475543459489;
	Mon, 03 Oct 2016 18:10:59 -0700 (PDT)
Original-Received: from mpn.zrh.corp.google.com ([2620:0:105f:301:e126:377e:c57c:59ab])
	by smtp.gmail.com with ESMTPSA id
	o5sm1291985wmg.16.2016.10.03.18.10.53 for <24603@debbugs.gnu.org>
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Mon, 03 Oct 2016 18:10:56 -0700 (PDT)
Original-Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942)
	id 3F8971E029E; Tue,  4 Oct 2016 03:10:49 +0200 (CEST)
X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020
In-Reply-To: <1475543441-10493-1-git-send-email-mina86@mina86.com>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 208.118.235.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs/>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: "bug-gnu-emacs"
	<bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.bugs:124004
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/124004>

Use a lookup table to map Unicode general categories to character
categories.  This generalises lowercasep, uppercasep et al. functions.

Furthermore, provide another lookup table for ASCII characters such that
the common case can be optimised and Unicode general category lookup
avoided.

Using lookup table in place of conditions may theoretically improve
performance even though I have not measure it.  Moreover though, having
the lookup table will allow regex engine to be optimised in an upcoming
patch.  Stay tuned. ;)

* src/character.c (alphanumericp, alphabeticp, uppercasep, lowercasep,
graphicp, printablep): Replaced with static inline functions define in
the header file.
(category_char_bits): New lookup table mapping Unicode
general category to character classes.
(ascii_char_bits): New lookup table mapping ASCII characters to
character classes.

* src/character.h (unicode_alphanumericp, unicode_alphabeticp,
unicode_uppercasep, unicode_lowercasep, unicode_graphicp,
unicode_printablep, _ascii_alphanumericp, _ascii_alphabeticp,
_ascii_uppercasep, _ascii_lowercasep, _ascii_graphicp,
_ascii_printablep): New static inline functions which are special cases
of respective unprefixed functions.

* src/regex.c (ISALNUM, ISALPHA): Remove special cases for ASCII
characters since alphanumericp and uppercasep already handle those.
---
 src/character.c | 168 +++++++++++++++++++++++++-------------------------------
 src/character.h |  77 +++++++++++++++++++++++---
 src/regex.c     |  20 ++-----
 3 files changed, 151 insertions(+), 114 deletions(-)

diff --git a/src/character.c b/src/character.c
index 707ae10..63f89d3 100644
--- a/src/character.c
+++ b/src/character.c
@@ -960,104 +960,88 @@ character is not ASCII nor 8-bit character, an error is signaled.  */)
   return make_number (c);
 }
 
-static unicode_category_t
+/* Return C’s Unicode general category (or UNICODE_CATEGORY_UNKNOWN). */
+unicode_category_t
 char_unicode_category (int c)
 {
   Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
   return INTEGERP (category) ? XINT (category) : UNICODE_CATEGORY_UNKNOWN;
 }
 
-/* Return true if C is a upper case character.  This does not imply mean it
-   has a lower case form. */
-bool
-uppercasep (int c)
-{
-  unicode_category_t gen_cat = char_unicode_category (c);
-
-  /* See UTS #18.  There are additional characters that should be
-     here, those designated as Other_uppercase; FIXME.  */
-  return gen_cat == UNICODE_CATEGORY_Lu;
-}
-
-/* Return true if C is a lower case character.  This does not imply mean it
-   has an upper case form. */
-bool
-lowercasep (int c)
-{
-  unicode_category_t gen_cat = char_unicode_category (c);
-
-  /* See UTS #18.  There are additional characters that should be
-     here, those designated as Other_lowercase; FIXME.  */
-  return gen_cat == UNICODE_CATEGORY_Ll;
-}
-
-/* Return true if C is an alphabetic character.  */
-bool
-alphabeticp (int c)
-{
-  unicode_category_t gen_cat = char_unicode_category (c);
-
-  /* See UTS #18.  There are additional characters that should be
-     here, those designated as Other_uppercase, Other_lowercase,
-     and Other_alphabetic; FIXME.  */
-  return (gen_cat == UNICODE_CATEGORY_Lu
-	  || gen_cat == UNICODE_CATEGORY_Ll
-	  || gen_cat == UNICODE_CATEGORY_Lt
-	  || gen_cat == UNICODE_CATEGORY_Lm
-	  || gen_cat == UNICODE_CATEGORY_Lo
-	  || gen_cat == UNICODE_CATEGORY_Mn
-	  || gen_cat == UNICODE_CATEGORY_Mc
-	  || gen_cat == UNICODE_CATEGORY_Me
-	  || gen_cat == UNICODE_CATEGORY_Nl);
-}
-
-/* Return true if C is a alphabetic or decimal-number character.  */
-bool
-alphanumericp (int c)
-{
-  unicode_category_t gen_cat = char_unicode_category (c);
-
-  /* See UTS #18.  Same comment as for alphabeticp applies.  FIXME. */
-  return (gen_cat == UNICODE_CATEGORY_Lu
-	  || gen_cat == UNICODE_CATEGORY_Ll
-	  || gen_cat == UNICODE_CATEGORY_Lt
-	  || gen_cat == UNICODE_CATEGORY_Lm
-	  || gen_cat == UNICODE_CATEGORY_Lo
-	  || gen_cat == UNICODE_CATEGORY_Mn
-	  || gen_cat == UNICODE_CATEGORY_Mc
-	  || gen_cat == UNICODE_CATEGORY_Me
-	  || gen_cat == UNICODE_CATEGORY_Nl
-	  || gen_cat == UNICODE_CATEGORY_Nd);
-}
-
-/* Return true if C is a graphic character.  */
-bool
-graphicp (int c)
-{
-  unicode_category_t gen_cat = char_unicode_category (c);
-
-  /* See UTS #18.  */
-  return (!(gen_cat == UNICODE_CATEGORY_UNKNOWN
-	    || gen_cat == UNICODE_CATEGORY_Zs /* space separator */
-	    || gen_cat == UNICODE_CATEGORY_Zl /* line separator */
-	    || gen_cat == UNICODE_CATEGORY_Zp /* paragraph separator */
-	    || gen_cat == UNICODE_CATEGORY_Cc /* control */
-	    || gen_cat == UNICODE_CATEGORY_Cs /* surrogate */
-	    || gen_cat == UNICODE_CATEGORY_Cn)); /* unassigned */
-}
-
-/* Return true if C is a printable character.  */
-bool
-printablep (int c)
-{
-  unicode_category_t gen_cat = char_unicode_category (c);
-
-  /* See UTS #18.  */
-  return (!(gen_cat == UNICODE_CATEGORY_UNKNOWN
-	    || gen_cat == UNICODE_CATEGORY_Cc /* control */
-	    || gen_cat == UNICODE_CATEGORY_Cs /* surrogate */
-	    || gen_cat == UNICODE_CATEGORY_Cn)); /* unassigned */
-}
+#define CHAR_BIT_ALNUM_ CHAR_BIT_ALNUM | CHAR_BIT_GRAPH | CHAR_BIT_PRINT
+#define CHAR_BIT_ALPHA_ CHAR_BIT_ALPHA | CHAR_BIT_ALNUM_
+
+/* See UTS #18 and DerivedCoreProperties.txt.  alpha, alnum, upper and
+   lower are missing some characters, namely those designated as
+   Other_uppercase, Other_lowercase and Other_alphabetic; FIXME.  */
+
+const unsigned char category_char_bits[] = {
+  [UNICODE_CATEGORY_UNKNOWN] = 0,
+  [UNICODE_CATEGORY_Lu] = CHAR_BIT_ALPHA_ | CHAR_BIT_UPPER,
+  [UNICODE_CATEGORY_Ll] = CHAR_BIT_ALPHA_ | CHAR_BIT_LOWER,
+  [UNICODE_CATEGORY_Lt] = CHAR_BIT_ALPHA_,
+  [UNICODE_CATEGORY_Lm] = CHAR_BIT_ALPHA_,
+  [UNICODE_CATEGORY_Lo] = CHAR_BIT_ALPHA_,
+  [UNICODE_CATEGORY_Mn] = CHAR_BIT_ALPHA_,
+  [UNICODE_CATEGORY_Mc] = CHAR_BIT_ALPHA_,
+  [UNICODE_CATEGORY_Me] = CHAR_BIT_ALPHA_,
+  [UNICODE_CATEGORY_Nd] = CHAR_BIT_ALNUM_,
+  [UNICODE_CATEGORY_Nl] = CHAR_BIT_ALPHA_,
+  [UNICODE_CATEGORY_No] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Pc] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Pd] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Ps] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Pe] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Pi] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Pf] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Po] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Sm] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Sc] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Sk] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_So] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Zs] = CHAR_BIT_PRINT,
+  [UNICODE_CATEGORY_Zl] = CHAR_BIT_PRINT,
+  [UNICODE_CATEGORY_Zp] = CHAR_BIT_PRINT,
+  [UNICODE_CATEGORY_Cc] = 0,
+  [UNICODE_CATEGORY_Cf] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Cs] = 0,
+  [UNICODE_CATEGORY_Co] = CHAR_BIT_PRINT | CHAR_BIT_GRAPH,
+  [UNICODE_CATEGORY_Cn] = 0,
+};
+
+#undef CHAR_BIT_ALNUM_
+#undef CHAR_BIT_ALPHA_
+
+#define P_ CHAR_BIT_PRINT
+#define G_ CHAR_BIT_GRAPH | P_
+#define N_ CHAR_BIT_ALNUM | G_
+#define U_ CHAR_BIT_UPPER | CHAR_BIT_ALPHA | N_
+#define L_ CHAR_BIT_LOWER | CHAR_BIT_ALPHA | N_
+
+const unsigned char ascii_char_bits[] = {
+/*\0  ...                                                     \17 */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+/*\20  ...                                                    \37 */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+/*' ' '!' '"' '#' '$' '%' '&' '´' '(' ')' '*' '+' ',' '-' '.' '/' */
+  P_, G_, G_, G_, G_, G_, G_, G_, G_, G_, G_, G_, G_, G_, G_, G_,
+/*'0' '1' '2' '3' '4' '5' '6' '7' '8' '9' ':' ';' '<' '=' '>' '?' */
+  N_, N_, N_, N_, N_, N_, N_, N_, N_, N_, G_, G_, G_, G_, G_, G_,
+/*'@' 'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' */
+  G_, U_, U_, U_, U_, U_, U_, U_, U_, U_, U_, U_, U_, U_, U_, U_,
+/*'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z' '[' '\' ']' '^' '_' */
+  U_, U_, U_, U_, U_, U_, U_, U_, U_, U_, U_, G_, G_, G_, G_, G_,
+/*'`' 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' */
+  G_, L_, L_, L_, L_, L_, L_, L_, L_, L_, L_, L_, L_, L_, L_, L_,
+/*'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' '{' '|' '}' '~' \177 */
+  L_, L_, L_, L_, L_, L_, L_, L_, L_, L_, L_, G_, G_, G_, G_,  0,
+};
+
+#undef P_
+#undef G_
+#undef N_
+#undef U_
+#undef L_
 
 void
 syms_of_character (void)
diff --git a/src/character.h b/src/character.h
index 5931c5c..6dc95ad 100644
--- a/src/character.h
+++ b/src/character.h
@@ -652,8 +652,78 @@ typedef enum {
   UNICODE_CATEGORY_Cs,
   UNICODE_CATEGORY_Co,
   UNICODE_CATEGORY_Cn
+  /* Don’t forget to extend category_char_bits in character.c when new entries
+     are added here. */
 } unicode_category_t;
 
+extern unicode_category_t char_unicode_category (int);
+
+/* Limited set of character categories which syntax-independent.  Testing of
+ * those characters do not require any run-time data, e.g. do not depend on
+ * syntax table. */
+#define CHAR_BIT_ALNUM        (1 << 0)
+#define CHAR_BIT_ALPHA        (1 << 1)
+#define CHAR_BIT_UPPER        (1 << 2)
+#define CHAR_BIT_LOWER        (1 << 3)
+#define CHAR_BIT_GRAPH        (1 << 4)
+#define CHAR_BIT_PRINT        (1 << 5)
+
+/* Map from Unicode general category to character classes the character is in.
+ *
+ * Only character classes defined by CHAR_BIT_* above are present.
+ *
+ * This is an array of bit fields so for example ‘category_char_bits[gc] &
+ * CHAR_BIT_ALPHA’ tells you whether characters in general category GC are
+ * alphabetic or not. */
+extern const unsigned char category_char_bits[];
+
+/* Map from ASCII character to character classes the character is in.
+ *
+ * Only character classes defined by CHAR_BIT_* above are present.
+ *
+ * This is an array of bit fields so for example ascii_char_bits[ch] &
+ * CHAR_BIT_ALPHA’ tells you whether character CH is alphabetic or not. */
+extern const unsigned char ascii_char_bits[128];
+
+#define DEFINE_CHAR_TEST(name, bit)					\
+  static inline bool unicode_ ## name (int c) {			\
+    return category_char_bits[char_unicode_category(c)] & bit;		\
+  }									\
+  static inline bool _ascii_ ## name (int c) {				\
+    return ascii_char_bits[c] & bit;					\
+  }									\
+  static inline bool name (int c) {					\
+    return (unsigned)c < (sizeof ascii_char_bits / sizeof *ascii_char_bits) ? \
+      _ascii_ ## name (c) : unicode_ ## name (c);			\
+  }
+
+/* For TEST being one of:
+     alphanumericp
+     alphabeticp
+     uppercasep
+     lowercasep
+     graphicp
+     printablep
+   define
+     bool TEST (int c);
+     bool unicode_TEST (int c);
+     bool _ascii_TEST (int c);
+   which test whether C has given character property.  TEST works for any
+   character, Unicode or not.  unicode_TEST works for any character as well but
+   is potentially slower for ASCII characters (since it requires Unicode
+   category lookup).  _ascii_TEST works for ASCII characters only and creates
+   naked singularity if non-ASCII character is passed to it. */
+
+DEFINE_CHAR_TEST (alphanumericp, CHAR_BIT_ALNUM)
+DEFINE_CHAR_TEST (alphabeticp, CHAR_BIT_ALPHA)
+DEFINE_CHAR_TEST (uppercasep, CHAR_BIT_UPPER)
+DEFINE_CHAR_TEST (lowercasep, CHAR_BIT_LOWER)
+DEFINE_CHAR_TEST (graphicp, CHAR_BIT_GRAPH)
+DEFINE_CHAR_TEST (printablep, CHAR_BIT_PRINT)
+
+#undef DEFINE_CHAR_TEST
+
+
 extern EMACS_INT char_resolve_modifier_mask (EMACS_INT) ATTRIBUTE_CONST;
 extern int char_string (unsigned, unsigned char *);
 extern int string_char (const unsigned char *,
@@ -676,13 +746,6 @@ extern ptrdiff_t lisp_string_width (Lisp_Object, ptrdiff_t,
 extern Lisp_Object Vchar_unify_table;
 extern Lisp_Object string_escape_byte8 (Lisp_Object);
 
-extern bool uppercasep (int);
-extern bool lowercasep (int);
-extern bool alphabeticp (int);
-extern bool alphanumericp (int);
-extern bool graphicp (int);
-extern bool printablep (int);
-
 /* Return a translation table of id number ID.  */
 #define GET_TRANSLATION_TABLE(id) \
   (XCDR (XVECTOR (Vtranslation_table_vector)->contents[(id)]))
diff --git a/src/regex.c b/src/regex.c
index 1917a84..02da1fb 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -313,6 +313,11 @@ enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
 
 /* The rest must handle multibyte characters.  */
 
+# define ISALNUM(c) alphanumericp (c)
+# define ISALPHA(c) alphabeticp (c)
+# define ISUPPER(c) uppercasep (c)
+# define ISLOWER(c) lowercasep (c)
+
 # define ISGRAPH(c) (SINGLE_BYTE_CHAR_P (c)				\
 		     ? (c) > ' ' && !((c) >= 0177 && (c) <= 0240)	\
 		     : graphicp (c))
@@ -321,19 +326,6 @@ enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
 		    ? (c) >= ' ' && !((c) >= 0177 && (c) <= 0237)	\
 		     : printablep (c))
 
-# define ISALNUM(c) (IS_REAL_ASCII (c)			\
-		    ? (((c) >= 'a' && (c) <= 'z')	\
-		       || ((c) >= 'A' && (c) <= 'Z')	\
-		       || ((c) >= '0' && (c) <= '9'))	\
-		    : alphanumericp (c))
-
-# define ISALPHA(c) (IS_REAL_ASCII (c)			\
-		    ? (((c) >= 'a' && (c) <= 'z')	\
-		       || ((c) >= 'A' && (c) <= 'Z'))	\
-		    : alphabeticp (c))
-
-# define ISLOWER(c) lowercasep (c)
-
 # define ISPUNCT(c) (IS_REAL_ASCII (c)				\
 		    ? ((c) > ' ' && (c) < 0177			\
 		       && !(((c) >= 'a' && (c) <= 'z')		\
@@ -343,8 +335,6 @@ enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
 
 # define ISSPACE(c) (SYNTAX (c) == Swhitespace)
 
-# define ISUPPER(c) uppercasep (c)
-
 # define ISWORD(c) (SYNTAX (c) == Sword)
 
 #else /* not emacs */
-- 
2.8.0.rc3.226.g39d4020