From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Michal Nazarewicz <mina86@mina86.com>
Newsgroups: gmane.emacs.bugs
Subject: bug#24071: [PATCH] Refactor regex character class parsing in [:name:]
Date: Tue, 26 Jul 2016 00:54:05 +0200
Message-ID: <1469487245-11126-1-git-send-email-mina86@mina86.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1469487330 3368 80.91.229.3 (25 Jul 2016 22:55:30 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 25 Jul 2016 22:55:30 +0000 (UTC)
To: 24071@debbugs.gnu.org
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jul 26 00:55:18 2016
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1bRomH-000064-UH
	for geb-bug-gnu-emacs@m.gmane.org; Tue, 26 Jul 2016 00:55:14 +0200
Original-Received: from localhost ([::1]:35728 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1bRomH-0007pM-84
	for geb-bug-gnu-emacs@m.gmane.org; Mon, 25 Jul 2016 18:55:13 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46896)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1bRomA-0007lV-3L
	for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:55:08 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1bRom6-000671-Lx
	for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:55:05 -0400
Original-Received: from debbugs.gnu.org ([208.118.235.43]:53196)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1bRom6-00066x-I6
	for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:55:02 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1bRom6-00051i-Bv
	for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:55:02 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Michal Nazarewicz <mina86@mina86.com>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Mon, 25 Jul 2016 22:55:02 +0000
Resent-Message-ID: <handler.24071.B.146948727419279@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: report 24071
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: patch
X-Debbugs-Original-To: bug-gnu-emacs@gnu.org
Original-Received: via spool by submit@debbugs.gnu.org id=B.146948727419279
	(code B ref -1); Mon, 25 Jul 2016 22:55:02 +0000
Original-Received: (at submit) by debbugs.gnu.org; 25 Jul 2016 22:54:34 +0000
Original-Received: from localhost ([127.0.0.1]:37300 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1bRold-00050t-Nd
	for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:34 -0400
Original-Received: from eggs.gnu.org ([208.118.235.92]:55824)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <mpn@google.com>) id 1bRolb-00050f-3C
	for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:31 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mpn@google.com>) id 1bRolT-0005sH-Dg
	for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:25 -0400
Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:42783)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mpn@google.com>) id 1bRolT-0005r0-AK
	for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:23 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46781)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mpn@google.com>) id 1bRolP-0007ax-Pr
	for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:54:21 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mpn@google.com>) id 1bRolN-0005oH-Ap
	for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:54:19 -0400
Original-Received: from mail-wm0-x231.google.com ([2a00:1450:400c:c09::231]:38653)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mpn@google.com>) id 1bRolM-0005nJ-RY
	for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:54:17 -0400
Original-Received: by mail-wm0-x231.google.com with SMTP id o80so177033260wme.1
	for <bug-gnu-emacs@gnu.org>; Mon, 25 Jul 2016 15:54:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113;
	h=sender:from:to:subject:date:message-id:mime-version
	:content-transfer-encoding;
	bh=FcEjFQ+83mh++7v+kFomi+vajanEptI4Ru6908zsEqY=;
	b=ZsT0HoecG0SMNwu/vqCFqGH7O25ioJb5b2PBoVK18dOYTl0vbwRoFF++hLC2ohJjN3
	6J3PxK8FBzfshgeiv2SRFUL7DyBdKDS5CU17GH9Gh/KGaplSgg4otCHyf7lHxwWipk7E
	bSxRblcVS63U+kc1frwBoou5AMz7RqZDNLcHV/df6HOL9yVto+GgRV0BZuIA/555iuGD
	eUhPdq/adMKFU+kALXOzT3oHdD3geYYkLJKDuJPVao+l1v03WMnlutz2VWjGEDXNWyOq
	IB7orx1syhLsUnA+t6ozTst3HwMl+Xn/btohypqAoFNh7cbphmA61s4jSHhLyjvYjP/4
	2wxA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:from:to:subject:date:message-id
	:mime-version:content-transfer-encoding;
	bh=FcEjFQ+83mh++7v+kFomi+vajanEptI4Ru6908zsEqY=;
	b=E63BUVx4JXhyTDcVwAh8EmUNeGmPh//Pj/LEdoivl5/crch/5k4v3T6Ikh14zpxTJa
	ERjCirGVsHWpX7aDKNYe/hY3mfDMLu0PQHfhl26QZWWxAXjZA/zPWw1nXmL4MINBx6Ik
	ro/K0FXwy+Hoy3Ei5RM1RU2b1LA3aUL6XsoYPE6QLlA2DpjPUzAL9Y9ydaYKYYyYWoW2
	PUhzEJKmhuT7imJCXObN585iVYarH4I8pg6Mxp16K2774FJvMYS2oP/0OBtGizAN56K3
	KsBYooI/wrjv+WhEUbbNN8twfvUrWB3WXz69FGr6GJi9jOgdnnk/m1/eHCpM063DYCap
	Vx6A==
X-Gm-Message-State: ALyK8tJq6QZWepkuAFBhTuEH9lhyKtwAPOraNPGGE18G7mGKdjsDeChS/r5U5XmKo301DJc4
X-Received: by 10.28.199.4 with SMTP id x4mr40468597wmf.70.1469487254383;
	Mon, 25 Jul 2016 15:54:14 -0700 (PDT)
Original-Received: from mpn.zrh.corp.google.com ([172.16.113.135])
	by smtp.gmail.com with ESMTPSA id
	c8sm17983949wjm.19.2016.07.25.15.54.12 for <bug-gnu-emacs@gnu.org>
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Mon, 25 Jul 2016 15:54:12 -0700 (PDT)
Original-Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942)
	id D022B1E35E8; Tue, 26 Jul 2016 00:54:11 +0200 (CEST)
X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 208.118.235.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs/>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: "bug-gnu-emacs"
	<bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.bugs:121541
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/121541>

re_wctype function is used in three separate places and in all of
those places almost exact code extracting the name from [:name:]
surrounds it.  Furthermore, re_wctype requires a NUL-terminated
string, so the name of the character class is copied to a temporary
buffer.

The code duplication and unnecessary memory copying can be avoided by
pushing the responsibility of parsing the whole [:name:] sequence to
the function.

Furthermore, since now the function has access to the length of the
character class name (since it’s doing the parsing), it can take
advantage of that information in skipping some string comparisons and
using a constant-length memcmp instead of strcmp which needs to take
care of NUL bytes.

* src/regex.c (re_wctype): Delete function.  Replace it with:
(re_wctype_parse): New function which parses a whole [:name:] string
and returns a RECC_* constant or -1 if the string is not of [:name:]
format.
(regex_compile): Use re_wctype_parse.
* src/syntax.c (skip_chars): Use re_wctype_parse.
---
 src/regex.c  | 312 +++++++++++++++++++++++++++++------------------------------
 src/regex.h  |  14 +--
 src/syntax.c |  96 +++++-------------
 3 files changed, 183 insertions(+), 239 deletions(-)

 Unless there is some opposition to this patch, I’ll submit it in
 a week or so.

diff --git a/src/regex.c b/src/regex.c
index 297bf71..50e6862 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -1969,29 +1969,98 @@ struct range_table_work_area
 
 #if ! WIDE_CHAR_SUPPORT
 
-/* Map a string to the char class it names (if any).  */
+/* Parse a character class, i.e. string such as "[:name:]".  *strp
+   points to the string to be parsed and limit is length, in bytes, of
+   that string.
+
+   If *strp point to a string that begins with "[:name:]", where name is
+   a non-empty sequence of lower case letters, *strp will be advanced past the
+   closing square bracket and RECC_* constant which maps to the name will be
+   returned.  If name is not a valid character class name zero, or RECC_ERROR,
+   is returned.
+
+   Otherwise, if *strp doesn’t begin with "[:name:]", -1 is returned.
+
+   The function can be used on ASCII as well as multitude strings.
+ */
 re_wctype_t
-re_wctype (const_re_char *str)
+re_wctype_parse (const unsigned char **strp, unsigned limit)
 {
-  const char *string = (const char *) str;
-  if      (STREQ (string, "alnum"))	return RECC_ALNUM;
-  else if (STREQ (string, "alpha"))	return RECC_ALPHA;
-  else if (STREQ (string, "word"))	return RECC_WORD;
-  else if (STREQ (string, "ascii"))	return RECC_ASCII;
-  else if (STREQ (string, "nonascii"))	return RECC_NONASCII;
-  else if (STREQ (string, "graph"))	return RECC_GRAPH;
-  else if (STREQ (string, "lower"))	return RECC_LOWER;
-  else if (STREQ (string, "print"))	return RECC_PRINT;
-  else if (STREQ (string, "punct"))	return RECC_PUNCT;
-  else if (STREQ (string, "space"))	return RECC_SPACE;
-  else if (STREQ (string, "upper"))	return RECC_UPPER;
-  else if (STREQ (string, "unibyte"))	return RECC_UNIBYTE;
-  else if (STREQ (string, "multibyte"))	return RECC_MULTIBYTE;
-  else if (STREQ (string, "digit"))	return RECC_DIGIT;
-  else if (STREQ (string, "xdigit"))	return RECC_XDIGIT;
-  else if (STREQ (string, "cntrl"))	return RECC_CNTRL;
-  else if (STREQ (string, "blank"))	return RECC_BLANK;
-  else return 0;
+  const char *beg = (const char *)*strp, *end, *it;
+
+  if (limit < 5 || beg[0] != '[' || beg[1] != ':')
+    return -1;
+
+  end = beg + limit - 2;  /* ‘- 2’ for the closing ":]" */
+  beg += 2;  /* skip opening "[:" */
+  it = beg;
+  while (it != end && *it >= 'a' && *it <= 'z')
+    ++it;
+  if (it[0] != ':' || it[1] != ']')
+    return -1;
+
+  *strp = (const unsigned char *)(it + 2);
+
+  /* Sort tests in the length=five case by frequency the classes to minimise
+     number of times we fail the comparison.  The frequencies of character class
+     names used in Emacs sources as of 2016-07-15:
+
+     $ find \( -name \*.c -o -name \*.el \) -exec grep -h '\[:[a-z]*:]' {} + |
+           sed 's/]/]\n/g' |grep -o '\[:[a-z]*:]' |sort |uniq -c |sort -nr
+         213 [:alnum:]
+         104 [:alpha:]
+          62 [:space:]
+          39 [:digit:]
+          36 [:blank:]
+          26 [:upper:]
+          24 [:word:]
+          21 [:lower:]
+          10 [:punct:]
+          10 [:ascii:]
+           9 [:xdigit:]
+           4 [:nonascii:]
+           4 [:graph:]
+           2 [:print:]
+           2 [:cntrl:]
+           1 [:ff:]
+
+     If you update this list, consider also updating chain of or’ed conditions
+     in execute_charset function.
+   */
+
+  switch (it - beg) {
+  case 4:
+    if (!memcmp (beg, "word", 4))      return RECC_WORD;
+    break;
+  case 5:
+    if (!memcmp (beg, "alnum", 5))     return RECC_ALNUM;
+    if (!memcmp (beg, "alpha", 5))     return RECC_ALPHA;
+    if (!memcmp (beg, "space", 5))     return RECC_SPACE;
+    if (!memcmp (beg, "digit", 5))     return RECC_DIGIT;
+    if (!memcmp (beg, "blank", 5))     return RECC_BLANK;
+    if (!memcmp (beg, "upper", 5))     return RECC_UPPER;
+    if (!memcmp (beg, "lower", 5))     return RECC_LOWER;
+    if (!memcmp (beg, "punct", 5))     return RECC_PUNCT;
+    if (!memcmp (beg, "ascii", 5))     return RECC_ASCII;
+    if (!memcmp (beg, "graph", 5))     return RECC_GRAPH;
+    if (!memcmp (beg, "print", 5))     return RECC_PRINT;
+    if (!memcmp (beg, "cntrl", 5))     return RECC_CNTRL;
+    break;
+  case 6:
+    if (!memcmp (beg, "xdigit", 6))    return RECC_XDIGIT;
+    break;
+  case 7:
+    if (!memcmp (beg, "unibyte", 7))   return RECC_UNIBYTE;
+    break;
+  case 8:
+    if (!memcmp (beg, "nonascii", 8))  return RECC_NONASCII;
+    break;
+  case 9:
+    if (!memcmp (beg, "multibyte", 9)) return RECC_MULTIBYTE;
+    break;
+  }
+
+  return RECC_ERROR;
 }
 
 /* True if CH is in the char class CC.  */
@@ -2776,10 +2845,74 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax,
 	      {
 		boolean escaped_char = false;
 		const unsigned char *p2 = p;
+		re_wctype_t cc;
 		re_wchar_t ch;
 
 		if (p == pend) FREE_STACK_RETURN (REG_EBRACK);
 
+		/* See if we're at the beginning of a possible character
+		   class.  */
+		if (syntax & RE_CHAR_CLASSES &&
+		    (cc = re_wctype_parse(&p, pend - p)) != -1)
+		  {
+		    if (cc == 0)
+		      FREE_STACK_RETURN (REG_ECTYPE);
+
+		    if (p == pend)
+		      FREE_STACK_RETURN (REG_EBRACK);
+
+#ifndef emacs
+		    for (ch = 0; ch < (1 << BYTEWIDTH); ++ch)
+		      if (re_iswctype (btowc (ch), cc))
+			{
+			  c = TRANSLATE (ch);
+			  if (c < (1 << BYTEWIDTH))
+			    SET_LIST_BIT (c);
+			}
+#else  /* emacs */
+		    /* Most character classes in a multibyte match just set
+		       a flag.  Exceptions are is_blank, is_digit, is_cntrl, and
+		       is_xdigit, since they can only match ASCII characters.
+		       We don't need to handle them for multibyte.  They are
+		       distinguished by a negative wctype.  */
+
+		    /* Setup the gl_state object to its buffer-defined value.
+		       This hardcodes the buffer-global syntax-table for ASCII
+		       chars, while the other chars will obey syntax-table
+		       properties.  It's not ideal, but it's the way it's been
+		       done until now.  */
+		    SETUP_BUFFER_SYNTAX_TABLE ();
+
+		    for (ch = 0; ch < 256; ++ch)
+		      {
+			c = RE_CHAR_TO_MULTIBYTE (ch);
+			if (! CHAR_BYTE8_P (c)
+			    && re_iswctype (c, cc))
+			  {
+			    SET_LIST_BIT (ch);
+			    c1 = TRANSLATE (c);
+			    if (c1 == c)
+			      continue;
+			    if (ASCII_CHAR_P (c1))
+			      SET_LIST_BIT (c1);
+			    else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0)
+			      SET_LIST_BIT (c1);
+			  }
+		      }
+		    SET_RANGE_TABLE_WORK_AREA_BIT
+		      (range_table_work, re_wctype_to_bit (cc));
+#endif	/* emacs */
+		    /* In most cases the matching rule for char classes only
+		       uses the syntax table for multibyte chars, so that the
+		       content of the syntax-table is not hardcoded in the
+		       range_table.  SPACE and WORD are the two exceptions.  */
+		    if ((1 << cc) & ((1 << RECC_SPACE) | (1 << RECC_WORD)))
+		      bufp->used_syntax = 1;
+
+		    /* Repeat the loop. */
+		    continue;
+		  }
+
 		/* Don't translate yet.  The range TRANSLATE(X..Y) cannot
 		   always be determined from TRANSLATE(X) and TRANSLATE(Y)
 		   So the translation is done later in a loop.  Example:
@@ -2803,119 +2936,6 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax,
 		      break;
 		  }
 
-		/* See if we're at the beginning of a possible character
-		   class.  */
-
-		if (!escaped_char &&
-		    syntax & RE_CHAR_CLASSES && c == '[' && *p == ':')
-		  {
-		    /* Leave room for the null.  */
-		    unsigned char str[CHAR_CLASS_MAX_LENGTH + 1];
-		    const unsigned char *class_beg;
-
-		    PATFETCH (c);
-		    c1 = 0;
-		    class_beg = p;
-
-		    /* If pattern is `[[:'.  */
-		    if (p == pend) FREE_STACK_RETURN (REG_EBRACK);
-
-		    for (;;)
-		      {
-		        PATFETCH (c);
-		        if ((c == ':' && *p == ']') || p == pend)
-		          break;
-			if (c1 < CHAR_CLASS_MAX_LENGTH)
-			  str[c1++] = c;
-			else
-			  /* This is in any case an invalid class name.  */
-			  str[0] = '\0';
-		      }
-		    str[c1] = '\0';
-
-		    /* If isn't a word bracketed by `[:' and `:]':
-		       undo the ending character, the letters, and
-		       leave the leading `:' and `[' (but set bits for
-		       them).  */
-		    if (c == ':' && *p == ']')
-		      {
-			re_wctype_t cc = re_wctype (str);
-
-			if (cc == 0)
-			  FREE_STACK_RETURN (REG_ECTYPE);
-
-                        /* Throw away the ] at the end of the character
-                           class.  */
-                        PATFETCH (c);
-
-                        if (p == pend) FREE_STACK_RETURN (REG_EBRACK);
-
-#ifndef emacs
-			for (ch = 0; ch < (1 << BYTEWIDTH); ++ch)
-			  if (re_iswctype (btowc (ch), cc))
-			    {
-			      c = TRANSLATE (ch);
-			      if (c < (1 << BYTEWIDTH))
-				SET_LIST_BIT (c);
-			    }
-#else  /* emacs */
-			/* Most character classes in a multibyte match
-			   just set a flag.  Exceptions are is_blank,
-			   is_digit, is_cntrl, and is_xdigit, since
-			   they can only match ASCII characters.  We
-			   don't need to handle them for multibyte.
-			   They are distinguished by a negative wctype.  */
-
-			/* Setup the gl_state object to its buffer-defined
-			   value.  This hardcodes the buffer-global
-			   syntax-table for ASCII chars, while the other chars
-			   will obey syntax-table properties.  It's not ideal,
-			   but it's the way it's been done until now.  */
-			SETUP_BUFFER_SYNTAX_TABLE ();
-
-			for (ch = 0; ch < 256; ++ch)
-			  {
-			    c = RE_CHAR_TO_MULTIBYTE (ch);
-			    if (! CHAR_BYTE8_P (c)
-				&& re_iswctype (c, cc))
-			      {
-				SET_LIST_BIT (ch);
-				c1 = TRANSLATE (c);
-				if (c1 == c)
-				  continue;
-				if (ASCII_CHAR_P (c1))
-				  SET_LIST_BIT (c1);
-				else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0)
-				  SET_LIST_BIT (c1);
-			      }
-			  }
-			SET_RANGE_TABLE_WORK_AREA_BIT
-			  (range_table_work, re_wctype_to_bit (cc));
-#endif	/* emacs */
-			/* In most cases the matching rule for char classes
-			   only uses the syntax table for multibyte chars,
-			   so that the content of the syntax-table is not
-			   hardcoded in the range_table.  SPACE and WORD are
-			   the two exceptions.  */
-			if ((1 << cc) & ((1 << RECC_SPACE) | (1 << RECC_WORD)))
-			  bufp->used_syntax = 1;
-
-			/* Repeat the loop. */
-			continue;
-		      }
-		    else
-		      {
-			/* Go back to right after the "[:".  */
-			p = class_beg;
-			SET_LIST_BIT ('[');
-
-			/* Because the `:' may start the range, we
-			   can't simply set bit and repeat the loop.
-			   Instead, just set it to C and handle below.  */
-			c = ':';
-		      }
-		  }
-
 		if (p < pend && p[0] == '-' && p[1] != ']')
 		  {
 
@@ -4659,28 +4679,8 @@ execute_charset (const_re_char **pp, unsigned c, unsigned corig, bool unibyte)
       re_wchar_t range_start, range_end;
 
   /* Sort tests by the most commonly used classes with some adjustment to which
-     tests are easiest to perform.  Frequencies of character class names used in
-     Emacs sources as of 2016-07-15:
-
-     $ find \( -name \*.c -o -name \*.el \) -exec grep -h '\[:[a-z]*:]' {} + |
-           sed 's/]/]\n/g' |grep -o '\[:[a-z]*:]' |sort |uniq -c |sort -nr
-         213 [:alnum:]
-         104 [:alpha:]
-          62 [:space:]
-          39 [:digit:]
-          36 [:blank:]
-          26 [:upper:]
-          24 [:word:]
-          21 [:lower:]
-          10 [:punct:]
-          10 [:ascii:]
-           9 [:xdigit:]
-           4 [:nonascii:]
-           4 [:graph:]
-           2 [:print:]
-           2 [:cntrl:]
-           1 [:ff:]
-   */
+     tests are easiest to perform.  Take a look at comment in re_wctype_parse
+     for table with frequencies of character class names. */
 
       if ((class_bits & BIT_MULTIBYTE) ||
 	  (class_bits & BIT_ALNUM && ISALNUM (c)) ||
diff --git a/src/regex.h b/src/regex.h
index 817167a..01b659a 100644
--- a/src/regex.h
+++ b/src/regex.h
@@ -585,25 +585,13 @@ extern void regfree (regex_t *__preg);
 /* Solaris 2.5 has a bug: <wchar.h> must be included before <wctype.h>.  */
 # include <wchar.h>
 # include <wctype.h>
-#endif
 
-#if WIDE_CHAR_SUPPORT
-/* The GNU C library provides support for user-defined character classes
-   and the functions from ISO C amendment 1.  */
-# ifdef CHARCLASS_NAME_MAX
-#  define CHAR_CLASS_MAX_LENGTH CHARCLASS_NAME_MAX
-# else
-/* This shouldn't happen but some implementation might still have this
-   problem.  Use a reasonable default value.  */
-#  define CHAR_CLASS_MAX_LENGTH 256
-# endif
 typedef wctype_t re_wctype_t;
 typedef wchar_t re_wchar_t;
 # define re_wctype wctype
 # define re_iswctype iswctype
 # define re_wctype_to_bit(cc) 0
 #else
-# define CHAR_CLASS_MAX_LENGTH  9 /* Namely, `multibyte'.  */
 # ifndef emacs
 #  define btowc(c) c
 # endif
@@ -621,7 +609,7 @@ typedef enum { RECC_ERROR = 0,
 } re_wctype_t;
 
 extern char re_iswctype (int ch,    re_wctype_t cc);
-extern re_wctype_t re_wctype (const unsigned char* str);
+extern re_wctype_t re_wctype_parse (const unsigned char **strp, unsigned limit);
 
 typedef int re_wchar_t;
 
diff --git a/src/syntax.c b/src/syntax.c
index f8d987b..667de40 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1691,44 +1691,22 @@ skip_chars (bool forwardp, Lisp_Object string, Lisp_Object lim,
       /* At first setup fastmap.  */
       while (i_byte < size_byte)
 	{
-	  c = str[i_byte++];
-
-	  if (handle_iso_classes && c == '['
-	      && i_byte < size_byte
-	      && str[i_byte] == ':')
+	  if (handle_iso_classes)
 	    {
-	      const unsigned char *class_beg = str + i_byte + 1;
-	      const unsigned char *class_end = class_beg;
-	      const unsigned char *class_limit = str + size_byte - 2;
-	      /* Leave room for the null.  */
-	      unsigned char class_name[CHAR_CLASS_MAX_LENGTH + 1];
-	      re_wctype_t cc;
-
-	      if (class_limit - class_beg > CHAR_CLASS_MAX_LENGTH)
-		class_limit = class_beg + CHAR_CLASS_MAX_LENGTH;
-
-	      while (class_end < class_limit
-		     && *class_end >= 'a' && *class_end <= 'z')
-		class_end++;
-
-	      if (class_end == class_beg
-		  || *class_end != ':' || class_end[1] != ']')
-		goto not_a_class_name;
-
-	      memcpy (class_name, class_beg, class_end - class_beg);
-	      class_name[class_end - class_beg] = 0;
-
-	      cc = re_wctype (class_name);
+	      const unsigned char *ch = str + i_byte;
+	      re_wctype_t cc = re_wctype_parse (&ch, size_byte - i_byte);
 	      if (cc == 0)
 		error ("Invalid ISO C character class");
-
-	      iso_classes = Fcons (make_number (cc), iso_classes);
-
-	      i_byte = class_end + 2 - str;
-	      continue;
+	      if (cc != -1)
+		{
+		  iso_classes = Fcons (make_number (cc), iso_classes);
+		  i_byte = ch - str;
+		  continue;
+		}
 	    }
 
-	not_a_class_name:
+	  c = str[i_byte++];
+
 	  if (c == '\\')
 	    {
 	      if (i_byte == size_byte)
@@ -1808,54 +1786,32 @@ skip_chars (bool forwardp, Lisp_Object string, Lisp_Object lim,
       while (i_byte < size_byte)
 	{
 	  int leading_code = str[i_byte];
-	  c = STRING_CHAR_AND_LENGTH (str + i_byte, len);
-	  i_byte += len;
 
-	  if (handle_iso_classes && c == '['
-	      && i_byte < size_byte
-	      && STRING_CHAR (str + i_byte) == ':')
+	  if (handle_iso_classes)
 	    {
-	      const unsigned char *class_beg = str + i_byte + 1;
-	      const unsigned char *class_end = class_beg;
-	      const unsigned char *class_limit = str + size_byte - 2;
-	      /* Leave room for the null.	 */
-	      unsigned char class_name[CHAR_CLASS_MAX_LENGTH + 1];
-	      re_wctype_t cc;
-
-	      if (class_limit - class_beg > CHAR_CLASS_MAX_LENGTH)
-		class_limit = class_beg + CHAR_CLASS_MAX_LENGTH;
-
-	      while (class_end < class_limit
-		     && *class_end >= 'a' && *class_end <= 'z')
-		class_end++;
-
-	      if (class_end == class_beg
-		  || *class_end != ':' || class_end[1] != ']')
-		goto not_a_class_name_multibyte;
-
-	      memcpy (class_name, class_beg, class_end - class_beg);
-	      class_name[class_end - class_beg] = 0;
-
-	      cc = re_wctype (class_name);
+	      const unsigned char *ch = str + i_byte;
+	      re_wctype_t cc = re_wctype_parse (&ch, size_byte - i_byte);
 	      if (cc == 0)
 		error ("Invalid ISO C character class");
-
-	      iso_classes = Fcons (make_number (cc), iso_classes);
-
-	      i_byte = class_end + 2 - str;
-	      continue;
+	      if (cc != -1)
+		{
+		  iso_classes = Fcons (make_number (cc), iso_classes);
+		  i_byte = ch - str;
+		  continue;
+		}
 	    }
 
-	not_a_class_name_multibyte:
-	  if (c == '\\')
+	  if (leading_code== '\\')
 	    {
-	      if (i_byte == size_byte)
+	      if (++i_byte == size_byte)
 		break;
 
 	      leading_code = str[i_byte];
-	      c = STRING_CHAR_AND_LENGTH (str + i_byte, len);
-	      i_byte += len;
 	    }
+	  c = STRING_CHAR_AND_LENGTH (str + i_byte, len);
+	  i_byte += len;
+
+
 	  /* Treat `-' as range character only if another character
 	     follows.  */
 	  if (i_byte + 1 < size_byte
-- 
2.8.0.rc3.226.g39d4020