unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
	emacs-devel <emacs-devel@gnu.org>
Subject: Re: :alnum: broken?
Date: Wed, 26 Feb 2020 15:10:46 +0100	[thread overview]
Message-ID: <03A37C4B-9FE8-4A25-9851-79BC8265455E@acm.org> (raw)
In-Reply-To: <b1ea62a0-54f2-55f4-de1d-f9be6caf2b82@cs.ucla.edu>

[-- Attachment #1: Type: text/plain, Size: 408 bytes --]

I just made this very mistake while adding a new regexp-error checking feature to xr. Needless to say I now am strongly in favour of turning it into a hard error.

Patch attached. It was written for master, but I would suggest it go in emacs-27.

The error message could be improved. For the benefit of isearch-forward-regexp, it's probably a good idea if it doesn't start or end in a square bracket.


[-- Attachment #2: 0001-Signal-an-error-for-the-regexp-alnum.patch --]
[-- Type: application/octet-stream, Size: 3737 bytes --]

From 014a7a7dce5ae23b8a47dd68eaaef0a5cb985b46 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Wed, 26 Feb 2020 14:46:01 +0100
Subject: [PATCH] Signal an error for the regexp "[:alnum:]"

Omitting the extra brackets is a common mistake; see discussion at
https://lists.gnu.org/archive/html/emacs-devel/2020-02/msg00215.html

* src/regex-emacs.c (reg_errcode_t, re_error_msgid): Add REG_ECLASSBR.
(regex_compile): Check for the mistake.
* test/src/regex-emacs-tests.el (regexp-invalid): Test.
* etc/NEWS: Announce.
---
 etc/NEWS                      |  5 +++++
 src/regex-emacs.c             | 21 ++++++++++++++++++++-
 test/src/regex-emacs-tests.el |  4 ++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/etc/NEWS b/etc/NEWS
index 54aab1a5b6..404b4b9ebd 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -190,6 +190,11 @@ Emacs now supports bignums so this old glitch is no longer needed.
 'previous-system-time-locale' have been removed, as they were created
 by mistake and were not useful to Lisp code.
 
+** The regexp mistake '[:digit:]' is now an error.
+The correct syntax is '[[:digit:]]'.  Previously, forgetting the extra
+brackets silently resulted in a regexp that did not at all work as
+intended.
+
 \f
 * Lisp Changes in Emacs 28.1
 
diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 694431c95e..2648e1d6ae 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -818,7 +818,8 @@ print_double_string (re_char *where, re_char *string1, ptrdiff_t size1,
   REG_ESIZE,		/* Compiled pattern bigger than 2^16 bytes.  */
   REG_ERPAREN,		/* Unmatched ) or \); not returned from regcomp.  */
   REG_ERANGEX,		/* Range striding over charsets.  */
-  REG_ESIZEBR           /* n or m too big in \{n,m\} */
+  REG_ESIZEBR,          /* n or m too big in \{n,m\} */
+  REG_ECLASSBR,         /* Missing [] around [:class:].  */
 } reg_errcode_t;
 
 static const char *re_error_msgid[] =
@@ -842,6 +843,7 @@ print_double_string (re_char *where, re_char *string1, ptrdiff_t size1,
    [REG_ERPAREN] = "Unmatched ) or \\)",
    [REG_ERANGEX ] = "Range striding over charsets",
    [REG_ESIZEBR ] = "Invalid content of \\{\\}",
+   [REG_ECLASSBR] = "Class syntax is [[:digit:]], not [:digit:]",
   };
 
 /* For 'regs_allocated'.  */
@@ -2000,6 +2002,23 @@ regex_compile (re_char *pattern, ptrdiff_t size,
 
 	    laststart = b;
 
+            /* Check for the mistake of forgetting the extra square brackets,
+               as in "[:alpha:]".  */
+            if (*p == ':')
+              {
+                re_char *q = p + 1;
+                while (q != pend && *q != ']')
+                  {
+                    if (*q == ':')
+                      {
+                        if (q + 1 != pend && q[1] == ']' && q > p + 1)
+                          FREE_STACK_RETURN (REG_ECLASSBR);
+                        break;
+                      }
+                    q++;
+                  }
+              }
+
 	    /* Test '*p == '^' twice, instead of using an if
 	       statement, so we need only one BUF_PUSH.  */
 	    BUF_PUSH (*p == '^' ? charset_not : charset);
diff --git a/test/src/regex-emacs-tests.el b/test/src/regex-emacs-tests.el
index f9372e37b1..d268b97080 100644
--- a/test/src/regex-emacs-tests.el
+++ b/test/src/regex-emacs-tests.el
@@ -803,4 +803,8 @@ regexp-multibyte-unibyte
   (should-not (string-match "å" "\xe5"))
   (should-not (string-match "[å]" "\xe5")))
 
+(ert-deftest regexp-invalid ()
+  (should-error (string-match "[:space:]" "")
+                :type 'invalid-regexp))
+
 ;;; regex-emacs-tests.el ends here
-- 
2.21.1 (Apple Git-122.3)


  parent reply	other threads:[~2020-02-26 14:10 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21 18:58 :alnum: broken? Stephen Leake
2020-02-21 19:00 ` Paul Eggert
2020-02-21 19:32   ` Mattias Engdegård
2020-02-21 21:28   ` Stephen Leake
2020-02-22  1:09     ` Paul Eggert
2020-02-22  7:48       ` Eli Zaretskii
2020-02-22 21:28         ` Paul Eggert
2020-02-23  3:28           ` Eli Zaretskii
2020-02-23 10:21       ` Mattias Engdegård
2020-02-23 18:13         ` Paul Eggert
2020-02-23 18:27           ` Eli Zaretskii
2020-02-23 19:34             ` Óscar Fuentes
2020-02-23 22:12               ` Drew Adams
2020-02-25  3:57                 ` Richard Stallman
2020-02-25 14:37                   ` Stefan Monnier
2020-02-25 15:45                     ` Drew Adams
2020-02-25 15:40                   ` Drew Adams
2020-02-25  9:33                 ` Andreas Schwab
2020-02-25 13:53                   ` Clément Pit-Claudel
2020-02-25 15:40                   ` Drew Adams
2020-02-23 18:40           ` Mattias Engdegård
2020-02-26 14:10           ` Mattias Engdegård [this message]
2020-02-26 14:54             ` Drew Adams
2020-02-26 15:48             ` Stefan Monnier
2020-02-26 21:00               ` Paul Eggert
2020-02-26 21:18                 ` Mattias Engdegård
2020-02-26 21:24                   ` Clément Pit-Claudel
2020-02-26 22:01                     ` Mattias Engdegård
2020-02-26 22:38                       ` Eli Zaretskii
2020-02-27 17:57                         ` Mattias Engdegård
2020-02-27 23:17                           ` Óscar Fuentes
2020-02-28  8:09                           ` Eli Zaretskii
2020-02-28  8:48                             ` Paul Eggert
2020-02-28 13:11                               ` Eli Zaretskii
2020-02-28 17:41                                 ` Paul Eggert
2020-02-28 20:09                                   ` Eli Zaretskii
2020-02-28 20:25                                     ` Paul Eggert
2020-02-28 20:38                                       ` Eli Zaretskii
2020-02-28 21:04                                         ` Mattias Engdegård
2020-02-28 21:40                                           ` Eli Zaretskii
2020-02-29 11:43                                             ` Mattias Engdegård
2020-02-29 12:07                                               ` Eli Zaretskii
2020-02-29 14:24                                                 ` Stefan Monnier
2020-02-29 14:14                                               ` Stefan Monnier
2020-02-29 17:33                                                 ` Óscar Fuentes
2020-02-29 19:52                                                   ` Stefan Monnier
2020-02-29 21:12                                                     ` Óscar Fuentes
2020-02-29 22:22                                                       ` Marcin Borkowski
2020-02-29 22:34                                                         ` Clément Pit-Claudel
2020-03-01 22:44                                                           ` Marcin Borkowski
2020-03-02  3:07                                                             ` Stefan Monnier
2020-03-02  7:15                                                               ` Marcin Borkowski
2020-03-02  7:41                                                                 ` Emanuel Berg via Emacs development discussions.
2020-03-02 16:14                                                                   ` Drew Adams
2020-03-02 16:51                                                                     ` Emanuel Berg via Emacs development discussions.
2020-03-02  7:56                                                                 ` Joost Kremers
2020-03-02  9:44                                                                   ` Štěpán Němec
2020-03-02 10:43                                                                     ` Joost Kremers
2020-03-02 13:37                                                                 ` Clément Pit-Claudel
2020-03-02 17:03                                                                 ` Stefan Monnier
2020-03-02 18:23                                                                   ` Clément Pit-Claudel
2020-02-29 23:02                                                         ` Andrea Corallo
2020-03-01 22:41                                                           ` Marcin Borkowski
2020-02-29 22:58                                                       ` Stefan Monnier
2020-02-29 23:28                                                         ` Óscar Fuentes
2020-02-27  1:33                       ` Paul Eggert
2020-02-26 16:01             ` Andreas Schwab
2020-02-26 21:06               ` Mattias Engdegård
2020-02-27  8:43                 ` Andreas Schwab
2020-02-27 18:05                   ` Mattias Engdegård
2020-02-22  9:09     ` Eli Zaretskii
2020-02-23  3:49     ` Richard Stallman
2020-02-23  7:51       ` Paul Eggert
2020-02-23 15:07         ` Eli Zaretskii
2020-02-21 19:01 ` Noam Postavsky
2020-02-21 19:04 ` Andreas Schwab
     [not found] <<86wo8flqct.fsf@stephe-leake.org>
     [not found] ` <<f89e340f-9a19-50e1-4421-57fd3e235548@cs.ucla.edu>
     [not found]   ` <<86sgj3ljf0.fsf@stephe-leake.org>
     [not found]     ` <<E1j5iGS-0000r6-Id@fencepost.gnu.org>
     [not found]       ` <<18bd2b64-1c2f-055f-4fa0-092bdb1da531@cs.ucla.edu>
     [not found]         ` <<838sktibpu.fsf@gnu.org>
2020-02-23 16:52           ` Drew Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03A37C4B-9FE8-4A25-9851-79BC8265455E@acm.org \
    --to=mattiase@acm.org \
    --cc=eggert@cs.ucla.edu \
    --cc=emacs-devel@gnu.org \
    --cc=stephen_leake@stephe-leake.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).