From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Noam Postavsky Newsgroups: gmane.emacs.bugs Subject: bug#24914: 24.5; isearch-regexp: wrong error message Date: Tue, 05 Dec 2017 21:52:52 -0500 Message-ID: <871sk8h8jf.fsf@users.sourceforge.net> References: <7c208ac0-8aa2-4db8-a38d-760f91c50500@default> <87h8t7ix7m.fsf@users.sourceforge.net> <87d13visrh.fsf@users.sourceforge.net> <87shcrgg8g.fsf@users.sourceforge.net> <87h8t6gegl.fsf@users.sourceforge.net> <878tehhlwo.fsf@users.sourceforge.net> <3a58fdaf-10c0-42e6-8c74-753ce24b969e@default> <87609lgv8r.fsf@users.sourceforge.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1512577161 20732 195.159.176.226 (6 Dec 2017 16:19:21 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 6 Dec 2017 16:19:21 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.90 (gnu/linux) Cc: 24914@debbugs.gnu.org To: Drew Adams Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Dec 06 17:19:09 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eMcPC-0001Nn-RR for geb-bug-gnu-emacs@m.gmane.org; Wed, 06 Dec 2017 17:18:43 +0100 Original-Received: from localhost ([::1]:53632 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eMPqd-0006sG-Rz for geb-bug-gnu-emacs@m.gmane.org; Tue, 05 Dec 2017 21:54:11 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50239) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eMPqX-0006r3-F2 for bug-gnu-emacs@gnu.org; Tue, 05 Dec 2017 21:54:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eMPqU-0003FI-Bp for bug-gnu-emacs@gnu.org; Tue, 05 Dec 2017 21:54:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:39724) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eMPqU-0003Eu-6L for bug-gnu-emacs@gnu.org; Tue, 05 Dec 2017 21:54:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eMPqT-0005FZ-SN for bug-gnu-emacs@gnu.org; Tue, 05 Dec 2017 21:54:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Noam Postavsky Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 06 Dec 2017 02:54:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24914 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: confirmed Original-Received: via spool by 24914-submit@debbugs.gnu.org id=B24914.151252878520117 (code B ref 24914); Wed, 06 Dec 2017 02:54:01 +0000 Original-Received: (at 24914) by debbugs.gnu.org; 6 Dec 2017 02:53:05 +0000 Original-Received: from localhost ([127.0.0.1]:48405 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eMPpY-0005EJ-HK for submit@debbugs.gnu.org; Tue, 05 Dec 2017 21:53:05 -0500 Original-Received: from mail-it0-f51.google.com ([209.85.214.51]:44204) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eMPpW-0005Dk-KE; Tue, 05 Dec 2017 21:53:03 -0500 Original-Received: by mail-it0-f51.google.com with SMTP id b5so5493717itc.3; Tue, 05 Dec 2017 18:53:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=DlwAQxIZGXEwKS02bWX0rnAyf5+nuRLM268qcPnzulU=; b=f0P7vBjYw6Ga1T1qTduNepGPPpXDiVcNxuuZKe3jvfDwxced3Mey/+g4pq2pqIDPtm FX8O390ApyUJJkSF7JnMe5ha+6HiGMif92VB8eqy7ThUudU4q5Drqv5AG9OfRzln6vpA /Tj7oUQ6WjWRK5AiWuMtfAtKWnwKlGXBe/X+K6qwRyUEFVzU+r3wJPC7i5YBS3a0JBf5 VGjb+WiHhacINoNuoBsl82KgDYjyvRkG4TsKELMYSCKmxE9CVsps2qgcGfM1Zbfr21y5 phXwzItnGffxR4PUCuMEJ/NWf3xBh2FJToeaKaw99JNx0CtMrVAhXK+T7ZA4VQooc4p6 h4jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=DlwAQxIZGXEwKS02bWX0rnAyf5+nuRLM268qcPnzulU=; b=auZYHkYbkhatmYwrMr1+eTKJ540B0yFg8PZ72VWOzKcJ66UNmAHa2QPaDJ+XQ8UgI4 Z+Xuj5zz0yKPpsboCMeLivFAaPoOq42a6mVx8PEcCLz+XvrWjDlh5oUQCgyN559fd66E Yht8Z3OQXxE6cjrwa0roC4n5qyiB/+YLi/rnxbx5CqOh81vbcHwjP3pVvYTHDxqIB//m r1ggvZVAxYU9DvNoLSw1iJhuRSwBfhDYxa5kM6/GxDXMGirEXNWkJ48ctvUXTuNxsN9p walx/lntJAVgu2yZZehe6q5nkIfrUyt20pK2GM/Zf2F517Uua0VRJu5hLNIk31pmw+gT 7vBw== X-Gm-Message-State: AKGB3mLOTCzzOSzCwfFp43I4IOxtiLlP/FU03uezZBze0gX/qFSMcwcj ne8QIk+4foi0NlW//HU9FPMQ8Q== X-Google-Smtp-Source: AGs4zMbe2TVlwMDdfrRYFxFOd3LOAwCIEgoKAhRT3Cj+AU8Gss6C/0u3fLKUQewFSrBItJnEwrPDyw== X-Received: by 10.36.58.12 with SMTP id m12mr11225562itm.17.1512528776468; Tue, 05 Dec 2017 18:52:56 -0800 (PST) Original-Received: from zebian ([45.2.119.34]) by smtp.googlemail.com with ESMTPSA id 143sm716505ioo.31.2017.12.05.18.52.53 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Dec 2017 18:52:54 -0800 (PST) In-Reply-To: (Drew Adams's message of "Tue, 5 Dec 2017 07:31:21 -0800 (PST)") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:140739 Archived-At: --=-=-= Content-Type: text/plain tags 24914 + patch quit Drew Adams writes: > Such error text does not, generally and directly, tell > users that the input is incomplete. Users very familiar > with regexps might understand that such a msg implies > that input is incomplete, but not everyone will get that. Hmm, I hadn't considered that possibility, but I will allow that *could* be a symptom of my being overly familiar with regexp syntax. > You apparently think there is never any value in > telling users that the input pattern is not > complete as a regexp. I disagree. We apparently > agree that at least in some cases the specific > regexp-invalidity message is more helpful. Okay, I've looked at the error messages a bit more closely, and I believe all the "Invalid ..." ones should never be considered "incomplete". See commit message for details. --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-Raise-limit-of-regexp-repetition-Bug-24914.patch Content-Description: patch >From 1d32f4d28521a143c333ef4cc125419661e3a3a9 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Sat, 2 Dec 2017 19:01:54 -0500 Subject: [PATCH] Raise limit of regexp repetition (Bug#24914) * src/regex.h (RE_DUP_MAX): Raise limit to 2^32-1. * etc/NEWS: Announce it. * doc/lispref/searching.texi (Regexp Backslash): Document it. * src/regex.h (reg_errcode_t): Add REG_ESIZEBR code. * src/regex.c (re_error_msgid): Add corresponding entry. (GET_INTERVAL_COUNT): Return it instead of the more generic REG_EBADBR when encountering a repetition greater than RE_DUP_MAX. * lisp/isearch.el (isearch-search): Don't convert errors starting with "Invalid" into "incomplete". Such errors are not incomplete, in the sense that they cannot be corrected by appending more characters to the end of the regexp. The affected error messages are: - REG_BADPAT "Invalid regular expression" - \\(?X:\\) where X is not a legal group number - \\_X where X is not < or > - REG_ECOLLATE "Invalid collation character" - There is no code to throw this. - REG_ECTYPE "Invalid character class name" - [[:foo:] where foo is not a valid class name - REG_ESUBREG "Invalid back reference" - \N where N is referenced before matching group N - REG_BADBR "Invalid content of \\{\\}" - \\{N,M\\} where N < 0, M < N, M or N larger than max - \\{NX where X is not a digit or backslash - \\{N\\X where X is not a } - REG_ERANGE "Invalid range end" - There is no code to throw this. - REG_BADRPT "Invalid preceding regular expression" - We never throw this. It would usually indicate a "*" with no preceding regexp text, but Emacs allows that to match a literal "*". --- doc/lispref/searching.texi | 9 ++++++++- etc/NEWS | 8 ++++++++ lisp/isearch.el | 2 +- src/regex.c | 5 +++-- src/regex.h | 9 ++++++--- 5 files changed, 26 insertions(+), 7 deletions(-) diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index 755fa554bb..724d66b5e3 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi @@ -639,7 +639,14 @@ Regexp Backslash is a more general postfix operator that specifies repetition with a minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m} is omitted, the minimum is 0; if @var{n} is omitted, there is no -maximum. +maximum. For both forms, @var{m} and @var{n}, if specified, may be no +larger than +@ifnottex +2**31 @minus{} 1 +@end ifnottex +@tex +@math{2^{31}-1} +@end tex For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car}, @samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and diff --git a/etc/NEWS b/etc/NEWS index 4ccf468693..579cad058e 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -509,6 +509,14 @@ instead. ** The new user option 'arabic-shaper-ZWNJ-handling' controls how to handle ZWNJ in Arabic text rendering. ++++ +** The limit on repetitions in regexps has been raised to 2^31-1. +It was previously undocumented and limited to 2^15-1. For example, +the following regular expression was previously invalid, but is now +accepted: + + x\{32768\} + * Editing Changes in Emacs 26.1 diff --git a/lisp/isearch.el b/lisp/isearch.el index 13fa97ea71..093185a096 100644 --- a/lisp/isearch.el +++ b/lisp/isearch.el @@ -2851,7 +2851,7 @@ isearch-search (setq isearch-error (car (cdr lossage))) (cond ((string-match - "\\`Premature \\|\\`Unmatched \\|\\`Invalid " + "\\`Premature \\|\\`Unmatched " isearch-error) (setq isearch-error "incomplete input")) ((and (not isearch-regexp) diff --git a/src/regex.c b/src/regex.c index 330f2f78a8..ab74f457d4 100644 --- a/src/regex.c +++ b/src/regex.c @@ -1200,7 +1200,8 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax) gettext_noop ("Premature end of regular expression"), /* REG_EEND */ gettext_noop ("Regular expression too big"), /* REG_ESIZE */ gettext_noop ("Unmatched ) or \\)"), /* REG_ERPAREN */ - gettext_noop ("Range striding over charsets") /* REG_ERANGEX */ + gettext_noop ("Range striding over charsets"), /* REG_ERANGEX */ + gettext_noop ("Invalid content of \\{\\}, repetitions too big") /* REG_ESIZEBR */ }; /* Whether to allocate memory during matching. */ @@ -1921,7 +1922,7 @@ while (REMAINING_AVAIL_SLOTS <= space) { \ if (num < 0) \ num = 0; \ if (RE_DUP_MAX / 10 - (RE_DUP_MAX % 10 < c - '0') < num) \ - FREE_STACK_RETURN (REG_BADBR); \ + FREE_STACK_RETURN (REG_ESIZEBR); \ num = num * 10 + c - '0'; \ if (p == pend) \ FREE_STACK_RETURN (REG_EBRACE); \ diff --git a/src/regex.h b/src/regex.h index 9fa8356011..b829848586 100644 --- a/src/regex.h +++ b/src/regex.h @@ -270,8 +270,10 @@ #ifdef RE_DUP_MAX # undef RE_DUP_MAX #endif -/* If sizeof(int) == 2, then ((1 << 15) - 1) overflows. */ -#define RE_DUP_MAX (0x7fff) +/* If sizeof(int) == 4, then ((1 << 31) - 1) overflows. This used to + be limited to 0x7fff, but Emacs never supported 16 bit platforms + anyway. */ +#define RE_DUP_MAX (0x7fffffff) /* POSIX `cflags' bits (i.e., information for `regcomp'). */ @@ -337,7 +339,8 @@ REG_EEND, /* Premature end. */ REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */ REG_ERPAREN, /* Unmatched ) or \); not returned from regcomp. */ - REG_ERANGEX /* Range striding over charsets. */ + REG_ERANGEX, /* Range striding over charsets. */ + REG_ESIZEBR /* n or m too big in \{n,m\} */ } reg_errcode_t; /* This data structure represents a compiled pattern. Before calling -- 2.11.0 --=-=-=--