From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Noam Postavsky Newsgroups: gmane.emacs.bugs Subject: bug#24914: 24.5; isearch-regexp: wrong error message Date: Sat, 09 Dec 2017 21:18:05 -0500 Message-ID: <87vahfe36q.fsf@users.sourceforge.net> References: <7c208ac0-8aa2-4db8-a38d-760f91c50500@default> <87h8t7ix7m.fsf@users.sourceforge.net> <87d13visrh.fsf@users.sourceforge.net> <87shcrgg8g.fsf@users.sourceforge.net> <87h8t6gegl.fsf@users.sourceforge.net> <83wp1xwnx1.fsf@gnu.org> <874lp1fipx.fsf@users.sourceforge.net> <83bmj9wan8.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1512872353 22239 195.159.176.226 (10 Dec 2017 02:19:13 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 10 Dec 2017 02:19:13 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.90 (gnu/linux) Cc: 24914@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Dec 10 03:19:06 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eNrCs-0005UY-5Y for geb-bug-gnu-emacs@m.gmane.org; Sun, 10 Dec 2017 03:19:06 +0100 Original-Received: from localhost ([::1]:43218 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eNrCz-0001aF-4r for geb-bug-gnu-emacs@m.gmane.org; Sat, 09 Dec 2017 21:19:13 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59808) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eNrCs-0001Zx-E7 for bug-gnu-emacs@gnu.org; Sat, 09 Dec 2017 21:19:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eNrCo-0006IP-Bf for bug-gnu-emacs@gnu.org; Sat, 09 Dec 2017 21:19:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:45662) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eNrCo-0006Hh-74 for bug-gnu-emacs@gnu.org; Sat, 09 Dec 2017 21:19:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eNrCn-0002Sn-Rn for bug-gnu-emacs@gnu.org; Sat, 09 Dec 2017 21:19:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Noam Postavsky Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Dec 2017 02:19:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24914 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: confirmed patch Original-Received: via spool by 24914-submit@debbugs.gnu.org id=B24914.15128722979418 (code B ref 24914); Sun, 10 Dec 2017 02:19:01 +0000 Original-Received: (at 24914) by debbugs.gnu.org; 10 Dec 2017 02:18:17 +0000 Original-Received: from localhost ([127.0.0.1]:54343 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eNrC5-0002Rq-4k for submit@debbugs.gnu.org; Sat, 09 Dec 2017 21:18:17 -0500 Original-Received: from mail-io0-f171.google.com ([209.85.223.171]:38685) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eNrC3-0002Rd-66 for 24914@debbugs.gnu.org; Sat, 09 Dec 2017 21:18:15 -0500 Original-Received: by mail-io0-f171.google.com with SMTP id d14so6200701ioc.5 for <24914@debbugs.gnu.org>; Sat, 09 Dec 2017 18:18:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=l1xLH31GuPN15Jo73QID0UjITDdb+CQvHraNvnaqyBo=; b=q9QFleR5yYAxvKYp1LniK/nPZXrepEbgjCAYItWNxpwg04YLrpW+n6kSJeZCjYMyE5 sJdxUZ9nB3jYKy5MUz51kJpHQqw9bnPS7eTJcSGeskB7VpDWjn7qqgm74LAk+lC/+Ywl 3BAKoDdbF3Idi8OnhlolDCXwzZXulMSNVAT35eKsf/7e7SV9MFDcDOWHv3IoaHkRs08K XCfA7v6siCXSUHDngNZpOZjnO6cpLhMHb1fTQKVhdxriH2+LBNqPH6BIoJ5K2eWvFmFp DB0bainRYJ5RNGh4CdMl1n93+YycydyNWo2j4fmEIbTbpVo4xCPxsFDZCh5U3Kje7jX1 CUEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=l1xLH31GuPN15Jo73QID0UjITDdb+CQvHraNvnaqyBo=; b=NO7z3L8U1u1PaW92GjxgG2ogoaxW3hPqM28/exHqtOaCYUadnws7u0hHz/JZavQnVr 63DV6xYtHBcnXLlO6KHWG/EbA1gFw6qF2M1YJWxVM47nqkkcRnSvGF51cEJS9vLrWy9o r5BhHJBej4/GcCUTV4xpO0gFecyDboFprsgrHpgjIMTJ0rcwxPtC9AvzpyR+Y9FP4FT0 Ssm8pRMf8Ec/YQoQ0juMaHccJBEt29aDzwc+Cb3KnC2R0nPQBaG+tBX/WbXShgVSKvHH iN+4nJuvJrAghM504dO1e3csPoSVCFCX/ypkNAJc7B4BQ56pBVh/i2VL4EOj0zQik13E PEVg== X-Gm-Message-State: AKGB3mKwWjncbe+vVr2xZd4L0TJNFCNcGuIH0ERYnQDBIQaelcGoT+wl 42GA+msAEzpIe9H0e6C3b95Ldw== X-Google-Smtp-Source: ACJfBosSxMNNTvbXMaGG26tjm9eTH3m0CfNLQa8UmNiViOOy3BxZ/d2D+u3HsZba58h86R4KNZj3SQ== X-Received: by 10.107.163.14 with SMTP id m14mr10198322ioe.73.1512872289259; Sat, 09 Dec 2017 18:18:09 -0800 (PST) Original-Received: from zebian ([45.2.119.34]) by smtp.googlemail.com with ESMTPSA id i133sm2549690itf.1.2017.12.09.18.18.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 09 Dec 2017 18:18:07 -0800 (PST) In-Reply-To: <83bmj9wan8.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 08 Dec 2017 16:35:07 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:140879 Archived-At: --=-=-= Content-Type: text/plain Eli Zaretskii writes: >> I thought it would be easier to document the limit if it's fixed across >> all machines. Otherwise we would have to say something like "For both >> forms, m and n, if specified, may be no larger than INT_MAX, which is >> usually 2**31 - 1, but could be 2**63 - 1 depending on the compiler used >> for building Emacs". > > Isn't int 32 bit wide everywhere? I might have been mixing up int with long when I was thinking about this; it seems only a few very obscure platforms have 64 bit ints. According to [1], everywhere but "HAL Computer Systems port of Solaris to the SPARC64" and "Classic UNICOS" has 32 bit ints. [1]: https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models > And anyway, since the bitmap is stored in an int, isn't INT_MAX TRT? Unfortunately, all this discussion of int size seems to be academic. I took another look at the code, there is another limit due to regexp opcode format. We can raise the limit to 2^16-1 though. Here is the use of RE_DUP_MAX, which makes it seem like int-size is the main limit: /* Get the next unsigned number in the uncompiled pattern. */ #define GET_INTERVAL_COUNT(num) \ ... if (RE_DUP_MAX / 10 - (RE_DUP_MAX % 10 < c - '0') < num) \ FREE_STACK_RETURN (REG_ESIZEBR); \ static reg_errcode_t regex_compile (const_re_char *pattern, size_t size, { ... int lower_bound = 0, upper_bound = -1; [...] GET_INTERVAL_COUNT (lower_bound); But then INSERT_JUMP2 (succeed_n, laststart, b + 5 + nbytes, lower_bound); /* Like `STORE_JUMP2', but for inserting. Assume `b' is the buffer end. */ #define INSERT_JUMP2(op, loc, to, arg) \ insert_op2 (op, loc, (to) - (loc) - 3, arg, b) /* Like `insert_op1', but for two two-byte parameters ARG1 and ARG2. */ ^^^^^^^^ static void insert_op2 (re_opcode_t op, unsigned char *loc, int arg1, int arg2, unsigned char *end) { ... store_op2 (op, loc, arg1, arg2); } /* Like `store_op1', but for two two-byte parameters ARG1 and ARG2. */ ^^^^^^^^ static void store_op2 (re_opcode_t op, unsigned char *loc, int arg1, int arg2) { *loc = (unsigned char) op; STORE_NUMBER (loc + 1, arg1); STORE_NUMBER (loc + 3, arg2); } /* Store NUMBER in two contiguous bytes starting at DESTINATION. */ ^^^^^^^^^^^^^^^^^^^^ #define STORE_NUMBER(destination, number) \ do { \ (destination)[0] = (number) & 0377; \ (destination)[1] = (number) >> 8; \ } while (0) Here is the updated patch: --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=0001-Raise-limit-of-regexp-repetition-Bug-24914.patch Content-Description: patch >From 6c3ead6bd5c61801915dcedbb8dd17622610a899 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Sat, 2 Dec 2017 19:01:54 -0500 Subject: [PATCH] Raise limit of regexp repetition (Bug#24914) * src/regex.h (RE_DUP_MAX): Raise limit to 2^16-1. * etc/NEWS: Announce it. * doc/lispref/searching.texi (Regexp Backslash): Document it. * test/src/regex-tests.el (regex-repeat-limit): Test it. * src/regex.h (reg_errcode_t): Add REG_ESIZEBR code. * src/regex.c (re_error_msgid): Add corresponding entry. (GET_INTERVAL_COUNT): Return it instead of the more generic REG_EBADBR when encountering a repetition greater than RE_DUP_MAX. * lisp/isearch.el (isearch-search): Don't convert errors starting with "Invalid" into "incomplete". Such errors are not incomplete, in the sense that they cannot be corrected by appending more characters to the end of the regexp. The affected error messages are: - REG_BADPAT "Invalid regular expression" - \\(?X:\\) where X is not a legal group number - \\_X where X is not < or > - REG_ECOLLATE "Invalid collation character" - There is no code to throw this. - REG_ECTYPE "Invalid character class name" - [[:foo:] where foo is not a valid class name - REG_ESUBREG "Invalid back reference" - \N where N is referenced before matching group N - REG_BADBR "Invalid content of \\{\\}" - \\{N,M\\} where N < 0, M < N, M or N larger than max - \\{NX where X is not a digit or backslash - \\{N\\X where X is not a } - REG_ERANGE "Invalid range end" - There is no code to throw this. - REG_BADRPT "Invalid preceding regular expression" - We never throw this. It would usually indicate a "*" with no preceding regexp text, but Emacs allows that to match a literal "*". --- doc/lispref/searching.texi | 10 +++++++++- etc/NEWS | 8 ++++++++ lisp/isearch.el | 2 +- src/regex.c | 5 +++-- src/regex.h | 9 ++++++--- test/src/regex-tests.el | 6 ++++++ 6 files changed, 33 insertions(+), 7 deletions(-) diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index 755fa554bb..ab52cf2802 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi @@ -639,7 +639,15 @@ Regexp Backslash is a more general postfix operator that specifies repetition with a minimum of @var{m} repeats and a maximum of @var{n} repeats. If @var{m} is omitted, the minimum is 0; if @var{n} is omitted, there is no -maximum. +maximum. For both forms, @var{m} and @var{n}, if specified, may be no +larger than +@ifnottex +2**16 @minus{} 1 +@end ifnottex +@tex +@math{2^{16}-1} +@end tex +. For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car}, @samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and diff --git a/etc/NEWS b/etc/NEWS index 64b53d88c8..c7efc53f6a 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -509,6 +509,14 @@ instead. ** The new user option 'arabic-shaper-ZWNJ-handling' controls how to handle ZWNJ in Arabic text rendering. ++++ +** The limit on repetitions in regexps has been raised to 2^16-1. +It was previously undocumented and limited to 2^15-1. For example, +the following regular expression was previously invalid, but is now +accepted: + + x\{32768\} + * Editing Changes in Emacs 26.1 diff --git a/lisp/isearch.el b/lisp/isearch.el index 13fa97ea71..093185a096 100644 --- a/lisp/isearch.el +++ b/lisp/isearch.el @@ -2851,7 +2851,7 @@ isearch-search (setq isearch-error (car (cdr lossage))) (cond ((string-match - "\\`Premature \\|\\`Unmatched \\|\\`Invalid " + "\\`Premature \\|\\`Unmatched " isearch-error) (setq isearch-error "incomplete input")) ((and (not isearch-regexp) diff --git a/src/regex.c b/src/regex.c index 330f2f78a8..ab74f457d4 100644 --- a/src/regex.c +++ b/src/regex.c @@ -1200,7 +1200,8 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax) gettext_noop ("Premature end of regular expression"), /* REG_EEND */ gettext_noop ("Regular expression too big"), /* REG_ESIZE */ gettext_noop ("Unmatched ) or \\)"), /* REG_ERPAREN */ - gettext_noop ("Range striding over charsets") /* REG_ERANGEX */ + gettext_noop ("Range striding over charsets"), /* REG_ERANGEX */ + gettext_noop ("Invalid content of \\{\\}, repetitions too big") /* REG_ESIZEBR */ }; /* Whether to allocate memory during matching. */ @@ -1921,7 +1922,7 @@ while (REMAINING_AVAIL_SLOTS <= space) { \ if (num < 0) \ num = 0; \ if (RE_DUP_MAX / 10 - (RE_DUP_MAX % 10 < c - '0') < num) \ - FREE_STACK_RETURN (REG_BADBR); \ + FREE_STACK_RETURN (REG_ESIZEBR); \ num = num * 10 + c - '0'; \ if (p == pend) \ FREE_STACK_RETURN (REG_EBRACE); \ diff --git a/src/regex.h b/src/regex.h index 9fa8356011..4c8632d6aa 100644 --- a/src/regex.h +++ b/src/regex.h @@ -270,8 +270,10 @@ #ifdef RE_DUP_MAX # undef RE_DUP_MAX #endif -/* If sizeof(int) == 2, then ((1 << 15) - 1) overflows. */ -#define RE_DUP_MAX (0x7fff) +/* Repeat counts are stored in opcodes as 2 byte integers. This was + previously limited to 7fff because the parsing code uses signed + ints. But Emacs only runs on 32 bit platforms anyway. */ +#define RE_DUP_MAX (0xffff) /* POSIX `cflags' bits (i.e., information for `regcomp'). */ @@ -337,7 +339,8 @@ REG_EEND, /* Premature end. */ REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */ REG_ERPAREN, /* Unmatched ) or \); not returned from regcomp. */ - REG_ERANGEX /* Range striding over charsets. */ + REG_ERANGEX, /* Range striding over charsets. */ + REG_ESIZEBR /* n or m too big in \{n,m\} */ } reg_errcode_t; /* This data structure represents a compiled pattern. Before calling diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el index b1f1ea71ce..872d16a085 100644 --- a/test/src/regex-tests.el +++ b/test/src/regex-tests.el @@ -677,4 +677,10 @@ regex-tests-TESTS This evaluates the TESTS test cases from glibc." (should-not (regex-tests-TESTS))) +(ert-deftest regex-repeat-limit () + "Test the #xFFFF repeat limit." + (should (string-match "\\`x\\{65535\\}" (make-string 65535 ?x))) + (should-not (string-match "\\`x\\{65535\\}" (make-string 65534 ?x))) + (should-error (string-match "\\`x\\{65536\\}" "X") :type invalid-regexp)) + ;;; regex-tests.el ends here -- 2.11.0 --=-=-=--