all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Christian Johansson <christian@cvj.se>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: Lars Ingebrigtsen <larsi@gnus.org>,
	Andreas Schwab <schwab@linux-m68k.org>,
	53680@debbugs.gnu.org
Subject: bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
Date: Tue, 1 Feb 2022 14:12:19 +0100	[thread overview]
Message-ID: <BA0DE426-164A-436B-AA98-A9BD5CCD15AC@cvj.se> (raw)
In-Reply-To: <965416B7-4ACC-4571-B2C2-8607A741212F@acm.org>

Alright, thanks for helping me find the correct regexp, this issue only appeared on some peculiar strings

> 1 feb. 2022 kl. 13:56 skrev Mattias Engdegård <mattiase@acm.org>:
> 
> 
>> 
>> (string-match·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·") 
> 
> The diagnostics by Lars and Andreas is correct. Let's look at it more closely, first translating the regexp to rx for ease of reasoning, and see if we can make it work:
> 
> (rx (* (in "\t\r "))
>    "implements"
>    (+ (in "\t\r "))
>    (+ (group
>        (* (in "\t\r "))
>        (+ (in "0-9A-Za-z" "\\_"))
>        (? ",")))
>    (* (in "\t\r "))
>    "{" eol)
> 
> The first line is meaningless since it can match the empty string, but you probably want to anchor the start of "implements" so that it doesn't match "house_implements". Let's also drop the capture group, and we get:
> 
> (rx symbol-start "implements"
>    (+ (in "\t\r "))
>    (+ (* (in "\t\r "))
>       (+ (in "0-9A-Za-z" "\\_"))
>       (? ","))
>    (* (in "\t\r "))
>    "{" eol)
> 
> You clearly want to match a non-empty sequence of 'words' separated with whitespace and/or commas, but the pattern is ambiguous --  all inter-word separators are optional. Let's make them mandatory:
> 
> (rx symbol-start "implements"
>     ;; mandatory whitespace
>     (+ (in "\t\r "))
>     ;; then a word
>     (+ (in "0-9A-Za-z" "\\_"))
>     ;; then maybe more words, each prefixed by spaces or comma
>     (* (+ (in "\t\r ,"))   ; fast and loose
>        (+ (in "0-9A-Za-z" "\\_")))
>     ;; finally whitespace before the curly bracket
>     (* (in "\t\r "))
>     "{" eol)
> 
> which is reasonably efficient, since all ambiguity is now gone: the regexp can (almost) only match in one way.
> 
> Note the "fast and loose" pattern where we accept any number of spaces or commas. Here it depends on your grammar but if you want exactly one comma separating each word, that subexpression would be something like
> 
>  (* (in "\t\r "))
>  ","
>  (* (in "\t\r "))
> 
> instead.
> 





  reply	other threads:[~2022-02-01 13:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01  7:37 bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50 Christian Johansson
2022-02-01 11:00 ` Lars Ingebrigtsen
2022-02-01 11:08   ` Lars Ingebrigtsen
2022-02-01 11:25     ` Andreas Schwab
2022-02-01 11:15   ` Andreas Schwab
2022-02-01 12:16     ` Lars Ingebrigtsen
2022-02-01 12:56 ` Mattias Engdegård
2022-02-01 13:12   ` Christian Johansson [this message]
     [not found]     ` <AB866492-4CC0-418A-8C9E-CAAB2C522CDA@acm.org>
2022-02-01 16:23       ` Christian Johansson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BA0DE426-164A-436B-AA98-A9BD5CCD15AC@cvj.se \
    --to=christian@cvj.se \
    --cc=53680@debbugs.gnu.org \
    --cc=larsi@gnus.org \
    --cc=mattiase@acm.org \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.