From: Stefan Monnier <monnier@iro.umontreal.ca>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: Emacs developers <emacs-devel@gnu.org>
Subject: Re: master ea93326: Add `union' and `intersection' to rx (bug#37849)
Date: Sun, 15 Dec 2019 15:04:29 -0500 [thread overview]
Message-ID: <jwvwoaxmklo.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <379396AE-D709-4F6F-AE7C-30321A5452C4@acm.org> ("Mattias \=\?windows-1252\?Q\?Engdeg\=E5rd\=22's\?\= message of "Sun, 15 Dec 2019 20:23:17 +0100")
>>> A bit overkill just for matching a set of constant strings, don't you think?
>> I think there's a lot of implicit assumptions here.
>> Yes, there are cases where you may want the "longest match" rule and
>> where `posix-string-match` can be too costly, but the ones I can think
>> of seem to be fairly contrived.
> Perhaps I should have underlined that it is only literal strings that is of
> immediate concern, since that is what regexp-opt is used for. It is not
> a contrived situation to have a set of strings -- keywords, for instance --
> not necessarily anchored by something else at the end.
We need more elements for a realistic scenario. E.g. when the regexp
match fails, `posix-string-match` has the same cost as `string-match`,
so not only you need the end of the regexp not to be anchored to something else
at the end, but you also need all of the below:
- the match should be frequent enough for performance to matter
- the match should almost always succeed
- it needs to matter exactly where the match end
- one of the matched words needs to be a prefix of another
- you can "extract the next word" and look it up in a hash-table instead
of performing a regexp match
FWIW, I think we can fix this by using a non-backtracking regexp
matcher, but I don't see it as a strong motivation for such a change
(there are good motivations for that, but this one is a pretty weak one
in my book).
Stefan
next prev parent reply other threads:[~2019-12-15 20:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20191210213842.5388.30110@vcs0.savannah.gnu.org>
[not found] ` <20191210213843.EB6A520A23@vcs0.savannah.gnu.org>
2019-12-10 21:52 ` master ea93326: Add `union' and `intersection' to rx (bug#37849) Stefan Monnier
2019-12-11 11:17 ` Mattias Engdegård
2019-12-11 15:10 ` Stefan Monnier
2019-12-12 22:48 ` Mattias Engdegård
2019-12-13 13:55 ` Stefan Monnier
2019-12-13 17:03 ` Mattias Engdegård
2019-12-13 17:13 ` Stefan Monnier
2019-12-13 17:43 ` Mattias Engdegård
2019-12-13 23:03 ` Stefan Monnier
2019-12-15 11:08 ` Mattias Engdegård
2019-12-15 14:53 ` Stefan Monnier
2019-12-15 19:23 ` Mattias Engdegård
2019-12-15 20:04 ` Stefan Monnier [this message]
2019-12-15 20:42 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwvwoaxmklo.fsf-monnier+emacs@gnu.org \
--to=monnier@iro.umontreal.ca \
--cc=emacs-devel@gnu.org \
--cc=mattiase@acm.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.