all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: Emacs developers <emacs-devel@gnu.org>
Subject: Re: master ea93326: Add `union' and `intersection' to rx (bug#37849)
Date: Sun, 15 Dec 2019 15:04:29 -0500	[thread overview]
Message-ID: <jwvwoaxmklo.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <379396AE-D709-4F6F-AE7C-30321A5452C4@acm.org> ("Mattias \=\?windows-1252\?Q\?Engdeg\=E5rd\=22's\?\= message of "Sun, 15 Dec 2019 20:23:17 +0100")

>>> A bit overkill just for matching a set of constant strings, don't you think?
>> I think there's a lot of implicit assumptions here.
>> Yes, there are cases where you may want the "longest match" rule and
>> where `posix-string-match` can be too costly, but the ones I can think
>> of seem to be fairly contrived.
> Perhaps I should have underlined that it is only literal strings that is of
> immediate concern, since that is what regexp-opt is used for. It is not
> a contrived situation to have a set of strings -- keywords, for instance --
> not necessarily anchored by something else at the end.

We need more elements for a realistic scenario. E.g. when the regexp
match fails, `posix-string-match` has the same cost as `string-match`,
so not only you need the end of the regexp not to be anchored to something else
at the end, but you also need all of the below:

- the match should be frequent enough for performance to matter
- the match should almost always succeed
- it needs to matter exactly where the match end
- one of the matched words needs to be a prefix of another
- you can "extract the next word" and look it up in a hash-table instead
  of performing a regexp match

FWIW, I think we can fix this by using a non-backtracking regexp
matcher, but I don't see it as a strong motivation for such a change
(there are good motivations for that, but this one is a pretty weak one
in my book).


        Stefan




  reply	other threads:[~2019-12-15 20:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20191210213842.5388.30110@vcs0.savannah.gnu.org>
     [not found] ` <20191210213843.EB6A520A23@vcs0.savannah.gnu.org>
2019-12-10 21:52   ` master ea93326: Add `union' and `intersection' to rx (bug#37849) Stefan Monnier
2019-12-11 11:17     ` Mattias Engdegård
2019-12-11 15:10       ` Stefan Monnier
2019-12-12 22:48         ` Mattias Engdegård
2019-12-13 13:55           ` Stefan Monnier
2019-12-13 17:03             ` Mattias Engdegård
2019-12-13 17:13               ` Stefan Monnier
2019-12-13 17:43                 ` Mattias Engdegård
2019-12-13 23:03                   ` Stefan Monnier
2019-12-15 11:08                     ` Mattias Engdegård
2019-12-15 14:53                       ` Stefan Monnier
2019-12-15 19:23                         ` Mattias Engdegård
2019-12-15 20:04                           ` Stefan Monnier [this message]
2019-12-15 20:42                             ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvwoaxmklo.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=emacs-devel@gnu.org \
    --cc=mattiase@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.