* Any performance comparison/guide of/for Emacs regex?
@ 2011-01-18 15:38 Oleksandr Gavenko
2011-01-18 15:59 ` Deniz Dogan
0 siblings, 1 reply; 6+ messages in thread
From: Oleksandr Gavenko @ 2011-01-18 15:38 UTC (permalink / raw)
To: help-gnu-emacs
Any performance comparison/guide of/for Emacs regex?
Which construction use to get more performance?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any performance comparison/guide of/for Emacs regex?
2011-01-18 15:38 Any performance comparison/guide of/for Emacs regex? Oleksandr Gavenko
@ 2011-01-18 15:59 ` Deniz Dogan
0 siblings, 0 replies; 6+ messages in thread
From: Deniz Dogan @ 2011-01-18 15:59 UTC (permalink / raw)
To: Oleksandr Gavenko; +Cc: help-gnu-emacs
2011/1/18 Oleksandr Gavenko <gavenko@bifit.com.ua>:
> Any performance comparison/guide of/for Emacs regex?
>
> Which construction use to get more performance?
>
Try regexp-opt: http://www.emacswiki.org/emacs/RegexpOpt
--
Deniz Dogan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any performance comparison/guide of/for Emacs regex?
[not found] <mailman.16.1295365226.21909.help-gnu-emacs@gnu.org>
@ 2011-01-18 18:29 ` Stefan Monnier
2011-01-18 21:48 ` Oleksandr Gavenko
[not found] ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
2011-01-18 21:27 ` Tim X
1 sibling, 2 replies; 6+ messages in thread
From: Stefan Monnier @ 2011-01-18 18:29 UTC (permalink / raw)
To: help-gnu-emacs
> Any performance comparison/guide of/for Emacs regex?
Not really. What you need to know is that Emacs's regexp engine is
based on backtracking, so things like .*\\(.*\\).* is insanely
inefficient and will make you think Emacs is frozen.
Generally, you'll prefer to use regexps that don't require backtracking.
E.g. [^(\n]*( works much better than .*( since when encountering an open
parenthesis, the matcher will know for sure that it has to leave the *
loop, rather than having to try both cases in sequence.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any performance comparison/guide of/for Emacs regex?
[not found] <mailman.16.1295365226.21909.help-gnu-emacs@gnu.org>
2011-01-18 18:29 ` Stefan Monnier
@ 2011-01-18 21:27 ` Tim X
1 sibling, 0 replies; 6+ messages in thread
From: Tim X @ 2011-01-18 21:27 UTC (permalink / raw)
To: help-gnu-emacs
Oleksandr Gavenko <gavenko@bifit.com.ua> writes:
> Any performance comparison/guide of/for Emacs regex?
>
> Which construction use to get more performance?
>
>
None that I'm aware of.
With respect to performance, just follow the golden rule and make sure
your regexp is anchored in some way.
Many regexp implementations use a backtrackin form of regexp. If you
don't anchor your regexp in some way, the amount of work it does in
trying to find a match before giving up can grow very fast and can even
give the impression the system is locked up.
Common anchoring techniques include using ^ and $ to anchor your regexp
to the start/end of a line and avoiding regexp with 'match everything'
type wildcards at the start and end. If you know your regexp needs to
match a specific sequence, include it and be as specific as possible.
Use the correct meta characters such as ?, *, + etc. Take advantage of
shy groups and non-greedy forms when they make sense. In general, be as
precise as you can.
Tim
--
tcross (at) rapttech dot com dot au
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any performance comparison/guide of/for Emacs regex?
2011-01-18 18:29 ` Stefan Monnier
@ 2011-01-18 21:48 ` Oleksandr Gavenko
[not found] ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 6+ messages in thread
From: Oleksandr Gavenko @ 2011-01-18 21:48 UTC (permalink / raw)
To: help-gnu-emacs
On 2011-01-18 20:29, Stefan Monnier wrote:
>> Any performance comparison/guide of/for Emacs regex?
>
> Not really. What you need to know is that Emacs's regexp engine is
> based on backtracking, so things like .*\\(.*\\).* is insanely
> inefficient and will make you think Emacs is frozen.
>
Thanks for answer!
How about if \\(.*\\) used to enclose a set of `\|'
alternatives or for referencing in font-lock expressions?
> Generally, you'll prefer to use regexps that don't require backtracking.
> E.g. [^(\n]*( works much better than .*( since when encountering an open
> parenthesis, the matcher will know for sure that it has to leave the *
> loop, rather than having to try both cases in sequence.
>
I get basic understand but read more about backtracking
in regex engine some later. Thanks.
--
Best regards!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any performance comparison/guide of/for Emacs regex?
[not found] ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
@ 2011-01-19 1:22 ` Stefan Monnier
0 siblings, 0 replies; 6+ messages in thread
From: Stefan Monnier @ 2011-01-19 1:22 UTC (permalink / raw)
To: help-gnu-emacs
>> Not really. What you need to know is that Emacs's regexp engine is
>> based on backtracking, so things like .*\\(.*\\).* is insanely
>> inefficient and will make you think Emacs is frozen.
> How about if \\(.*\\) used to enclose a set of `\|'
> alternatives or for referencing in font-lock expressions?
The \(..\) subgroups are cheap, so I wouldn't worry about them
w.r.t performance. If you really care about their performance impact,
you can use the non-numbered (aka "shy") groups which are written
\(?:...\), but I doubt you'd be able to measure a difference.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-01-19 1:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-18 15:38 Any performance comparison/guide of/for Emacs regex? Oleksandr Gavenko
2011-01-18 15:59 ` Deniz Dogan
[not found] <mailman.16.1295365226.21909.help-gnu-emacs@gnu.org>
2011-01-18 18:29 ` Stefan Monnier
2011-01-18 21:48 ` Oleksandr Gavenko
[not found] ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
2011-01-19 1:22 ` Stefan Monnier
2011-01-18 21:27 ` Tim X
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).