unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Any performance comparison/guide of/for Emacs regex?
@ 2011-01-18 15:38 Oleksandr Gavenko
  2011-01-18 15:59 ` Deniz Dogan
  0 siblings, 1 reply; 6+ messages in thread
From: Oleksandr Gavenko @ 2011-01-18 15:38 UTC (permalink / raw)
  To: help-gnu-emacs

Any performance comparison/guide of/for Emacs regex?

Which construction use to get more performance?




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any performance comparison/guide of/for Emacs regex?
  2011-01-18 15:38 Oleksandr Gavenko
@ 2011-01-18 15:59 ` Deniz Dogan
  0 siblings, 0 replies; 6+ messages in thread
From: Deniz Dogan @ 2011-01-18 15:59 UTC (permalink / raw)
  To: Oleksandr Gavenko; +Cc: help-gnu-emacs

2011/1/18 Oleksandr Gavenko <gavenko@bifit.com.ua>:
> Any performance comparison/guide of/for Emacs regex?
>
> Which construction use to get more performance?
>

Try regexp-opt: http://www.emacswiki.org/emacs/RegexpOpt

-- 
Deniz Dogan



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any performance comparison/guide of/for Emacs regex?
       [not found] <mailman.16.1295365226.21909.help-gnu-emacs@gnu.org>
@ 2011-01-18 18:29 ` Stefan Monnier
  2011-01-18 21:48   ` Oleksandr Gavenko
       [not found]   ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
  2011-01-18 21:27 ` Tim X
  1 sibling, 2 replies; 6+ messages in thread
From: Stefan Monnier @ 2011-01-18 18:29 UTC (permalink / raw)
  To: help-gnu-emacs

> Any performance comparison/guide of/for Emacs regex?

Not really.  What you need to know is that Emacs's regexp engine is
based on backtracking, so things like .*\\(.*\\).* is insanely
inefficient and will make you think Emacs is frozen.

Generally, you'll prefer to use regexps that don't require backtracking.
E.g. [^(\n]*( works much better than .*( since when encountering an open
parenthesis, the matcher will know for sure that it has to leave the *
loop, rather than having to try both cases in sequence.


        Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any performance comparison/guide of/for Emacs regex?
       [not found] <mailman.16.1295365226.21909.help-gnu-emacs@gnu.org>
  2011-01-18 18:29 ` Any performance comparison/guide of/for Emacs regex? Stefan Monnier
@ 2011-01-18 21:27 ` Tim X
  1 sibling, 0 replies; 6+ messages in thread
From: Tim X @ 2011-01-18 21:27 UTC (permalink / raw)
  To: help-gnu-emacs

Oleksandr Gavenko <gavenko@bifit.com.ua> writes:

> Any performance comparison/guide of/for Emacs regex?
>
> Which construction use to get more performance?
>
>

None that I'm aware of. 

With respect to performance, just follow the golden rule and make sure
your regexp is anchored in some way. 

Many regexp implementations use a backtrackin form of regexp. If you
don't anchor your regexp in some way, the amount of work it does in
trying to find a match before giving up can grow very fast and can even
give the impression the system is locked up. 

Common anchoring techniques include using ^ and $ to anchor your regexp
to the start/end of a line and avoiding regexp with 'match everything'
type wildcards at the start and end. If you know your regexp needs to
match a specific sequence, include it and be as specific as possible.
Use the correct meta characters such as ?, *, + etc. Take advantage of
shy groups and non-greedy forms when they make sense. In general, be as
precise as you can.

Tim

-- 
tcross (at) rapttech dot com dot au


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any performance comparison/guide of/for Emacs regex?
  2011-01-18 18:29 ` Any performance comparison/guide of/for Emacs regex? Stefan Monnier
@ 2011-01-18 21:48   ` Oleksandr Gavenko
       [not found]   ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 6+ messages in thread
From: Oleksandr Gavenko @ 2011-01-18 21:48 UTC (permalink / raw)
  To: help-gnu-emacs

On 2011-01-18 20:29, Stefan Monnier wrote:
>> Any performance comparison/guide of/for Emacs regex?
>
> Not really.  What you need to know is that Emacs's regexp engine is
> based on backtracking, so things like .*\\(.*\\).* is insanely
> inefficient and will make you think Emacs is frozen.
>
Thanks for answer!

How about if \\(.*\\) used to enclose a set of `\|'
alternatives or for referencing in font-lock expressions?

> Generally, you'll prefer to use regexps that don't require backtracking.
> E.g. [^(\n]*( works much better than .*( since when encountering an open
> parenthesis, the matcher will know for sure that it has to leave the *
> loop, rather than having to try both cases in sequence.
>
I get basic understand but read more about backtracking
in regex engine some later. Thanks.

-- 
Best regards!




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any performance comparison/guide of/for Emacs regex?
       [not found]   ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
@ 2011-01-19  1:22     ` Stefan Monnier
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Monnier @ 2011-01-19  1:22 UTC (permalink / raw)
  To: help-gnu-emacs

>> Not really.  What you need to know is that Emacs's regexp engine is
>> based on backtracking, so things like .*\\(.*\\).* is insanely
>> inefficient and will make you think Emacs is frozen.
> How about if \\(.*\\) used to enclose a set of `\|'
> alternatives or for referencing in font-lock expressions?

The \(..\) subgroups are cheap, so I wouldn't worry about them
w.r.t performance.  If you really care about their performance impact,
you can use the non-numbered (aka "shy") groups which are written
\(?:...\), but I doubt you'd be able to measure a difference.


        Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-01-19  1:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.16.1295365226.21909.help-gnu-emacs@gnu.org>
2011-01-18 18:29 ` Any performance comparison/guide of/for Emacs regex? Stefan Monnier
2011-01-18 21:48   ` Oleksandr Gavenko
     [not found]   ` <mailman.3.1295387371.19799.help-gnu-emacs@gnu.org>
2011-01-19  1:22     ` Stefan Monnier
2011-01-18 21:27 ` Tim X
2011-01-18 15:38 Oleksandr Gavenko
2011-01-18 15:59 ` Deniz Dogan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).