unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Compiled regexp?
@ 2013-01-31 13:40 Bastien
  2013-01-31 14:08 ` Christopher Schmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Bastien @ 2013-01-31 13:40 UTC (permalink / raw)
  To: emacs-devel; +Cc: Christopher Schmidt, Carsten Dominik

After Christopher submitted a patch for org-mode, Carsten and him
discussed the difference between these two patterns:

  ;; Concat in defconst
  (defconst my-pattern (concat "^" "xyz"))
  (re-search-forward my-pattern ...)

  ;; Concat in re-search-forward
  (defconst my-partial-pattern "xyz")
  (re-search-forward (concat "^" my-partial-pattern) ...)

Both Carsten and I thought there was some optimization done
by Emacs so that the first pattern is more efficient than the
second one.  (concat "^" "xyz") would be "cached", not eval'ed
each time you search for my-pattern.

Christopher pointed to compiled_pattern in the C part of Emacs,
suggesting that there would be no difference between the two
in terms of performance.

Can anyone confirm this is the case?

Thanks,

-- 
 Bastien




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Compiled regexp?
  2013-01-31 13:40 Compiled regexp? Bastien
@ 2013-01-31 14:08 ` Christopher Schmidt
  2013-01-31 14:11   ` Bastien
  2013-01-31 14:26 ` Andreas Schwab
  2013-01-31 15:28 ` Stefan Monnier
  2 siblings, 1 reply; 8+ messages in thread
From: Christopher Schmidt @ 2013-01-31 14:08 UTC (permalink / raw)
  To: emacs-devel

Bastien <bzg@gnu.org> writes:
> After Christopher submitted a patch for org-mode, Carsten and him
> discussed the difference between these two patterns:
>
>   ;; Concat in defconst
>   (defconst my-pattern (concat "^" "xyz"))
>   (re-search-forward my-pattern ...)
>
>   ;; Concat in re-search-forward
>   (defconst my-partial-pattern "xyz")
>   (re-search-forward (concat "^" my-partial-pattern) ...)
>
> Both Carsten and I thought there was some optimization done
> by Emacs so that the first pattern is more efficient than the
> second one.  (concat "^" "xyz") would be "cached", not eval'ed
> each time you search for my-pattern.

No, this is not what we discussed.  re-search-forward is a regular
function and eval'ing (re-search-forward (concat "^" my-partial-pattern)
...) will result in concat being eval'ed, too.  The overhead is
negligible, though.

We discussed how cached compiled regexps are looked up, i.e. per object
or per content.  If the former were true, (re-search-forward (concat "^"
my-partial-pattern) ...) would result in the regular expression being
compiled each and every time because concat returns a new lisp object.
That's slow.

I pointed out that this is does not look meaningful to me.  Ultimately I
took a glimpse at the code.  compile_pattern in search.c compares
the regexps using Fstring_equal.

        Christopher



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Compiled regexp?
  2013-01-31 14:08 ` Christopher Schmidt
@ 2013-01-31 14:11   ` Bastien
  0 siblings, 0 replies; 8+ messages in thread
From: Bastien @ 2013-01-31 14:11 UTC (permalink / raw)
  To: emacs-devel

Christopher Schmidt <christopher@ch.ristopher.com> writes:

> We discussed how cached compiled regexps are looked up, i.e. per object
> or per content.  If the former were true, (re-search-forward (concat "^"
> my-partial-pattern) ...) would result in the regular expression being
> compiled each and every time because concat returns a new lisp object.
> That's slow.

Yes, that's what I tried to say, thanks for clarifying it.

-- 
 Bastien



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Compiled regexp?
  2013-01-31 13:40 Compiled regexp? Bastien
  2013-01-31 14:08 ` Christopher Schmidt
@ 2013-01-31 14:26 ` Andreas Schwab
  2013-01-31 14:42   ` Dominik, Carsten
  2013-01-31 15:28 ` Stefan Monnier
  2 siblings, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2013-01-31 14:26 UTC (permalink / raw)
  To: Bastien; +Cc: Christopher Schmidt, Carsten Dominik, emacs-devel

Bastien <bzg@gnu.org> writes:

> After Christopher submitted a patch for org-mode, Carsten and him
> discussed the difference between these two patterns:
>
>   ;; Concat in defconst
>   (defconst my-pattern (concat "^" "xyz"))
>   (re-search-forward my-pattern ...)
>
>   ;; Concat in re-search-forward
>   (defconst my-partial-pattern "xyz")
>   (re-search-forward (concat "^" my-partial-pattern) ...)
>
> Both Carsten and I thought there was some optimization done
> by Emacs so that the first pattern is more efficient than the
> second one.  (concat "^" "xyz") would be "cached", not eval'ed
> each time you search for my-pattern.

The first pattern evaluates (concat "^" "xyz") once when my-pattern is
defined.  The second case evalutes (concat "^" my-partial-pattern) each
time the containing sexp is evaluated.  Other than that there is no
difference wrt. re-search-forward.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Compiled regexp?
  2013-01-31 14:26 ` Andreas Schwab
@ 2013-01-31 14:42   ` Dominik, Carsten
  0 siblings, 0 replies; 8+ messages in thread
From: Dominik, Carsten @ 2013-01-31 14:42 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Bastien, Christopher Schmidt, Dominik, Carsten,
	<emacs-devel@gnu.org>


On 31 jan. 2013, at 15:26, Andreas Schwab <schwab@linux-m68k.org>
 wrote:

> Bastien <bzg@gnu.org> writes:
> 
>> After Christopher submitted a patch for org-mode, Carsten and him
>> discussed the difference between these two patterns:
>> 
>>  ;; Concat in defconst
>>  (defconst my-pattern (concat "^" "xyz"))
>>  (re-search-forward my-pattern ...)
>> 
>>  ;; Concat in re-search-forward
>>  (defconst my-partial-pattern "xyz")
>>  (re-search-forward (concat "^" my-partial-pattern) ...)
>> 
>> Both Carsten and I thought there was some optimization done
>> by Emacs so that the first pattern is more efficient than the
>> second one.  (concat "^" "xyz") would be "cached", not eval'ed
>> each time you search for my-pattern.
> 
> The first pattern evaluates (concat "^" "xyz") once when my-pattern is
> defined.  The second case evalutes (concat "^" my-partial-pattern) each
> time the containing sexp is evaluated.  Other than that there is no
> difference wrt. re-search-forward.

Hi Andreas, thanks for the reply.  Just to double-check, I rephrase the question:  

Caching of compiled regular expression keys in on string contents, not on a particular string object (so the cache looks for an earlier compiled string with (something like0 `equal' not with `eq').  Is this correct?  Because the difference between the two procedures shown above is that in the first case, the same lisp object enters the re-search-forward, while in the second case it is a different string of equal content each time.

Thank you.

- Carsten


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Compiled regexp?
  2013-01-31 13:40 Compiled regexp? Bastien
  2013-01-31 14:08 ` Christopher Schmidt
  2013-01-31 14:26 ` Andreas Schwab
@ 2013-01-31 15:28 ` Stefan Monnier
  2013-01-31 15:34   ` Dominik, Carsten
  2013-01-31 15:55   ` Tom Tromey
  2 siblings, 2 replies; 8+ messages in thread
From: Stefan Monnier @ 2013-01-31 15:28 UTC (permalink / raw)
  To: Bastien; +Cc: Christopher Schmidt, Carsten Dominik, emacs-devel

>   ;; Concat in defconst
>   (defconst my-pattern (concat "^" "xyz"))
>   (re-search-forward my-pattern ...)
-vs-
>   ;; Concat in re-search-forward
>   (defconst my-partial-pattern "xyz")
>   (re-search-forward (concat "^" my-partial-pattern) ...)
[...]
> Can anyone confirm this is the case?

The second will incur the additional cost of the `concat' at each
iteration, obviously, but other than that the current Emacs code will
not take advantage of the fact that the same string is passed in the
first code, whereas a new string is use in the second.

I don't forsee the Emacs code changing such that it's significantly more
efficient to reuse the exact same string rather than a copy of it.

If we ever try to do something to avoid the cost of re-compiling
a regexp, I'd expect that we'd provide a way to have explicit access to
compiled regexps (with a `regexp-compile' function, the result of which
would be accepted by re-search-forward as an alternative to a string).


        Stefan "Who generally prefers the first form"



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Compiled regexp?
  2013-01-31 15:28 ` Stefan Monnier
@ 2013-01-31 15:34   ` Dominik, Carsten
  2013-01-31 15:55   ` Tom Tromey
  1 sibling, 0 replies; 8+ messages in thread
From: Dominik, Carsten @ 2013-01-31 15:34 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Bastien, Christopher Schmidt, Dominik, Carsten,
	<emacs-devel@gnu.org>


On 31 jan. 2013, at 16:28, Stefan Monnier <monnier@iro.umontreal.ca>
 wrote:

>>  ;; Concat in defconst
>>  (defconst my-pattern (concat "^" "xyz"))
>>  (re-search-forward my-pattern ...)
> -vs-
>>  ;; Concat in re-search-forward
>>  (defconst my-partial-pattern "xyz")
>>  (re-search-forward (concat "^" my-partial-pattern) ...)
> [...]
>> Can anyone confirm this is the case?
> 
> The second will incur the additional cost of the `concat' at each
> iteration, obviously, but other than that the current Emacs code will
> not take advantage of the fact that the same string is passed in the
> first code, whereas a new string is use in the second.


OK, this is clear, thank you very much.

- Carsten


> 
> I don't forsee the Emacs code changing such that it's significantly more
> efficient to reuse the exact same string rather than a copy of it.
> 
> If we ever try to do something to avoid the cost of re-compiling
> a regexp, I'd expect that we'd provide a way to have explicit access to
> compiled regexps (with a `regexp-compile' function, the result of which
> would be accepted by re-search-forward as an alternative to a string).
> 
> 
>        Stefan "Who generally prefers the first form"




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Compiled regexp?
  2013-01-31 15:28 ` Stefan Monnier
  2013-01-31 15:34   ` Dominik, Carsten
@ 2013-01-31 15:55   ` Tom Tromey
  1 sibling, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2013-01-31 15:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Bastien, Christopher Schmidt, emacs-devel, Carsten Dominik

>>>>> "Stefan" == Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

Stefan> If we ever try to do something to avoid the cost of re-compiling
Stefan> a regexp, I'd expect that we'd provide a way to have explicit access to
Stefan> compiled regexps (with a `regexp-compile' function, the result of which
Stefan> would be accepted by re-search-forward as an alternative to a string).

Back when I did some emacs profiling, regexp compilation didn't show up
in the top 5.  Regexp matching did, though.  I wasn't specifically
looking at regexp performance, though, so I didn't try to isolate the
effects of the compiled-regexp cache.

Tom



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-31 15:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31 13:40 Compiled regexp? Bastien
2013-01-31 14:08 ` Christopher Schmidt
2013-01-31 14:11   ` Bastien
2013-01-31 14:26 ` Andreas Schwab
2013-01-31 14:42   ` Dominik, Carsten
2013-01-31 15:28 ` Stefan Monnier
2013-01-31 15:34   ` Dominik, Carsten
2013-01-31 15:55   ` Tom Tromey

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).