* skip-chars-forward v. re-search-forward
@ 2003-04-02 17:43 Dave Love
2003-04-02 19:49 ` Andreas Schwab
2003-04-03 10:37 ` Richard Stallman
0 siblings, 2 replies; 11+ messages in thread
From: Dave Love @ 2003-04-02 17:43 UTC (permalink / raw)
I've tended to use skip-chars-forward instead of re-search-forward
since it's byte-coded and I assumed it's more efficient. However, I
made a measurement and found that skip-chars-forward was actually
slower than re-search-forward.
Is there a good reason for maintaining the two implementations?
Implementing skip-chars-forward using re-search-forward would have the
advantage that character classes would work with it, which they
currently don't.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-02 17:43 skip-chars-forward v. re-search-forward Dave Love
@ 2003-04-02 19:49 ` Andreas Schwab
2003-04-03 22:53 ` Richard Stallman
2003-04-04 10:48 ` Dave Love
2003-04-03 10:37 ` Richard Stallman
1 sibling, 2 replies; 11+ messages in thread
From: Andreas Schwab @ 2003-04-02 19:49 UTC (permalink / raw)
Cc: bug-gnu-emacs
Dave Love <d.love@dl.ac.uk> writes:
|> I've tended to use skip-chars-forward instead of re-search-forward
|> since it's byte-coded and I assumed it's more efficient. However, I
|> made a measurement and found that skip-chars-forward was actually
|> slower than re-search-forward.
Both skip-chars-forward and re-search-forward are builtin functions. But
re-search-forward uses a cache for recently used regexps to avoid the
overhead of compiling it, whereas skip-chars-forward always sets up the
search map anew.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-02 17:43 skip-chars-forward v. re-search-forward Dave Love
2003-04-02 19:49 ` Andreas Schwab
@ 2003-04-03 10:37 ` Richard Stallman
2003-04-04 10:52 ` Dave Love
2003-04-24 1:41 ` Kenichi Handa
1 sibling, 2 replies; 11+ messages in thread
From: Richard Stallman @ 2003-04-03 10:37 UTC (permalink / raw)
Cc: bug-gnu-emacs
I've tended to use skip-chars-forward instead of re-search-forward
since it's byte-coded and I assumed it's more efficient. However, I
made a measurement and found that skip-chars-forward was actually
slower than re-search-forward.
That is really surprising. Can you figure out what makes
skip-chars-forward so slow? We ought to be able to make it faster
than re-search-forward without much effort, so we may as well try.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-02 19:49 ` Andreas Schwab
@ 2003-04-03 22:53 ` Richard Stallman
2003-04-04 10:48 ` Dave Love
1 sibling, 0 replies; 11+ messages in thread
From: Richard Stallman @ 2003-04-03 22:53 UTC (permalink / raw)
Cc: bug-gnu-emacs
Both skip-chars-forward and re-search-forward are builtin functions. But
re-search-forward uses a cache for recently used regexps to avoid the
overhead of compiling it, whereas skip-chars-forward always sets up the
search map anew.
If skip-chars-forward had a cache, it would probably run faster
than regexp searching.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-02 19:49 ` Andreas Schwab
2003-04-03 22:53 ` Richard Stallman
@ 2003-04-04 10:48 ` Dave Love
1 sibling, 0 replies; 11+ messages in thread
From: Dave Love @ 2003-04-04 10:48 UTC (permalink / raw)
Cc: bug-gnu-emacs
Andreas Schwab <schwab@suse.de> writes:
> But re-search-forward uses a cache for recently used regexps to
> avoid the overhead of compiling it, whereas skip-chars-forward
> always sets up the search map anew.
Yes, but that doesn't address the question.
Anyhow, caching doesn't account for the experimental data, at least
when searching a big buffer for a missing pattern. re-search-forward
is faster in one-off use.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-03 10:37 ` Richard Stallman
@ 2003-04-04 10:52 ` Dave Love
2003-04-24 1:41 ` Kenichi Handa
1 sibling, 0 replies; 11+ messages in thread
From: Dave Love @ 2003-04-04 10:52 UTC (permalink / raw)
Cc: bug-gnu-emacs
Richard Stallman <rms@gnu.org> writes:
> That is really surprising. Can you figure out what makes
> skip-chars-forward so slow?
It would be best for a regexp expert to look at it.
> We ought to be able to make it faster than re-search-forward without
> much effort, so we may as well try.
I don't know enough for that to be clear to me, though it's suggested
by the separate implementation for the special case and
skip-chars-forward being byte-coded. [Why not re-search-forward,
which must be more common?]
I could believe that the introduction of multibyte changed factors
which are relevant, but that's just a guess.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-03 10:37 ` Richard Stallman
2003-04-04 10:52 ` Dave Love
@ 2003-04-24 1:41 ` Kenichi Handa
2003-04-26 2:33 ` Richard Stallman
1 sibling, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2003-04-24 1:41 UTC (permalink / raw)
Cc: bug-gnu-emacs
In article <E19126X-0005aH-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> I've tended to use skip-chars-forward instead of re-search-forward
> since it's byte-coded and I assumed it's more efficient. However, I
> made a measurement and found that skip-chars-forward was actually
> slower than re-search-forward.
> That is really surprising. Can you figure out what makes
> skip-chars-forward so slow? We ought to be able to make it faster
> than re-search-forward without much effort, so we may as well try.
I've just installed a fix for skip-chars-forward. I think
the reason of the slowness was that it uses FETCH_CHAR
naively. I changed the code to use the common technique of
*p, *stop, *endp.
The attached benchmark code (using a 25M-byte file that
doesn't contain "%" nor syntax `"') shows that now
skip-chars-forward/backward is slightly faster than
re-search-forward/backward.
(search-forward "%") 0.561530
(skip-chars-forward "^%") 1.174494
(re-search-forward "[%]") 1.239955
(skip-syntax-forward "^\"") 2.712070
(re-search-forward "\\s\"") 26.080168
(search-backward "%") 0.992879
(skip-chars-backward "^%") 3.079179
(re-search-backward "[%]") 10.005354
It's still surprising that search-forward/backward is more
than twice faster. I have no idea what kind of magic the
boyer-moore search is using.
I tried it on GNU/Linux (devian). Could someone please try
it on the different system?
---
Ken'ichi HANDA
handa@m17n.org
(defun benchtest ()
(require 'benchmark)
(save-excursion
(let ((coding-system-for-read 'utf-8))
(set-buffer
(find-file-noselect "/project/mule/UNIDATA/Unihan-3.2.0.txt")))
;; It is sure that the buffer doesn't contain "%" nor syntax `"'.
(concat
(format "%-30S %f\n" '(search-forward "%")
(car (benchmark-run
10 (progn (goto-char 1)
(search-forward "%" nil 'move)))))
(format "%-30S %f\n" '(skip-chars-forward "^%")
(car (benchmark-run
10 (progn (goto-char 1)
(skip-chars-forward "^%")))))
(format "%-30S %f\n" '(re-search-forward "[%]")
(car (benchmark-run
10 (progn (goto-char 1)
(re-search-forward "[%]" nil 'move)))))
(format "%-30S %f\n" '(skip-syntax-forward "^\"")
(car (benchmark-run
10 (progn (goto-char 1)
(skip-syntax-forward "^\"")))))
(format "%-30S %f\n" '(re-search-forward "\\s\"")
(car (benchmark-run
10 (progn (goto-char 1)
(re-search-forward "\\s\"" nil 'move)))))
(format "%-30S %f\n" '(search-backward "%")
(car (benchmark-run
10 (progn (goto-char (point-max))
(search-backward "%" nil 'move)))))
(format "%-30S %f\n" '(skip-chars-backward "^%")
(car (benchmark-run
10 (progn (goto-char (point-max))
(skip-chars-backward "^%")))))
(format "%-30S %f\n" '(re-search-backward "[%]")
(car (benchmark-run
10 (progn (goto-char (point-max))
(re-search-backward "[%]" nil 'move))))))))
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-24 1:41 ` Kenichi Handa
@ 2003-04-26 2:33 ` Richard Stallman
2003-04-29 17:20 ` Dave Love
2003-05-01 7:11 ` Kenichi Handa
0 siblings, 2 replies; 11+ messages in thread
From: Richard Stallman @ 2003-04-26 2:33 UTC (permalink / raw)
Cc: bug-gnu-emacs
It's still surprising that search-forward/backward is more
than twice faster. I have no idea what kind of magic the
boyer-moore search is using.
The comparison between search-forward and skip-chars-forward
does not seem meaningful to me, because they do different jobs.
The only case which both of these functions can do is
(search-forward "a") and (skip-chars-forward "^a").
The time taken by search-forward is inversely proportional to the
search string size in favorable cases, so if you tried a longer
string, it was probably faster.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-26 2:33 ` Richard Stallman
@ 2003-04-29 17:20 ` Dave Love
2003-05-01 7:11 ` Kenichi Handa
1 sibling, 0 replies; 11+ messages in thread
From: Dave Love @ 2003-04-29 17:20 UTC (permalink / raw)
Cc: Kenichi Handa
Richard Stallman <rms@gnu.org> writes:
> The comparison between search-forward and skip-chars-forward
> does not seem meaningful to me, because they do different jobs.
> The only case which both of these functions can do is
> (search-forward "a") and (skip-chars-forward "^a").
That's probably not an uncommon use, and it looks as though the
optimizer might usefully translate one to the other. I was originally
comparing with re-search-forward, though. The skip-chars versions
still don't seem to have a big advantage, and they have the
disadvantage of not supporting char classes.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-04-26 2:33 ` Richard Stallman
2003-04-29 17:20 ` Dave Love
@ 2003-05-01 7:11 ` Kenichi Handa
2003-05-02 7:05 ` Richard Stallman
1 sibling, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2003-05-01 7:11 UTC (permalink / raw)
Cc: bug-gnu-emacs
In article <E199FUs-0003kd-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> It's still surprising that search-forward/backward is more
> than twice faster. I have no idea what kind of magic the
> boyer-moore search is using.
> The comparison between search-forward and skip-chars-forward
> does not seem meaningful to me, because they do different jobs.
> The only case which both of these functions can do is
> (search-forward "a") and (skip-chars-forward "^a").
That is what I did in my benchmark test.
(search-forward "%")
vs (skip-chars-forward "^%")
vs (re-search-forward "[%]")
> The time taken by search-forward is inversely proportional to the
> search string size in favorable cases, so if you tried a longer
> string, it was probably faster.
Yes, I know that. I was surprised because search-forward
was twice faster even if the search string size is one.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: skip-chars-forward v. re-search-forward
2003-05-01 7:11 ` Kenichi Handa
@ 2003-05-02 7:05 ` Richard Stallman
0 siblings, 0 replies; 11+ messages in thread
From: Richard Stallman @ 2003-05-02 7:05 UTC (permalink / raw)
Cc: bug-gnu-emacs
Yes, I know that. I was surprised because search-forward
was twice faster even if the search string size is one.
This could be a matter of how tightly coded the inner loop is.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-05-02 7:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-02 17:43 skip-chars-forward v. re-search-forward Dave Love
2003-04-02 19:49 ` Andreas Schwab
2003-04-03 22:53 ` Richard Stallman
2003-04-04 10:48 ` Dave Love
2003-04-03 10:37 ` Richard Stallman
2003-04-04 10:52 ` Dave Love
2003-04-24 1:41 ` Kenichi Handa
2003-04-26 2:33 ` Richard Stallman
2003-04-29 17:20 ` Dave Love
2003-05-01 7:11 ` Kenichi Handa
2003-05-02 7:05 ` Richard Stallman
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).