skip-chars-forward v. re-search-forward

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* skip-chars-forward v. re-search-forward
@ 2003-04-02 17:43 Dave Love
  2003-04-02 19:49 ` Andreas Schwab
  2003-04-03 10:37 ` Richard Stallman
  0 siblings, 2 replies; 11+ messages in thread
From: Dave Love @ 2003-04-02 17:43 UTC (permalink / raw)


I've tended to use skip-chars-forward instead of re-search-forward
since it's byte-coded and I assumed it's more efficient.  However, I
made a measurement and found that skip-chars-forward was actually
slower than re-search-forward.

Is there a good reason for maintaining the two implementations?
Implementing skip-chars-forward using re-search-forward would have the
advantage that character classes would work with it, which they
currently don't.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-02 17:43 skip-chars-forward v. re-search-forward Dave Love
@ 2003-04-02 19:49 ` Andreas Schwab
  2003-04-03 22:53   ` Richard Stallman
  2003-04-04 10:48   ` Dave Love
  2003-04-03 10:37 ` Richard Stallman
  1 sibling, 2 replies; 11+ messages in thread
From: Andreas Schwab @ 2003-04-02 19:49 UTC (permalink / raw)
  Cc: bug-gnu-emacs

Dave Love <d.love@dl.ac.uk> writes:

|> I've tended to use skip-chars-forward instead of re-search-forward
|> since it's byte-coded and I assumed it's more efficient.  However, I
|> made a measurement and found that skip-chars-forward was actually
|> slower than re-search-forward.

Both skip-chars-forward and re-search-forward are builtin functions.  But
re-search-forward uses a cache for recently used regexps to avoid the
overhead of compiling it, whereas skip-chars-forward always sets up the
search map anew.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-02 19:49 ` Andreas Schwab
@ 2003-04-03 22:53   ` Richard Stallman
  2003-04-04 10:48   ` Dave Love
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Stallman @ 2003-04-03 22:53 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    Both skip-chars-forward and re-search-forward are builtin functions.  But
    re-search-forward uses a cache for recently used regexps to avoid the
    overhead of compiling it, whereas skip-chars-forward always sets up the
    search map anew.

If skip-chars-forward had a cache, it would probably run faster
than regexp searching.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-02 19:49 ` Andreas Schwab
  2003-04-03 22:53   ` Richard Stallman
@ 2003-04-04 10:48   ` Dave Love
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Love @ 2003-04-04 10:48 UTC (permalink / raw)
  Cc: bug-gnu-emacs

Andreas Schwab <schwab@suse.de> writes:

> But re-search-forward uses a cache for recently used regexps to
> avoid the overhead of compiling it, whereas skip-chars-forward
> always sets up the search map anew.

Yes, but that doesn't address the question.

Anyhow, caching doesn't account for the experimental data, at least
when searching a big buffer for a missing pattern.  re-search-forward
is faster in one-off use.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-02 17:43 skip-chars-forward v. re-search-forward Dave Love
  2003-04-02 19:49 ` Andreas Schwab
@ 2003-04-03 10:37 ` Richard Stallman
  2003-04-04 10:52   ` Dave Love
  2003-04-24  1:41   ` Kenichi Handa
  1 sibling, 2 replies; 11+ messages in thread
From: Richard Stallman @ 2003-04-03 10:37 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    I've tended to use skip-chars-forward instead of re-search-forward
    since it's byte-coded and I assumed it's more efficient.  However, I
    made a measurement and found that skip-chars-forward was actually
    slower than re-search-forward.

That is really surprising.  Can you figure out what makes
skip-chars-forward so slow?  We ought to be able to make it faster
than re-search-forward without much effort, so we may as well try.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-03 10:37 ` Richard Stallman
@ 2003-04-04 10:52   ` Dave Love
  2003-04-24  1:41   ` Kenichi Handa
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Love @ 2003-04-04 10:52 UTC (permalink / raw)
  Cc: bug-gnu-emacs

Richard Stallman <rms@gnu.org> writes:

> That is really surprising.  Can you figure out what makes
> skip-chars-forward so slow?

It would be best for a regexp expert to look at it.

> We ought to be able to make it faster than re-search-forward without
> much effort, so we may as well try.

I don't know enough for that to be clear to me, though it's suggested
by the separate implementation for the special case and
skip-chars-forward being byte-coded.  [Why not re-search-forward,
which must be more common?]

I could believe that the introduction of multibyte changed factors
which are relevant, but that's just a guess.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-03 10:37 ` Richard Stallman
  2003-04-04 10:52   ` Dave Love
@ 2003-04-24  1:41   ` Kenichi Handa
  2003-04-26  2:33     ` Richard Stallman
  1 sibling, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2003-04-24  1:41 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <E19126X-0005aH-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     I've tended to use skip-chars-forward instead of re-search-forward
>     since it's byte-coded and I assumed it's more efficient.  However, I
>     made a measurement and found that skip-chars-forward was actually
>     slower than re-search-forward.

> That is really surprising.  Can you figure out what makes
> skip-chars-forward so slow?  We ought to be able to make it faster
> than re-search-forward without much effort, so we may as well try.

I've just installed a fix for skip-chars-forward.  I think
the reason of the slowness was that it uses FETCH_CHAR
naively.  I changed the code to use the common technique of
*p, *stop, *endp.

The attached benchmark code (using a 25M-byte file that
doesn't contain "%" nor syntax `"') shows that now
skip-chars-forward/backward is slightly faster than
re-search-forward/backward.

(search-forward "%")           0.561530
(skip-chars-forward "^%")      1.174494
(re-search-forward "[%]")      1.239955
(skip-syntax-forward "^\"")    2.712070
(re-search-forward "\\s\"")    26.080168
(search-backward "%")          0.992879
(skip-chars-backward "^%")     3.079179
(re-search-backward "[%]")     10.005354

It's still surprising that search-forward/backward is more
than twice faster.  I have no idea what kind of magic the
boyer-moore search is using.

I tried it on GNU/Linux (devian).  Could someone please try
it on the different system?

---
Ken'ichi HANDA
handa@m17n.org

(defun benchtest ()
  (require 'benchmark)
  (save-excursion
    (let ((coding-system-for-read 'utf-8))
      (set-buffer
       (find-file-noselect "/project/mule/UNIDATA/Unihan-3.2.0.txt")))
    ;; It is sure that the buffer doesn't contain "%" nor syntax `"'.
    (concat
     (format "%-30S %f\n" '(search-forward "%")
	     (car (benchmark-run
		   10 (progn (goto-char 1)
			     (search-forward "%" nil 'move)))))
     (format "%-30S %f\n" '(skip-chars-forward "^%")
	     (car (benchmark-run
		   10 (progn (goto-char 1)
			     (skip-chars-forward "^%")))))
     (format "%-30S %f\n" '(re-search-forward "[%]")
	     (car (benchmark-run
		   10 (progn (goto-char 1)
			     (re-search-forward "[%]" nil 'move)))))
     (format "%-30S %f\n" '(skip-syntax-forward "^\"")
	     (car (benchmark-run
		   10 (progn (goto-char 1)
			     (skip-syntax-forward "^\"")))))
     (format "%-30S %f\n" '(re-search-forward "\\s\"")
	     (car (benchmark-run
		   10 (progn (goto-char 1)
			     (re-search-forward "\\s\"" nil 'move)))))
     (format "%-30S %f\n" '(search-backward "%")
	     (car (benchmark-run
		   10 (progn (goto-char (point-max))
			     (search-backward "%" nil 'move)))))
     (format "%-30S %f\n" '(skip-chars-backward "^%")
	     (car (benchmark-run
		   10 (progn (goto-char (point-max))
			     (skip-chars-backward "^%")))))
     (format "%-30S %f\n" '(re-search-backward "[%]")
	     (car (benchmark-run
		   10 (progn (goto-char (point-max))
			     (re-search-backward "[%]" nil 'move))))))))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-24  1:41   ` Kenichi Handa
@ 2003-04-26  2:33     ` Richard Stallman
  2003-04-29 17:20       ` Dave Love
  2003-05-01  7:11       ` Kenichi Handa
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Stallman @ 2003-04-26  2:33 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    It's still surprising that search-forward/backward is more
    than twice faster.  I have no idea what kind of magic the
    boyer-moore search is using.

The comparison between search-forward and skip-chars-forward
does not seem meaningful to me, because they do different jobs.
The only case which both of these functions can do is
(search-forward "a") and (skip-chars-forward "^a").

The time taken by search-forward is inversely proportional to the
search string size in favorable cases, so if you tried a longer
string, it was probably faster.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-26  2:33     ` Richard Stallman
@ 2003-04-29 17:20       ` Dave Love
  2003-05-01  7:11       ` Kenichi Handa
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Love @ 2003-04-29 17:20 UTC (permalink / raw)
  Cc: Kenichi Handa

Richard Stallman <rms@gnu.org> writes:

> The comparison between search-forward and skip-chars-forward
> does not seem meaningful to me, because they do different jobs.
> The only case which both of these functions can do is
> (search-forward "a") and (skip-chars-forward "^a").

That's probably not an uncommon use, and it looks as though the
optimizer might usefully translate one to the other.  I was originally
comparing with re-search-forward, though.  The skip-chars versions
still don't seem to have a big advantage, and they have the
disadvantage of not supporting char classes.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-04-26  2:33     ` Richard Stallman
  2003-04-29 17:20       ` Dave Love
@ 2003-05-01  7:11       ` Kenichi Handa
  2003-05-02  7:05         ` Richard Stallman
  1 sibling, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2003-05-01  7:11 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <E199FUs-0003kd-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     It's still surprising that search-forward/backward is more
>     than twice faster.  I have no idea what kind of magic the
>     boyer-moore search is using.

> The comparison between search-forward and skip-chars-forward
> does not seem meaningful to me, because they do different jobs.
> The only case which both of these functions can do is
> (search-forward "a") and (skip-chars-forward "^a").

That is what I did in my benchmark test.
  (search-forward "%")
    vs (skip-chars-forward "^%")
    vs (re-search-forward "[%]")

> The time taken by search-forward is inversely proportional to the
> search string size in favorable cases, so if you tried a longer
> string, it was probably faster.

Yes, I know that.  I was surprised because search-forward
was twice faster even if the search string size is one.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: skip-chars-forward v. re-search-forward
  2003-05-01  7:11       ` Kenichi Handa
@ 2003-05-02  7:05         ` Richard Stallman
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Stallman @ 2003-05-02  7:05 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    Yes, I know that.  I was surprised because search-forward
    was twice faster even if the search string size is one.

This could be a matter of how tightly coded the inner loop is.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-05-02  7:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-02 17:43 skip-chars-forward v. re-search-forward Dave Love
2003-04-02 19:49 ` Andreas Schwab
2003-04-03 22:53   ` Richard Stallman
2003-04-04 10:48   ` Dave Love
2003-04-03 10:37 ` Richard Stallman
2003-04-04 10:52   ` Dave Love
2003-04-24  1:41   ` Kenichi Handa
2003-04-26  2:33     ` Richard Stallman
2003-04-29 17:20       ` Dave Love
2003-05-01  7:11       ` Kenichi Handa
2003-05-02  7:05         ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).