From: philippe schnoebelen <schnoebelen.ph@gmail.com>
To: emacs-devel@gnu.org
Subject: regular expressions that match nothing
Date: Tue, 14 May 2019 09:25:59 +0200 [thread overview]
Message-ID: <CADqJmR29MymVD96EznUsXRNaM9J6L7+s_Gj4pzuQMsCtVs_7RA@mail.gmail.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 982 bytes --]
I was very happy to see that in v27.0.50 (regexp-opt nil) now properly
returns a regular expression that matches nothing, namely a\`. Thanks to
whoever fixed that old bug.
I was wondering why (regexp-opt nil) uses a\` and not \'a or another option
like \=a\= so I did some profiling (see attached code).
The different options that I tried have more or less the same response time
when one checks, via looking-at, whether the regexp matches at point. But
when one searches for a match across a whole buffer, some options behave
notably faster than the others. And a\` is not the best, e.g., \=a\= is way
faster. Maybe some other solutions would be even faster.
Of course this may be dependent on the internals of the specific regexp
library at hand. I do not know enough to judge. In fact I believe that a
solid regular expression library should provide a specific regular
expression that matches nothing with special but easy treatment that
guarantees best response time.
--phs
[-- Attachment #1.2: Type: text/html, Size: 1153 bytes --]
[-- Attachment #2: profile-empty-regexp.el --]
[-- Type: application/octet-stream, Size: 1542 bytes --]
(defun profile-empty-regexps (&optional buffer)
"Report some matching times for several regular expressions."
(interactive)
(unless buffer
(setq buffer (find-file-noselect (locate-library "regexp-opt"))))
(with-output-to-temp-buffer "Profiling"
(princ (profile-one-regexp "a\\`" buffer)) (princ "\n")
(princ (profile-one-regexp "\\'a" buffer)) (princ "\n")
(princ (profile-one-regexp "\\=.\\=" buffer)) (princ "\n")
(princ (profile-one-regexp "\\=a\\=" buffer)) (princ "\n")
(princ (profile-one-regexp "\\." buffer)) (princ "\n")
))
(defun profile-one-regexp (regexp buffer &optional nbrepeats)
;; The workhorse
(setq nbrepeats (or nbrepeats 50000))
(let (start-time duration1 duration2 found)
(with-current-buffer buffer
(save-excursion
(setq start-time (current-time))
(goto-char (point-min))
(dotimes (_ nbrepeats)
(looking-at regexp))
(setq duration1 (time-subtract (current-time) start-time))
(setq start-time (current-time))
(goto-char (point-min))
(dotimes (_ nbrepeats)
(when (re-search-forward regexp nil t)
(setq found t)
(goto-char (point-min)))) ;; return to test position
(setq duration2 (time-subtract (current-time) start-time))))
(format "Testing regexp %s %d times\n\tmatch at point-min: %.4fs\n\tsearch in buffer %s (size %d): %.4fs\n%s" regexp nbrepeats (float-time duration1) (buffer-name buffer)
(buffer-size buffer) (float-time duration2)
(if found "\t*** WARNING *** a match was found\n" ""))))
next reply other threads:[~2019-05-14 7:25 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-14 7:25 philippe schnoebelen [this message]
2019-05-14 10:14 ` regular expressions that match nothing Mattias Engdegård
2019-05-14 19:41 ` Stefan Monnier
2019-05-15 16:21 ` Mattias Engdegård
2019-05-15 19:41 ` Alan Mackenzie
2019-05-16 10:54 ` Mattias Engdegård
2019-05-16 23:18 ` Phil Sainty
2019-05-17 9:43 ` Alan Mackenzie
2019-05-17 10:17 ` Mattias Engdegård
2019-05-17 12:53 ` Stefan Monnier
2019-05-15 20:17 ` Michael Heerdegen
2019-05-15 21:06 ` Stefan Monnier
2019-05-15 21:07 ` Mattias Engdegård
2019-05-15 21:38 ` Michael Heerdegen
2019-05-16 6:57 ` More re odditie [Was: regular expressions that match nothing] phs
2019-05-16 9:29 ` Mattias Engdegård
2019-05-16 10:59 ` phs
2019-05-16 12:31 ` Stefan Monnier
2019-05-16 18:35 ` Michael Heerdegen
2019-05-16 20:31 ` Mattias Engdegård
2019-05-16 21:01 ` Global and local definitions of non-functions/variable (was: More re odditie [Was: regular expressions that match nothing]) Stefan Monnier
2019-05-20 16:26 ` Bootstrap/autoload policy (was Re: regular expressions that match nothing) Mattias Engdegård
2019-05-22 14:02 ` Stefan Monnier
2019-05-22 14:07 ` Mattias Engdegård
2019-05-22 14:24 ` Stefan Monnier
2019-05-22 15:06 ` Mattias Engdegård
2019-05-22 15:53 ` Stefan Monnier
2019-05-22 16:40 ` Mattias Engdegård
2019-05-22 19:08 ` Stefan Monnier
2019-05-26 12:05 ` Basil L. Contovounesios
2019-05-16 18:12 ` regular expressions that match nothing Eric Abrahamsen
2019-05-19 4:30 ` 回复: " net june
2019-05-19 5:00 ` HaiJun Zhang
2019-05-19 7:32 ` Mattias Engdegård
2019-05-20 7:56 ` philippe schnoebelen
2019-05-20 23:19 ` Richard Stallman
2019-05-19 14:12 ` 回复: " Drew Adams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CADqJmR29MymVD96EznUsXRNaM9J6L7+s_Gj4pzuQMsCtVs_7RA@mail.gmail.com \
--to=schnoebelen.ph@gmail.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).