all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
To: Agustin Martin <agustin.martin@hispalinux.es>
Cc: 16800@debbugs.gnu.org
Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file
Date: Mon, 24 Feb 2014 03:02:51 +0400	[thread overview]
Message-ID: <20140223230251.GA30257@openwall.com> (raw)
In-Reply-To: <20140223195659.GA23581@openwall.com>

I've performed some tests against my .org file (not in emacs -Q):

(insert
 (mapconcat (lambda (re)
              (save-excursion
                (let ((time (current-time))
                      (count 0))
                  (while (re-search-backward re nil t)
                    (setq count (1+ count)))
                  (format "%d: %S :: %s" count (subtract-time (current-time) time) re))))
            '("\\<[[:alpha:]]"
              "\\b[[:alpha:]]"
              "\\([^[:alpha:]]\\|\\b\\)[[:alpha:]]"
              "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\|\\`\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\)[[:alpha:]]"
              "[^[:alpha:]][[:alpha:]]"
              "\\(?:\\b\\|'\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\|\\`\\)\\([[:alpha:]]+\\)"
              "\\([^[:alpha:]]\\|\\`\\)\\(?:[[:alpha:]]+\\)"
              "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]+")
            "\n"))

Matches| Time              | Regexp tried
299158: (0 2 841190 614000) :: \<[[:alpha:]]
299158: (0 2 876846 547000) :: \b[[:alpha:]]
307919: (0 3 321676 163000) :: \([^[:alpha:]]\|\b\)[[:alpha:]]
307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]]
307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]]
299518: (0 2 998895 976000) :: \(?:\b\|'\)[[:alpha:]]
307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\)
307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\)
307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+

I should admit that word search breaks things even for setup with
[[:alpha:]]: a0a is 1 word for emacs and 2 for flyspell. I missed it
because Russian behaves differently (there is word boundary on border
between digits and Russian letters). My bad.

307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]]
307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]]
These two suggest that it may provide a speed up if we do not check
beginning of buffer in regexp but check it separately. But I doubt it
is worth it.

On Sun, Feb 23, 2014 at 11:56:59PM +0400, Aleksey Cherepanov wrote:
> Also not capturing group ("\\(?:") could be used because we do not
> need a match data of the first group. It should work faster but I
> don't really know.

307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]]
The test shows that not capturing group is faster.

> Maybe it would be faster to not capture word but capture one char or
> void but I doubt the difference would be noticable.

307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\)
307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\)
307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+
Unexpectedly capturing of word works a bit faster. Maybe it is not a
word but the second group and it would work differently for search
forward. Or alpha+ instead of fixed word caused it. Anyway the
difference is very small.

Capturing word allows us to make a function to wrap a word into regexp
like word-search-regexp function wraps a word for
word-search-forward/-backward functions.

> I guess that \b would work faster than the group so we could have 'if'
> statement around the whole loop that has one implementation with \b
> for case when casechars are "[[:alpha:]]" and not-casechars are
> "[^[:alpha:]]" and another implementation as above for other cases.
> But it seems cumbersome.

My guess is wrong: \b works slower than the group. Also it is
inappropriate at all.

Thanks!

-- 
Regards,
Aleksey Cherepanov





  reply	other threads:[~2014-02-23 23:02 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 20:56 bug#16800: 24.3; flyspell works slow on very short words at the end of big file Aleksey Cherepanov
2014-02-21 10:15 ` Eli Zaretskii
2014-02-21 14:38   ` Agustin Martin
2014-02-21 15:12     ` Eli Zaretskii
2014-02-21 15:21       ` Eli Zaretskii
2014-02-22 12:44       ` Aleksey Cherepanov
2014-02-22 13:10         ` Eli Zaretskii
2014-02-22 16:02           ` Aleksey Cherepanov
2014-02-22 16:41             ` Eli Zaretskii
2014-02-22 18:55               ` Aleksey Cherepanov
2014-02-22 20:16                 ` Aleksey Cherepanov
2014-02-22 21:03                 ` Eli Zaretskii
2014-02-23  1:26                   ` Agustin Martin
2014-02-23 18:36                     ` Eli Zaretskii
2014-02-23 19:56                     ` Aleksey Cherepanov
2014-02-23 23:02                       ` Aleksey Cherepanov [this message]
2014-02-24 16:03                         ` Aleksey Cherepanov
2014-02-26 20:32                           ` Agustin Martin
2014-02-28 11:45                             ` Agustin Martin
2014-02-28 11:51                               ` Eli Zaretskii
2014-03-01 21:44                                 ` Aleksey Cherepanov
2014-03-02  3:56                                   ` Eli Zaretskii
2014-03-09 17:36                                     ` Agustin Martin
2014-03-09 18:02                                       ` Aleksey Cherepanov
2014-03-09 18:24                                         ` Eli Zaretskii
2014-02-28 23:11                               ` Aleksey Cherepanov
2014-03-01 10:33                                 ` Aleksey Cherepanov
2014-03-01 15:50                                   ` Aleksey Cherepanov
2014-03-01 21:39                                 ` Aleksey Cherepanov
2014-03-09 17:25                                 ` Agustin Martin
2015-03-06 21:46                                   ` Agustin Martin
2015-03-07  8:09                                     ` Eli Zaretskii
2014-02-23 20:39                     ` Aleksey Cherepanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140223230251.GA30257@openwall.com \
    --to=aleksey.4erepanov@gmail.com \
    --cc=16800@debbugs.gnu.org \
    --cc=agustin.martin@hispalinux.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.