From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Aleksey Cherepanov Newsgroups: gmane.emacs.bugs Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file Date: Mon, 24 Feb 2014 03:02:51 +0400 Message-ID: <20140223230251.GA30257@openwall.com> References: <20140221143855.GA6018@agmartin.aq.upm.es> <83k3co4hzd.fsf@gnu.org> <20140222124413.GA4971@openwall.com> <83vbw72t05.fsf@gnu.org> <20140222160217.GA15616@openwall.com> <83ios72j8b.fsf@gnu.org> <20140222185511.GA23643@openwall.com> <838ut23lo9.fsf@gnu.org> <20140223195659.GA23581@openwall.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1393196653 9299 80.91.229.3 (23 Feb 2014 23:04:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 23 Feb 2014 23:04:13 +0000 (UTC) Cc: 16800@debbugs.gnu.org To: Agustin Martin Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Feb 24 00:04:20 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WHi5s-0001OK-Dt for geb-bug-gnu-emacs@m.gmane.org; Mon, 24 Feb 2014 00:04:20 +0100 Original-Received: from localhost ([::1]:55027 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHi5s-00005J-0I for geb-bug-gnu-emacs@m.gmane.org; Sun, 23 Feb 2014 18:04:20 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46640) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHi5i-00005D-8z for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 18:04:17 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WHi5a-0003ZY-U4 for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 18:04:10 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:35842) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHi5a-0003ZT-QC for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 18:04:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WHi5a-00042y-5T for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 18:04:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Aleksey Cherepanov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 23 Feb 2014 23:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16800 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16800-submit@debbugs.gnu.org id=B16800.139319658415482 (code B ref 16800); Sun, 23 Feb 2014 23:04:02 +0000 Original-Received: (at 16800) by debbugs.gnu.org; 23 Feb 2014 23:03:04 +0000 Original-Received: from localhost ([127.0.0.1]:37023 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WHi4e-00041e-B8 for submit@debbugs.gnu.org; Sun, 23 Feb 2014 18:03:04 -0500 Original-Received: from mail-la0-f43.google.com ([209.85.215.43]:40122) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WHi4a-000415-Lh for 16800@debbugs.gnu.org; Sun, 23 Feb 2014 18:03:01 -0500 Original-Received: by mail-la0-f43.google.com with SMTP id pv20so4688342lab.2 for <16800@debbugs.gnu.org>; Sun, 23 Feb 2014 15:02:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=gX8OXt6LM/Owk+QtnJ/rECpZznDQLe/WmlSPsUmF++8=; b=mb+XX92gLuMvUhACBscv+lkWqoUx0q+tIWj7lt7m+GO3S7SfYHUWvInwp8r4joM1X2 DoCS42vJYOusH61Y9FukAxdd6aEs1KGnOI6d2la4t0F7GSd+PBOY33BzN1XQ66eX6lv1 UCPcm+VYQ4/qOdDRjfGnKwCbtXl8ZURVxNQPZuPRensVdhM7W/luS+W1WAXBOyZCAeJg 89ICThkHRvmNfd0O+Jma/4+priRD0alvFq1eIZbxOFVs5zAr1xGwf6A+MyLB+Ei2G/iD UKxVDegErTo4CP5Es6lRdgg75uOVsrPBrWZDkMeUaTWOAGAC+MleKfJtDHxQL+HeGxnW bHGg== X-Received: by 10.113.5.167 with SMTP id cn7mr9567316lbd.1.1393196574255; Sun, 23 Feb 2014 15:02:54 -0800 (PST) Original-Received: from openwall.com ([188.123.230.115]) by mx.google.com with ESMTPSA id jt7sm11595790lbc.15.2014.02.23.15.02.53 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 23 Feb 2014 15:02:53 -0800 (PST) Content-Disposition: inline In-Reply-To: <20140223195659.GA23581@openwall.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:86085 Archived-At: I've performed some tests against my .org file (not in emacs -Q): (insert (mapconcat (lambda (re) (save-excursion (let ((time (current-time)) (count 0)) (while (re-search-backward re nil t) (setq count (1+ count))) (format "%d: %S :: %s" count (subtract-time (current-time) time) re)))) '("\\<[[:alpha:]]" "\\b[[:alpha:]]" "\\([^[:alpha:]]\\|\\b\\)[[:alpha:]]" "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]" "\\(?:[^[:alpha:]]\\|\\`\\)[[:alpha:]]" "\\(?:[^[:alpha:]]\\)[[:alpha:]]" "[^[:alpha:]][[:alpha:]]" "\\(?:\\b\\|'\\)[[:alpha:]]" "\\(?:[^[:alpha:]]\\|\\`\\)\\([[:alpha:]]+\\)" "\\([^[:alpha:]]\\|\\`\\)\\(?:[[:alpha:]]+\\)" "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]+") "\n")) Matches| Time | Regexp tried 299158: (0 2 841190 614000) :: \<[[:alpha:]] 299158: (0 2 876846 547000) :: \b[[:alpha:]] 307919: (0 3 321676 163000) :: \([^[:alpha:]]\|\b\)[[:alpha:]] 307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]] 307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]] 307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]] 307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]] 299518: (0 2 998895 976000) :: \(?:\b\|'\)[[:alpha:]] 307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\) 307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\) 307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+ I should admit that word search breaks things even for setup with [[:alpha:]]: a0a is 1 word for emacs and 2 for flyspell. I missed it because Russian behaves differently (there is word boundary on border between digits and Russian letters). My bad. 307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]] 307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]] These two suggest that it may provide a speed up if we do not check beginning of buffer in regexp but check it separately. But I doubt it is worth it. On Sun, Feb 23, 2014 at 11:56:59PM +0400, Aleksey Cherepanov wrote: > Also not capturing group ("\\(?:") could be used because we do not > need a match data of the first group. It should work faster but I > don't really know. 307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]] 307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]] The test shows that not capturing group is faster. > Maybe it would be faster to not capture word but capture one char or > void but I doubt the difference would be noticable. 307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\) 307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\) 307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+ Unexpectedly capturing of word works a bit faster. Maybe it is not a word but the second group and it would work differently for search forward. Or alpha+ instead of fixed word caused it. Anyway the difference is very small. Capturing word allows us to make a function to wrap a word into regexp like word-search-regexp function wraps a word for word-search-forward/-backward functions. > I guess that \b would work faster than the group so we could have 'if' > statement around the whole loop that has one implementation with \b > for case when casechars are "[[:alpha:]]" and not-casechars are > "[^[:alpha:]]" and another implementation as above for other cases. > But it seems cumbersome. My guess is wrong: \b works slower than the group. Also it is inappropriate at all. Thanks! -- Regards, Aleksey Cherepanov