From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Date: Fri, 4 Dec 2015 23:00:00 +0000 Message-ID: <20151204230000.GC6070@acm.fritz.box> References: <20151204192126.73199.qmail@mail.muc.de> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1449269905 15352 80.91.229.3 (4 Dec 2015 22:58:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 4 Dec 2015 22:58:25 +0000 (UTC) Cc: 22090@debbugs.gnu.org To: Artur Malabarba Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Dec 04 23:58:13 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a4zIo-0001XZ-TK for geb-bug-gnu-emacs@m.gmane.org; Fri, 04 Dec 2015 23:58:11 +0100 Original-Received: from localhost ([::1]:43842 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4zIo-00082k-Dq for geb-bug-gnu-emacs@m.gmane.org; Fri, 04 Dec 2015 17:58:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41107) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4zIl-00082U-Dm for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 17:58:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4zIg-0002Mt-E2 for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 17:58:07 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:49767) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4zIg-0002Mp-AR for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 17:58:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1a4zIg-00023u-4J for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 17:58:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 04 Dec 2015 22:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 22090 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 22090-submit@debbugs.gnu.org id=B22090.14492698767915 (code B ref 22090); Fri, 04 Dec 2015 22:58:02 +0000 Original-Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 22:57:56 +0000 Original-Received: from localhost ([127.0.0.1]:39475 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4zIZ-00023Z-Fu for submit@debbugs.gnu.org; Fri, 04 Dec 2015 17:57:55 -0500 Original-Received: from mail.muc.de ([193.149.48.3]:14026) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4zIX-00023Q-2Z for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 17:57:54 -0500 Original-Received: (qmail 59353 invoked by uid 3782); 4 Dec 2015 22:57:51 -0000 Original-Received: from acm.muc.de (p579E9292.dip0.t-ipconnect.de [87.158.146.146]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 04 Dec 2015 23:57:50 +0100 Original-Received: (qmail 26686 invoked by uid 1000); 4 Dec 2015 23:00:00 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:109629 Archived-At: Hello, Artur. On Fri, Dec 04, 2015 at 08:49:42PM +0000, Artur Malabarba wrote: > 2015-12-04 19:21 GMT+00:00 Alan Mackenzie : > > Would you like any help to sort out these regexps? I have some expertise > > in doing this, having half-written fix-re.el, a program which analyses > > and corrects just the sort of thing you're talking about. > Maybe you can help then. The situation is actually quite simple. > We have a regexp for matching anything that 'a' should match (for > instance, that might look like "\\(a[´`]?\\|[áà𝑎]\\)"), and we have > another for matching anything that A could match (e.g. > "\\(A[`´]?\\|[ÁÀ]\\)"). Each of these regexps looks intrinsically blameless.. > When case-fold-search is on the previous code would simply join these > regexps with "\\(\\(a[´`]?\\|[áà𝑎]\\)\\|\\(A[`´]?\\|[ÁÀ]\\)\\)". Quick question: _why_ do you need to join them? Given that case-fold-search is enabled, couldn't you just use, say, the lower case version? > The problem is that (when case-fold-search is on) this creates a lot > of redundancy. There are two paths in that regexp that match "a", > there are two paths that match "à" and so on (but it's not full > redundancy, for instance, only one path matches 𝑎). Yes. This is the killer danger in regexps (at least with the sort of regexp engine we've got). But it looks to me that this redundancy would be quite easy to eliminate - you just need three regexp fragments for the letter "a" - a lower case one, an upper case one and a case-fold-search one. The other thing is that for that single character "a" a 39 character regexp fragment is being generated. Might this have something to do with the "[Too many words]" error I got last night (which comes from the regexp engine returning a "too long regexp" error)? Even if you can reduce that to, say 19 characters, that's only winning a factor of 2 in the slide towards a too long regexp. It might well be that for a very long regexp, you might have to divide it into shorter sections (a typical long RE will by a sequence of sub expressions, rather than lots of alternatives inside \(...\|........\)). -- Alan Mackenzie (Nuremberg, Germany).