From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Artur Malabarba Newsgroups: gmane.emacs.bugs Subject: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Date: Sat, 5 Dec 2015 17:23:53 +0000 Message-ID: References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> Reply-To: bruce.connor.am@gmail.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1449336262 30624 80.91.229.3 (5 Dec 2015 17:24:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 5 Dec 2015 17:24:22 +0000 (UTC) Cc: 22090@debbugs.gnu.org To: Alan Mackenzie Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 05 18:24:13 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a5GZ7-0005xo-OE for geb-bug-gnu-emacs@m.gmane.org; Sat, 05 Dec 2015 18:24:09 +0100 Original-Received: from localhost ([::1]:47261 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5GZ7-0002a4-18 for geb-bug-gnu-emacs@m.gmane.org; Sat, 05 Dec 2015 12:24:09 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39772) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5GZ3-0002Zv-AD for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5GZ0-0006nw-5T for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50336) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5GZ0-0006nr-0i for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1a5GYz-0000Bo-N8 for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Artur Malabarba Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 05 Dec 2015 17:24:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 22090 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 22090-submit@debbugs.gnu.org id=B22090.1449336236717 (code B ref 22090); Sat, 05 Dec 2015 17:24:01 +0000 Original-Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 17:23:56 +0000 Original-Received: from localhost ([127.0.0.1]:40044 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5GYu-0000BV-BM for submit@debbugs.gnu.org; Sat, 05 Dec 2015 12:23:56 -0500 Original-Received: from mail-lb0-f175.google.com ([209.85.217.175]:36431) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5GYs-0000BN-B4 for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 12:23:55 -0500 Original-Received: by lbblt2 with SMTP id lt2so36517527lbb.3 for <22090@debbugs.gnu.org>; Sat, 05 Dec 2015 09:23:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=68Ase23Gf9dCza8G+nxR6lnUjlEurqSOhn4NpnO3H4U=; b=c/p/f25WJlAhOD6cpIr8OvRfXS80aiEEWWJKxOiN/VehnLASucaGaHy+yNrpN3C17b 0CN2N1VyazkmqOp9YHVOTJ/AZnqhkVyKVHw9Pw3qdBhYBDanPnhnOiUeA/i3vGbqDCKX BjxNeQ6oa46kLU1vKyNva9vnfChbbTMEKO6eGLOMcfSBegfVHRCXr0adkEOpJU9zPhZD P1foY0qFfZtRI+0OIfRNnaIiVFNXOhFnbr7cf57I8P/fFTH2f8Y+9bHadUHgtHWUjZU9 kV62VBtebeSqHW7/+X9mChzNeCvdtszYXtFJ4rDdLyl2eU5Ws1FUqR7zvCc7aUChQBbf X7CA== X-Received: by 10.112.242.167 with SMTP id wr7mr9142395lbc.69.1449336233464; Sat, 05 Dec 2015 09:23:53 -0800 (PST) Original-Received: by 10.112.202.99 with HTTP; Sat, 5 Dec 2015 09:23:53 -0800 (PST) In-Reply-To: <20151204230000.GC6070@acm.fritz.box> X-Google-Sender-Auth: UgEnmVy8hJExEwm-vHntK2KhD7I X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:109657 Archived-At: nn2015-12-04 23:00 GMT+00:00 Alan Mackenzie : >> When case-fold-search is on the previous code would simply join these >> regexps with "\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)\\|\\(A[`= =C2=B4]?\\|[=C3=81=C3=80]\\)\\)". > > Quick question: _why_ do you need to join them? Given that > case-fold-search is enabled, couldn't you just use, say, the lower case > version? Because there are some characters in each regexp that don't have lower/upper-case equivalents. For instance, if I use the "\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)" regexp, that's enough t= o match A or =C3=80, but it's not enough to match a variety of other chars (=F0=9D=94=B8=F0=9D=95=AC= =F0=9D=96=A0=F0=9D=97=94=F0=9D=98=88=F0=9D=98=BC=F0=9D=99=B0=F0=9F=84=B0). > it looks to me that this redundancy would > be quite easy to eliminate - you just need three regexp fragments for > the letter "a" - a lower case one, an upper case one and a > case-fold-search one. Yes, we could go that route. It's just going to add complexity to the code that generates the char-fold-table (which is already quite dense) and I wonder if it's worth such a corner-case. Like I said, 'a' already matches A and =C3=80, how much do we want to support this extra case-folding? > The other thing is that for that single character "a" a 39 character > regexp fragment is being generated. Might this have something to do > with the "[Too many words]" error I got last night (which comes from the > regexp engine returning a "too long regexp" error)? yes > Even if you can reduce that to, say 19 characters, that's only winning a > factor of 2 in the slide towards a too long regexp. It might well be > that for a very long regexp, you might have to divide it into shorter > sections (a typical long RE will by a sequence of sub expressions, > rather than lots of alternatives inside \(...\|........\)). I don't understand what you mean. Could you elaborate?