From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Artur Malabarba <bruce.connor.am@gmail.com>
Newsgroups: gmane.emacs.bugs
Subject: bug#22090: Isearch is sluggish and eventually refuses further service
	with "[Too many words]".
Date: Sat, 5 Dec 2015 17:23:53 +0000
Message-ID: <CAAdUY-Jj8_pK78x2hyXWLwiOADkmad97Q6LJxtqnirMKCrSfOg@mail.gmail.com>
References: <mailman.1363.1449242229.31583.bug-gnu-emacs@gnu.org>
	<20151204192126.73199.qmail@mail.muc.de>
	<CAAdUY-KvOC_r6bi50n5pia1uY7vUfwALzBEupFYfX0BYc+vCvw@mail.gmail.com>
	<20151204230000.GC6070@acm.fritz.box>
Reply-To: bruce.connor.am@gmail.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1449336262 30624 80.91.229.3 (5 Dec 2015 17:24:22 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 5 Dec 2015 17:24:22 +0000 (UTC)
Cc: 22090@debbugs.gnu.org
To: Alan Mackenzie <acm@muc.de>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 05 18:24:13 2015
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1a5GZ7-0005xo-OE
	for geb-bug-gnu-emacs@m.gmane.org; Sat, 05 Dec 2015 18:24:09 +0100
Original-Received: from localhost ([::1]:47261 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1a5GZ7-0002a4-18
	for geb-bug-gnu-emacs@m.gmane.org; Sat, 05 Dec 2015 12:24:09 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39772)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5GZ3-0002Zv-AD
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:06 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5GZ0-0006nw-5T
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:05 -0500
Original-Received: from debbugs.gnu.org ([208.118.235.43]:50336)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5GZ0-0006nr-0i
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:02 -0500
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5GYz-0000Bo-N8
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 12:24:01 -0500
X-Loop: help-debbugs@gnu.org
Resent-From: Artur Malabarba <bruce.connor.am@gmail.com>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Sat, 05 Dec 2015 17:24:01 +0000
Resent-Message-ID: <handler.22090.B22090.1449336236717@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 22090
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 22090-submit@debbugs.gnu.org id=B22090.1449336236717
	(code B ref 22090); Sat, 05 Dec 2015 17:24:01 +0000
Original-Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 17:23:56 +0000
Original-Received: from localhost ([127.0.0.1]:40044 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1a5GYu-0000BV-BM
	for submit@debbugs.gnu.org; Sat, 05 Dec 2015 12:23:56 -0500
Original-Received: from mail-lb0-f175.google.com ([209.85.217.175]:36431)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <arturmalabarba@gmail.com>) id 1a5GYs-0000BN-B4
	for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 12:23:55 -0500
Original-Received: by lbblt2 with SMTP id lt2so36517527lbb.3
	for <22090@debbugs.gnu.org>; Sat, 05 Dec 2015 09:23:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; 
	h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type:content-transfer-encoding;
	bh=68Ase23Gf9dCza8G+nxR6lnUjlEurqSOhn4NpnO3H4U=;
	b=c/p/f25WJlAhOD6cpIr8OvRfXS80aiEEWWJKxOiN/VehnLASucaGaHy+yNrpN3C17b
	0CN2N1VyazkmqOp9YHVOTJ/AZnqhkVyKVHw9Pw3qdBhYBDanPnhnOiUeA/i3vGbqDCKX
	BjxNeQ6oa46kLU1vKyNva9vnfChbbTMEKO6eGLOMcfSBegfVHRCXr0adkEOpJU9zPhZD
	P1foY0qFfZtRI+0OIfRNnaIiVFNXOhFnbr7cf57I8P/fFTH2f8Y+9bHadUHgtHWUjZU9
	kV62VBtebeSqHW7/+X9mChzNeCvdtszYXtFJ4rDdLyl2eU5Ws1FUqR7zvCc7aUChQBbf
	X7CA==
X-Received: by 10.112.242.167 with SMTP id wr7mr9142395lbc.69.1449336233464;
	Sat, 05 Dec 2015 09:23:53 -0800 (PST)
Original-Received: by 10.112.202.99 with HTTP; Sat, 5 Dec 2015 09:23:53 -0800 (PST)
In-Reply-To: <20151204230000.GC6070@acm.fritz.box>
X-Google-Sender-Auth: UgEnmVy8hJExEwm-vHntK2KhD7I
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 208.118.235.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.bugs:109657
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/109657>

nn2015-12-04 23:00 GMT+00:00 Alan Mackenzie <acm@muc.de>:
>> When case-fold-search is on the previous code would simply join these
>> regexps with "\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)\\|\\(A[`=
=C2=B4]?\\|[=C3=81=C3=80]\\)\\)".
>
> Quick question: _why_ do you need to join them?  Given that
> case-fold-search is enabled, couldn't you just use, say, the lower case
> version?

Because there are some characters in each regexp that don't have
lower/upper-case equivalents. For instance, if I use the
"\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)" regexp, that's enough t=
o match A or =C3=80, but
it's not enough to match a variety of other chars (=F0=9D=94=B8=F0=9D=95=AC=
=F0=9D=96=A0=F0=9D=97=94=F0=9D=98=88=F0=9D=98=BC=F0=9D=99=B0=F0=9F=84=B0).

> it looks to me that this redundancy would
> be quite easy to eliminate - you just need three regexp fragments for
> the letter "a" - a lower case one, an upper case one and a
> case-fold-search one.

Yes, we could go that route. It's just going to add complexity to the
code that generates the char-fold-table (which is already quite dense)
and I wonder if it's worth such a corner-case. Like I said, 'a'
already matches A and =C3=80, how much do we want to support this extra
case-folding?

> The other thing is that for that single character "a" a 39 character
> regexp fragment is being generated.  Might this have something to do
> with the "[Too many words]" error I got last night (which comes from the
> regexp engine returning a "too long regexp" error)?

yes

> Even if you can reduce that to, say 19 characters, that's only winning a
> factor of 2 in the slide towards a too long regexp.  It might well be
> that for a very long regexp, you might have to divide it into shorter
> sections (a typical long RE will by a sequence of sub expressions,
> rather than lots of alternatives inside \(...\|........\)).

I don't understand what you mean. Could you elaborate?