From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Alan Mackenzie <acm@muc.de>
Newsgroups: gmane.emacs.bugs
Subject: bug#22090: Isearch is sluggish and eventually refuses further service
	with "[Too many words]".
Date: Sat, 5 Dec 2015 18:52:20 +0000
Message-ID: <20151205185220.GF2698@acm.fritz.box>
References: <mailman.1363.1449242229.31583.bug-gnu-emacs@gnu.org>
	<20151204192126.73199.qmail@mail.muc.de>
	<CAAdUY-KvOC_r6bi50n5pia1uY7vUfwALzBEupFYfX0BYc+vCvw@mail.gmail.com>
	<20151204230000.GC6070@acm.fritz.box>
	<CAAdUY-Jj8_pK78x2hyXWLwiOADkmad97Q6LJxtqnirMKCrSfOg@mail.gmail.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1449341495 9317 80.91.229.3 (5 Dec 2015 18:51:35 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 5 Dec 2015 18:51:35 +0000 (UTC)
Cc: 22090@debbugs.gnu.org
To: Artur Malabarba <bruce.connor.am@gmail.com>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 05 19:51:24 2015
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1a5HvW-0004f3-K1
	for geb-bug-gnu-emacs@m.gmane.org; Sat, 05 Dec 2015 19:51:22 +0100
Original-Received: from localhost ([::1]:47443 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1a5HvV-0005ut-S2
	for geb-bug-gnu-emacs@m.gmane.org; Sat, 05 Dec 2015 13:51:21 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52629)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5HvF-0005bq-92
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 13:51:06 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5HvC-0002m7-2G
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 13:51:05 -0500
Original-Received: from debbugs.gnu.org ([208.118.235.43]:50368)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5HvB-0002lt-V6
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 13:51:01 -0500
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1a5HvB-0002Lz-N0
	for bug-gnu-emacs@gnu.org; Sat, 05 Dec 2015 13:51:01 -0500
X-Loop: help-debbugs@gnu.org
Resent-From: Alan Mackenzie <acm@muc.de>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Sat, 05 Dec 2015 18:51:01 +0000
Resent-Message-ID: <handler.22090.B22090.14493414359010@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 22090
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 22090-submit@debbugs.gnu.org id=B22090.14493414359010
	(code B ref 22090); Sat, 05 Dec 2015 18:51:01 +0000
Original-Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 18:50:35 +0000
Original-Received: from localhost ([127.0.0.1]:40076 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1a5Huk-0002LF-TL
	for submit@debbugs.gnu.org; Sat, 05 Dec 2015 13:50:35 -0500
Original-Received: from mail.muc.de ([193.149.48.3]:56514)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <acm@muc.de>) id 1a5HuQ-0002Ki-J1
	for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 13:50:33 -0500
Original-Received: (qmail 6765 invoked by uid 3782); 5 Dec 2015 18:50:13 -0000
Original-Received: from acm.muc.de (p548A4450.dip0.t-ipconnect.de [84.138.68.80]) by
	colin.muc.de (tmda-ofmipd) with ESMTP;
	Sat, 05 Dec 2015 19:50:12 +0100
Original-Received: (qmail 4907 invoked by uid 1000); 5 Dec 2015 18:52:20 -0000
Content-Disposition: inline
In-Reply-To: <CAAdUY-Jj8_pK78x2hyXWLwiOADkmad97Q6LJxtqnirMKCrSfOg@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Delivery-Agent: TMDA/1.1.12 (Macallan)
X-Primary-Address: acm@muc.de
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 208.118.235.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.bugs:109663
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/109663>

Hello, Artur.

On Sat, Dec 05, 2015 at 05:23:53PM +0000, Artur Malabarba wrote:
> nn2015-12-04 23:00 GMT+00:00 Alan Mackenzie <acm@muc.de>:
> >> When case-fold-search is on the previous code would simply join these
> >> regexps with "\\(\\(a[´`]?\\|[áà𝑎]\\)\\|\\(A[`´]?\\|[ÁÀ]\\)\\)".

> > Quick question: _why_ do you need to join them?  Given that
> > case-fold-search is enabled, couldn't you just use, say, the lower case
> > version?

> Because there are some characters in each regexp that don't have
> lower/upper-case equivalents. For instance, if I use the
> "\\(\\(a[´`]?\\|[áà𝑎]\\)" regexp, that's enough to match A or À, but
> it's not enough to match a variety of other chars (𝔸𝕬𝖠𝗔𝘈𝘼𝙰🄰).

OK, thanks.

> > it looks to me that this redundancy would
> > be quite easy to eliminate - you just need three regexp fragments for
> > the letter "a" - a lower case one, an upper case one and a
> > case-fold-search one.

> Yes, we could go that route. It's just going to add complexity to the
> code that generates the char-fold-table (which is already quite dense)
> and I wonder if it's worth such a corner-case. Like I said, 'a'
> already matches A and À, how much do we want to support this extra
> case-folding?

But it seems the complexity (and it can't honestly be that much,
surely?) is intrinsic to the task being carried out.  Sticking a "\\|"
between the upper case and lower case versions clearly doesn't work.

Seriously, how difficult can it be to generate

    "\\([Aa][´`]?\\|[áà𝑎ÁÀ]\\)"

, which is a blameless regexp, given where you've already got to?

> > The other thing is that for that single character "a" a 39 character
> > regexp fragment is being generated.  Might this have something to do
> > with the "[Too many words]" error I got last night (which comes from the
> > regexp engine returning a "too long regexp" error)?

> yes

I was afraid of that.

> > Even if you can reduce that to, say 19 characters, that's only winning a
> > factor of 2 in the slide towards a too long regexp.  It might well be
> > that for a very long regexp, you might have to divide it into shorter
> > sections (a typical long RE will by a sequence of sub expressions,
> > rather than lots of alternatives inside \(...\|........\)).

> I don't understand what you mean. Could you elaborate?

Once you've generated the long regexp, if it's too long, you can split
it up into, say, 3 pieces A, B, C, such that (equal re (concat A B C)).

Then you can do something like:

    (and (search-forward-regexp A bound noerror)
    	 (search-forward-regexp (concat "\\=" B) bound noerror)
	 (search-forward-regexp (concat "\\=" C) bound noerror))

.  Though, thinking about it, it might be less painful to enhance the
regexp engine to take longer regexps.

-- 
Alan Mackenzie (Nuremberg, Germany).