From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Artur Malabarba Newsgroups: gmane.emacs.bugs Subject: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Date: Sun, 6 Dec 2015 12:50:24 +0000 Message-ID: References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> <20151205185220.GF2698@acm.fritz.box> Reply-To: bruce.connor.am@gmail.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1449406287 28530 80.91.229.3 (6 Dec 2015 12:51:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 6 Dec 2015 12:51:27 +0000 (UTC) Cc: 22090@debbugs.gnu.org To: Alan Mackenzie Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Dec 06 13:51:18 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a5YmT-000693-2Q for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Dec 2015 13:51:09 +0100 Original-Received: from localhost ([::1]:49566 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5YmS-00067z-6z for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Dec 2015 07:51:08 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55071) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5YmP-00067s-9P for bug-gnu-emacs@gnu.org; Sun, 06 Dec 2015 07:51:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5YmM-0003Ln-2x for bug-gnu-emacs@gnu.org; Sun, 06 Dec 2015 07:51:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50641) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5YmL-0003Lj-Vo for bug-gnu-emacs@gnu.org; Sun, 06 Dec 2015 07:51:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1a5YmL-00069d-T0 for bug-gnu-emacs@gnu.org; Sun, 06 Dec 2015 07:51:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Artur Malabarba Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 06 Dec 2015 12:51:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 22090 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 22090-submit@debbugs.gnu.org id=B22090.144940622923606 (code B ref 22090); Sun, 06 Dec 2015 12:51:01 +0000 Original-Received: (at 22090) by debbugs.gnu.org; 6 Dec 2015 12:50:29 +0000 Original-Received: from localhost ([127.0.0.1]:40349 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5Ylo-00068f-9Z for submit@debbugs.gnu.org; Sun, 06 Dec 2015 07:50:28 -0500 Original-Received: from mail-lf0-f43.google.com ([209.85.215.43]:33355) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5Yll-00068X-Tf for 22090@debbugs.gnu.org; Sun, 06 Dec 2015 07:50:26 -0500 Original-Received: by lfaz4 with SMTP id z4so137276173lfa.0 for <22090@debbugs.gnu.org>; Sun, 06 Dec 2015 04:50:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=vKj5wmpo5/yQiPlV0WSgZaeIhOTaDC8KDq7sgXOgUUE=; b=mlpKx9beVnP7ulM2txzY42d6gDQdIV+x8PthTU+BEP1RZIxlwtmWfYmyU4r9LaGAo+ oLOn0ku7v/adz0XcxAIabojTpo57JRLDAiXgx9wKIAoUHJ8lvszLiyhqtd/OIMOrOf70 DJAREKoc1mv9QidOsZPvI4NEF4lR3wARA82xbjP11cjXReyWm3JZEcyoDGffF++sqCeh 4/xMQvCpnucj2cIYwB5XvwJ/+02IjmMQibtMTVZwdlgogc9ZsDnOqqIG444uXPttShgp 9Px5n+xisjsvqH9B1xuXi2VVNxqdRvVsv6KJ+BdyMnGznceHwfdGfj+O56vWK2a5+Uc0 t5NQ== X-Received: by 10.25.137.7 with SMTP id l7mr9721635lfd.63.1449406224811; Sun, 06 Dec 2015 04:50:24 -0800 (PST) Original-Received: by 10.112.202.99 with HTTP; Sun, 6 Dec 2015 04:50:24 -0800 (PST) In-Reply-To: <20151205185220.GF2698@acm.fritz.box> X-Google-Sender-Auth: _dLy06VRMD5XfriLk-ZEstpSk2M X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:109688 Archived-At: 2015-12-05 18:52 GMT+00:00 Alan Mackenzie : > But it seems the complexity (and it can't honestly be that much, > surely?) is intrinsic to the task being carried out. Sticking a "\\|" > between the upper case and lower case versions clearly doesn't work. > > Seriously, how difficult can it be to generate > > "\\([Aa][=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E=C3=81=C3=80]\\)" > > , which is a blameless regexp, given where you've already got to? Oh. I see. I thought you were talking about mutually exclusive regexps. Indeed a regexp like that would be trivial to generate. But is it really blameless? I mean, if "\\(A\\|a\\)" can lead to extremely slow searches, doesn't the same happen with "[Aa]"? Anyway, at this point I'm just asking for future knowledge/reference. According to Eli, the current implementation is in accordance with the Unicode Standard. So it's probably best to keep it this way at least for the first release of the feature. > Once you've generated the long regexp, if it's too long, you can split > it up into, say, 3 pieces A, B, C, such that (equal re (concat A B C)). > > Then you can do something like: > > (and (search-forward-regexp A bound noerror) > (search-forward-regexp (concat "\\=3D" B) bound noerror) > (search-forward-regexp (concat "\\=3D" C) bound noerror)) > > . Though, thinking about it, it might be less painful to enhance the > regexp engine to take longer regexps. Besides. Char-folding is supposed to turn strings into regexps usable anywhere, and this wouldn't work with that. I've added a clause to the function so that it won't do any charfolding if the resulting regexp would be longer than 5k chars (instead it will just regexp-quote). That will at least prevent the too-many words error in isearch. (I already had this clause in there before, but it was using 10k, which apparently is not enough).