From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: "Daniel Colascione" Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master 938d252 4/4: Make regex matching reentrant; update syntax during match Date: Sun, 17 Jun 2018 11:51:33 -0700 Message-ID: <04e89d2beffedcc102b811863910c1ec.squirrel@dancol.org> References: <20180616204650.8423.73499@vcs0.savannah.gnu.org> <20180616204653.86AFC203CB@vcs0.savannah.gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1529261449 7038 195.159.176.226 (17 Jun 2018 18:50:49 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 17 Jun 2018 18:50:49 +0000 (UTC) User-Agent: SquirrelMail/1.4.23 [SVN] Cc: Daniel Colascione , emacs-devel@gnu.org To: "Stefan Monnier" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jun 17 20:50:44 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUclA-0001ii-4D for ged-emacs-devel@m.gmane.org; Sun, 17 Jun 2018 20:50:44 +0200 Original-Received: from localhost ([::1]:56115 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fUcnH-0001W0-8y for ged-emacs-devel@m.gmane.org; Sun, 17 Jun 2018 14:52:55 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37347) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fUclz-0001GQ-Kw for emacs-devel@gnu.org; Sun, 17 Jun 2018 14:51:36 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fUcly-0000nh-Oe for emacs-devel@gnu.org; Sun, 17 Jun 2018 14:51:35 -0400 Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:44062) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fUcly-0000n4-HT for emacs-devel@gnu.org; Sun, 17 Jun 2018 14:51:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:To:From:Subject:Date:References:In-Reply-To:Message-ID; bh=lcIqwKFIQ3ekh4dFGobFOxy39CSFhfvXnyqkykaRWE8=; b=Q3R7AVsZX5iIeCN7fH3rr7rKj5opDCCyYj0lsC2HxLClZHFYe7BcM5DXdB+oAkegUJrcgHyQbwnT9dt34LUcTmHFFfJxZ+joQ/wnuaqEGQ3echRjN4RcNU/otU/AcpJuvW8TYSK6rgb7rrDzwI5LHZKbtjADskTQrT06TsjOzXsi6miy7oT0qNbL0Cg8hs6FjjBSIiu3UVyhcoEqofbPy3zj6+YBb3GYgIuqkINpOz7OIfWpWLDS8GanCBacpZ/p71iRy1zEnvCStbTUtj3pXd8Y1qvSTTZTwiuqxc6CG1nUos7KstJneihQA+1MgifeIob6u1wqGTJ3xKB5FXuLvQ==; Original-Received: from localhost ([127.0.0.1] helo=dancol.org) by dancol.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUclx-0000Gq-Cx; Sun, 17 Jun 2018 11:51:33 -0700 Original-Received: from 127.0.0.1 (SquirrelMail authenticated user dancol) by dancol.org with HTTP; Sun, 17 Jun 2018 11:51:33 -0700 In-Reply-To: X-Priority: 3 (Normal) Importance: Normal X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:226420 Archived-At: >> (unfreeze_pattern, freeze_pattern): New functions. >> (compile_pattern): Return a regexp_cache pointer instead of the >> re_pattern_buffer, allowing callers to use `freeze_pattern' if >> needed. Do not consider busy patterns as cache hit candidates; >> error if we run out of non-busy cache entries. > > IIRC the main/only reason why you can't use a compiled pattern in > a reentrant way is because the \{N,M\} repetitions use a counter that's > stored directly within the compiled pattern. > > But these are fairly rare. > > So we could easily change the code to add a boolean stating whether there > is such a repetition-counter in the pattern, and if there isn't then > "freeze" can just do nothing because we can freely use that pattern > multiple times at the same time. Good idea. My reading of the "smart jump" stuff in regex.c suggested that we use the optimization for _all_ greedy Kleene star constructs though, not just the bounded ones. Am I wrong? But anyway, I think the regex code needs a major overhaul. I was actually thinking about forking and vendoring RE2. Granted, having done that, you'd need a C++ compiler to build Emacs, but it's probably one of the better actively-maintained DFA-based regex engines around.