From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: "Daniel Colascione" Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master 938d252 4/4: Make regex matching reentrant; update syntax during match Date: Sun, 17 Jun 2018 12:18:13 -0700 Message-ID: <8db0c426708a79bbf02e596a37675e95.squirrel@dancol.org> References: <20180616204650.8423.73499@vcs0.savannah.gnu.org> <20180616204653.86AFC203CB@vcs0.savannah.gnu.org> <04e89d2beffedcc102b811863910c1ec.squirrel@dancol.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1529263014 21794 195.159.176.226 (17 Jun 2018 19:16:54 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 17 Jun 2018 19:16:54 +0000 (UTC) User-Agent: SquirrelMail/1.4.23 [SVN] Cc: Daniel Colascione , Stefan Monnier , emacs-devel@gnu.org To: "Daniel Colascione" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jun 17 21:16:49 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUdAN-0005XN-TF for ged-emacs-devel@m.gmane.org; Sun, 17 Jun 2018 21:16:48 +0200 Original-Received: from localhost ([::1]:56177 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fUdCV-00070I-0a for ged-emacs-devel@m.gmane.org; Sun, 17 Jun 2018 15:18:59 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40880) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fUdBn-00070B-QN for emacs-devel@gnu.org; Sun, 17 Jun 2018 15:18:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fUdBm-00014D-OQ for emacs-devel@gnu.org; Sun, 17 Jun 2018 15:18:15 -0400 Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:44526) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fUdBm-000144-FM for emacs-devel@gnu.org; Sun, 17 Jun 2018 15:18:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:To:From:Subject:Date:References:In-Reply-To:Message-ID; bh=Z2vFZBoldIIWCmJ5AsmtNiQ9tv1/LcrPgzZqZokoO3Y=; b=H3DWFmvHyePm2tiVkBa2hKnv2s7tU10y1Ei5xYTMf4uo3IC4KbrRAfRvltGI+rsdMpKJolh9bRQqHMPLObbrUWtU7YqTDFkXz27Z49XwvkLio2PNuEXY0STSuXTYbfnFvMl5VJrlGb7QpLyi3dPJWRmNNUIXqrqj/vNqozVTORgBE88lTLpDOkSTkPOwsHgnBRx6hdhOukjKdL39c3knmKDnjjSeWxbtCmZlflT6x+GBJ4fLtpxrw/keaVaYbNpTC0Z+WTI1MxRnf3ABBLyXRxPEOJLgjZeAvswhi2AbGzbJ/ypMQuWux1hGeyaN/NUyvthFdNFZAbPazWCTYh3R9A==; Original-Received: from localhost ([127.0.0.1] helo=dancol.org) by dancol.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUdBl-0000PT-2H; Sun, 17 Jun 2018 12:18:13 -0700 Original-Received: from 127.0.0.1 (SquirrelMail authenticated user dancol) by dancol.org with HTTP; Sun, 17 Jun 2018 12:18:13 -0700 In-Reply-To: <04e89d2beffedcc102b811863910c1ec.squirrel@dancol.org> X-Priority: 3 (Normal) Importance: Normal X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:226422 Archived-At: >>> (unfreeze_pattern, freeze_pattern): New functions. >>> (compile_pattern): Return a regexp_cache pointer instead of the >>> re_pattern_buffer, allowing callers to use `freeze_pattern' if >>> needed. Do not consider busy patterns as cache hit candidates; >>> error if we run out of non-busy cache entries. >> >> IIRC the main/only reason why you can't use a compiled pattern in >> a reentrant way is because the \{N,M\} repetitions use a counter that's >> stored directly within the compiled pattern. >> >> But these are fairly rare. >> >> So we could easily change the code to add a boolean stating whether >> there >> is such a repetition-counter in the pattern, and if there isn't then >> "freeze" can just do nothing because we can freely use that pattern >> multiple times at the same time. > > Good idea. My reading of the "smart jump" stuff in regex.c suggested that > we use the optimization for _all_ greedy Kleene star constructs though, > not just the bounded ones. Am I wrong? Oh, yeah. One more thing. The busy flag isn't *just* to prevent the regex bytecode engine confusing itself with self-modifying bytecode. It also serves to protect the cache slot holding the regex pattern from reuse, which would be disastrous whether or not the bytecode of a particular pattern happens to be mutable. Under your proposal, we can get cache hits on busy patterns, but we still have to mark them busy in the first place so we can prevent this reuse.