From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.help Subject: Re: will we ever have zero width assertions in regexps? Date: Fri, 28 Jan 2011 21:51:55 -0500 Organization: A noiseless patient Spider Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1296272449 7671 80.91.229.12 (29 Jan 2011 03:40:49 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 29 Jan 2011 03:40:49 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Jan 29 04:40:45 2011 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Pj1g2-000207-36 for geh-help-gnu-emacs@m.gmane.org; Sat, 29 Jan 2011 04:40:42 +0100 Original-Received: from localhost ([127.0.0.1]:48469 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pj1g1-0003B9-Kp for geh-help-gnu-emacs@m.gmane.org; Fri, 28 Jan 2011 22:40:41 -0500 Original-Path: usenet.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!news.tcx.org.uk!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 39 Injection-Info: mx03.eternal-september.org; posting-host="eMDpGrxquomWlKb79/Ex8A"; logging-data="27665"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19IasqK0+Rk/PXJmuP1Dn8M" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:moK7JiKy3CNpWQzL9/vIuWYe8Nc= sha1:KDBX5mmYXMpDXRRmOmYmQV+mfZ8= Original-Xref: usenet.stanford.edu gnu.emacs.help:184669 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:78825 Archived-At: >> To get rid of the occasional pathological case where matching takes >> forever and Emacs appears to be frozen. Programmers who are used to >> backtracking matchers will usually intuitively stay away from regexps >> that can show such behaviors, but not all programmers do, and even if >> you're careful there are cases that are hard to avoid. > Did you try it with Perl recently (last 10 years or so)? No, but neither have I bumped into pathological cases in Perl before that (when I did use it). > As I said, I put some optimizations which in most (AFAIK) practical > senses remove such pathologies. (The underlying problems remain; the > optimizations are only "heuristic"; but one needs to be extra > inventive to circumvent the optimizations.) A typical case could look something like "foo *(.*?) *bar". when matching "foo .... baZ". Emacs's regexp engine is not very clever and doesn't do much in terms of avoiding backtracking (it mostly takes care of * when can only match a single char and can only start with a char that's not matched by ), but I can't think of too many ways to handle the above one efficiently within a "backtracking regexp matcher" framework. >> Another minor reason is that it can be handy to have an incremental >> matching primitive, so you can match over a long string one chunk at >> a time. I'm not sure how often this would be useful, but I've come >> across a few cases where it seemed like it could be put to good use >> (tho, for lack of experience with it, I can't sweat that it would turn >> out to be a good idea). > Do not know what you mean by this... Basically, provide a primitive like (match-string RE STRING LIMIT) that can not only say "matched between START and END", but also "reached LIMIT within yet finding a match, here's the suspended SEARCH-STATE at LIMIT", so you can later resume the search starting at LIMIT by passing that state. Stefan