From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.help Subject: Re: regexp and strings you don't want Supersedes: Date: Fri, 29 Aug 2003 20:50:18 +0200 Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: References: <6c185cf3.0308251145.6af55ffc@posting.google.com> <844r02luds.fsf@slowfox.is.informatik.uni-duisburg.de> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1062184014 13658 80.91.224.253 (29 Aug 2003 19:06:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 29 Aug 2003 19:06:54 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Aug 29 21:06:52 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19soaC-0005Xr-00 for ; Fri, 29 Aug 2003 21:06:52 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.22) id 19soWQ-0000bn-2y for geh-help-gnu-emacs@m.gmane.org; Fri, 29 Aug 2003 15:02:58 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.icl.net!newsfeed.fjserv.net!feed.news.nacamar.de!fu-berlin.de!uni-berlin.de!dialin-145-254-136-006.arcor-ip.NET!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 166 Original-NNTP-Posting-Host: dialin-145-254-136-006.arcor-ip.net (145.254.136.6) Original-X-Trace: news.uni-berlin.de 1062183044 12200623 145.254.136.6 (16 [87814]) X-Attribution: os X-Face: "HgH2sgK|bfH$; PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6; Td% IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c& User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt) Cancel-Key: sha1:d7XtPuA2XU+Ye2yRcitMpiZIipk= Cancel-Lock: sha1:ODr9YsUvPdX5Io/jP1Oz+F//o8U= Original-Xref: shelby.stanford.edu gnu.emacs.help:116265 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:12183 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:12183 [Yet another follow-up to myself ...] [Superseded because of a flaky patch] Oliver Scholz writes: > kai.grossjohann@gmx.net (Kai Großjohann) writes: > >> chaz2@thedoghousemail.com (Chaz) writes: >> >>> For example, how can I search for a paragraph beginning with "The" >>> that does NOT include the word "top"? >> >> It is possible to build a regexp that does this (disregarding the >> paragraph problem at the moment), but it is not pretty. >> >> Some regexp implementations have the feature you're looking for to >> make it convenient, but the Emacs implementation doesn't. >> >> Let me rephrase this in terms of lines instead of paragraphs. >> >> The idea is this: search for a line that begins with The and then >> does not have top after it, as follows: after The, we allow any >> characters that aren't t. We also allow a t followed by something >> that's not o, and also a to that's followed by something that's not >> p. And so on: >> >> "^The\\([^t]*\\($\\|t$\\|t[^o]\\|to$\\|to[^p]\\)\\)*$" > > Hmm. This is not really human readable. Would it be hard and/or bad > to extend `rx' so that it allows for (not STRING)? A là: > > (looking-at (rx (and line-start > "The " > (not "top")))) > > Whereas `(not "top")' would compile to a normal regexp in the way you > described it. WDYT? [...] I've played a bit with this (patch below). But I thing I am a bit puzzled. With my patch, `(rx (not top))' translates to: "\\(?:[^t]*\\|t[^o]*\\|to[^p]*\\)" Is this actually correct? What does the concept of a regexp that matches a sequence of characters that does _not_ contain a certain sequence of characters actually mean? Should it match any sequence of characters not identical to the unwanted one (including the empty string) or should it match only sequences of the same length? Or any non-empty sequence of characters not identical with the unwanted one? With my patch: (string-match (rx (and line-start "The " (not "top") " lirum larum")) "The top lirum larum") ==> nil (string-match (rx (and line-start "The " (not "top") " lirum larum")) "The to lirum larum") ==> 0 (string-match (rx (and line-start "The " (not "top") " lirum larum")) "The lirum larum") ==> nil Is this good or bad? Oliver (puzzled) Index: lisp/emacs-lisp/rx.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/emacs-lisp/rx.el,v retrieving revision 1.3 diff -u -r1.3 rx.el --- lisp/emacs-lisp/rx.el 23 Dec 2002 17:43:24 -0000 1.3 +++ lisp/emacs-lisp/rx.el 29 Aug 2003 18:46:18 -0000 @@ -334,6 +334,7 @@ '(digit control hex-digit blank graphic printing alphanumeric letter ascii nonascii lower punctuation space upper word)) + (stringp form) (and (consp form) (memq (car form) '(not any in syntax category:)))) (error "Rx `not' syntax error: %s" form)) @@ -343,27 +344,41 @@ (defun rx-not (form) "Parse and produce code from FORM. FORM is `(not ...)'." (rx-check form) - (let ((result (rx-to-string (cadr form) 'no-group))) - (cond ((string-match "\\`\\[^" result) - (if (= (length result) 4) - (substring result 2 3) - (concat "[" (substring result 2)))) - ((string-match "\\`\\[" result) - (concat "[^" (substring result 1))) - ((string-match "\\`\\\\s." result) - (concat "\\S" (substring result 2))) - ((string-match "\\`\\\\S." result) - (concat "\\s" (substring result 2))) - ((string-match "\\`\\\\c." result) - (concat "\\C" (substring result 2))) - ((string-match "\\`\\\\C." result) - (concat "\\c" (substring result 2))) - ((string-match "\\`\\\\B" result) - (concat "\\b" (substring result 2))) - ((string-match "\\`\\\\b" result) - (concat "\\B" (substring result 2))) - (t - (concat "[^" result "]"))))) + (if (stringp (cadr form)) + (rx-reverse-string (cadr form)) + (let ((result (rx-to-string (cadr form) 'no-group))) + (cond ((string-match "\\`\\[^" result) + (if (= (length result) 4) + (substring result 2 3) + (concat "[" (substring result 2)))) + ((string-match "\\`\\[" result) + (concat "[^" (substring result 1))) + ((string-match "\\`\\\\s." result) + (concat "\\S" (substring result 2))) + ((string-match "\\`\\\\S." result) + (concat "\\s" (substring result 2))) + ((string-match "\\`\\\\c." result) + (concat "\\C" (substring result 2))) + ((string-match "\\`\\\\C." result) + (concat "\\c" (substring result 2))) + ((string-match "\\`\\\\B" result) + (concat "\\b" (substring result 2))) + ((string-match "\\`\\\\b" result) + (concat "\\B" (substring result 2))) + (t + (concat "[^" result "]")))))) + +(defun rx-reverse-string (string) + (let ((list nil)) + (dotimes (i (length string)) + (push (rx-reverse-string-1 i string) list)) + (concat "\\(?:" + (mapconcat 'identity (nreverse list) "\\|") + "\\)"))) + +(defun rx-reverse-string-1 (n string) + (concat (substring string 0 n) + "[^" (string (aref string n)) "]*")) (defun rx-repeat (form) -- 12 Fructidor an 211 de la Révolution Liberté, Egalité, Fraternité!