unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Oliver Scholz <alkibiades@gmx.de>
Subject: Re: regexp and strings you don't want
Date: Fri, 29 Aug 2003 20:50:18 +0200	[thread overview]
Message-ID: <usmnk1eol.fsf@ID-87814.user.dfncis.de> (raw)
In-Reply-To: ud6eobfvj.fsf@ID-87814.user.dfncis.de

[Yet another follow-up to myself ...]
[Superseded because of a flaky patch]

Oliver Scholz <alkibiades@gmx.de> writes:

> kai.grossjohann@gmx.net (Kai Großjohann) writes:
>
>> chaz2@thedoghousemail.com (Chaz) writes:
>>
>>> For example, how can I search for a paragraph beginning with "The"
>>> that does NOT include the word "top"?
>>
>> It is possible to build a regexp that does this (disregarding the
>> paragraph problem at the moment), but it is not pretty.
>>
>> Some regexp implementations have the feature you're looking for to
>> make it convenient, but the Emacs implementation doesn't.
>>
>> Let me rephrase this in terms of lines instead of paragraphs.
>>
>> The idea is this: search for a line that begins with The and then
>> does not have top after it, as follows: after The, we allow any
>> characters that aren't t.  We also allow a t followed by something
>> that's not o, and also a to that's followed by something that's not
>> p.  And so on:
>>
>> "^The\\([^t]*\\($\\|t$\\|t[^o]\\|to$\\|to[^p]\\)\\)*$"
>
> Hmm. This is not really human readable. Would it be hard and/or bad
> to extend `rx' so that it allows for (not STRING)? A là:
>
> (looking-at (rx (and line-start
> 		     "The "
> 		     (not "top"))))
>
> Whereas `(not "top")' would compile to a normal regexp in the way you
> described it. WDYT?
[...]

I've played a bit with this (patch below). But I thing I am a bit
puzzled. With my patch, `(rx (not top))' translates to:

"\\(?:[^t]*\\|t[^o]*\\|to[^p]*\\)"

Is this actually correct?

What does the concept of a regexp that matches a sequence of
characters that does _not_ contain a certain sequence of characters
actually mean?

Should it match any sequence of characters not identical to the
unwanted one (including the empty string) or should it match only
sequences of the same length? Or any non-empty sequence of characters
not identical with the unwanted one?

With my patch:

(string-match (rx (and line-start
		       "The "
		       (not "top")
 		       " lirum larum"))
	      "The top lirum larum")
 ==> nil

(string-match (rx (and line-start
		       "The "
		       (not "top")
 		       " lirum larum"))
	      "The to lirum larum")
 ==> 0

(string-match (rx (and line-start
		       "The "
		       (not "top")
 		       " lirum larum"))
	      "The lirum larum")

 ==> nil

Is this good or bad?

    Oliver (puzzled)


Index: lisp/emacs-lisp/rx.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/emacs-lisp/rx.el,v
retrieving revision 1.3
diff -u -r1.3 rx.el
--- lisp/emacs-lisp/rx.el	23 Dec 2002 17:43:24 -0000	1.3
+++ lisp/emacs-lisp/rx.el	29 Aug 2003 18:46:18 -0000
@@ -334,6 +334,7 @@
 		    '(digit control hex-digit blank graphic printing
 			    alphanumeric letter ascii nonascii lower
 			    punctuation space upper word))
+	      (stringp form)
 	      (and (consp form)
 		   (memq (car form) '(not any in syntax category:))))
     (error "Rx `not' syntax error: %s" form))
@@ -343,27 +344,41 @@
 (defun rx-not (form)
   "Parse and produce code from FORM.  FORM is `(not ...)'."
   (rx-check form)
-  (let ((result (rx-to-string (cadr form) 'no-group)))
-    (cond ((string-match "\\`\\[^" result)
-	   (if (= (length result) 4)
-	       (substring result 2 3)
-	     (concat "[" (substring result 2))))
-	  ((string-match "\\`\\[" result)
-	   (concat "[^" (substring result 1)))
-	  ((string-match "\\`\\\\s." result)
-	   (concat "\\S" (substring result 2)))
-	  ((string-match "\\`\\\\S." result)
-	   (concat "\\s" (substring result 2)))
-	  ((string-match "\\`\\\\c." result)
-	   (concat "\\C" (substring result 2)))
-	  ((string-match "\\`\\\\C." result)
-	   (concat "\\c" (substring result 2)))
-	  ((string-match "\\`\\\\B" result)
-	   (concat "\\b" (substring result 2)))
-	  ((string-match "\\`\\\\b" result)
-	   (concat "\\B" (substring result 2)))
-	  (t
-	   (concat "[^" result "]")))))
+  (if (stringp (cadr form))
+      (rx-reverse-string (cadr form))
+    (let ((result (rx-to-string (cadr form) 'no-group)))
+      (cond ((string-match "\\`\\[^" result)
+	     (if (= (length result) 4)
+		 (substring result 2 3)
+	       (concat "[" (substring result 2))))
+	    ((string-match "\\`\\[" result)
+	     (concat "[^" (substring result 1)))
+	    ((string-match "\\`\\\\s." result)
+	     (concat "\\S" (substring result 2)))
+	    ((string-match "\\`\\\\S." result)
+	     (concat "\\s" (substring result 2)))
+	    ((string-match "\\`\\\\c." result)
+	     (concat "\\C" (substring result 2)))
+	    ((string-match "\\`\\\\C." result)
+	     (concat "\\c" (substring result 2)))
+	    ((string-match "\\`\\\\B" result)
+	     (concat "\\b" (substring result 2)))
+	    ((string-match "\\`\\\\b" result)
+	     (concat "\\B" (substring result 2)))
+	    (t
+	     (concat "[^" result "]"))))))
+
+(defun rx-reverse-string (string)
+  (let ((list nil))
+    (dotimes (i (length string))
+      (push (rx-reverse-string-1 i string) list))
+    (concat "\\(?:"
+	    (mapconcat 'identity (nreverse list) "\\|")
+	    "\\)")))
+
+(defun rx-reverse-string-1 (n string)
+  (concat (substring string 0 n)
+	  "[^" (string (aref string n)) "]*"))
 
 
 (defun rx-repeat (form)

-- 
12 Fructidor an 211 de la Révolution
Liberté, Egalité, Fraternité!

  reply	other threads:[~2003-08-29 18:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-25 19:45 regexp and strings you don't want Chaz
2003-08-25 20:17 ` Barry Margolin
2003-08-26 18:13   ` Chaz
2003-08-27 15:13     ` Kevin Rodgers
2003-08-29 15:50     ` Stefan Monnier
2003-08-26 22:19 ` Eric Pement
2003-08-27 20:26 ` Kai Großjohann
2003-08-29 16:14   ` Oliver Scholz
2003-08-29 18:50     ` Oliver Scholz [this message]
2003-08-29 19:58       ` Kai Großjohann
2003-08-29 20:38         ` Oliver Scholz
2003-08-30 14:50         ` Ilya Zakharevich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=usmnk1eol.fsf@ID-87814.user.dfncis.de \
    --to=alkibiades@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).