unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* (regexp-opt-depth "[\\(]") => 1  :-(
@ 2003-04-20 22:30 Alan Mackenzie
  2003-04-23 11:26 ` Alan Mackenzie
  0 siblings, 1 reply; 2+ messages in thread
From: Alan Mackenzie @ 2003-04-20 22:30 UTC (permalink / raw)


There Is No Alternative:  regexp-opt-depth MUST analyse its argument
properly.  The following rewrite of regexp-opt-depth does just that.

[Well, OK, it would be as well for somebody else to check the formulation
of regexp-opt-not-groupie*-re.  ;-]

The patch below passes the following test cases:
(regexp-opt-depth "(asdf)")             => 0
(regexp-opt-depth "\\(asdf\\)")         => 1
(regexp-opt-depth "\\(\\(asdf\\)\\)")   => 2
(regexp-opt-depth "\\(?:asdf\\)")       => 0
(regexp-opt-depth "[\\(]")              => 0

(regexp-opt-depth "[a]\\(]asd\\)")      => 1
(regexp-opt-depth "[^a]\\(]asd\\)")     => 1
(regexp-opt-depth "[]\\(]asd)")         => 0
(regexp-opt-depth "[^]\\(]asd)")        => 0
(regexp-opt-depth "\\(? \\)")           signals "invalid regexp".

*************************************************************************
*** regexp-opt.1.24.el	Fri Apr 18 18:34:34 2003
--- regexp-opt.acm.1.24.el	Sun Apr 20 22:05:33 2003
***************
*** 110,115 ****
--- 110,124 ----
  	   (re (regexp-opt-group sorted-strings open)))
        (if words (concat "\\<" re "\\>") re))))
  
+ (defconst regexp-opt-not-groupie*-re
+   (let ((harmless-ch "[^\\\\[]")
+         (esc-pair-not-lp "\\\\[^(]")
+         (class "\\[^?]?[^]]*]")
+         (shy-lp "\\\\(\\?:"))
+     (concat "\\(" harmless-ch "\\|" esc-pair-not-lp
+             "\\|" class "\\|" shy-lp "\\)*"))
+   "Matches any part of a regular expression EXCEPT for non-shy \"\\\\(\"s")
+ 
  ;;;###autoload
  (defun regexp-opt-depth (regexp)
    "Return the depth of REGEXP.
***************
*** 120,130 ****
      (string-match regexp "")
      ;; Count the number of open parentheses in REGEXP.
      (let ((count 0) start)
!       (while (string-match "\\(\\`\\|[^\\]\\)\\\\\\(\\\\\\\\\\)*([^?]"
! 			   regexp start)
! 	(setq count (1+ count)
! 	      ;; Go back 2 chars (one for [^?] and one for [^\\]).
! 	      start (- (match-end 0) 2)))
        count)))
  \f
  ;;; Workhorse functions.
--- 129,141 ----
      (string-match regexp "")
      ;; Count the number of open parentheses in REGEXP.
      (let ((count 0) start)
!       (while
!           (progn
!             (string-match regexp-opt-not-groupie*-re regexp start)
!             (setq start (match-end 0))
!             (< start (1- (length regexp))))
!         (setq count (1+ count)
!               start (+ start 2)))        ; step START over "\\("
        count)))
  \f
  ;;; Workhorse functions.
*************************************************************************

-- 
Alan Mackenzie (Munich, Germany)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: (regexp-opt-depth "[\\(]") => 1  :-(
  2003-04-20 22:30 (regexp-opt-depth "[\\(]") => 1 :-( Alan Mackenzie
@ 2003-04-23 11:26 ` Alan Mackenzie
  0 siblings, 0 replies; 2+ messages in thread
From: Alan Mackenzie @ 2003-04-23 11:26 UTC (permalink / raw)




On Sun, 20 Apr 2003, Alan Mackenzie wrote:

>There Is No Alternative:  regexp-opt-depth MUST analyse its argument
>properly.  The following rewrite of regexp-opt-depth does just that.

>[Well, OK, it would be as well for somebody else to check the formulation
>of regexp-opt-not-groupie*-re.  ;-]

Many thanks to RMS for doing just that and telling me that 
# This is a good idea, but it fails on "[[:alpha:]\\(]".
# I think the value for `class' needs to be more sophisticated.

>The patch below passes the following test cases:
>(regexp-opt-depth "(asdf)")             => 0
>(regexp-opt-depth "\\(asdf\\)")         => 1
>(regexp-opt-depth "\\(\\(asdf\\)\\)")   => 2
>(regexp-opt-depth "\\(?:asdf\\)")       => 0
>(regexp-opt-depth "[\\(]")              => 0

>(regexp-opt-depth "[a]\\(]asd\\)")      => 1
>(regexp-opt-depth "[^a]\\(]asd\\)")     => 1
>(regexp-opt-depth "[]\\(]asd)")         => 0
>(regexp-opt-depth "[^]\\(]asd)")        => 0
>(regexp-opt-depth "\\(? \\)")           signals "invalid regexp".

Here is the amended patch with that more sophisticated regexp for class.
In addition to the above test cases, the newer version passes these:

(regexp-opt-depth "[[:alpha:]\\(]")     => 0
(regexp-opt-depth "[[:alpha]\\(")       signals "invalid regexp"
(regexp-opt-depth "[[:alpha]\\(\\)")    => 1
(regexp-opt-depth "[[:alp$ha:]\\(\\)")  signals "Invalid regexp"
(regexp-opt-depth "[[alpha:]\\(]\\)")   => 1

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
2003-04-23  Alan Mackenzie  <acm@muc.de>

        * regexp-opt.el: In regexp-opt-depth, don't count a "//(" which appears
        inside a character set].  New constant regexp-opt-not-groupie*-re.

*** regexp-opt.1.24.el	Fri Apr 18 18:34:34 2003
--- regexp-opt.acm.1.24.el	Tue Apr 22 20:52:53 2003
***************
*** 110,115 ****
--- 110,133 ----
  	   (re (regexp-opt-group sorted-strings open)))
        (if words (concat "\\<" re "\\>") re))))
  
+ (defconst regexp-opt-not-groupie*-re
+   (let* ((harmless-ch "[^\\\\[]")
+          (esc-pair-not-lp "\\\\[^(]")
+          (class-harmless-ch "[^][]")
+          (class-lb-harmless "[^]:]")
+          (class-lb-colon-maybe-charclass ":\\([a-z]+:]\\)?")
+          (class-lb (concat "\\[\\(" class-lb-harmless
+                            "\\|" class-lb-colon-maybe-charclass "\\)"))
+          (class
+           (concat "\\[^?]?"
+                   "\\(" class-harmless-ch
+                   "\\|" class-lb "\\)*"
+                   "\\[?]"))         ; special handling for bare [ at end of re
+          (shy-lp "\\\\(\\?:"))
+     (concat "\\(" harmless-ch "\\|" esc-pair-not-lp
+             "\\|" class "\\|" shy-lp "\\)*"))
+   "Matches any part of a regular expression EXCEPT for non-shy \"\\\\(\"s")
+ 
  ;;;###autoload
  (defun regexp-opt-depth (regexp)
    "Return the depth of REGEXP.
***************
*** 120,130 ****
      (string-match regexp "")
      ;; Count the number of open parentheses in REGEXP.
      (let ((count 0) start)
!       (while (string-match "\\(\\`\\|[^\\]\\)\\\\\\(\\\\\\\\\\)*([^?]"
! 			   regexp start)
! 	(setq count (1+ count)
! 	      ;; Go back 2 chars (one for [^?] and one for [^\\]).
! 	      start (- (match-end 0) 2)))
        count)))
  \f
  ;;; Workhorse functions.
--- 138,149 ----
      (string-match regexp "")
      ;; Count the number of open parentheses in REGEXP.
      (let ((count 0) start)
!       (while
!           (progn
!             (string-match regexp-opt-not-groupie*-re regexp start)
!             (setq start ( + (match-end 0) 2))  ; +2 for "\\(" after match-end.
!             (<= start (length regexp)))
!         (setq count (1+ count)))
        count)))
  \f
  ;;; Workhorse functions.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

-- 
Alan Mackenzie (Munich, Germany)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-04-23 11:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-20 22:30 (regexp-opt-depth "[\\(]") => 1 :-( Alan Mackenzie
2003-04-23 11:26 ` Alan Mackenzie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).