Re: using non-Emacs regexp syntax

* Re: using non-Emacs regexp syntax
@ 2006-12-01 22:35 Stuart D. Herring
  2006-12-01 22:54 ` Paul Pogonyshev
  2006-12-02  2:38 ` Stefan Monnier
  0 siblings, 2 replies; 15+ messages in thread
From: Stuart D. Herring @ 2006-12-01 22:35 UTC (permalink / raw)
  Cc: rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2007 bytes --]

> If you don't mind, I'll work on it now.  Changes can be added to whatever
> .el file in the distribution later.
>
> Also, is there sense in supporting conversion to and from several formats?
> E.g. some require that plus operator is escaped, while everything else is
> not.  E.g. something like this:
>
> 	(convert-regexp :sed :emacs some-regexp)
> 			FROM   TO   PATTERN-STRING
>
> Of course, it will add more complexity, but it shouldn't be much of a
problem for users of this function and implementing it in Lisp should
still
> be not hard.

I've already started on this sort of thing, writing a converter just
between the two formats supported by GNU grep.  (These are
"GNU-extended-basic-RE" and "extended-RE with backreferences".)  As it
happens, that conversion can be done with one function because the formats
are so similar.  I had planned to go on to the more general case, but for
now I'll just provide what I have for comment and/or use.  (I have papers,
so any use is fine.)  If, Paul, you'd like, we can collaborate on this, or
one of us of your choice can go on with it.

For reference/goal purposes, I've been looking at the (somewhat outdated)
Mastering Regular Expressions and it describes these syntaxes:
1.  vi
2. (modern) grep
3. egrep
4. sed
5. lex
6. old awk
7. new awk(s) (don't know how different they really are from each other or
from old awk)
8. Emacs
9. Perl (obviously we can only convert a subset of Perl's syntax...)
10. Tcl
11. a Tcl library called Expect (although I don't know if/why it has a
different syntax from Tcl itself)
12. Python (complicated by the old regex and the new re packages, and how
the former had a variable syntax)

Hope it's helpful,
Davis

PS - I originally wrote this using some convenience macros of mine.  It
seems to work after I standardized it, but that's probably why if it
doesn't.

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

[-- Attachment #2: convert-re.el --]
[-- Type: application/octet-stream, Size: 1396 bytes --]

;; Remember the exceedingly-basic regexes as used by sed(1)... might need to
;; support them too, although converting into them can be a pain.  Obviously,
;; in general you can't have just one function.

(defun convert-regexp (re)
	"Convert the regexp RE from basic to extended format or back."
	(let ((chars (string-to-list re)) ret backslash)
		(while chars
			(let ((curchar (car chars)))
				(cond
				 ((eq curchar ?\\)
					(unless (setq backslash (not backslash))
						(push ?\\ ret) (push ?\\ ret)))
				 ((eq curchar ?\[)
					(if backslash (progn (push ?\\ ret) (push ?\[ ret))
						;; Otherwise, it's a character class:
						(push ?\[ ret)
						(setq chars (cdr chars))
						(let ((level 1) (first 0))
							(while (and chars (> level 0))
								(let ((clch (car chars)))
									(push clch ret)
									(cond
									 ((eq clch ?\[) (incf level))
									 ((eq clch ?\]) (unless first (decf level)))
									 ((eq clch ?^) (if first (setq first t)))))
								(setq first (and first (unless (numberp first) 0)))
								(unless (zerop level) (setq chars (cdr chars)))))))
				 ((memq curchar (string-to-list "?+()|{}"))
					(unless backslash (push ?\\ ret))
					(push (car chars) ret))
				 (t (if backslash (push ?\\ ret)) (push (car chars) ret))))
			(setq backslash (and backslash (unless (numberp backslash) 0))
						chars (cdr chars)))
		(concat (nreverse ret))))

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread