all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* using non-Emacs regexp syntax
@ 2006-11-28 20:56 Paul Pogonyshev
  2006-11-29 16:26 ` Richard Stallman
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Pogonyshev @ 2006-11-28 20:56 UTC (permalink / raw)


Hi,

Is there a function to convert non-Emacs regexps (e.g. "ab(c+|d)" to
Emacs regexps (example to "ab\(c+\|d\)")?

If there is none, are you interested in adding such functions?  (Of
course, not now, but after the release.)  I assume it is not worth it
to implement in C, so a Lisp implementation is in order?

(Abstract task is like this: be able to read regexps from an (XML)
file, which should be readable not only by Emacs; since Emacs syntax
is not widespread, regexps would use a different syntax.)

Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: using non-Emacs regexp syntax
@ 2006-12-01 22:35 Stuart D. Herring
  2006-12-01 22:54 ` Paul Pogonyshev
  2006-12-02  2:38 ` Stefan Monnier
  0 siblings, 2 replies; 15+ messages in thread
From: Stuart D. Herring @ 2006-12-01 22:35 UTC (permalink / raw)
  Cc: rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2007 bytes --]

> If you don't mind, I'll work on it now.  Changes can be added to whatever
> .el file in the distribution later.
>
> Also, is there sense in supporting conversion to and from several formats?
> E.g. some require that plus operator is escaped, while everything else is
> not.  E.g. something like this:
>
> 	(convert-regexp :sed :emacs some-regexp)
> 			FROM   TO   PATTERN-STRING
>
> Of course, it will add more complexity, but it shouldn't be much of a
problem for users of this function and implementing it in Lisp should
still
> be not hard.

I've already started on this sort of thing, writing a converter just
between the two formats supported by GNU grep.  (These are
"GNU-extended-basic-RE" and "extended-RE with backreferences".)  As it
happens, that conversion can be done with one function because the formats
are so similar.  I had planned to go on to the more general case, but for
now I'll just provide what I have for comment and/or use.  (I have papers,
so any use is fine.)  If, Paul, you'd like, we can collaborate on this, or
one of us of your choice can go on with it.

For reference/goal purposes, I've been looking at the (somewhat outdated)
Mastering Regular Expressions and it describes these syntaxes:
1.  vi
2. (modern) grep
3. egrep
4. sed
5. lex
6. old awk
7. new awk(s) (don't know how different they really are from each other or
from old awk)
8. Emacs
9. Perl (obviously we can only convert a subset of Perl's syntax...)
10. Tcl
11. a Tcl library called Expect (although I don't know if/why it has a
different syntax from Tcl itself)
12. Python (complicated by the old regex and the new re packages, and how
the former had a variable syntax)

Hope it's helpful,
Davis

PS - I originally wrote this using some convenience macros of mine.  It
seems to work after I standardized it, but that's probably why if it
doesn't.

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

[-- Attachment #2: convert-re.el --]
[-- Type: application/octet-stream, Size: 1396 bytes --]

;; Remember the exceedingly-basic regexes as used by sed(1)... might need to
;; support them too, although converting into them can be a pain.  Obviously,
;; in general you can't have just one function.

(defun convert-regexp (re)
	"Convert the regexp RE from basic to extended format or back."
	(let ((chars (string-to-list re)) ret backslash)
		(while chars
			(let ((curchar (car chars)))
				(cond
				 ((eq curchar ?\\)
					(unless (setq backslash (not backslash))
						(push ?\\ ret) (push ?\\ ret)))
				 ((eq curchar ?\[)
					(if backslash (progn (push ?\\ ret) (push ?\[ ret))
						;; Otherwise, it's a character class:
						(push ?\[ ret)
						(setq chars (cdr chars))
						(let ((level 1) (first 0))
							(while (and chars (> level 0))
								(let ((clch (car chars)))
									(push clch ret)
									(cond
									 ((eq clch ?\[) (incf level))
									 ((eq clch ?\]) (unless first (decf level)))
									 ((eq clch ?^) (if first (setq first t)))))
								(setq first (and first (unless (numberp first) 0)))
								(unless (zerop level) (setq chars (cdr chars)))))))
				 ((memq curchar (string-to-list "?+()|{}"))
					(unless backslash (push ?\\ ret))
					(push (car chars) ret))
				 (t (if backslash (push ?\\ ret)) (push (car chars) ret))))
			(setq backslash (and backslash (unless (numberp backslash) 0))
						chars (cdr chars)))
		(concat (nreverse ret))))

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-12-05  5:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-28 20:56 using non-Emacs regexp syntax Paul Pogonyshev
2006-11-29 16:26 ` Richard Stallman
2006-11-29 16:38   ` Drew Adams
2006-11-29 17:23     ` David Kastrup
2006-11-29 19:13       ` Paul Pogonyshev
2006-11-29 20:53         ` Jari Aalto
2006-11-30  2:11         ` Drew Adams
2006-11-30 14:26           ` Stefan Monnier
2006-12-05  5:16             ` Drew Adams
2006-12-01 20:30       ` Stuart D. Herring
2006-11-29 19:06   ` Paul Pogonyshev
  -- strict thread matches above, loose matches on Subject: below --
2006-12-01 22:35 Stuart D. Herring
2006-12-01 22:54 ` Paul Pogonyshev
2006-12-03 20:22   ` Juri Linkov
2006-12-02  2:38 ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.