unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* regexp to match a sexp?
@ 2006-07-29  1:55 Drew Adams
  0 siblings, 0 replies; 4+ messages in thread
From: Drew Adams @ 2006-07-29  1:55 UTC (permalink / raw)


This is a job description for a regexp.

I'm looking for a regexp that will match (only) a sexp. Ideally, by using
syntax classes, it might be able to adapt to (some) other languages, but a
regexp that worked for only Emacs-Lisp sexps would be a great start.

For Lisp, for example, here are some considerations I can think of (are
there others?):

1. It would need to match either a list sexp or an atomic sexp. The former
would be something like \s(.*\s), but `.' would need to be refined here
(how?). The latter would have to account for symbols, as in \s_+, but also
arbitrary numbers and strings. (Is anything else a readable Emacs-Lisp
atom?)

2. Strings and the possible escaping of `"' would be one headache that would
need to be dealt with carefully, as always.

3. It would need to be effectively recursive or some approximation thereof,
for example, with some limit placed on nesting. That is, it would need to
allow for nested sexps.

4. It would need to deal properly with quoting, `''. Dealing with backquote
syntax, ``', would be a plus.

Can something like this be done in a reasonable way? What's a good regexp
that you could use, e.g., to search for one or more sexps?

I'm not looking for a way to search for or scan a sexp *without* using a
regexp; I know there are ways to do that. I'm wondering what can be done
*with* a regexp. IOW, imagine that all you have is `C-M-s' (but don't worry
about the expression being too complex to type interactively).

I'm less interested in hearing "it can't be done" than in attempts to do the
job, even if in a rough way.

Any ideas?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regexp to match a sexp?
       [not found] <mailman.4621.1154138203.9609.help-gnu-emacs@gnu.org>
@ 2006-07-29  2:24 ` Pascal Bourguignon
  2006-07-29  3:59   ` Drew Adams
  0 siblings, 1 reply; 4+ messages in thread
From: Pascal Bourguignon @ 2006-07-29  2:24 UTC (permalink / raw)


"Drew Adams" <drew.adams@oracle.com> writes:
> This is a job description for a regexp.
>
> I'm looking for a regexp that will match (only) a sexp. 

In general, it's not possible.  Regexps cannot match recursive grammars.


> Ideally, by using
> syntax classes, it might be able to adapt to (some) other languages, but a
> regexp that worked for only Emacs-Lisp sexps would be a great start.
>
> For Lisp, for example, here are some considerations I can think of (are
> there others?):
>
> 1. It would need to match either a list sexp or an atomic sexp. The former
> would be something like \s(.*\s), but `.' would need to be refined here
> (how?). The latter would have to account for symbols, as in \s_+, but also
> arbitrary numbers and strings. (Is anything else a readable Emacs-Lisp
> atom?)

Are you excluding sublists?  With sublists excluded, it's possible,
but note that \s(.*\s) you cannot ensure that the opening bracket is
the corresponding character to the closing bracket. You'll have to write:

(ELEMENTS)\|\[ELEMENTS\]\|{ELEMENTS}\|<ELEMENTS>\|...

ELEMENTS ::= "\([^"\\]*\|\\.\)*"\|[^"]*


> 2. Strings and the possible escaping of `"' would be one headache that would
> need to be dealt with carefully, as always.

No, it's simplistic to deal with them in regexps. See above.


> 3. It would need to be effectively recursive or some approximation thereof,
> for example, with some limit placed on nesting. That is, it would need to
> allow for nested sexps.

Yes, that's why it's not possible with regexps. Regexps are not
recursive by definition.


> 4. It would need to deal properly with quoting, `''. Dealing with backquote
> syntax, ``', would be a plus.
>
> Can something like this be done in a reasonable way? What's a good regexp
> that you could use, e.g., to search for one or more sexps?
>
> I'm not looking for a way to search for or scan a sexp *without* using a
> regexp; I know there are ways to do that. I'm wondering what can be done
> *with* a regexp. IOW, imagine that all you have is `C-M-s' (but don't worry
> about the expression being too complex to type interactively).
>
> I'm less interested in hearing "it can't be done" than in attempts to do the
> job, even if in a rough way.

Well, since it's not possible, ita can't be done, but you can still go
thru the mirror and see if it's possible, since you prefer that.  Good bye.



Now, if your purpose is to _*PARSE*_ sexps  instead of using regexps,
then you can easily write a sexp parser.  This is one of the simpliest
grammar there is.  In emacs, of course you can use the provided sexp
parser, with functions such as: forward-sexp, backward-sexp,
(thing-at-point 'sexp), read-from-string, etc...

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: regexp to match a sexp?
  2006-07-29  2:24 ` Pascal Bourguignon
@ 2006-07-29  3:59   ` Drew Adams
  0 siblings, 0 replies; 4+ messages in thread
From: Drew Adams @ 2006-07-29  3:59 UTC (permalink / raw)


    > I'm looking for a regexp that will match (only) a sexp.

    In general, it's not possible.  Regexps cannot match recursive grammars.

I know that. Again, as I said:

    > I'm less interested in hearing "it can't be done" than in
    > attempts to do the job, even if in a rough way.

    Are you excluding sublists?

No, as I said. Excluding them would be one legitimate approximation. But the
real exercise is to allow for some degree of nesting.

    With sublists excluded, it's possible,
    but note that \s(.*\s) you cannot ensure that the opening bracket is
    the corresponding character to the closing bracket.

Yes.

Of course, with no sublists, you can use \s(\S)*\s).

    You'll have to write:

    (ELEMENTS)\|\[ELEMENTS\]\|{ELEMENTS}\|<ELEMENTS>\|...
    ELEMENTS ::= "\([^"\\]*\|\\.\)*"\|[^"]*

I'm interested in candidate regexps that might be useful in limited
contexts. Yes, different regexps will have different limitations, and
therefore be differently useful.

Think of this as a quick-and-dirty `C-M-s' that might (*might*) just
usefully find a sexp some of the time.

    > 2. Strings and the possible escaping of `"' would be one
    > headache that would need to be dealt with carefully, as always.

    No, it's simplistic to deal with them in regexps. See above.

In practice, we do deal with them in Emacs, albeit with limited success. I'm
not trying to write a sexp grammar; I'm asking about what can be done,
practically, with regexps, in terms of matching sexps.

    > 3. It would need to be effectively recursive or some
    > approximation thereof, for example, with some limit placed
    > on nesting. That is, it would need to allow for nested sexps.

    Yes, that's why it's not possible with regexps. Regexps are not
    recursive by definition.

Yes, I know that.

Some approximations can be made. You proposed an approximation of zero
nesting. That in itself can be useful in some contexts, but one level of
nesting is also useful, and two levels,...

    > 4. It would need to deal properly with quoting, `''. Dealing
    > with backquote syntax, ``', would be a plus.
    >
    > Can something like this be done in a reasonable way? What's a
    > good regexp that you could use, e.g., to search for one or
    > more sexps?
    >
    > I'm not looking for a way to search for or scan a sexp
    > *without* using a regexp; I know there are ways to do that.
    > I'm wondering what can be done *with* a regexp. IOW, imagine
    > that all you have is `C-M-s' (but don't worry
    > about the expression being too complex to type interactively).
    >
    > I'm less interested in hearing "it can't be done" than in
    > attempts to do the job, even if in a rough way.

    Well, since it's not possible, ita can't be done, but you can still go
    thru the mirror and see if it's possible, since you prefer
    that.  Good bye.

    Now, if your purpose is to _*PARSE*_ sexps  instead of using regexps,

I stated that my purpose was the opposite. If I ask about turtles in
Alabama, why do you tell me what you know about wine in Tuscany? ;-)

    then you can easily write a sexp parser.  This is one of the simpliest
    grammar there is.  In emacs, of course you can use the provided sexp
    parser, with functions such as: forward-sexp, backward-sexp,
    (thing-at-point 'sexp), read-from-string, etc...

I know about those. No, I'm not interested in them here.

I'm interested in how much that's interesting could be done with a regexp.
The question is what you can do with `C-M-s' - what interesting regexps
would you use to find which classes of sexp or almost-sexp?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regexp to match a sexp?
       [not found] <mailman.4622.1154145591.9609.help-gnu-emacs@gnu.org>
@ 2006-07-29  4:22 ` Pascal Bourguignon
  0 siblings, 0 replies; 4+ messages in thread
From: Pascal Bourguignon @ 2006-07-29  4:22 UTC (permalink / raw)


"Drew Adams" <drew.adams@oracle.com> writes:

> I stated that my purpose was the opposite. If I ask about turtles in
> Alabama, why do you tell me what you know about wine in Tuscany? ;-)

Because there is absolutely no turtle in Alabama.  The closest you can
get is wine in Tuscany. If you don't want it, then you're just asking
for something impossible, so be happy with what you have, that is,
nothing.

>     then you can easily write a sexp parser.  This is one of the simpliest
>     grammar there is.  In emacs, of course you can use the provided sexp
>     parser, with functions such as: forward-sexp, backward-sexp,
>     (thing-at-point 'sexp), read-from-string, etc...
>
> I know about those. No, I'm not interested in them here.
>
> I'm interested in how much that's interesting could be done with a regexp.
> The question is what you can do with `C-M-s' - what interesting regexps
> would you use to find which classes of sexp or almost-sexp?


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"By filing this bug report you have challenged the honor of my
family. Prepare to die!"

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-07-29  4:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.4622.1154145591.9609.help-gnu-emacs@gnu.org>
2006-07-29  4:22 ` regexp to match a sexp? Pascal Bourguignon
     [not found] <mailman.4621.1154138203.9609.help-gnu-emacs@gnu.org>
2006-07-29  2:24 ` Pascal Bourguignon
2006-07-29  3:59   ` Drew Adams
2006-07-29  1:55 Drew Adams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).